12 Other Language Changes

Python 2.5


12 Other Language Changes

Here are all of the changes that Python 2.5 makes to the core Python language.

  • The dict type has a new hook for letting subclasses provide a default value when a key isn't contained in the dictionary. When a key isn't found, the dictionary's __missing__(key) method will be called. This hook is used to implement the new defaultdict class in the collections module. The following example defines a dictionary that returns zero for any missing key:

    class zerodict (dict):
        def __missing__ (self, key):
            return 0
    
    d = zerodict({1:1, 2:2})
    print d[1], d[2]   # Prints 1, 2
    print d[3], d[4]   # Prints 0, 0
    

  • Both 8-bit and Unicode strings have new partition(sep) and rpartition(sep) methods that simplify a common use case.

    The find(S) method is often used to get an index which is then used to slice the string and obtain the pieces that are before and after the separator. partition(sep) condenses this pattern into a single method call that returns a 3-tuple containing the substring before the separator, the separator itself, and the substring after the separator. If the separator isn't found, the first element of the tuple is the entire string and the other two elements are empty. rpartition(sep) also returns a 3-tuple but starts searching from the end of the string; the "r" stands for 'reverse'.

    Some examples:

    >>> ('http://www.python.org').partition('://')
    ('http', '://', 'www.python.org')
    >>> ('file:/usr/share/doc/index.html').partition('://')
    ('file:/usr/share/doc/index.html', '', '')
    >>> (u'Subject: a quick question').partition(':')
    (u'Subject', u':', u' a quick question')
    >>> 'www.python.org'.rpartition('.')
    ('www.python', '.', 'org')
    >>> 'www.python.org'.rpartition(':')
    ('', '', 'www.python.org')
    

    (Implemented by Fredrik Lundh following a suggestion by Raymond Hettinger.)

  • The startswith() and endswith() methods of string types now accept tuples of strings to check for.

    def is_image_file (filename):
        return filename.endswith(('.gif', '.jpg', '.tiff'))
    

    (Implemented by Georg Brandl following a suggestion by Tom Lynn.)

  • The min() and max() built-in functions gained a key keyword parameter analogous to the key argument for sort(). This parameter supplies a function that takes a single argument and is called for every value in the list; min()/max() will return the element with the smallest/largest return value from this function. For example, to find the longest string in a list, you can do:

    L = ['medium', 'longest', 'short']
    # Prints 'longest'
    print max(L, key=len)              
    # Prints 'short', because lexicographically 'short' has the largest value
    print max(L)
    

    (Contributed by Steven Bethard and Raymond Hettinger.)

  • Two new built-in functions, any() and all(), evaluate whether an iterator contains any true or false values. any() returns True if any value returned by the iterator is true; otherwise it will return False. all() returns True only if all of the values returned by the iterator evaluate as true. (Suggested by Guido van Rossum, and implemented by Raymond Hettinger.)

  • The result of a class's __hash__() method can now be either a long integer or a regular integer. If a long integer is returned, the hash of that value is taken. In earlier versions the hash value was required to be a regular integer, but in 2.5 the id() built-in was changed to always return non-negative numbers, and users often seem to use id(self) in __hash__() methods (though this is discouraged).

  • ASCII is now the default encoding for modules. It's now a syntax error if a module contains string literals with 8-bit characters but doesn't have an encoding declaration. In Python 2.4 this triggered a warning, not a syntax error. See PEP 263 for how to declare a module's encoding; for example, you might add a line like this near the top of the source file:

    # -*- coding: latin1 -*-
    

  • A new warning, UnicodeWarning, is triggered when you attempt to compare a Unicode string and an 8-bit string that can't be converted to Unicode using the default ASCII encoding. The result of the comparison is false:

    >>> chr(128) == unichr(128)   # Can't convert chr(128) to Unicode
    __main__:1: UnicodeWarning: Unicode equal comparison failed 
      to convert both arguments to Unicode - interpreting them 
      as being unequal
    False
    >>> chr(127) == unichr(127)   # chr(127) can be converted
    True
    

    Previously this would raise a UnicodeDecodeError exception, but in 2.5 this could result in puzzling problems when accessing a dictionary. If you looked up unichr(128) and chr(128) was being used as a key, you'd get a UnicodeDecodeError exception. Other changes in 2.5 resulted in this exception being raised instead of suppressed by the code in dictobject.c that implements dictionaries.

    Raising an exception for such a comparison is strictly correct, but the change might have broken code, so instead UnicodeWarning was introduced.

    (Implemented by Marc-André Lemburg.)

  • One error that Python programmers sometimes make is forgetting to include an __init__.py module in a package directory. Debugging this mistake can be confusing, and usually requires running Python with the -v switch to log all the paths searched. In Python 2.5, a new ImportWarning warning is triggered when an import would have picked up a directory as a package but no __init__.py was found. This warning is silently ignored by default; provide the -Wd option when running the Python executable to display the warning message. (Implemented by Thomas Wouters.)

  • The list of base classes in a class definition can now be empty. As an example, this is now legal:

    class C():
        pass
    
    (Implemented by Brett Cannon.)


12.1 Interactive Interpreter Changes

In the interactive interpreter, quit and exit have long been strings so that new users get a somewhat helpful message when they try to quit:

>>> quit
'Use Ctrl-D (i.e. EOF) to exit.'

In Python 2.5, quit and exit are now objects that still produce string representations of themselves, but are also callable. Newbies who try quit() or exit() will now exit the interpreter as they expect. (Implemented by Georg Brandl.)

The Python executable now accepts the standard long options --help and --version; on Windows, it also accepts the /? option for displaying a help message. (Implemented by Georg Brandl.)


12.2 Optimizations

Several of the optimizations were developed at the NeedForSpeed sprint, an event held in Reykjavik, Iceland, from May 21-28 2006. The sprint focused on speed enhancements to the CPython implementation and was funded by EWT LLC with local support from CCP Games. Those optimizations added at this sprint are specially marked in the following list.

  • When they were introduced in Python 2.4, the built-in set and frozenset types were built on top of Python's dictionary type. In 2.5 the internal data structure has been customized for implementing sets, and as a result sets will use a third less memory and are somewhat faster. (Implemented by Raymond Hettinger.)

  • The speed of some Unicode operations, such as finding substrings, string splitting, and character map encoding and decoding, has been improved. (Substring search and splitting improvements were added by Fredrik Lundh and Andrew Dalke at the NeedForSpeed sprint. Character maps were improved by Walter Dörwald and Martin von Löwis.)

  • The long(str, base) function is now faster on long digit strings because fewer intermediate results are calculated. The peak is for strings of around 800-1000 digits where the function is 6 times faster. (Contributed by Alan McIntyre and committed at the NeedForSpeed sprint.)

  • The struct module now compiles structure format strings into an internal representation and caches this representation, yielding a 20% speedup. (Contributed by Bob Ippolito at the NeedForSpeed sprint.)

  • The re module got a 1 or 2% speedup by switching to Python's allocator functions instead of the system's malloc() and free(). (Contributed by Jack Diederich at the NeedForSpeed sprint.)

  • The code generator's peephole optimizer now performs simple constant folding in expressions. If you write something like a = 2+3, the code generator will do the arithmetic and produce code corresponding to a = 5. (Proposed and implemented by Raymond Hettinger.)

  • Function calls are now faster because code objects now keep the most recently finished frame (a ``zombie frame'') in an internal field of the code object, reusing it the next time the code object is invoked. (Original patch by Michael Hudson, modified by Armin Rigo and Richard Jones; committed at the NeedForSpeed sprint.)

    Frame objects are also slightly smaller, which may improve cache locality and reduce memory usage a bit. (Contributed by Neal Norwitz.)

  • Python's built-in exceptions are now new-style classes, a change that speeds up instantiation considerably. Exception handling in Python 2.5 is therefore about 30% faster than in 2.4. (Contributed by Richard Jones, Georg Brandl and Sean Reifschneider at the NeedForSpeed sprint.)

  • Importing now caches the paths tried, recording whether they exist or not so that the interpreter makes fewer open() and stat() calls on startup. (Contributed by Martin von Löwis and Georg Brandl.)

See About this document... for information on suggesting changes.