10 PEP 353: Using ssize_t as the index type
A wide-ranging change to Python's C API, using a new Py_ssize_t type definition instead of int, will permit the interpreter to handle more data on 64-bit platforms. This change doesn't affect Python's capacity on 32-bit platforms.
Various pieces of the Python interpreter used C's int type to
store sizes or counts; for example, the number of items in a list or
tuple were stored in an int. The C compilers for most 64-bit
platforms still define int as a 32-bit type, so that meant
that lists could only hold up to 2**31 - 1
= 2147483647 items.
(There are actually a few different programming models that 64-bit C
compilers can use - see
http://www.unix.org/version2/whatsnew/lp64_wp.html for a
discussion - but the most commonly available model leaves int
as 32 bits.)
A limit of 2147483647 items doesn't really matter on a 32-bit platform because you'll run out of memory before hitting the length limit. Each list item requires space for a pointer, which is 4 bytes, plus space for a PyObject representing the item. 2147483647*4 is already more bytes than a 32-bit address space can contain.
It's possible to address that much memory on a 64-bit platform, however. The pointers for a list that size would only require 16 GiB of space, so it's not unreasonable that Python programmers might construct lists that large. Therefore, the Python interpreter had to be changed to use some type other than int, and this will be a 64-bit type on 64-bit platforms. The change will cause incompatibilities on 64-bit machines, so it was deemed worth making the transition now, while the number of 64-bit users is still relatively small. (In 5 or 10 years, we may all be on 64-bit machines, and the transition would be more painful then.)
This change most strongly affects authors of C extension modules. Python strings and container types such as lists and tuples now use Py_ssize_t to store their size. Functions such as PyList_Size() now return Py_ssize_t. Code in extension modules may therefore need to have some variables changed to Py_ssize_t.
The PyArg_ParseTuple() and Py_BuildValue() functions have a new conversion code, "n", for Py_ssize_t. PyArg_ParseTuple()'s "s#" and "t#" still output int by default, but you can define the macro PY_SSIZE_T_CLEAN before including Python.h to make them return Py_ssize_t.
PEP 353 has a section on conversion guidelines that extension authors should read to learn about supporting 64-bit platforms.
See Also:
- PEP written and implemented by Martin von Löwis.
See About this document... for information on suggesting changes.