1Porting Cython code to PyPy 2=========================== 3 4Cython has basic support for cpyext, the layer in 5`PyPy <http://pypy.org/>`_ that emulates CPython's C-API. This is 6achieved by making the generated C code adapt at C compile time, so 7the generated code will compile in both CPython and PyPy unchanged. 8 9However, beyond what Cython can cover and adapt internally, the cpyext 10C-API emulation involves some differences to the real C-API in CPython 11that have a visible impact on user code. This page lists major 12differences and ways to deal with them in order to write Cython code 13that works in both CPython and PyPy. 14 15 16Reference counts 17---------------- 18 19A general design difference in PyPy is that the runtime does not use 20reference counting internally but always a garbage collector. Reference 21counting is only emulated at the cpyext layer by counting references 22being held in C space. This implies that the reference count in PyPy 23is generally different from that in CPython because it does not count 24any references held in Python space. 25 26 27Object lifetime 28--------------- 29 30As a direct consequence of the different garbage collection characteristics, 31objects may see the end of their lifetime at other points than in 32CPython. Special care therefore has to be taken when objects are expected 33to have died in CPython but may not in PyPy. Specifically, a deallocator 34method of an extension type (``__dealloc__()``) may get called at a much 35later point than in CPython, triggered rather by memory getting tighter 36than by objects dying. 37 38If the point in the code is known when an object is supposed to die (e.g. 39when it is tied to another object or to the execution time of a function), 40it is worth considering if it can be invalidated and cleaned up manually at 41that point, rather than relying on a deallocator. 42 43As a side effect, this can sometimes even lead to a better code design, 44e.g. when context managers can be used together with the ``with`` statement. 45 46 47Borrowed references and data pointers 48------------------------------------- 49 50The memory management in PyPy is allowed to move objects around in memory. 51The C-API layer is only an indirect view on PyPy objects and often replicates 52data or state into C space that is then tied to the lifetime of a C-API 53object rather then the underlying PyPy object. It is important to understand 54that these two objects are separate things in cpyext. 55 56The effect can be that when data pointers or borrowed references are used, 57and the owning object is no longer directly referenced from C space, the 58reference or data pointer may become invalid at some point, even if the 59object itself is still alive. As opposed to CPython, it is not enough to 60keep the reference to the object alive in a list (or other Python container), 61because the contents of those is only managed in Python space and thus only 62references the PyPy object. A reference in a Python container will not keep 63the C-API view on it alive. Entries in a Python class dict will obviously 64not work either. 65 66One of the more visible places where this may happen is when accessing the 67:c:type:`char*` buffer of a byte string. In PyPy, this will only work as 68long as the Cython code holds a direct reference to the byte string object 69itself. 70 71Another point is when CPython C-API functions are used directly that return 72borrowed references, e.g. :c:func:`PyTuple_GET_ITEM()` and similar functions, 73but also some functions that return borrowed references to built-in modules or 74low-level objects of the runtime environment. The GIL in PyPy only guarantees 75that the borrowed reference stays valid up to the next call into PyPy (or 76its C-API), but not necessarily longer. 77 78When accessing the internals of Python objects or using borrowed references 79longer than up to the next call into PyPy, including reference counting or 80anything that frees the GIL, it is therefore required to additionally keep 81direct owned references to these objects alive in C space, e.g. in local 82variables in a function or in the attributes of an extension type. 83 84When in doubt, avoid using C-API functions that return borrowed references, 85or surround the usage of a borrowed reference explicitly by a pair of calls 86to :c:func:`Py_INCREF()` when getting the reference and :c:func:`Py_DECREF()` 87when done with it to convert it into an owned reference. 88 89 90Builtin types, slots and fields 91------------------------------- 92 93The following builtin types are not currently available in cpyext in 94form of their C level representation: :c:type:`PyComplexObject`, 95:c:type:`PyFloatObject` and :c:type:`PyBoolObject`. 96 97Many of the type slot functions of builtin types are not initialised 98in cpyext and can therefore not be used directly. 99 100Similarly, almost none of the (implementation) specific struct fields of 101builtin types is exposed at the C level, such as the ``ob_digit`` field 102of :c:type:`PyLongObject` or the ``allocated`` field of the 103:c:type:`PyListObject` struct etc. Although the ``ob_size`` field of 104containers (used by the :c:func:`Py_SIZE()` macro) is available, it is 105not guaranteed to be accurate. 106 107It is best not to access any of these struct fields and slots and to 108use the normal Python types instead as well as the normal Python 109protocols for object operations. Cython will map them to an appropriate 110usage of the C-API in both CPython and cpyext. 111 112 113GIL handling 114------------ 115 116Currently, the GIL handling function :c:func:`PyGILState_Ensure` is not 117re-entrant in PyPy and deadlocks when called twice. This means that 118code that tries to acquire the GIL "just in case", because it might be 119called with or without the GIL, will not work as expected in PyPy. 120See `PyGILState_Ensure should not deadlock if GIL already held 121<https://bitbucket.org/pypy/pypy/issues/1778>`_. 122 123 124Efficiency 125---------- 126 127Simple functions and especially macros that are used for speed in CPython 128may exhibit substantially different performance characteristics in cpyext. 129 130Functions returning borrowed references were already mentioned as requiring 131special care, but they also induce substantially more runtime overhead because 132they often create weak references in PyPy where they only return a plain 133pointer in CPython. A visible example is :c:func:`PyTuple_GET_ITEM()`. 134 135Some more high-level functions may also show entirely different performance 136characteristics, e.g. :c:func:`PyDict_Next()` for dict iteration. While 137being the fastest way to iterate over a dict in CPython, having linear time 138complexity and a low overhead, it currently has quadratic runtime in PyPy 139because it maps to normal dict iteration, which cannot keep track of the 140current position between two calls and thus needs to restart the iteration 141on each call. 142 143The general advice applies here even more than in CPython, that it is always 144best to rely on Cython generating appropriately adapted C-API handling code 145for you than to use the C-API directly - unless you really know what you are 146doing. And if you find a better way of doing something in PyPy and cpyext 147than Cython currently does, it's best to fix Cython for everyone's benefit. 148 149 150Known problems 151-------------- 152 153* As of PyPy 1.9, subtyping builtin types can result in infinite recursion 154 on method calls in some rare cases. 155 156* Docstrings of special methods are not propagated to Python space. 157 158* The Python 3.x adaptations in pypy3 only slowly start to include the 159 C-API, so more incompatibilities can be expected there. 160 161 162Bugs and crashes 163---------------- 164 165The cpyext implementation in PyPy is much younger and substantially less 166mature than the well tested C-API and its underlying native implementation 167in CPython. This should be remembered when running into crashes, as the 168problem may not always be in your code or in Cython. Also, PyPy and its 169cpyext implementation are less easy to debug at the C level than CPython 170and Cython, simply because they were not designed for it. 171