1Porting Cython code to PyPy
2===========================
3
4Cython has basic support for cpyext, the layer in
5`PyPy <https://pypy.org/>`_ that emulates CPython's C-API.  This is
6achieved by making the generated C code adapt at C compile time, so
7the generated code will compile in both CPython and PyPy unchanged.
8
9However, beyond what Cython can cover and adapt internally, the cpyext
10C-API emulation involves some differences to the real C-API in CPython
11that have a visible impact on user code.  This page lists major
12differences and ways to deal with them in order to write Cython code
13that works in both CPython and PyPy.
14
15
16Reference counts
17----------------
18
19A general design difference in PyPy is that the runtime does not use
20reference counting internally but always a garbage collector.  Reference
21counting is only emulated at the cpyext layer by counting references
22being held in C space.  This implies that the reference count in PyPy
23is generally different from that in CPython because it does not count
24any references held in Python space.
25
26
27Object lifetime
28---------------
29
30As a direct consequence of the different garbage collection characteristics,
31objects may see the end of their lifetime at other points than in
32CPython.  Special care therefore has to be taken when objects are expected
33to have died in CPython but may not in PyPy.  Specifically, a deallocator
34method of an extension type (``__dealloc__()``) may get called at a much
35later point than in CPython, triggered rather by memory getting tighter
36than by objects dying.
37
38If the point in the code is known when an object is supposed to die (e.g.
39when it is tied to another object or to the execution time of a function),
40it is worth considering if it can be invalidated and cleaned up manually at
41that point, rather than relying on a deallocator.
42
43As a side effect, this can sometimes even lead to a better code design,
44e.g. when context managers can be used together with the ``with`` statement.
45
46
47Borrowed references and data pointers
48-------------------------------------
49
50The memory management in PyPy is allowed to move objects around in memory.
51The C-API layer is only an indirect view on PyPy objects and often replicates
52data or state into C space that is then tied to the lifetime of a C-API
53object rather then the underlying PyPy object.  It is important to understand
54that these two objects are separate things in cpyext.
55
56The effect can be that when data pointers or borrowed references are used,
57and the owning object is no longer directly referenced from C space, the
58reference or data pointer may become invalid at some point, even if the
59object itself is still alive.  As opposed to CPython, it is not enough to
60keep the reference to the object alive in a list (or other Python container),
61because the contents of those is only managed in Python space and thus only
62references the PyPy object.  A reference in a Python container will not keep
63the C-API view on it alive.  Entries in a Python class dict will obviously
64not work either.
65
66One of the more visible places where this may happen is when accessing the
67:c:type:`char*` buffer of a byte string.  In PyPy, this will only work as
68long as the Cython code holds a direct reference to the byte string object
69itself.
70
71Another point is when CPython C-API functions are used directly that return
72borrowed references, e.g. :c:func:`PyTuple_GET_ITEM()` and similar functions,
73but also some functions that return borrowed references to built-in modules or
74low-level objects of the runtime environment.  The GIL in PyPy only guarantees
75that the borrowed reference stays valid up to the next call into PyPy (or
76its C-API), but not necessarily longer.
77
78When accessing the internals of Python objects or using borrowed references
79longer than up to the next call into PyPy, including reference counting or
80anything that frees the GIL, it is therefore required to additionally keep
81direct owned references to these objects alive in C space, e.g. in local
82variables in a function or in the attributes of an extension type.
83
84When in doubt, avoid using C-API functions that return borrowed references,
85or surround the usage of a borrowed reference explicitly by a pair of calls
86to :c:func:`Py_INCREF()` when getting the reference and :c:func:`Py_DECREF()`
87when done with it to convert it into an owned reference.
88
89
90Builtin types, slots and fields
91-------------------------------
92
93The following builtin types are not currently available in cpyext in
94form of their C level representation: :c:type:`PyComplexObject`,
95:c:type:`PyFloatObject` and :c:type:`PyBoolObject`.
96
97Many of the type slot functions of builtin types are not initialised
98in cpyext and can therefore not be used directly.
99
100Similarly, almost none of the (implementation) specific struct fields of
101builtin types is exposed at the C level, such as the ``ob_digit`` field
102of :c:type:`PyLongObject` or the ``allocated`` field of the
103:c:type:`PyListObject` struct etc.  Although the ``ob_size`` field of
104containers (used by the :c:func:`Py_SIZE()` macro) is available, it is
105not guaranteed to be accurate.
106
107It is best not to access any of these struct fields and slots and to
108use the normal Python types instead as well as the normal Python
109protocols for object operations.  Cython will map them to an appropriate
110usage of the C-API in both CPython and cpyext.
111
112
113GIL handling
114------------
115
116Currently, the GIL handling function :c:func:`PyGILState_Ensure` is not
117re-entrant in PyPy and deadlocks when called twice.  This means that
118code that tries to acquire the GIL "just in case", because it might be
119called with or without the GIL, will not work as expected in PyPy.
120See `PyGILState_Ensure should not deadlock if GIL already held
121<https://bitbucket.org/pypy/pypy/issues/1778>`_.
122
123
124Efficiency
125----------
126
127Simple functions and especially macros that are used for speed in CPython
128may exhibit substantially different performance characteristics in cpyext.
129
130Functions returning borrowed references were already mentioned as requiring
131special care, but they also induce substantially more runtime overhead because
132they often create weak references in PyPy where they only return a plain
133pointer in CPython.  A visible example is :c:func:`PyTuple_GET_ITEM()`.
134
135Some more high-level functions may also show entirely different performance
136characteristics, e.g. :c:func:`PyDict_Next()` for dict iteration.  While
137being the fastest way to iterate over a dict in CPython, having linear time
138complexity and a low overhead, it currently has quadratic runtime in PyPy
139because it maps to normal dict iteration, which cannot keep track of the
140current position between two calls and thus needs to restart the iteration
141on each call.
142
143The general advice applies here even more than in CPython, that it is always
144best to rely on Cython generating appropriately adapted C-API handling code
145for you than to use the C-API directly - unless you really know what you are
146doing.  And if you find a better way of doing something in PyPy and cpyext
147than Cython currently does, it's best to fix Cython for everyone's benefit.
148
149
150Known problems
151--------------
152
153* As of PyPy 1.9, subtyping builtin types can result in infinite recursion
154  on method calls in some rare cases.
155
156* Docstrings of special methods are not propagated to Python space.
157
158* The Python 3.x adaptations in pypy3 only slowly start to include the
159  C-API, so more incompatibilities can be expected there.
160
161
162Bugs and crashes
163----------------
164
165The cpyext implementation in PyPy is much younger and substantially less
166mature than the well tested C-API and its underlying native implementation
167in CPython.  This should be remembered when running into crashes, as the
168problem may not always be in your code or in Cython.  Also, PyPy and its
169cpyext implementation are less easy to debug at the C level than CPython
170and Cython, simply because they were not designed for it.
171