1 2======================= 3Polymorphic dispatching 4======================= 5 6Functions compiled using :func:`~numba.jit` or :func:`~numba.vectorize` 7are open-ended: they can be called with many different input types and 8have to select (possibly compile on-the-fly) the right low-level 9specialization. We hereby explain how this mechanism is implemented. 10 11 12Requirements 13============ 14 15JIT-compiled functions can take several arguments and each of them is 16taken into account when selecting a specialization. Thus it is a 17form of multiple dispatch, more complex than single dispatch. 18 19Each argument weighs in the selection based on its :ref:`Numba type 20<numba-types>`. Numba types are often more granular than Python types: 21for example, Numba types Numpy arrays differently depending on their 22dimensionality and their layout (C-contiguous, etc.). 23 24Once a Numba type is inferred for each argument, a specialization must 25be chosen amongst the available ones; or, if not suitable specialization 26is found, a new one must be compiled. This is not a trivial decision: 27there can be multiple specializations compatible with a given concrete 28signature (for example, say a two-argument function has compiled 29specializations for ``(float64, float64)`` and ``(complex64, complex64)``, 30and it is called with ``(float32, float32)``). 31 32Therefore, there are two crucial steps in the dispatch mechanism: 33 341. infer the Numba types of the concrete arguments 352. select the best available specialization (or choose to compile a new one) 36 for the inferred Numba types 37 38Compile-time vs. run-time 39------------------------- 40 41This document discusses dispatching when it is done at runtime, i.e. 42when a JIT-compiled function is called from pure Python. In that context, 43performance is important. To stay in the realm of normal function call 44overhead in Python, the overhead of dispatching should stay under a 45microsecond. Of course, *the faster the better*... 46 47When a JIT-compiled function is called from another JIT-compiled 48function (in :term:`nopython mode`), the polymorphism is resolved at 49compile-time, using a non-performance critical mechanism, bearing zero 50runtime performance overhead. 51 52.. note:: 53 In practice, the performance-critical parts described here are coded in C. 54 55 56Type resolution 57=============== 58 59The first step is therefore to infer, at call-time, a Numba type for each 60of the function's concrete arguments. Given the finer granularity of 61Numba types compared to Python types, one cannot simply lookup an object's 62class and key a dictionary with it to obtain the corresponding Numba type. 63 64Instead, there is a machinery to inspect the object and, based on its 65Python type, query various properties to infer the appropriate Numba 66type. This can be more or less complex: for example, a Python ``int`` 67argument will always infer to a Numba ``intp`` (a pointer-sized integer), 68but a Python ``tuple`` argument can infer to multiple Numba types (depending 69on the tuple's size and the concrete type of each of its elements). 70 71The Numba type system is high-level and written in pure Python; there is 72a pure Python machinery, based on a generic function, to do said inference 73(in :mod:`numba.typing.typeof`). That machinery is used for compile-time 74inference, e.g. on constants. Unfortunately, it is too slow for run-time 75value-based dispatching. It is only used as a fallback for rarely used 76(or difficult to infer) types, and exhibits multiple-microsecond overhead. 77 78Typecodes 79--------- 80 81The Numba type system is really too high-level to be manipulated efficiently 82from C code. Therefore, the C dispatching layer uses another representation 83based on integer typecodes. Each Numba type gets a unique integer typecode 84when constructed; also, an interning system ensure no two instances of same 85type are created. The dispatching layer is therefore able to *eschew* 86the overhead of the Numba type system by working with simple integer 87typecodes, amenable to well-known optimizations (fast hash tables, etc.). 88 89The goal of the type resolution step becomes: infer a Numba *typecode* 90for each of the function's concrete arguments. Ideally, it doesn't deal 91with Numba types anymore... 92 93Hard-coded fast paths 94--------------------- 95 96While eschewing the abstraction and object-orientation overhead of the type 97system, the integer typecodes still have the same conceptual complexity. 98Therefore, an important technique to speed up inference is to first go 99through checks for the most important types, and hard-code a fast resolution 100for each of them. 101 102Several types benefit from such an optimization, notably: 103 104* basic Python scalars (``bool``, ``int``, ``float``, ``complex``); 105* basic Numpy scalars (the various kinds of integer, floating-point, 106 complex numbers); 107* Numpy arrays of certain dimensionalities and basic element types. 108 109Each of those fast paths ideally uses a hard-coded result value or a direct 110table lookup after a few simple checks. 111 112However, we can't apply that technique to all argument types; there would 113be an explosion of ad-hoc internal caches, and it would become difficult to 114maintain. Besides, the recursive application of hard-coded fast paths 115would not necessarily combine into a low overhead (in the nested tuple 116case, for example). 117 118Fingerprint-based typecode cache 119-------------------------------- 120 121For non-so-trivial types (imagine a tuple, or a Numpy ``datetime64`` array, 122for example), the hard-coded fast paths don't match. Another mechanism 123then kicks in, more generic. 124 125The principle here is to examine each argument value, as the pure Python 126machinery would do, and to describe its Numba type unambiguously. The 127difference is that *we don't actually compute a Numba type*. Instead, we 128compute a simple bytestring, a low-level possible denotation of that 129Numba type: a *fingerprint*. The fingerprint format is designed to be 130short and extremely simple to compute from C code (in practice, it has 131a bytecode-like format). 132 133Once the fingerprint is computed, it is looked up in a cache mapping 134fingerprints to typecodes. The cache is a hash table, and the lookup 135is fast thanks to the fingerprints being generally very short (rarely 136more than 20 bytes). 137 138If the cache lookup fails, the typecode must first be computed using the 139slow pure Python machinery. Luckily, this would only happen once: on 140subsequent calls, the cached typecode would be returned for the given 141fingerprint. 142 143In rare cases, a fingerprint cannot be computed efficiently. This is 144the case for some types which cannot be easily inspected from C: for 145example ``cffi`` function pointers. Then, the slow Pure Python machinery 146is invoked at each function call with such an argument. 147 148.. note:: 149 Two fingerprints may denote a single Numba type. This does not make 150 the mechanism incorrect; it only creates more cache entries. 151 152 153Summary 154------- 155 156Type resolution of a function argument involves the following mechanisms 157in order: 158 159* Try a few hard-coded fast paths, for common simple types. 160* If the above failed, compute a fingerprint for the argument and lookup 161 its typecode in a cache. 162* If all the above failed, invoke the pure Python machinery which will 163 determine a Numba type for the argument (and look up its typecode). 164 165 166Specialization selection 167======================== 168 169At the previous step, an integer typecode has been determined for each 170concrete argument to the JIT-compiled function. Now it remains to match 171that concrete signature against each of the available specializations for 172the function. There can be three outcomes: 173 174* There is a satisfying best match: the corresponding specialization 175 is then invoked (it will handle argument unboxing and other details). 176* There is a tie between two or more "best matches": an exception is raised, 177 refusing to solve the ambiguity. 178* There is no satisfying match: a new specialization is compiled tailored 179 for the concrete argument types that were inferred. 180 181The selection works by looping over all available specializations, and 182computing the compatibility of each concrete argument type with the 183corresponding type in the specialization's intended signature. Specifically, 184we are interested in: 185 1861. Whether the concrete argument type is allowed to convert implicitly to 187 the specialization's argument type; 1882. If so, at what semantic (user-visible) cost the conversion comes. 189 190Implicit conversion rules 191------------------------- 192 193There are five possible kinds of implicit conversion from a source type 194to a destination type (note this is an asymmetric relationship): 195 1961. *exact match*: the two types are identical; this is the ideal case, 197 since the specialization would behave exactly as intended; 1982. *same-kind promotion*: the two types belong to the same "kind" (for 199 example ``int32`` and ``int64`` are two integer types), and the source 200 type can be converted losslessly to the destination type (e.g. from 201 ``int32`` to ``int64``, but not the reverse); 2023. *safe conversion*: the two types belong to different kinds, but the 203 source type can be reasonably converted to the destination type 204 (e.g. from ``int32`` to ``float64``, but not the reverse); 2054. *unsafe conversion*: a conversion is available from the source type 206 to the destination type, but it may lose precision, magnitude, or 207 another desirable quality. 2085. *no conversion*: there is no correct or reasonably efficient way to 209 convert between the two types (for example between an ``int64`` and a 210 ``datetime64``, or a C-contiguous array and a Fortran-contiguous array). 211 212When a specialization is examined, the latter two cases eliminate it from 213the final choice: i.e. when at least one argument has *no conversion* or 214only an *unsafe conversion* to the signature's argument type. 215 216.. note:: 217 However, if the function is compiled with explicit signatures 218 in the :func:`~numba.jit` call (and therefore it is not allowed to compile 219 new specializations), *unsafe conversion* is allowed. 220 221Candidates and best match 222------------------------- 223 224If a specialization is not eliminated by the rule above, it enters the 225list of *candidates* for the final choice. Those candidates are ranked 226by an ordered 4-uple of integers: ``(number of unsafe conversions, 227number of safe conversions, number of same-kind promotions, number of 228exact matches)`` (note the sum of the tuple's elements is equal to the 229number of arguments). The best match is then the #1 result in sorted 230ascending order, thereby preferring exact matches over promotions, 231promotions over safe conversions, safe conversions over unsafe conversions. 232 233Implementation 234-------------- 235 236The above-described mechanism works on integer typecodes, not on Numba 237types. It uses an internal hash table storing the possible conversion 238kind for each pair of compatible types. The internal hash table is in part 239built at startup (for built-in trivial types such as ``int32``, ``int64`` 240etc.), in part filled dynamically (for arbitrarily complex types such 241as array types: for example to allow using a C-contiguous 2D array where 242a function expects a non-contiguous 2D array). 243 244Summary 245------- 246 247Selecting the right specialization involves the following steps: 248 249* Examine each available specialization and match it against the concrete 250 argument types. 251* Eliminate any specialization where at least one argument doesn't offer 252 sufficient compatibility. 253* If there are remaining candidates, choose the best one in terms of 254 preserving the types' semantics. 255 256 257Miscellaneous 258============= 259 260Some `benchmarks of dispatch performance 261<https://github.com/numba/numba-benchmark/blob/master/benchmarks/bench_dispatch.py>`_ 262exist in the `Numba benchmarks <https://github.com/numba/numba-benchmark>`_ 263repository. 264 265Some unit tests of specific aspects of the machinery are available 266in :mod:`numba.tests.test_typeinfer` and :mod:`numba.tests.test_typeof`. 267Higher-level dispatching tests are in :mod:`numba.tests.test_dispatcher`. 268