1
2=======================
3Polymorphic dispatching
4=======================
5
6Functions compiled using :func:`~numba.jit` or :func:`~numba.vectorize`
7are open-ended: they can be called with many different input types and
8have to select (possibly compile on-the-fly) the right low-level
9specialization.  We hereby explain how this mechanism is implemented.
10
11
12Requirements
13============
14
15JIT-compiled functions can take several arguments and each of them is
16taken into account when selecting a specialization.  Thus it is a
17form of multiple dispatch, more complex than single dispatch.
18
19Each argument weighs in the selection based on its :ref:`Numba type
20<numba-types>`.  Numba types are often more granular than Python types:
21for example, Numba types Numpy arrays differently depending on their
22dimensionality and their layout (C-contiguous, etc.).
23
24Once a Numba type is inferred for each argument, a specialization must
25be chosen amongst the available ones; or, if not suitable specialization
26is found, a new one must be compiled.  This is not a trivial decision:
27there can be multiple specializations compatible with a given concrete
28signature (for example, say a two-argument function has compiled
29specializations for ``(float64, float64)`` and ``(complex64, complex64)``,
30and it is called with ``(float32, float32)``).
31
32Therefore, there are two crucial steps in the dispatch mechanism:
33
341. infer the Numba types of the concrete arguments
352. select the best available specialization (or choose to compile a new one)
36   for the inferred Numba types
37
38Compile-time vs. run-time
39-------------------------
40
41This document discusses dispatching when it is done at runtime, i.e.
42when a JIT-compiled function is called from pure Python.  In that context,
43performance is important.  To stay in the realm of normal function call
44overhead in Python, the overhead of dispatching should stay under a
45microsecond.  Of course, *the faster the better*...
46
47When a JIT-compiled function is called from another JIT-compiled
48function (in :term:`nopython mode`), the polymorphism is resolved at
49compile-time, using a non-performance critical mechanism, bearing zero
50runtime performance overhead.
51
52.. note::
53   In practice, the performance-critical parts described here are coded in C.
54
55
56Type resolution
57===============
58
59The first step is therefore to infer, at call-time, a Numba type for each
60of the function's concrete arguments.  Given the finer granularity of
61Numba types compared to Python types, one cannot simply lookup an object's
62class and key a dictionary with it to obtain the corresponding Numba type.
63
64Instead, there is a machinery to inspect the object and, based on its
65Python type, query various properties to infer the appropriate Numba
66type.  This can be more or less complex: for example, a Python ``int``
67argument will always infer to a Numba ``intp`` (a pointer-sized integer),
68but a Python ``tuple`` argument can infer to multiple Numba types (depending
69on the tuple's size and the concrete type of each of its elements).
70
71The Numba type system is high-level and written in pure Python; there is
72a pure Python machinery, based on a generic function, to do said inference
73(in :mod:`numba.typing.typeof`).  That machinery is used for compile-time
74inference, e.g. on constants.  Unfortunately, it is too slow for run-time
75value-based dispatching.  It is only used as a fallback for rarely used
76(or difficult to infer) types, and exhibits multiple-microsecond overhead.
77
78Typecodes
79---------
80
81The Numba type system is really too high-level to be manipulated efficiently
82from C code.  Therefore, the C dispatching layer uses another representation
83based on integer typecodes.  Each Numba type gets a unique integer typecode
84when constructed; also, an interning system ensure no two instances of same
85type are created.  The dispatching layer is therefore able to *eschew*
86the overhead of the Numba type system by working with simple integer
87typecodes, amenable to well-known optimizations (fast hash tables, etc.).
88
89The goal of the type resolution step becomes: infer a Numba *typecode*
90for each of the function's concrete arguments.  Ideally, it doesn't deal
91with Numba types anymore...
92
93Hard-coded fast paths
94---------------------
95
96While eschewing the abstraction and object-orientation overhead of the type
97system, the integer typecodes still have the same conceptual complexity.
98Therefore, an important technique to speed up inference is to first go
99through checks for the most important types, and hard-code a fast resolution
100for each of them.
101
102Several types benefit from such an optimization, notably:
103
104* basic Python scalars (``bool``, ``int``, ``float``, ``complex``);
105* basic Numpy scalars (the various kinds of integer, floating-point,
106  complex numbers);
107* Numpy arrays of certain dimensionalities and basic element types.
108
109Each of those fast paths ideally uses a hard-coded result value or a direct
110table lookup after a few simple checks.
111
112However, we can't apply that technique to all argument types; there would
113be an explosion of ad-hoc internal caches, and it would become difficult to
114maintain.  Besides, the recursive application of hard-coded fast paths
115would not necessarily combine into a low overhead (in the nested tuple
116case, for example).
117
118Fingerprint-based typecode cache
119--------------------------------
120
121For non-so-trivial types (imagine a tuple, or a Numpy ``datetime64`` array,
122for example), the hard-coded fast paths don't match.  Another mechanism
123then kicks in, more generic.
124
125The principle here is to examine each argument value, as the pure Python
126machinery would do, and to describe its Numba type unambiguously.  The
127difference is that *we don't actually compute a Numba type*.  Instead, we
128compute a simple bytestring, a low-level possible denotation of that
129Numba type: a *fingerprint*.  The fingerprint format is designed to be
130short and extremely simple to compute from C code (in practice, it has
131a bytecode-like format).
132
133Once the fingerprint is computed, it is looked up in a cache mapping
134fingerprints to typecodes.  The cache is a hash table, and the lookup
135is fast thanks to the fingerprints being generally very short (rarely
136more than 20 bytes).
137
138If the cache lookup fails, the typecode must first be computed using the
139slow pure Python machinery.  Luckily, this would only happen once: on
140subsequent calls, the cached typecode would be returned for the given
141fingerprint.
142
143In rare cases, a fingerprint cannot be computed efficiently.  This is
144the case for some types which cannot be easily inspected from C: for
145example ``cffi`` function pointers.  Then, the slow Pure Python machinery
146is invoked at each function call with such an argument.
147
148.. note::
149   Two fingerprints may denote a single Numba type.  This does not make
150   the mechanism incorrect; it only creates more cache entries.
151
152
153Summary
154-------
155
156Type resolution of a function argument involves the following mechanisms
157in order:
158
159* Try a few hard-coded fast paths, for common simple types.
160* If the above failed, compute a fingerprint for the argument and lookup
161  its typecode in a cache.
162* If all the above failed, invoke the pure Python machinery which will
163  determine a Numba type for the argument (and look up its typecode).
164
165
166Specialization selection
167========================
168
169At the previous step, an integer typecode has been determined for each
170concrete argument to the JIT-compiled function.  Now it remains to match
171that concrete signature against each of the available specializations for
172the function.  There can be three outcomes:
173
174* There is a satisfying best match: the corresponding specialization
175  is then invoked (it will handle argument unboxing and other details).
176* There is a tie between two or more "best matches": an exception is raised,
177  refusing to solve the ambiguity.
178* There is no satisfying match: a new specialization is compiled tailored
179  for the concrete argument types that were inferred.
180
181The selection works by looping over all available specializations, and
182computing the compatibility of each concrete argument type with the
183corresponding type in the specialization's intended signature.  Specifically,
184we are interested in:
185
1861. Whether the concrete argument type is allowed to convert implicitly to
187   the specialization's argument type;
1882. If so, at what semantic (user-visible) cost the conversion comes.
189
190Implicit conversion rules
191-------------------------
192
193There are five possible kinds of implicit conversion from a source type
194to a destination type (note this is an asymmetric relationship):
195
1961. *exact match*: the two types are identical; this is the ideal case,
197   since the specialization would behave exactly as intended;
1982. *same-kind promotion*: the two types belong to the same "kind" (for
199   example ``int32`` and ``int64`` are two integer types), and the source
200   type can be converted losslessly to the destination type (e.g. from
201   ``int32`` to ``int64``, but not the reverse);
2023. *safe conversion*: the two types belong to different kinds, but the
203   source type can be reasonably converted to the destination type
204   (e.g. from ``int32`` to ``float64``, but not the reverse);
2054. *unsafe conversion*: a conversion is available from the source type
206   to the destination type, but it may lose precision, magnitude, or
207   another desirable quality.
2085. *no conversion*: there is no correct or reasonably efficient way to
209   convert between the two types (for example between an ``int64`` and a
210   ``datetime64``, or a C-contiguous array and a Fortran-contiguous array).
211
212When a specialization is examined, the latter two cases eliminate it from
213the final choice: i.e. when at least one argument has *no conversion* or
214only an *unsafe conversion* to the signature's argument type.
215
216.. note::
217   However, if the function is compiled with explicit signatures
218   in the :func:`~numba.jit` call (and therefore it is not allowed to compile
219   new specializations), *unsafe conversion* is allowed.
220
221Candidates and best match
222-------------------------
223
224If a specialization is not eliminated by the rule above, it enters the
225list of *candidates* for the final choice.  Those candidates are ranked
226by an ordered 4-uple of integers: ``(number of unsafe conversions,
227number of safe conversions, number of same-kind promotions, number of
228exact matches)`` (note the sum of the tuple's elements is equal to the
229number of arguments).  The best match is then the #1 result in sorted
230ascending order, thereby preferring exact matches over promotions,
231promotions over safe conversions, safe conversions over unsafe conversions.
232
233Implementation
234--------------
235
236The above-described mechanism works on integer typecodes, not on Numba
237types.  It uses an internal hash table storing the possible conversion
238kind for each pair of compatible types.  The internal hash table is in part
239built at startup (for built-in trivial types such as ``int32``, ``int64``
240etc.), in part filled dynamically (for arbitrarily complex types such
241as array types: for example to allow using a C-contiguous 2D array where
242a function expects a non-contiguous 2D array).
243
244Summary
245-------
246
247Selecting the right specialization involves the following steps:
248
249* Examine each available specialization and match it against the concrete
250  argument types.
251* Eliminate any specialization where at least one argument doesn't offer
252  sufficient compatibility.
253* If there are remaining candidates, choose the best one in terms of
254  preserving the types' semantics.
255
256
257Miscellaneous
258=============
259
260Some `benchmarks of dispatch performance
261<https://github.com/numba/numba-benchmark/blob/master/benchmarks/bench_dispatch.py>`_
262exist in the `Numba benchmarks <https://github.com/numba/numba-benchmark>`_
263repository.
264
265Some unit tests of specific aspects of the machinery are available
266in :mod:`numba.tests.test_typeinfer` and :mod:`numba.tests.test_typeof`.
267Higher-level dispatching tests are in :mod:`numba.tests.test_dispatcher`.
268