1 2================ 3Notes on Hashing 4================ 5 6Numba supports the built-in :func:`hash` and does so by simply calling the 7:func:`__hash__` member function on the supplied argument. This makes it 8trivial to add hash support for new types as all that is required is the 9application of the extension API :func:`overload_method` decorator to overload 10a function for computing the hash value for the new type registered to the 11type's :func:`__hash__` method. For example:: 12 13 from numba.extending import overload_method 14 15 @overload_method(myType, '__hash__') 16 def myType_hash_overload(obj): 17 # implementation details 18 19 20The Implementation 21================== 22 23The implementation of the Numba hashing functions strictly follows that of 24Python 3. The only exception to this is that for hashing Unicode and bytes (for 25content longer than ``sys.hash_info.cutoff``) the only supported algorithm is 26``siphash24`` (default in CPython 3). As a result Numba will match Python 3 27hash values for all supported types under the default conditions described. 28 29Unicode hash cache differences 30------------------------------ 31 32Both Numba and CPython Unicode string internal representations have a ``hash`` 33member for the purposes of caching the string's hash value. This member is 34always checked ahead of computing a hash value the with view of simply providing 35a value from cache as it is considerably cheaper to do so. The Numba Unicode 36string hash caching implementation behaves in a similar way to that of 37CPython's. The only notable behavioral change (and its only impact is a minor 38potential change in performance) is that Numba always computes and caches the 39hash for Unicode strings created in ``nopython mode`` at the time they are boxed 40for reuse in Python, this is too eager in some cases in comparison to CPython 41which may delay hashing a new Unicode string depending on creation method. It 42should also be noted that Numba copies in the ``hash`` member of the CPython 43internal representation for Unicode strings when unboxing them to its own 44representation so as to not recompute the hash of a string that already has a 45hash value associated with it. 46 47The accommodation of ``PYTHONHASHSEED`` 48--------------------------------------- 49 50The ``PYTHONHASHSEED`` environment variable can be used to seed the CPython 51hashing algorithms for e.g. the purposes of reproduciblity. The Numba hashing 52implementation directly reads the CPython hashing algorithms' internal state and 53as a result the influence of ``PYTHONHASHSEED`` is replicated in Numba's 54hashing implementations. 55