• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

comparison/H06-Sep-2021-10065

fast_histogram/H06-Sep-2021-1,8841,385

fast_histogram.egg-info/H03-May-2022-176120

.gitignoreH A D06-Sep-2021791 6657

CHANGES.rstH A D06-Sep-20211.6 KiB7546

LICENSEH A D06-Sep-20211.3 KiB2419

MANIFEST.inH A D06-Sep-202178 54

PKG-INFOH A D06-Sep-20216.4 KiB176120

README.rstH A D06-Sep-20216.1 KiB161108

azure-pipelines.ymlH A D06-Sep-20211.5 KiB6856

pyproject.tomlH A D06-Sep-2021170 76

setup.cfgH A D06-Sep-2021492 2924

setup.pyH A D06-Sep-2021428 159

tox.iniH A D06-Sep-2021893 4037

README.rst

1|Azure Status| |asv|
2
3About
4-----
5
6Sometimes you just want to compute simple 1D or 2D histograms with regular bins. Fast. No
7nonsense. `Numpy's <http://www.numpy.org>`__ histogram functions are
8versatile, and can handle for example non-regular binning, but this
9versatility comes at the expense of performance.
10
11The **fast-histogram** mini-package aims to provide simple and fast
12histogram functions for regular bins that don't compromise on performance. It doesn't do
13anything complicated - it just implements a simple histogram algorithm
14in C and keeps it simple. The aim is to have functions that are fast but
15also robust and reliable. The result is a 1D histogram function here that
16is **7-15x faster** than ``numpy.histogram``, and a 2D histogram function
17that is **20-25x faster** than ``numpy.histogram2d``.
18
19To install::
20
21    pip install fast-histogram
22
23or if you use conda you can instead do::
24
25    conda install -c conda-forge fast-histogram
26
27The ``fast_histogram`` module then provides two functions:
28``histogram1d`` and ``histogram2d``:
29
30.. code:: python
31
32    from fast_histogram import histogram1d, histogram2d
33
34Example
35-------
36
37Here's an example of binning 10 million points into a regular 2D
38histogram:
39
40.. code:: python
41
42    In [1]: import numpy as np
43
44    In [2]: x = np.random.random(10_000_000)
45
46    In [3]: y = np.random.random(10_000_000)
47
48    In [4]: %timeit _ = np.histogram2d(x, y, range=[[-1, 2], [-2, 4]], bins=30)
49    935 ms ± 58.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
50
51    In [5]: from fast_histogram import histogram2d
52
53    In [6]: %timeit _ = histogram2d(x, y, range=[[-1, 2], [-2, 4]], bins=30)
54    40.2 ms ± 624 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
55
56(note that ``10_000_000`` is possible in Python 3.6 syntax, use ``10000000`` instead in previous versions)
57
58The version here is over 20 times faster! The following plot shows the
59speedup as a function of array size for the bin parameters shown above:
60
61.. figure:: https://github.com/astrofrog/fast-histogram/raw/master/speedup_compared.png
62   :alt: Comparison of performance between Numpy and fast-histogram
63
64as well as results for the 1D case, also with 30 bins. The speedup for
65the 2D case is consistently between 20-25x, and for the 1D case goes
66from 15x for small arrays to around 7x for large arrays.
67
68Q&A
69---
70
71Why don't the histogram functions return the edges?
72~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
73
74Computing and returning the edges may seem trivial but it can slow things down by a factor of a few when computing histograms of 10^5 or fewer elements, so not returning the edges is a deliberate decision related to performance. You can easily compute the edges yourself if needed though, using ``numpy.linspace``.
75
76Doesn't package X already do this, but better?
77~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
78
79This may very well be the case! If this duplicates another package, or
80if it is possible to use Numpy in a smarter way to get the same
81performance gains, please open an issue and I'll consider deprecating
82this package :)
83
84One package that does include fast histogram functions (including in
85n-dimensions) and can compute other statistics is
86`vaex <https://github.com/maartenbreddels/vaex>`_, so take a look there
87if you need more advanced functionality!
88
89Are the 2D histograms not transposed compared to what they should be?
90~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
91
92There is technically no 'right' and 'wrong' orientation - here we adopt
93the convention which gives results consistent with Numpy, so:
94
95.. code:: python
96
97    numpy.histogram2d(x, y, range=[[xmin, xmax], [ymin, ymax]], bins=[nx, ny])
98
99should give the same result as:
100
101.. code:: python
102
103    fast_histogram.histogram2d(x, y, range=[[xmin, xmax], [ymin, ymax]], bins=[nx, ny])
104
105Why not contribute this to Numpy directly?
106~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
107
108As mentioned above, the Numpy functions are much more versatile, so they could not
109be replaced by the ones here. One option would be to check in Numpy's functions for
110cases that are simple and dispatch to functions such as the ones here, or add
111dedicated functions for regular binning. I hope we can get this in Numpy in some form
112or another eventually, but for now, the aim is to have this available to packages
113that need to support a range of Numpy versions.
114
115Why not use Cython?
116~~~~~~~~~~~~~~~~~~~
117
118I originally implemented this in Cython, but found that I could get a
11950% performance improvement by going straight to a C extension.
120
121What about using Numba?
122~~~~~~~~~~~~~~~~~~~~~~~
123
124I specifically want to keep this package as easy as possible to install,
125and while `Numba <https://numba.pydata.org>`__ is a great package, it is
126not trivial to install outside of Anaconda.
127
128Could this be parallelized?
129~~~~~~~~~~~~~~~~~~~~~~~~~~~
130
131This may benefit from parallelization under certain circumstances. The
132easiest solution might be to use OpenMP, but this won't work on all
133platforms, so it would need to be made optional.
134
135Couldn't you make it faster by using the GPU?
136~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
137
138Almost certainly, though the aim here is to have an easily installable
139and portable package, and introducing GPUs is going to affect both of
140these.
141
142Why make a package specifically for this? This is a tiny amount of functionality
143~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
144
145Packages that need this could simply bundle their own C extension or
146Cython code to do this, but the main motivation for releasing this as a
147mini-package is to avoid making pure-Python packages into packages that
148require compilation just because of the need to compute fast histograms.
149
150Can I contribute?
151~~~~~~~~~~~~~~~~~
152
153Yes please! This is not meant to be a finished package, and I welcome
154pull request to improve things.
155
156.. |Azure Status| image:: https://dev.azure.com/thomasrobitaille/fast-histogram/_apis/build/status/astrofrog.fast-histogram?branchName=master
157   :target: https://dev.azure.com/thomasrobitaille/fast-histogram/_build/latest?definitionId=13&branchName=master
158
159.. |asv| image:: https://img.shields.io/badge/benchmarked%20by-asv-brightgreen.svg
160   :target: https://astrofrog.github.io/fast-histogram
161