1.. index::
2   single: histograms
3   single: binning data
4
5**********
6Histograms
7**********
8
9This chapter describes functions for creating histograms.  Histograms
10provide a convenient way of summarizing the distribution of a set of
11data. A histogram consists of a set of *bins* which count the number
12of events falling into a given range of a continuous variable :math:`x`.
13In GSL the bins of a histogram contain floating-point numbers, so they
14can be used to record both integer and non-integer distributions.  The
15bins can use arbitrary sets of ranges (uniformly spaced bins are the
16default).  Both one and two-dimensional histograms are supported.
17
18Once a histogram has been created it can also be converted into a
19probability distribution function.  The library provides efficient
20routines for selecting random samples from probability distributions.
21This can be useful for generating simulations based on real data.
22
23The functions are declared in the header files :file:`gsl_histogram.h`
24and :file:`gsl_histogram2d.h`.
25
26The histogram struct
27====================
28
29A histogram is defined by the following struct,
30
31.. type:: gsl_histogram
32
33   ============================= ============================================================================
34   size_t n                      This is the number of histogram bins
35   double * range                The ranges of the bins are stored in an array of :code:`n+1` elements
36                                 pointed to by range.
37   double * bin                  The counts for each bin are stored in an array of :data:`n` elements
38                                 pointed to by :data:`bin`.  The bins are floating-point numbers, so you can
39                                 increment them by non-integer values if necessary.
40   ============================= ============================================================================
41
42   The range for :code:`bin[i]` is given by :code:`range[i]` to
43   :code:`range[i+1]`.  For :math:`n` bins there are :code:`n+1` entries in the
44   array :data:`range`.  Each bin is inclusive at the lower end and exclusive
45   at the upper end.  Mathematically this means that the bins are defined by
46   the following inequality,
47
48   .. only:: not texinfo
49
50      .. math:: \hbox{bin[i] corresponds to range[i]} \le x < \hbox{range[i+1]}
51
52   .. only:: texinfo
53
54      ::
55
56         bin[i] corresponds to range[i] <= x < range[i+1]
57
58   Here is a diagram of the correspondence between ranges and bins on the
59   number-line for :math:`x`::
60
61          [ bin[0] )[ bin[1] )[ bin[2] )[ bin[3] )[ bin[4] )
62       ---|---------|---------|---------|---------|---------|---  x
63        r[0]      r[1]      r[2]      r[3]      r[4]      r[5]
64
65   In this picture the values of the :data:`range` array are denoted by
66   :math:`r`.  On the left-hand side of each bin the square bracket
67   :code:`[` denotes an inclusive lower bound
68   (:math:`r \le x`),
69   and the round parentheses :code:`)` on the right-hand
70   side denote an exclusive upper bound (:math:`x < r`).  Thus any samples
71   which fall on the upper end of the histogram are excluded.  If you want
72   to include this value for the last bin you will need to add an extra bin
73   to your histogram.
74
75   The :type:`gsl_histogram` struct and its associated functions are defined
76   in the header file :file:`gsl_histogram.h`.
77
78Histogram allocation
79====================
80
81The functions for allocating memory to a histogram follow the style of
82:func:`malloc` and :func:`free`.  In addition they also perform their own
83error checking.  If there is insufficient memory available to allocate a
84histogram then the functions call the error handler (with an error
85number of :macro:`GSL_ENOMEM`) in addition to returning a null pointer.
86Thus if you use the library error handler to abort your program then it
87isn't necessary to check every histogram :code:`alloc`.
88
89.. function:: gsl_histogram * gsl_histogram_alloc (size_t n)
90
91   This function allocates memory for a histogram with :data:`n` bins, and
92   returns a pointer to a newly created :type:`gsl_histogram` struct.  If
93   insufficient memory is available a null pointer is returned and the
94   error handler is invoked with an error code of :macro:`GSL_ENOMEM`. The
95   bins and ranges are not initialized, and should be prepared using one of
96   the range-setting functions below in order to make the histogram ready
97   for use.
98
99.. @deftypefun {gsl_histogram *} gsl_histogram_calloc (size_t n)
100.. This function allocates memory for a histogram with :data:`n` bins, and
101.. returns a pointer to its newly initialized :type:`gsl_histogram` struct.
102.. The bins are uniformly spaced with a total range of
103.. @c{$0 \le  x < n$}
104.. @math{0 <=  x < n},
105.. as shown in the table below.
106
107.. @tex
108.. \beforedisplay
109.. $$
110.. \matrix{
111.. \hbox{bin[0]}&\hbox{corresponds to}& 0 \le x < 1\cr
112.. \hbox{bin[1]}&\hbox{corresponds to}& 1 \le x < 2\cr
113.. \dots&\dots&\dots\cr
114.. \hbox{bin[n-1]}&\hbox{corresponds to}&n-1 \le x < n}
115.. $$
116.. \afterdisplay
117.. @end tex
118.. @ifinfo
119.. @display
120.. bin[0] corresponds to 0 <= x < 1
121.. bin[1] corresponds to 1 <= x < 2
122.. @dots{}
123.. bin[n-1] corresponds to n-1 <= x < n
124.. @end display
125.. @end ifinfo
126.. @noindent
127.. The bins are initialized to zero so the histogram is ready for use.
128
129.. If insufficient memory is available a null pointer is returned and the
130.. error handler is invoked with an error code of :macro:`GSL_ENOMEM`.
131.. @end deftypefun
132
133.. @deftypefun {gsl_histogram *} gsl_histogram_calloc_uniform (size_t n, double xmin, double xmax)
134.. This function allocates memory for a histogram with :data:`n` uniformly
135.. spaced bins from :data:`xmin` to :data:`xmax`, and returns a pointer to the
136.. newly initialized :type:`gsl_histogram` struct.
137.. If insufficient memory is available a null pointer is returned and the
138.. error handler is invoked with an error code of :macro:`GSL_ENOMEM`.
139.. @end deftypefun
140
141.. @deftypefun {gsl_histogram *} gsl_histogram_calloc_range (size_t n, double * range)
142.. This function allocates a histogram of size :data:`n` using the @math{n+1}
143.. bin ranges specified by the array :data:`range`.
144.. @end deftypefun
145
146.. function:: int gsl_histogram_set_ranges (gsl_histogram * h, const double range[], size_t size)
147
148   This function sets the ranges of the existing histogram :data:`h` using
149   the array :data:`range` of size :data:`size`.  The values of the histogram
150   bins are reset to zero.  The :data:`range` array should contain the
151   desired bin limits.  The ranges can be arbitrary, subject to the
152   restriction that they are monotonically increasing.
153
154   The following example shows how to create a histogram with logarithmic
155   bins with ranges [1,10), [10,100) and [100,1000)::
156
157      gsl_histogram * h = gsl_histogram_alloc (3);
158
159      /* bin[0] covers the range 1 <= x < 10 */
160      /* bin[1] covers the range 10 <= x < 100 */
161      /* bin[2] covers the range 100 <= x < 1000 */
162
163      double range[4] = { 1.0, 10.0, 100.0, 1000.0 };
164
165      gsl_histogram_set_ranges (h, range, 4);
166
167   Note that the size of the :data:`range` array should be defined to be one
168   element bigger than the number of bins.  The additional element is
169   required for the upper value of the final bin.
170
171.. function:: int gsl_histogram_set_ranges_uniform (gsl_histogram * h, double xmin, double xmax)
172
173   This function sets the ranges of the existing histogram :data:`h` to cover
174   the range :data:`xmin` to :data:`xmax` uniformly.  The values of the
175   histogram bins are reset to zero.  The bin ranges are shown in the table
176   below,
177
178   .. only:: not texinfo
179
180      .. math::
181
182         \begin{array}{ccc}
183           \hbox{bin[0]}&\hbox{corresponds to}& xmin \le  x < xmin + d \\
184           \hbox{bin[1]} &\hbox{corresponds to}& xmin + d \le  x < xmin + 2 d \\
185           \dots&\dots&\dots \\
186           \hbox{bin[n-1]} & \hbox{corresponds to}& xmin + (n-1)d \le  x < xmax
187         \end{array}
188
189   .. only:: texinfo
190
191      ::
192
193         bin[0] corresponds to xmin <= x < xmin + d
194         bin[1] corresponds to xmin + d <= x < xmin + 2 d
195         ......
196         bin[n-1] corresponds to xmin + (n-1)d <= x < xmax
197
198   where :math:`d` is the bin spacing, :math:`d = (xmax-xmin)/n`.
199
200.. function:: void gsl_histogram_free (gsl_histogram * h)
201
202   This function frees the histogram :data:`h` and all of the memory
203   associated with it.
204
205Copying Histograms
206==================
207
208.. function:: int gsl_histogram_memcpy (gsl_histogram * dest, const gsl_histogram * src)
209
210   This function copies the histogram :data:`src` into the pre-existing
211   histogram :data:`dest`, making :data:`dest` into an exact copy of :data:`src`.
212   The two histograms must be of the same size.
213
214.. function:: gsl_histogram * gsl_histogram_clone (const gsl_histogram * src)
215
216   This function returns a pointer to a newly created histogram which is an
217   exact copy of the histogram :data:`src`.
218
219Updating and accessing histogram elements
220=========================================
221
222There are two ways to access histogram bins, either by specifying an
223:math:`x` coordinate or by using the bin-index directly.  The functions
224for accessing the histogram through :math:`x` coordinates use a binary
225search to identify the bin which covers the appropriate range.
226
227.. function:: int gsl_histogram_increment (gsl_histogram * h, double x)
228
229   This function updates the histogram :data:`h` by adding one (1.0) to the
230   bin whose range contains the coordinate :data:`x`.
231
232   If :data:`x` lies in the valid range of the histogram then the function
233   returns zero to indicate success.  If :data:`x` is less than the lower
234   limit of the histogram then the function returns :macro:`GSL_EDOM`, and
235   none of bins are modified.  Similarly, if the value of :data:`x` is greater
236   than or equal to the upper limit of the histogram then the function
237   returns :macro:`GSL_EDOM`, and none of the bins are modified.  The error
238   handler is not called, however, since it is often necessary to compute
239   histograms for a small range of a larger dataset, ignoring the values
240   outside the range of interest.
241
242.. function:: int gsl_histogram_accumulate (gsl_histogram * h, double x, double weight)
243
244   This function is similar to :func:`gsl_histogram_increment` but increases
245   the value of the appropriate bin in the histogram :data:`h` by the
246   floating-point number :data:`weight`.
247
248.. function:: double gsl_histogram_get (const gsl_histogram * h, size_t i)
249
250   This function returns the contents of the :data:`i`-th bin of the histogram
251   :data:`h`.  If :data:`i` lies outside the valid range of indices for the
252   histogram then the error handler is called with an error code of
253   :macro:`GSL_EDOM` and the function returns 0.
254
255.. function:: int gsl_histogram_get_range (const gsl_histogram * h, size_t i, double * lower, double * upper)
256
257   This function finds the upper and lower range limits of the :data:`i`-th
258   bin of the histogram :data:`h`.  If the index :data:`i` is valid then the
259   corresponding range limits are stored in :data:`lower` and :data:`upper`.
260   The lower limit is inclusive (i.e. events with this coordinate are
261   included in the bin) and the upper limit is exclusive (i.e. events with
262   the coordinate of the upper limit are excluded and fall in the
263   neighboring higher bin, if it exists).  The function returns 0 to
264   indicate success.  If :data:`i` lies outside the valid range of indices for
265   the histogram then the error handler is called and the function returns
266   an error code of :macro:`GSL_EDOM`.
267
268.. function:: double gsl_histogram_max (const gsl_histogram * h)
269              double gsl_histogram_min (const gsl_histogram * h)
270              size_t gsl_histogram_bins (const gsl_histogram * h)
271
272   These functions return the maximum upper and minimum lower range limits
273   and the number of bins of the histogram :data:`h`.  They provide a way of
274   determining these values without accessing the :type:`gsl_histogram`
275   struct directly.
276
277.. function:: void gsl_histogram_reset (gsl_histogram * h)
278
279   This function resets all the bins in the histogram :data:`h` to zero.
280
281Searching histogram ranges
282==========================
283
284The following functions are used by the access and update routines to
285locate the bin which corresponds to a given :math:`x` coordinate.
286
287.. function:: int gsl_histogram_find (const gsl_histogram * h, double x, size_t * i)
288
289   This function finds and sets the index :data:`i` to the bin number which
290   covers the coordinate :data:`x` in the histogram :data:`h`.  The bin is
291   located using a binary search. The search includes an optimization for
292   histograms with uniform range, and will return the correct bin
293   immediately in this case.  If :data:`x` is found in the range of the
294   histogram then the function sets the index :data:`i` and returns
295   :macro:`GSL_SUCCESS`.  If :data:`x` lies outside the valid range of the
296   histogram then the function returns :macro:`GSL_EDOM` and the error
297   handler is invoked.
298
299.. index::
300   single: histogram statistics
301   single: statistics, from histogram
302   single: maximum value, from histogram
303   single: minimum value, from histogram
304
305Histogram Statistics
306====================
307
308.. function:: double gsl_histogram_max_val (const gsl_histogram * h)
309
310   This function returns the maximum value contained in the histogram bins.
311
312.. function:: size_t gsl_histogram_max_bin (const gsl_histogram * h)
313
314   This function returns the index of the bin containing the maximum
315   value. In the case where several bins contain the same maximum value the
316   smallest index is returned.
317
318.. function:: double gsl_histogram_min_val (const gsl_histogram * h)
319
320   This function returns the minimum value contained in the histogram bins.
321
322.. function:: size_t gsl_histogram_min_bin (const gsl_histogram * h)
323
324   This function returns the index of the bin containing the minimum
325   value. In the case where several bins contain the same minimum value the
326   smallest index is returned.
327
328.. index::
329   single: mean value, from histogram
330
331.. function:: double gsl_histogram_mean (const gsl_histogram * h)
332
333   This function returns the mean of the histogrammed variable, where the
334   histogram is regarded as a probability distribution. Negative bin values
335   are ignored for the purposes of this calculation.  The accuracy of the
336   result is limited by the bin width.
337
338.. index::
339   single: standard deviation, from histogram
340   single: variance, from histogram
341
342.. function:: double gsl_histogram_sigma (const gsl_histogram * h)
343
344   This function returns the standard deviation of the histogrammed
345   variable, where the histogram is regarded as a probability
346   distribution. Negative bin values are ignored for the purposes of this
347   calculation. The accuracy of the result is limited by the bin width.
348
349.. function:: double gsl_histogram_sum (const gsl_histogram * h)
350
351   This function returns the sum of all bin values. Negative bin values
352   are included in the sum.
353
354Histogram Operations
355====================
356
357.. function:: int gsl_histogram_equal_bins_p (const gsl_histogram * h1, const gsl_histogram * h2)
358
359   This function returns 1 if the all of the individual bin
360   ranges of the two histograms are identical, and 0
361   otherwise.
362
363.. function:: int gsl_histogram_add (gsl_histogram * h1, const gsl_histogram * h2)
364
365   This function adds the contents of the bins in histogram :data:`h2` to the
366   corresponding bins of histogram :data:`h1`,  i.e. :math:`h'_1(i) = h_1(i) + h_2(i)`.
367   The two histograms must have identical bin ranges.
368
369.. function:: int gsl_histogram_sub (gsl_histogram * h1, const gsl_histogram * h2)
370
371   This function subtracts the contents of the bins in histogram :data:`h2`
372   from the corresponding bins of histogram :data:`h1`, i.e. :math:`h'_1(i) = h_1(i) - h_2(i)`.
373   The two histograms must have identical bin ranges.
374
375.. function:: int gsl_histogram_mul (gsl_histogram * h1, const gsl_histogram * h2)
376
377   This function multiplies the contents of the bins of histogram :data:`h1`
378   by the contents of the corresponding bins in histogram :data:`h2`,
379   i.e. :math:`h'_1(i) = h_1(i) * h_2(i)`.  The two histograms must have
380   identical bin ranges.
381
382.. function:: int gsl_histogram_div (gsl_histogram * h1, const gsl_histogram * h2)
383
384   This function divides the contents of the bins of histogram :data:`h1` by
385   the contents of the corresponding bins in histogram :data:`h2`,
386   i.e. :math:`h'_1(i) = h_1(i) / h_2(i)`.  The two histograms must have
387   identical bin ranges.
388
389.. function:: int gsl_histogram_scale (gsl_histogram * h, double scale)
390
391   This function multiplies the contents of the bins of histogram :data:`h`
392   by the constant :data:`scale`, i.e.
393
394   .. only:: not texinfo
395
396      .. math:: h'_1(i) = h_1(i) * \hbox{\it scale}
397
398   .. only:: texinfo
399
400      ::
401
402         h'_1(i) = h_1(i) * scale
403
404.. function:: int gsl_histogram_shift (gsl_histogram * h, double offset)
405
406   This function shifts the contents of the bins of histogram :data:`h` by
407   the constant :data:`offset`, i.e.
408
409   .. only:: not texinfo
410
411      .. math:: h'_1(i) = h_1(i) + \hbox{\it offset}
412
413   .. only:: texinfo
414
415      ::
416
417         h'_1(i) = h_1(i) + offset
418
419Reading and writing histograms
420==============================
421
422The library provides functions for reading and writing histograms to a file
423as binary data or formatted text.
424
425.. function:: int gsl_histogram_fwrite (FILE * stream, const gsl_histogram * h)
426
427   This function writes the ranges and bins of the histogram :data:`h` to the
428   stream :data:`stream` in binary format.  The return value is 0 for success
429   and :macro:`GSL_EFAILED` if there was a problem writing to the file.  Since
430   the data is written in the native binary format it may not be portable
431   between different architectures.
432
433.. function:: int gsl_histogram_fread (FILE * stream, gsl_histogram * h)
434
435   This function reads into the histogram :data:`h` from the open stream
436   :data:`stream` in binary format.  The histogram :data:`h` must be
437   preallocated with the correct size since the function uses the number of
438   bins in :data:`h` to determine how many bytes to read.  The return value is
439   0 for success and :macro:`GSL_EFAILED` if there was a problem reading from
440   the file.  The data is assumed to have been written in the native binary
441   format on the same architecture.
442
443.. function:: int gsl_histogram_fprintf (FILE * stream, const gsl_histogram * h, const char * range_format, const char * bin_format)
444
445   This function writes the ranges and bins of the histogram :data:`h`
446   line-by-line to the stream :data:`stream` using the format specifiers
447   :data:`range_format` and :data:`bin_format`.  These should be one of the
448   :code:`%g`, :code:`%e` or :code:`%f` formats for floating point
449   numbers.  The function returns 0 for success and :macro:`GSL_EFAILED` if
450   there was a problem writing to the file.  The histogram output is
451   formatted in three columns, and the columns are separated by spaces,
452   like this::
453
454      range[0] range[1] bin[0]
455      range[1] range[2] bin[1]
456      range[2] range[3] bin[2]
457      ....
458      range[n-1] range[n] bin[n-1]
459
460   The values of the ranges are formatted using :data:`range_format` and the
461   value of the bins are formatted using :data:`bin_format`.  Each line
462   contains the lower and upper limit of the range of the bins and the
463   value of the bin itself.  Since the upper limit of one bin is the lower
464   limit of the next there is duplication of these values between lines but
465   this allows the histogram to be manipulated with line-oriented tools.
466
467.. function:: int gsl_histogram_fscanf (FILE * stream, gsl_histogram * h)
468
469   This function reads formatted data from the stream :data:`stream` into the
470   histogram :data:`h`.  The data is assumed to be in the three-column format
471   used by :func:`gsl_histogram_fprintf`.  The histogram :data:`h` must be
472   preallocated with the correct length since the function uses the size of
473   :data:`h` to determine how many numbers to read.  The function returns 0
474   for success and :macro:`GSL_EFAILED` if there was a problem reading from
475   the file.
476
477.. index::
478   single: resampling from histograms
479   single: sampling from histograms
480   single: probability distributions, from histograms
481
482Resampling from histograms
483==========================
484
485A histogram made by counting events can be regarded as a measurement of
486a probability distribution.  Allowing for statistical error, the height
487of each bin represents the probability of an event where the value of
488:math:`x` falls in the range of that bin.  The probability distribution
489function has the one-dimensional form :math:`p(x)dx` where,
490
491.. math:: p(x) = n_i / (N w_i)
492
493In this equation :math:`n_i` is the number of events in the bin which
494contains :math:`x`, :math:`w_i` is the width of the bin and :math:`N` is
495the total number of events.  The distribution of events within each bin
496is assumed to be uniform.
497
498.. index::
499   single: probability distribution, from histogram
500   single: sampling from histograms
501   single: random sampling from histograms
502   single: histograms, random sampling from
503
504The histogram probability distribution struct
505=============================================
506
507The probability distribution function for a histogram consists of a set
508of *bins* which measure the probability of an event falling into a
509given range of a continuous variable :math:`x`. A probability
510distribution function is defined by the following struct, which actually
511stores the cumulative probability distribution function.  This is the
512natural quantity for generating samples via the inverse transform
513method, because there is a one-to-one mapping between the cumulative
514probability distribution and the range [0,1].  It can be shown that by
515taking a uniform random number in this range and finding its
516corresponding coordinate in the cumulative probability distribution we
517obtain samples with the desired probability distribution.
518
519.. type:: gsl_histogram_pdf
520
521   ================================ =======================================================================
522   :code:`size_t n`                 This is the number of bins used to approximate the probability
523                                    distribution function.
524   :code:`double * range`           The ranges of the bins are stored in an array of :math:`n + 1`
525                                    elements pointed to by :data:`range`.
526   :code:`double * sum`             The cumulative probability for the bins is stored in an array of
527                                    :data:`n` elements pointed to by :data:`sum`.
528   ================================ =======================================================================
529
530The following functions allow you to create a :type:`gsl_histogram_pdf`
531struct which represents this probability distribution and generate
532random samples from it.
533
534.. function:: gsl_histogram_pdf * gsl_histogram_pdf_alloc (size_t n)
535
536   This function allocates memory for a probability distribution with
537   :data:`n` bins and returns a pointer to a newly initialized
538   :type:`gsl_histogram_pdf` struct. If insufficient memory is available a
539   null pointer is returned and the error handler is invoked with an error
540   code of :macro:`GSL_ENOMEM`.
541
542.. function:: int gsl_histogram_pdf_init (gsl_histogram_pdf * p, const gsl_histogram * h)
543
544   This function initializes the probability distribution :data:`p` with
545   the contents of the histogram :data:`h`. If any of the bins of :data:`h` are
546   negative then the error handler is invoked with an error code of
547   :macro:`GSL_EDOM` because a probability distribution cannot contain
548   negative values.
549
550.. function:: void gsl_histogram_pdf_free (gsl_histogram_pdf * p)
551
552   This function frees the probability distribution function :data:`p` and
553   all of the memory associated with it.
554
555.. function:: double gsl_histogram_pdf_sample (const gsl_histogram_pdf * p, double r)
556
557   This function uses :data:`r`, a uniform random number between zero and
558   one, to compute a single random sample from the probability distribution
559   :data:`p`.  The algorithm used to compute the sample :math:`s` is given by
560   the following formula,
561
562   .. only:: not texinfo
563
564      .. math:: s = \hbox{range}[i] + \delta * (\hbox{range}[i+1] - \hbox{range}[i])
565
566   .. only:: texinfo
567
568      ::
569
570         s = range[i] + delta * (range[i+1] - range[i])
571
572   where :math:`i` is the index which satisfies
573   :math:`sum[i] \le  r < sum[i+1]`
574   and :math:`delta` is
575   :math:`(r - sum[i])/(sum[i+1] - sum[i])`.
576
577Example programs for histograms
578===============================
579
580The following program shows how to make a simple histogram of a column
581of numerical data supplied on :code:`stdin`.  The program takes three
582arguments, specifying the upper and lower bounds of the histogram and
583the number of bins.  It then reads numbers from :code:`stdin`, one line at
584a time, and adds them to the histogram.  When there is no more data to
585read it prints out the accumulated histogram using
586:func:`gsl_histogram_fprintf`.
587
588.. include:: examples/histogram.c
589   :code:
590
591Here is an example of the program in use.  We generate 10000 random
592samples from a Cauchy distribution with a width of 30 and histogram
593them over the range -100 to 100, using 200 bins::
594
595  $ gsl-randist 0 10000 cauchy 30
596     | gsl-histogram -- -100 100 200 > histogram.dat
597
598:numref:`fig_histogram` shows the familiar shape of the
599Cauchy distribution and the fluctuations caused by the finite sample
600size.
601
602.. _fig_histogram:
603
604.. figure:: /images/histogram.png
605   :scale: 60%
606
607   Histogram output from example program
608
609.. index::
610   single: two dimensional histograms
611   single: 2D histograms
612
613Two dimensional histograms
614==========================
615
616A two dimensional histogram consists of a set of *bins* which count
617the number of events falling in a given area of the :math:`(x,y)`
618plane.  The simplest way to use a two dimensional histogram is to record
619two-dimensional position information, :math:`n(x,y)`.  Another possibility
620is to form a *joint distribution* by recording related
621variables.  For example a detector might record both the position of an
622event (:math:`x`) and the amount of energy it deposited :math:`E`.  These
623could be histogrammed as the joint distribution :math:`n(x,E)`.
624
625The 2D histogram struct
626=======================
627
628Two dimensional histograms are defined by the following struct,
629
630.. type:: gsl_histogram2d
631
632   =========================== ============================================================================
633   :code:`size_t nx, ny`       This is the number of histogram bins in the x and y directions.
634   :code:`double * xrange`     The ranges of the bins in the x-direction are stored in an array of
635                               :code:`nx + 1` elements pointed to by :data:`xrange`.
636   :code:`double * yrange`     The ranges of the bins in the y-direction are stored in an array of
637                               :code:`ny + 1` elements pointed to by :data:`yrange`.
638   :code:`double * bin`        The counts for each bin are stored in an array pointed to by :data:`bin`.
639                               The bins are floating-point numbers, so you can increment them by
640                               non-integer values if necessary.  The array :data:`bin` stores the two
641                               dimensional array of bins in a single block of memory according to the
642                               mapping :code:`bin(i,j)` = :code:`bin[i * ny + j]`.
643   =========================== ============================================================================
644
645The range for :code:`bin(i,j)` is given by :code:`xrange[i]` to
646:code:`xrange[i+1]` in the x-direction and :code:`yrange[j]` to
647:code:`yrange[j+1]` in the y-direction.  Each bin is inclusive at the lower
648end and exclusive at the upper end.  Mathematically this means that the
649bins are defined by the following inequality,
650
651.. only:: not texinfo
652
653   .. math::
654
655      \begin{array}{cc}
656        \hbox{bin(i,j) corresponds to} & \hbox{\it xrange}[i] \le x < \hbox{\it xrange}[i+1] \\
657        \hbox{and} & \hbox{\it yrange}[j] \le y < \hbox{\it yrange}[j+1]
658      \end{array}
659
660.. only:: texinfo
661
662   ::
663
664      bin(i,j) corresponds to xrange[i] <= x < xrange[i+1]
665                          and yrange[j] <= y < yrange[j+1]
666
667Note that any samples which fall on the upper sides of the histogram are
668excluded.  If you want to include these values for the side bins you will
669need to add an extra row or column to your histogram.
670
671The :type:`gsl_histogram2d` struct and its associated functions are
672defined in the header file :file:`gsl_histogram2d.h`.
673
6742D Histogram allocation
675=======================
676
677The functions for allocating memory to a 2D histogram follow the style
678of :func:`malloc` and :func:`free`.  In addition they also perform their
679own error checking.  If there is insufficient memory available to
680allocate a histogram then the functions call the error handler (with
681an error number of :macro:`GSL_ENOMEM`) in addition to returning a null
682pointer.  Thus if you use the library error handler to abort your program
683then it isn't necessary to check every 2D histogram :code:`alloc`.
684
685.. function:: gsl_histogram2d * gsl_histogram2d_alloc (size_t nx, size_t ny)
686
687   This function allocates memory for a two-dimensional histogram with
688   :data:`nx` bins in the x direction and :data:`ny` bins in the y direction.
689   The function returns a pointer to a newly created :type:`gsl_histogram2d`
690   struct. If insufficient memory is available a null pointer is returned
691   and the error handler is invoked with an error code of
692   :macro:`GSL_ENOMEM`. The bins and ranges must be initialized with one of
693   the functions below before the histogram is ready for use.
694
695.. @deftypefun {gsl_histogram2d *} gsl_histogram2d_calloc (size_t nx, size_t ny)
696.. This function allocates memory for a two-dimensional histogram with
697.. :data:`nx` bins in the x direction and :data:`ny` bins in the y
698.. direction.  The function returns a pointer to a newly initialized
699.. :type:`gsl_histogram2d` struct.  The bins are uniformly spaced with a
700.. total range of
701.. @c{$0 \le  x < nx$}
702.. @math{0 <= x < nx} in the x-direction and
703.. @c{$0 \le  y < ny$}
704.. @math{0 <=  y < ny} in the y-direction, as shown in the table below.
705..
706.. The bins are initialized to zero so the histogram is ready for use.
707..
708.. If insufficient memory is available a null pointer is returned and the
709.. error handler is invoked with an error code of :macro:`GSL_ENOMEM`.
710.. @end deftypefun
711..
712.. @deftypefun {gsl_histogram2d *} gsl_histogram2d_calloc_uniform (size_t nx, size_t ny, double xmin, double xmax, double ymin, double ymax)
713.. This function allocates a histogram of size :data:`nx`-by-:data:`ny` which
714.. uniformly covers the ranges :data:`xmin` to :data:`xmax` and :data:`ymin` to
715.. :data:`ymax` in the :math:`x` and :math:`y` directions respectively.
716.. @end deftypefun
717..
718.. @deftypefun {gsl_histogram2d *} gsl_histogram2d_calloc_range (size_t nx, size_t ny, double * xrange, double * yrange)
719.. This function allocates a histogram of size :data:`nx`-by-:data:`ny` using
720.. the @math{nx+1} and @math{ny+1} bin ranges specified by the arrays
721.. :data:`xrange` and :data:`yrange`.
722.. @end deftypefun
723
724.. function:: int gsl_histogram2d_set_ranges (gsl_histogram2d * h,  const double xrange[], size_t xsize, const double yrange[], size_t ysize)
725
726   This function sets the ranges of the existing histogram :data:`h` using
727   the arrays :data:`xrange` and :data:`yrange` of size :data:`xsize` and
728   :data:`ysize` respectively.  The values of the histogram bins are reset to
729   zero.
730
731.. function:: int gsl_histogram2d_set_ranges_uniform (gsl_histogram2d * h, double xmin, double xmax, double ymin, double ymax)
732
733   This function sets the ranges of the existing histogram :data:`h` to cover
734   the ranges :data:`xmin` to :data:`xmax` and :data:`ymin` to :data:`ymax`
735   uniformly.  The values of the histogram bins are reset to zero.
736
737.. function:: void gsl_histogram2d_free (gsl_histogram2d * h)
738
739   This function frees the 2D histogram :data:`h` and all of the memory
740   associated with it.
741
742Copying 2D Histograms
743=====================
744
745.. function:: int gsl_histogram2d_memcpy (gsl_histogram2d * dest, const gsl_histogram2d * src)
746
747   This function copies the histogram :data:`src` into the pre-existing
748   histogram :data:`dest`, making :data:`dest` into an exact copy of :data:`src`.
749   The two histograms must be of the same size.
750
751.. function:: gsl_histogram2d * gsl_histogram2d_clone (const gsl_histogram2d * src)
752
753   This function returns a pointer to a newly created histogram which is an
754   exact copy of the histogram :data:`src`.
755
756Updating and accessing 2D histogram elements
757============================================
758
759You can access the bins of a two-dimensional histogram either by
760specifying a pair of :math:`(x,y)` coordinates or by using the bin
761indices :math:`(i,j)` directly.  The functions for accessing the histogram
762through :math:`(x,y)` coordinates use binary searches in the x and y
763directions to identify the bin which covers the appropriate range.
764
765.. function:: int gsl_histogram2d_increment (gsl_histogram2d * h, double x, double y)
766
767   This function updates the histogram :data:`h` by adding one (1.0) to the
768   bin whose x and y ranges contain the coordinates (:data:`x`, :data:`y`).
769
770   If the point :math:`(x,y)` lies inside the valid ranges of the
771   histogram then the function returns zero to indicate success.  If
772   :math:`(x,y)` lies outside the limits of the histogram then the
773   function returns :macro:`GSL_EDOM`, and none of the bins are modified.  The
774   error handler is not called, since it is often necessary to compute
775   histograms for a small range of a larger dataset, ignoring any
776   coordinates outside the range of interest.
777
778.. function:: int gsl_histogram2d_accumulate (gsl_histogram2d * h, double x, double y, double weight)
779
780   This function is similar to :func:`gsl_histogram2d_increment` but increases
781   the value of the appropriate bin in the histogram :data:`h` by the
782   floating-point number :data:`weight`.
783
784.. function:: double gsl_histogram2d_get (const gsl_histogram2d * h, size_t i, size_t j)
785
786   This function returns the contents of the (:data:`i`, :data:`j`)-th bin of the
787   histogram :data:`h`.  If (:data:`i`, :data:`j`) lies outside the valid range of
788   indices for the histogram then the error handler is called with an error
789   code of :macro:`GSL_EDOM` and the function returns 0.
790
791.. function:: int gsl_histogram2d_get_xrange (const gsl_histogram2d * h, size_t i, double * xlower, double * xupper)
792              int gsl_histogram2d_get_yrange (const gsl_histogram2d * h, size_t j, double * ylower, double * yupper)
793
794   These functions find the upper and lower range limits of the :data:`i`-th
795   and :data:`j`-th bins in the x and y directions of the histogram :data:`h`.
796   The range limits are stored in :data:`xlower` and :data:`xupper` or
797   :data:`ylower` and :data:`yupper`.  The lower limits are inclusive
798   (i.e. events with these coordinates are included in the bin) and the
799   upper limits are exclusive (i.e. events with the value of the upper
800   limit are not included and fall in the neighboring higher bin, if it
801   exists).  The functions return 0 to indicate success.  If :data:`i` or
802   :data:`j` lies outside the valid range of indices for the histogram then
803   the error handler is called with an error code of :macro:`GSL_EDOM`.
804
805.. function:: double gsl_histogram2d_xmax (const gsl_histogram2d * h)
806              double gsl_histogram2d_xmin (const gsl_histogram2d * h)
807              size_t gsl_histogram2d_nx (const gsl_histogram2d * h)
808              double gsl_histogram2d_ymax (const gsl_histogram2d * h)
809              double gsl_histogram2d_ymin (const gsl_histogram2d * h)
810              size_t gsl_histogram2d_ny (const gsl_histogram2d * h)
811
812   These functions return the maximum upper and minimum lower range limits
813   and the number of bins for the x and y directions of the histogram
814   :data:`h`.  They provide a way of determining these values without
815   accessing the :type:`gsl_histogram2d` struct directly.
816
817.. function:: void gsl_histogram2d_reset (gsl_histogram2d * h)
818
819   This function resets all the bins of the histogram :data:`h` to zero.
820
821Searching 2D histogram ranges
822=============================
823
824The following functions are used by the access and update routines to
825locate the bin which corresponds to a given :math:`(x,y)` coordinate.
826
827.. function:: int gsl_histogram2d_find (const gsl_histogram2d * h, double x, double y, size_t * i, size_t * j)
828
829   This function finds and sets the indices :data:`i` and :data:`j` to
830   the bin which covers the coordinates (:data:`x`, :data:`y`). The bin is
831   located using a binary search.  The search includes an optimization for
832   histograms with uniform ranges, and will return the correct bin immediately
833   in this case. If :math:`(x,y)` is found then the function sets the
834   indices (:data:`i`, :data:`j`) and returns :macro:`GSL_SUCCESS`.  If
835   :math:`(x,y)` lies outside the valid range of the histogram then the
836   function returns :macro:`GSL_EDOM` and the error handler is invoked.
837
8382D Histogram Statistics
839=======================
840
841.. function:: double gsl_histogram2d_max_val (const gsl_histogram2d * h)
842
843   This function returns the maximum value contained in the histogram bins.
844
845.. function:: void gsl_histogram2d_max_bin (const gsl_histogram2d * h, size_t * i, size_t * j)
846
847   This function finds the indices of the bin containing the maximum value
848   in the histogram :data:`h` and stores the result in (:data:`i`, :data:`j`). In
849   the case where several bins contain the same maximum value the first bin
850   found is returned.
851
852.. function:: double gsl_histogram2d_min_val (const gsl_histogram2d * h)
853
854   This function returns the minimum value contained in the histogram bins.
855
856.. function:: void gsl_histogram2d_min_bin (const gsl_histogram2d * h, size_t * i, size_t * j)
857
858   This function finds the indices of the bin containing the minimum value
859   in the histogram :data:`h` and stores the result in (:data:`i`, :data:`j`). In
860   the case where several bins contain the same maximum value the first bin
861   found is returned.
862
863.. function:: double gsl_histogram2d_xmean (const gsl_histogram2d * h)
864
865   This function returns the mean of the histogrammed x variable, where the
866   histogram is regarded as a probability distribution. Negative bin values
867   are ignored for the purposes of this calculation.
868
869.. function:: double gsl_histogram2d_ymean (const gsl_histogram2d * h)
870
871   This function returns the mean of the histogrammed y variable, where the
872   histogram is regarded as a probability distribution. Negative bin values
873   are ignored for the purposes of this calculation.
874
875.. function:: double gsl_histogram2d_xsigma (const gsl_histogram2d * h)
876
877   This function returns the standard deviation of the histogrammed
878   x variable, where the histogram is regarded as a probability
879   distribution. Negative bin values are ignored for the purposes of this
880   calculation.
881
882.. function:: double gsl_histogram2d_ysigma (const gsl_histogram2d * h)
883
884   This function returns the standard deviation of the histogrammed
885   y variable, where the histogram is regarded as a probability
886   distribution. Negative bin values are ignored for the purposes of this
887   calculation.
888
889.. function:: double gsl_histogram2d_cov (const gsl_histogram2d * h)
890
891   This function returns the covariance of the histogrammed x and y
892   variables, where the histogram is regarded as a probability
893   distribution. Negative bin values are ignored for the purposes of this
894   calculation.
895
896.. function:: double gsl_histogram2d_sum (const gsl_histogram2d * h)
897
898   This function returns the sum of all bin values. Negative bin values
899   are included in the sum.
900
9012D Histogram Operations
902=======================
903
904.. function:: int gsl_histogram2d_equal_bins_p (const gsl_histogram2d * h1, const gsl_histogram2d * h2)
905
906   This function returns 1 if all the individual bin ranges of the two
907   histograms are identical, and 0 otherwise.
908
909.. function:: int gsl_histogram2d_add (gsl_histogram2d * h1, const gsl_histogram2d * h2)
910
911   This function adds the contents of the bins in histogram :data:`h2` to the
912   corresponding bins of histogram :data:`h1`,
913   i.e. :math:`h'_1(i,j) = h_1(i,j) + h_2(i,j)`.
914   The two histograms must have identical bin ranges.
915
916.. function:: int gsl_histogram2d_sub (gsl_histogram2d * h1, const gsl_histogram2d * h2)
917
918   This function subtracts the contents of the bins in histogram :data:`h2` from the
919   corresponding bins of histogram :data:`h1`,
920   i.e. :math:`h'_1(i,j) = h_1(i,j) - h_2(i,j)`.
921   The two histograms must have identical bin ranges.
922
923.. function:: int gsl_histogram2d_mul (gsl_histogram2d * h1, const gsl_histogram2d * h2)
924
925   This function multiplies the contents of the bins of histogram :data:`h1`
926   by the contents of the corresponding bins in histogram :data:`h2`,
927   i.e. :math:`h'_1(i,j) = h_1(i,j) * h_2(i,j)`.
928   The two histograms must have identical bin ranges.
929
930.. function:: int gsl_histogram2d_div (gsl_histogram2d * h1, const gsl_histogram2d * h2)
931
932   This function divides the contents of the bins of histogram :data:`h1`
933   by the contents of the corresponding bins in histogram :data:`h2`,
934   i.e. :math:`h'_1(i,j) = h_1(i,j) / h_2(i,j)`.
935   The two histograms must have identical bin ranges.
936
937.. function:: int gsl_histogram2d_scale (gsl_histogram2d * h, double scale)
938
939   This function multiplies the contents of the bins of histogram :data:`h`
940   by the constant :data:`scale`, i.e.
941
942   .. only:: not texinfo
943
944      .. math:: h'_1(i,j) = h_1(i,j) * \hbox{\it scale}
945
946   .. only:: texinfo
947
948      ::
949
950         h'_1(i,j) = h_1(i,j) scale
951
952.. function:: int gsl_histogram2d_shift (gsl_histogram2d * h, double offset)
953
954   This function shifts the contents of the bins of histogram :data:`h`
955   by the constant :data:`offset`, i.e.
956
957   .. only:: not texinfo
958
959      .. math:: h'_1(i,j) = h_1(i,j) + \hbox{\it offset}
960
961   .. only:: texinfo
962
963      ::
964
965         h'_1(i,j) = h_1(i,j) + offset
966
967Reading and writing 2D histograms
968=================================
969
970The library provides functions for reading and writing two dimensional
971histograms to a file as binary data or formatted text.
972
973.. function:: int gsl_histogram2d_fwrite (FILE * stream, const gsl_histogram2d * h)
974
975   This function writes the ranges and bins of the histogram :data:`h` to the
976   stream :data:`stream` in binary format.  The return value is 0 for success
977   and :macro:`GSL_EFAILED` if there was a problem writing to the file.  Since
978   the data is written in the native binary format it may not be portable
979   between different architectures.
980
981.. function:: int gsl_histogram2d_fread (FILE * stream, gsl_histogram2d * h)
982
983   This function reads into the histogram :data:`h` from the stream
984   :data:`stream` in binary format.  The histogram :data:`h` must be
985   preallocated with the correct size since the function uses the number of
986   x and y bins in :data:`h` to determine how many bytes to read.  The return
987   value is 0 for success and :macro:`GSL_EFAILED` if there was a problem
988   reading from the file.  The data is assumed to have been written in the
989   native binary format on the same architecture.
990
991.. function:: int gsl_histogram2d_fprintf (FILE * stream, const gsl_histogram2d * h, const char * range_format, const char * bin_format)
992
993   This function writes the ranges and bins of the histogram :data:`h`
994   line-by-line to the stream :data:`stream` using the format specifiers
995   :data:`range_format` and :data:`bin_format`.  These should be one of the
996   :code:`%g`, :code:`%e` or :code:`%f` formats for floating point
997   numbers.  The function returns 0 for success and :macro:`GSL_EFAILED` if
998   there was a problem writing to the file.  The histogram output is
999   formatted in five columns, and the columns are separated by spaces,
1000   like this::
1001
1002      xrange[0] xrange[1] yrange[0] yrange[1] bin(0,0)
1003      xrange[0] xrange[1] yrange[1] yrange[2] bin(0,1)
1004      xrange[0] xrange[1] yrange[2] yrange[3] bin(0,2)
1005      ....
1006      xrange[0] xrange[1] yrange[ny-1] yrange[ny] bin(0,ny-1)
1007
1008      xrange[1] xrange[2] yrange[0] yrange[1] bin(1,0)
1009      xrange[1] xrange[2] yrange[1] yrange[2] bin(1,1)
1010      xrange[1] xrange[2] yrange[1] yrange[2] bin(1,2)
1011      ....
1012      xrange[1] xrange[2] yrange[ny-1] yrange[ny] bin(1,ny-1)
1013
1014      ....
1015
1016      xrange[nx-1] xrange[nx] yrange[0] yrange[1] bin(nx-1,0)
1017      xrange[nx-1] xrange[nx] yrange[1] yrange[2] bin(nx-1,1)
1018      xrange[nx-1] xrange[nx] yrange[1] yrange[2] bin(nx-1,2)
1019      ....
1020      xrange[nx-1] xrange[nx] yrange[ny-1] yrange[ny] bin(nx-1,ny-1)
1021
1022   Each line contains the lower and upper limits of the bin and the
1023   contents of the bin.  Since the upper limits of the each bin are the
1024   lower limits of the neighboring bins there is duplication of these
1025   values but this allows the histogram to be manipulated with
1026   line-oriented tools.
1027
1028.. function:: int gsl_histogram2d_fscanf (FILE * stream, gsl_histogram2d * h)
1029
1030   This function reads formatted data from the stream :data:`stream` into the
1031   histogram :data:`h`.  The data is assumed to be in the five-column format
1032   used by :func:`gsl_histogram2d_fprintf`.  The histogram :data:`h` must be
1033   preallocated with the correct lengths since the function uses the sizes
1034   of :data:`h` to determine how many numbers to read.  The function returns 0
1035   for success and :macro:`GSL_EFAILED` if there was a problem reading from
1036   the file.
1037
1038Resampling from 2D histograms
1039=============================
1040
1041As in the one-dimensional case, a two-dimensional histogram made by
1042counting events can be regarded as a measurement of a probability
1043distribution.  Allowing for statistical error, the height of each bin
1044represents the probability of an event where (:math:`x`, :math:`y`) falls in
1045the range of that bin.  For a two-dimensional histogram the probability
1046distribution takes the form :math:`p(x,y) dx dy` where,
1047
1048.. math:: p(x,y) = n_{ij} / (N A_{ij})
1049
1050In this equation :math:`n_{ij}`
1051is the number of events in the bin which
1052contains :math:`(x,y)`, :math:`A_{ij}`
1053is the area of the bin and :math:`N` is
1054the total number of events.  The distribution of events within each bin
1055is assumed to be uniform.
1056
1057.. type:: gsl_histogram2d_pdf
1058
1059   ============================= ===========================================================================
1060   :code:`size_t nx, ny`         This is the number of histogram bins used to approximate the probability
1061                                 distribution function in the x and y directions.
1062   :code:`double * xrange`       The ranges of the bins in the x-direction are stored in an array of
1063                                 :code:`nx + 1` elements pointed to by :data:`xrange`.
1064   :code:`double * yrange`       The ranges of the bins in the y-direction are stored in an array of
1065                                 :code:`ny + 1` pointed to by :data:`yrange`.
1066   :code:`double * sum`          The cumulative probability for the bins is stored in an array of
1067                                 :data:`nx` * :data:`ny` elements pointed to by :data:`sum`.
1068   ============================= ===========================================================================
1069
1070The following functions allow you to create a :type:`gsl_histogram2d_pdf`
1071struct which represents a two dimensional probability distribution and
1072generate random samples from it.
1073
1074.. function:: gsl_histogram2d_pdf * gsl_histogram2d_pdf_alloc (size_t nx, size_t ny)
1075
1076   This function allocates memory for a two-dimensional probability
1077   distribution of size :data:`nx`-by-:data:`ny` and returns a pointer to a
1078   newly initialized :type:`gsl_histogram2d_pdf` struct. If insufficient
1079   memory is available a null pointer is returned and the error handler is
1080   invoked with an error code of :macro:`GSL_ENOMEM`.
1081
1082.. function:: int gsl_histogram2d_pdf_init (gsl_histogram2d_pdf * p, const gsl_histogram2d * h)
1083
1084   This function initializes the two-dimensional probability distribution
1085   calculated :data:`p` from the histogram :data:`h`.  If any of the bins of
1086   :data:`h` are negative then the error handler is invoked with an error
1087   code of :macro:`GSL_EDOM` because a probability distribution cannot
1088   contain negative values.
1089
1090.. function:: void gsl_histogram2d_pdf_free (gsl_histogram2d_pdf * p)
1091
1092   This function frees the two-dimensional probability distribution
1093   function :data:`p` and all of the memory associated with it.
1094
1095.. function:: int gsl_histogram2d_pdf_sample (const gsl_histogram2d_pdf * p, double r1, double r2, double * x, double * y)
1096
1097   This function uses two uniform random numbers between zero and one,
1098   :data:`r1` and :data:`r2`, to compute a single random sample from the
1099   two-dimensional probability distribution :data:`p`.
1100
1101Example programs for 2D histograms
1102==================================
1103
1104This program demonstrates two features of two-dimensional histograms.
1105First a 10-by-10 two-dimensional histogram is created with x and y running
1106from 0 to 1.  Then a few sample points are added to the histogram, at
1107(0.3,0.3) with a height of 1, at (0.8,0.1) with a height of 5 and at
1108(0.7,0.9) with a height of 0.5.  This histogram with three events is
1109used to generate a random sample of 1000 simulated events, which are
1110printed out.
1111
1112.. include:: examples/histogram2d.c
1113   :code:
1114
1115The following plot shows the distribution of the simulated events.  Using
1116a higher resolution grid we can see the original underlying histogram
1117and also the statistical fluctuations caused by the events being
1118uniformly distributed over the area of the original bins.
1119
1120.. figure:: /images/histogram2d.png
1121   :scale: 60%
1122
1123   Distribution of simulated events from example program
1124