1.. index:: 2 single: histograms 3 single: binning data 4 5********** 6Histograms 7********** 8 9This chapter describes functions for creating histograms. Histograms 10provide a convenient way of summarizing the distribution of a set of 11data. A histogram consists of a set of *bins* which count the number 12of events falling into a given range of a continuous variable :math:`x`. 13In GSL the bins of a histogram contain floating-point numbers, so they 14can be used to record both integer and non-integer distributions. The 15bins can use arbitrary sets of ranges (uniformly spaced bins are the 16default). Both one and two-dimensional histograms are supported. 17 18Once a histogram has been created it can also be converted into a 19probability distribution function. The library provides efficient 20routines for selecting random samples from probability distributions. 21This can be useful for generating simulations based on real data. 22 23The functions are declared in the header files :file:`gsl_histogram.h` 24and :file:`gsl_histogram2d.h`. 25 26The histogram struct 27==================== 28 29A histogram is defined by the following struct, 30 31.. type:: gsl_histogram 32 33 ============================= ============================================================================ 34 size_t n This is the number of histogram bins 35 double * range The ranges of the bins are stored in an array of :code:`n+1` elements 36 pointed to by range. 37 double * bin The counts for each bin are stored in an array of :data:`n` elements 38 pointed to by :data:`bin`. The bins are floating-point numbers, so you can 39 increment them by non-integer values if necessary. 40 ============================= ============================================================================ 41 42 The range for :code:`bin[i]` is given by :code:`range[i]` to 43 :code:`range[i+1]`. For :math:`n` bins there are :code:`n+1` entries in the 44 array :data:`range`. Each bin is inclusive at the lower end and exclusive 45 at the upper end. Mathematically this means that the bins are defined by 46 the following inequality, 47 48 .. only:: not texinfo 49 50 .. math:: \hbox{bin[i] corresponds to range[i]} \le x < \hbox{range[i+1]} 51 52 .. only:: texinfo 53 54 :: 55 56 bin[i] corresponds to range[i] <= x < range[i+1] 57 58 Here is a diagram of the correspondence between ranges and bins on the 59 number-line for :math:`x`:: 60 61 [ bin[0] )[ bin[1] )[ bin[2] )[ bin[3] )[ bin[4] ) 62 ---|---------|---------|---------|---------|---------|--- x 63 r[0] r[1] r[2] r[3] r[4] r[5] 64 65 In this picture the values of the :data:`range` array are denoted by 66 :math:`r`. On the left-hand side of each bin the square bracket 67 :code:`[` denotes an inclusive lower bound 68 (:math:`r \le x`), 69 and the round parentheses :code:`)` on the right-hand 70 side denote an exclusive upper bound (:math:`x < r`). Thus any samples 71 which fall on the upper end of the histogram are excluded. If you want 72 to include this value for the last bin you will need to add an extra bin 73 to your histogram. 74 75 The :type:`gsl_histogram` struct and its associated functions are defined 76 in the header file :file:`gsl_histogram.h`. 77 78Histogram allocation 79==================== 80 81The functions for allocating memory to a histogram follow the style of 82:func:`malloc` and :func:`free`. In addition they also perform their own 83error checking. If there is insufficient memory available to allocate a 84histogram then the functions call the error handler (with an error 85number of :macro:`GSL_ENOMEM`) in addition to returning a null pointer. 86Thus if you use the library error handler to abort your program then it 87isn't necessary to check every histogram :code:`alloc`. 88 89.. function:: gsl_histogram * gsl_histogram_alloc (size_t n) 90 91 This function allocates memory for a histogram with :data:`n` bins, and 92 returns a pointer to a newly created :type:`gsl_histogram` struct. If 93 insufficient memory is available a null pointer is returned and the 94 error handler is invoked with an error code of :macro:`GSL_ENOMEM`. The 95 bins and ranges are not initialized, and should be prepared using one of 96 the range-setting functions below in order to make the histogram ready 97 for use. 98 99.. @deftypefun {gsl_histogram *} gsl_histogram_calloc (size_t n) 100.. This function allocates memory for a histogram with :data:`n` bins, and 101.. returns a pointer to its newly initialized :type:`gsl_histogram` struct. 102.. The bins are uniformly spaced with a total range of 103.. @c{$0 \le x < n$} 104.. @math{0 <= x < n}, 105.. as shown in the table below. 106 107.. @tex 108.. \beforedisplay 109.. $$ 110.. \matrix{ 111.. \hbox{bin[0]}&\hbox{corresponds to}& 0 \le x < 1\cr 112.. \hbox{bin[1]}&\hbox{corresponds to}& 1 \le x < 2\cr 113.. \dots&\dots&\dots\cr 114.. \hbox{bin[n-1]}&\hbox{corresponds to}&n-1 \le x < n} 115.. $$ 116.. \afterdisplay 117.. @end tex 118.. @ifinfo 119.. @display 120.. bin[0] corresponds to 0 <= x < 1 121.. bin[1] corresponds to 1 <= x < 2 122.. @dots{} 123.. bin[n-1] corresponds to n-1 <= x < n 124.. @end display 125.. @end ifinfo 126.. @noindent 127.. The bins are initialized to zero so the histogram is ready for use. 128 129.. If insufficient memory is available a null pointer is returned and the 130.. error handler is invoked with an error code of :macro:`GSL_ENOMEM`. 131.. @end deftypefun 132 133.. @deftypefun {gsl_histogram *} gsl_histogram_calloc_uniform (size_t n, double xmin, double xmax) 134.. This function allocates memory for a histogram with :data:`n` uniformly 135.. spaced bins from :data:`xmin` to :data:`xmax`, and returns a pointer to the 136.. newly initialized :type:`gsl_histogram` struct. 137.. If insufficient memory is available a null pointer is returned and the 138.. error handler is invoked with an error code of :macro:`GSL_ENOMEM`. 139.. @end deftypefun 140 141.. @deftypefun {gsl_histogram *} gsl_histogram_calloc_range (size_t n, double * range) 142.. This function allocates a histogram of size :data:`n` using the @math{n+1} 143.. bin ranges specified by the array :data:`range`. 144.. @end deftypefun 145 146.. function:: int gsl_histogram_set_ranges (gsl_histogram * h, const double range[], size_t size) 147 148 This function sets the ranges of the existing histogram :data:`h` using 149 the array :data:`range` of size :data:`size`. The values of the histogram 150 bins are reset to zero. The :data:`range` array should contain the 151 desired bin limits. The ranges can be arbitrary, subject to the 152 restriction that they are monotonically increasing. 153 154 The following example shows how to create a histogram with logarithmic 155 bins with ranges [1,10), [10,100) and [100,1000):: 156 157 gsl_histogram * h = gsl_histogram_alloc (3); 158 159 /* bin[0] covers the range 1 <= x < 10 */ 160 /* bin[1] covers the range 10 <= x < 100 */ 161 /* bin[2] covers the range 100 <= x < 1000 */ 162 163 double range[4] = { 1.0, 10.0, 100.0, 1000.0 }; 164 165 gsl_histogram_set_ranges (h, range, 4); 166 167 Note that the size of the :data:`range` array should be defined to be one 168 element bigger than the number of bins. The additional element is 169 required for the upper value of the final bin. 170 171.. function:: int gsl_histogram_set_ranges_uniform (gsl_histogram * h, double xmin, double xmax) 172 173 This function sets the ranges of the existing histogram :data:`h` to cover 174 the range :data:`xmin` to :data:`xmax` uniformly. The values of the 175 histogram bins are reset to zero. The bin ranges are shown in the table 176 below, 177 178 .. only:: not texinfo 179 180 .. math:: 181 182 \begin{array}{ccc} 183 \hbox{bin[0]}&\hbox{corresponds to}& xmin \le x < xmin + d \\ 184 \hbox{bin[1]} &\hbox{corresponds to}& xmin + d \le x < xmin + 2 d \\ 185 \dots&\dots&\dots \\ 186 \hbox{bin[n-1]} & \hbox{corresponds to}& xmin + (n-1)d \le x < xmax 187 \end{array} 188 189 .. only:: texinfo 190 191 :: 192 193 bin[0] corresponds to xmin <= x < xmin + d 194 bin[1] corresponds to xmin + d <= x < xmin + 2 d 195 ...... 196 bin[n-1] corresponds to xmin + (n-1)d <= x < xmax 197 198 where :math:`d` is the bin spacing, :math:`d = (xmax-xmin)/n`. 199 200.. function:: void gsl_histogram_free (gsl_histogram * h) 201 202 This function frees the histogram :data:`h` and all of the memory 203 associated with it. 204 205Copying Histograms 206================== 207 208.. function:: int gsl_histogram_memcpy (gsl_histogram * dest, const gsl_histogram * src) 209 210 This function copies the histogram :data:`src` into the pre-existing 211 histogram :data:`dest`, making :data:`dest` into an exact copy of :data:`src`. 212 The two histograms must be of the same size. 213 214.. function:: gsl_histogram * gsl_histogram_clone (const gsl_histogram * src) 215 216 This function returns a pointer to a newly created histogram which is an 217 exact copy of the histogram :data:`src`. 218 219Updating and accessing histogram elements 220========================================= 221 222There are two ways to access histogram bins, either by specifying an 223:math:`x` coordinate or by using the bin-index directly. The functions 224for accessing the histogram through :math:`x` coordinates use a binary 225search to identify the bin which covers the appropriate range. 226 227.. function:: int gsl_histogram_increment (gsl_histogram * h, double x) 228 229 This function updates the histogram :data:`h` by adding one (1.0) to the 230 bin whose range contains the coordinate :data:`x`. 231 232 If :data:`x` lies in the valid range of the histogram then the function 233 returns zero to indicate success. If :data:`x` is less than the lower 234 limit of the histogram then the function returns :macro:`GSL_EDOM`, and 235 none of bins are modified. Similarly, if the value of :data:`x` is greater 236 than or equal to the upper limit of the histogram then the function 237 returns :macro:`GSL_EDOM`, and none of the bins are modified. The error 238 handler is not called, however, since it is often necessary to compute 239 histograms for a small range of a larger dataset, ignoring the values 240 outside the range of interest. 241 242.. function:: int gsl_histogram_accumulate (gsl_histogram * h, double x, double weight) 243 244 This function is similar to :func:`gsl_histogram_increment` but increases 245 the value of the appropriate bin in the histogram :data:`h` by the 246 floating-point number :data:`weight`. 247 248.. function:: double gsl_histogram_get (const gsl_histogram * h, size_t i) 249 250 This function returns the contents of the :data:`i`-th bin of the histogram 251 :data:`h`. If :data:`i` lies outside the valid range of indices for the 252 histogram then the error handler is called with an error code of 253 :macro:`GSL_EDOM` and the function returns 0. 254 255.. function:: int gsl_histogram_get_range (const gsl_histogram * h, size_t i, double * lower, double * upper) 256 257 This function finds the upper and lower range limits of the :data:`i`-th 258 bin of the histogram :data:`h`. If the index :data:`i` is valid then the 259 corresponding range limits are stored in :data:`lower` and :data:`upper`. 260 The lower limit is inclusive (i.e. events with this coordinate are 261 included in the bin) and the upper limit is exclusive (i.e. events with 262 the coordinate of the upper limit are excluded and fall in the 263 neighboring higher bin, if it exists). The function returns 0 to 264 indicate success. If :data:`i` lies outside the valid range of indices for 265 the histogram then the error handler is called and the function returns 266 an error code of :macro:`GSL_EDOM`. 267 268.. function:: double gsl_histogram_max (const gsl_histogram * h) 269 double gsl_histogram_min (const gsl_histogram * h) 270 size_t gsl_histogram_bins (const gsl_histogram * h) 271 272 These functions return the maximum upper and minimum lower range limits 273 and the number of bins of the histogram :data:`h`. They provide a way of 274 determining these values without accessing the :type:`gsl_histogram` 275 struct directly. 276 277.. function:: void gsl_histogram_reset (gsl_histogram * h) 278 279 This function resets all the bins in the histogram :data:`h` to zero. 280 281Searching histogram ranges 282========================== 283 284The following functions are used by the access and update routines to 285locate the bin which corresponds to a given :math:`x` coordinate. 286 287.. function:: int gsl_histogram_find (const gsl_histogram * h, double x, size_t * i) 288 289 This function finds and sets the index :data:`i` to the bin number which 290 covers the coordinate :data:`x` in the histogram :data:`h`. The bin is 291 located using a binary search. The search includes an optimization for 292 histograms with uniform range, and will return the correct bin 293 immediately in this case. If :data:`x` is found in the range of the 294 histogram then the function sets the index :data:`i` and returns 295 :macro:`GSL_SUCCESS`. If :data:`x` lies outside the valid range of the 296 histogram then the function returns :macro:`GSL_EDOM` and the error 297 handler is invoked. 298 299.. index:: 300 single: histogram statistics 301 single: statistics, from histogram 302 single: maximum value, from histogram 303 single: minimum value, from histogram 304 305Histogram Statistics 306==================== 307 308.. function:: double gsl_histogram_max_val (const gsl_histogram * h) 309 310 This function returns the maximum value contained in the histogram bins. 311 312.. function:: size_t gsl_histogram_max_bin (const gsl_histogram * h) 313 314 This function returns the index of the bin containing the maximum 315 value. In the case where several bins contain the same maximum value the 316 smallest index is returned. 317 318.. function:: double gsl_histogram_min_val (const gsl_histogram * h) 319 320 This function returns the minimum value contained in the histogram bins. 321 322.. function:: size_t gsl_histogram_min_bin (const gsl_histogram * h) 323 324 This function returns the index of the bin containing the minimum 325 value. In the case where several bins contain the same minimum value the 326 smallest index is returned. 327 328.. index:: 329 single: mean value, from histogram 330 331.. function:: double gsl_histogram_mean (const gsl_histogram * h) 332 333 This function returns the mean of the histogrammed variable, where the 334 histogram is regarded as a probability distribution. Negative bin values 335 are ignored for the purposes of this calculation. The accuracy of the 336 result is limited by the bin width. 337 338.. index:: 339 single: standard deviation, from histogram 340 single: variance, from histogram 341 342.. function:: double gsl_histogram_sigma (const gsl_histogram * h) 343 344 This function returns the standard deviation of the histogrammed 345 variable, where the histogram is regarded as a probability 346 distribution. Negative bin values are ignored for the purposes of this 347 calculation. The accuracy of the result is limited by the bin width. 348 349.. function:: double gsl_histogram_sum (const gsl_histogram * h) 350 351 This function returns the sum of all bin values. Negative bin values 352 are included in the sum. 353 354Histogram Operations 355==================== 356 357.. function:: int gsl_histogram_equal_bins_p (const gsl_histogram * h1, const gsl_histogram * h2) 358 359 This function returns 1 if the all of the individual bin 360 ranges of the two histograms are identical, and 0 361 otherwise. 362 363.. function:: int gsl_histogram_add (gsl_histogram * h1, const gsl_histogram * h2) 364 365 This function adds the contents of the bins in histogram :data:`h2` to the 366 corresponding bins of histogram :data:`h1`, i.e. :math:`h'_1(i) = h_1(i) + h_2(i)`. 367 The two histograms must have identical bin ranges. 368 369.. function:: int gsl_histogram_sub (gsl_histogram * h1, const gsl_histogram * h2) 370 371 This function subtracts the contents of the bins in histogram :data:`h2` 372 from the corresponding bins of histogram :data:`h1`, i.e. :math:`h'_1(i) = h_1(i) - h_2(i)`. 373 The two histograms must have identical bin ranges. 374 375.. function:: int gsl_histogram_mul (gsl_histogram * h1, const gsl_histogram * h2) 376 377 This function multiplies the contents of the bins of histogram :data:`h1` 378 by the contents of the corresponding bins in histogram :data:`h2`, 379 i.e. :math:`h'_1(i) = h_1(i) * h_2(i)`. The two histograms must have 380 identical bin ranges. 381 382.. function:: int gsl_histogram_div (gsl_histogram * h1, const gsl_histogram * h2) 383 384 This function divides the contents of the bins of histogram :data:`h1` by 385 the contents of the corresponding bins in histogram :data:`h2`, 386 i.e. :math:`h'_1(i) = h_1(i) / h_2(i)`. The two histograms must have 387 identical bin ranges. 388 389.. function:: int gsl_histogram_scale (gsl_histogram * h, double scale) 390 391 This function multiplies the contents of the bins of histogram :data:`h` 392 by the constant :data:`scale`, i.e. 393 394 .. only:: not texinfo 395 396 .. math:: h'_1(i) = h_1(i) * \hbox{\it scale} 397 398 .. only:: texinfo 399 400 :: 401 402 h'_1(i) = h_1(i) * scale 403 404.. function:: int gsl_histogram_shift (gsl_histogram * h, double offset) 405 406 This function shifts the contents of the bins of histogram :data:`h` by 407 the constant :data:`offset`, i.e. 408 409 .. only:: not texinfo 410 411 .. math:: h'_1(i) = h_1(i) + \hbox{\it offset} 412 413 .. only:: texinfo 414 415 :: 416 417 h'_1(i) = h_1(i) + offset 418 419Reading and writing histograms 420============================== 421 422The library provides functions for reading and writing histograms to a file 423as binary data or formatted text. 424 425.. function:: int gsl_histogram_fwrite (FILE * stream, const gsl_histogram * h) 426 427 This function writes the ranges and bins of the histogram :data:`h` to the 428 stream :data:`stream` in binary format. The return value is 0 for success 429 and :macro:`GSL_EFAILED` if there was a problem writing to the file. Since 430 the data is written in the native binary format it may not be portable 431 between different architectures. 432 433.. function:: int gsl_histogram_fread (FILE * stream, gsl_histogram * h) 434 435 This function reads into the histogram :data:`h` from the open stream 436 :data:`stream` in binary format. The histogram :data:`h` must be 437 preallocated with the correct size since the function uses the number of 438 bins in :data:`h` to determine how many bytes to read. The return value is 439 0 for success and :macro:`GSL_EFAILED` if there was a problem reading from 440 the file. The data is assumed to have been written in the native binary 441 format on the same architecture. 442 443.. function:: int gsl_histogram_fprintf (FILE * stream, const gsl_histogram * h, const char * range_format, const char * bin_format) 444 445 This function writes the ranges and bins of the histogram :data:`h` 446 line-by-line to the stream :data:`stream` using the format specifiers 447 :data:`range_format` and :data:`bin_format`. These should be one of the 448 :code:`%g`, :code:`%e` or :code:`%f` formats for floating point 449 numbers. The function returns 0 for success and :macro:`GSL_EFAILED` if 450 there was a problem writing to the file. The histogram output is 451 formatted in three columns, and the columns are separated by spaces, 452 like this:: 453 454 range[0] range[1] bin[0] 455 range[1] range[2] bin[1] 456 range[2] range[3] bin[2] 457 .... 458 range[n-1] range[n] bin[n-1] 459 460 The values of the ranges are formatted using :data:`range_format` and the 461 value of the bins are formatted using :data:`bin_format`. Each line 462 contains the lower and upper limit of the range of the bins and the 463 value of the bin itself. Since the upper limit of one bin is the lower 464 limit of the next there is duplication of these values between lines but 465 this allows the histogram to be manipulated with line-oriented tools. 466 467.. function:: int gsl_histogram_fscanf (FILE * stream, gsl_histogram * h) 468 469 This function reads formatted data from the stream :data:`stream` into the 470 histogram :data:`h`. The data is assumed to be in the three-column format 471 used by :func:`gsl_histogram_fprintf`. The histogram :data:`h` must be 472 preallocated with the correct length since the function uses the size of 473 :data:`h` to determine how many numbers to read. The function returns 0 474 for success and :macro:`GSL_EFAILED` if there was a problem reading from 475 the file. 476 477.. index:: 478 single: resampling from histograms 479 single: sampling from histograms 480 single: probability distributions, from histograms 481 482Resampling from histograms 483========================== 484 485A histogram made by counting events can be regarded as a measurement of 486a probability distribution. Allowing for statistical error, the height 487of each bin represents the probability of an event where the value of 488:math:`x` falls in the range of that bin. The probability distribution 489function has the one-dimensional form :math:`p(x)dx` where, 490 491.. math:: p(x) = n_i / (N w_i) 492 493In this equation :math:`n_i` is the number of events in the bin which 494contains :math:`x`, :math:`w_i` is the width of the bin and :math:`N` is 495the total number of events. The distribution of events within each bin 496is assumed to be uniform. 497 498.. index:: 499 single: probability distribution, from histogram 500 single: sampling from histograms 501 single: random sampling from histograms 502 single: histograms, random sampling from 503 504The histogram probability distribution struct 505============================================= 506 507The probability distribution function for a histogram consists of a set 508of *bins* which measure the probability of an event falling into a 509given range of a continuous variable :math:`x`. A probability 510distribution function is defined by the following struct, which actually 511stores the cumulative probability distribution function. This is the 512natural quantity for generating samples via the inverse transform 513method, because there is a one-to-one mapping between the cumulative 514probability distribution and the range [0,1]. It can be shown that by 515taking a uniform random number in this range and finding its 516corresponding coordinate in the cumulative probability distribution we 517obtain samples with the desired probability distribution. 518 519.. type:: gsl_histogram_pdf 520 521 ================================ ======================================================================= 522 :code:`size_t n` This is the number of bins used to approximate the probability 523 distribution function. 524 :code:`double * range` The ranges of the bins are stored in an array of :math:`n + 1` 525 elements pointed to by :data:`range`. 526 :code:`double * sum` The cumulative probability for the bins is stored in an array of 527 :data:`n` elements pointed to by :data:`sum`. 528 ================================ ======================================================================= 529 530The following functions allow you to create a :type:`gsl_histogram_pdf` 531struct which represents this probability distribution and generate 532random samples from it. 533 534.. function:: gsl_histogram_pdf * gsl_histogram_pdf_alloc (size_t n) 535 536 This function allocates memory for a probability distribution with 537 :data:`n` bins and returns a pointer to a newly initialized 538 :type:`gsl_histogram_pdf` struct. If insufficient memory is available a 539 null pointer is returned and the error handler is invoked with an error 540 code of :macro:`GSL_ENOMEM`. 541 542.. function:: int gsl_histogram_pdf_init (gsl_histogram_pdf * p, const gsl_histogram * h) 543 544 This function initializes the probability distribution :data:`p` with 545 the contents of the histogram :data:`h`. If any of the bins of :data:`h` are 546 negative then the error handler is invoked with an error code of 547 :macro:`GSL_EDOM` because a probability distribution cannot contain 548 negative values. 549 550.. function:: void gsl_histogram_pdf_free (gsl_histogram_pdf * p) 551 552 This function frees the probability distribution function :data:`p` and 553 all of the memory associated with it. 554 555.. function:: double gsl_histogram_pdf_sample (const gsl_histogram_pdf * p, double r) 556 557 This function uses :data:`r`, a uniform random number between zero and 558 one, to compute a single random sample from the probability distribution 559 :data:`p`. The algorithm used to compute the sample :math:`s` is given by 560 the following formula, 561 562 .. only:: not texinfo 563 564 .. math:: s = \hbox{range}[i] + \delta * (\hbox{range}[i+1] - \hbox{range}[i]) 565 566 .. only:: texinfo 567 568 :: 569 570 s = range[i] + delta * (range[i+1] - range[i]) 571 572 where :math:`i` is the index which satisfies 573 :math:`sum[i] \le r < sum[i+1]` 574 and :math:`delta` is 575 :math:`(r - sum[i])/(sum[i+1] - sum[i])`. 576 577Example programs for histograms 578=============================== 579 580The following program shows how to make a simple histogram of a column 581of numerical data supplied on :code:`stdin`. The program takes three 582arguments, specifying the upper and lower bounds of the histogram and 583the number of bins. It then reads numbers from :code:`stdin`, one line at 584a time, and adds them to the histogram. When there is no more data to 585read it prints out the accumulated histogram using 586:func:`gsl_histogram_fprintf`. 587 588.. include:: examples/histogram.c 589 :code: 590 591Here is an example of the program in use. We generate 10000 random 592samples from a Cauchy distribution with a width of 30 and histogram 593them over the range -100 to 100, using 200 bins:: 594 595 $ gsl-randist 0 10000 cauchy 30 596 | gsl-histogram -- -100 100 200 > histogram.dat 597 598:numref:`fig_histogram` shows the familiar shape of the 599Cauchy distribution and the fluctuations caused by the finite sample 600size. 601 602.. _fig_histogram: 603 604.. figure:: /images/histogram.png 605 :scale: 60% 606 607 Histogram output from example program 608 609.. index:: 610 single: two dimensional histograms 611 single: 2D histograms 612 613Two dimensional histograms 614========================== 615 616A two dimensional histogram consists of a set of *bins* which count 617the number of events falling in a given area of the :math:`(x,y)` 618plane. The simplest way to use a two dimensional histogram is to record 619two-dimensional position information, :math:`n(x,y)`. Another possibility 620is to form a *joint distribution* by recording related 621variables. For example a detector might record both the position of an 622event (:math:`x`) and the amount of energy it deposited :math:`E`. These 623could be histogrammed as the joint distribution :math:`n(x,E)`. 624 625The 2D histogram struct 626======================= 627 628Two dimensional histograms are defined by the following struct, 629 630.. type:: gsl_histogram2d 631 632 =========================== ============================================================================ 633 :code:`size_t nx, ny` This is the number of histogram bins in the x and y directions. 634 :code:`double * xrange` The ranges of the bins in the x-direction are stored in an array of 635 :code:`nx + 1` elements pointed to by :data:`xrange`. 636 :code:`double * yrange` The ranges of the bins in the y-direction are stored in an array of 637 :code:`ny + 1` elements pointed to by :data:`yrange`. 638 :code:`double * bin` The counts for each bin are stored in an array pointed to by :data:`bin`. 639 The bins are floating-point numbers, so you can increment them by 640 non-integer values if necessary. The array :data:`bin` stores the two 641 dimensional array of bins in a single block of memory according to the 642 mapping :code:`bin(i,j)` = :code:`bin[i * ny + j]`. 643 =========================== ============================================================================ 644 645The range for :code:`bin(i,j)` is given by :code:`xrange[i]` to 646:code:`xrange[i+1]` in the x-direction and :code:`yrange[j]` to 647:code:`yrange[j+1]` in the y-direction. Each bin is inclusive at the lower 648end and exclusive at the upper end. Mathematically this means that the 649bins are defined by the following inequality, 650 651.. only:: not texinfo 652 653 .. math:: 654 655 \begin{array}{cc} 656 \hbox{bin(i,j) corresponds to} & \hbox{\it xrange}[i] \le x < \hbox{\it xrange}[i+1] \\ 657 \hbox{and} & \hbox{\it yrange}[j] \le y < \hbox{\it yrange}[j+1] 658 \end{array} 659 660.. only:: texinfo 661 662 :: 663 664 bin(i,j) corresponds to xrange[i] <= x < xrange[i+1] 665 and yrange[j] <= y < yrange[j+1] 666 667Note that any samples which fall on the upper sides of the histogram are 668excluded. If you want to include these values for the side bins you will 669need to add an extra row or column to your histogram. 670 671The :type:`gsl_histogram2d` struct and its associated functions are 672defined in the header file :file:`gsl_histogram2d.h`. 673 6742D Histogram allocation 675======================= 676 677The functions for allocating memory to a 2D histogram follow the style 678of :func:`malloc` and :func:`free`. In addition they also perform their 679own error checking. If there is insufficient memory available to 680allocate a histogram then the functions call the error handler (with 681an error number of :macro:`GSL_ENOMEM`) in addition to returning a null 682pointer. Thus if you use the library error handler to abort your program 683then it isn't necessary to check every 2D histogram :code:`alloc`. 684 685.. function:: gsl_histogram2d * gsl_histogram2d_alloc (size_t nx, size_t ny) 686 687 This function allocates memory for a two-dimensional histogram with 688 :data:`nx` bins in the x direction and :data:`ny` bins in the y direction. 689 The function returns a pointer to a newly created :type:`gsl_histogram2d` 690 struct. If insufficient memory is available a null pointer is returned 691 and the error handler is invoked with an error code of 692 :macro:`GSL_ENOMEM`. The bins and ranges must be initialized with one of 693 the functions below before the histogram is ready for use. 694 695.. @deftypefun {gsl_histogram2d *} gsl_histogram2d_calloc (size_t nx, size_t ny) 696.. This function allocates memory for a two-dimensional histogram with 697.. :data:`nx` bins in the x direction and :data:`ny` bins in the y 698.. direction. The function returns a pointer to a newly initialized 699.. :type:`gsl_histogram2d` struct. The bins are uniformly spaced with a 700.. total range of 701.. @c{$0 \le x < nx$} 702.. @math{0 <= x < nx} in the x-direction and 703.. @c{$0 \le y < ny$} 704.. @math{0 <= y < ny} in the y-direction, as shown in the table below. 705.. 706.. The bins are initialized to zero so the histogram is ready for use. 707.. 708.. If insufficient memory is available a null pointer is returned and the 709.. error handler is invoked with an error code of :macro:`GSL_ENOMEM`. 710.. @end deftypefun 711.. 712.. @deftypefun {gsl_histogram2d *} gsl_histogram2d_calloc_uniform (size_t nx, size_t ny, double xmin, double xmax, double ymin, double ymax) 713.. This function allocates a histogram of size :data:`nx`-by-:data:`ny` which 714.. uniformly covers the ranges :data:`xmin` to :data:`xmax` and :data:`ymin` to 715.. :data:`ymax` in the :math:`x` and :math:`y` directions respectively. 716.. @end deftypefun 717.. 718.. @deftypefun {gsl_histogram2d *} gsl_histogram2d_calloc_range (size_t nx, size_t ny, double * xrange, double * yrange) 719.. This function allocates a histogram of size :data:`nx`-by-:data:`ny` using 720.. the @math{nx+1} and @math{ny+1} bin ranges specified by the arrays 721.. :data:`xrange` and :data:`yrange`. 722.. @end deftypefun 723 724.. function:: int gsl_histogram2d_set_ranges (gsl_histogram2d * h, const double xrange[], size_t xsize, const double yrange[], size_t ysize) 725 726 This function sets the ranges of the existing histogram :data:`h` using 727 the arrays :data:`xrange` and :data:`yrange` of size :data:`xsize` and 728 :data:`ysize` respectively. The values of the histogram bins are reset to 729 zero. 730 731.. function:: int gsl_histogram2d_set_ranges_uniform (gsl_histogram2d * h, double xmin, double xmax, double ymin, double ymax) 732 733 This function sets the ranges of the existing histogram :data:`h` to cover 734 the ranges :data:`xmin` to :data:`xmax` and :data:`ymin` to :data:`ymax` 735 uniformly. The values of the histogram bins are reset to zero. 736 737.. function:: void gsl_histogram2d_free (gsl_histogram2d * h) 738 739 This function frees the 2D histogram :data:`h` and all of the memory 740 associated with it. 741 742Copying 2D Histograms 743===================== 744 745.. function:: int gsl_histogram2d_memcpy (gsl_histogram2d * dest, const gsl_histogram2d * src) 746 747 This function copies the histogram :data:`src` into the pre-existing 748 histogram :data:`dest`, making :data:`dest` into an exact copy of :data:`src`. 749 The two histograms must be of the same size. 750 751.. function:: gsl_histogram2d * gsl_histogram2d_clone (const gsl_histogram2d * src) 752 753 This function returns a pointer to a newly created histogram which is an 754 exact copy of the histogram :data:`src`. 755 756Updating and accessing 2D histogram elements 757============================================ 758 759You can access the bins of a two-dimensional histogram either by 760specifying a pair of :math:`(x,y)` coordinates or by using the bin 761indices :math:`(i,j)` directly. The functions for accessing the histogram 762through :math:`(x,y)` coordinates use binary searches in the x and y 763directions to identify the bin which covers the appropriate range. 764 765.. function:: int gsl_histogram2d_increment (gsl_histogram2d * h, double x, double y) 766 767 This function updates the histogram :data:`h` by adding one (1.0) to the 768 bin whose x and y ranges contain the coordinates (:data:`x`, :data:`y`). 769 770 If the point :math:`(x,y)` lies inside the valid ranges of the 771 histogram then the function returns zero to indicate success. If 772 :math:`(x,y)` lies outside the limits of the histogram then the 773 function returns :macro:`GSL_EDOM`, and none of the bins are modified. The 774 error handler is not called, since it is often necessary to compute 775 histograms for a small range of a larger dataset, ignoring any 776 coordinates outside the range of interest. 777 778.. function:: int gsl_histogram2d_accumulate (gsl_histogram2d * h, double x, double y, double weight) 779 780 This function is similar to :func:`gsl_histogram2d_increment` but increases 781 the value of the appropriate bin in the histogram :data:`h` by the 782 floating-point number :data:`weight`. 783 784.. function:: double gsl_histogram2d_get (const gsl_histogram2d * h, size_t i, size_t j) 785 786 This function returns the contents of the (:data:`i`, :data:`j`)-th bin of the 787 histogram :data:`h`. If (:data:`i`, :data:`j`) lies outside the valid range of 788 indices for the histogram then the error handler is called with an error 789 code of :macro:`GSL_EDOM` and the function returns 0. 790 791.. function:: int gsl_histogram2d_get_xrange (const gsl_histogram2d * h, size_t i, double * xlower, double * xupper) 792 int gsl_histogram2d_get_yrange (const gsl_histogram2d * h, size_t j, double * ylower, double * yupper) 793 794 These functions find the upper and lower range limits of the :data:`i`-th 795 and :data:`j`-th bins in the x and y directions of the histogram :data:`h`. 796 The range limits are stored in :data:`xlower` and :data:`xupper` or 797 :data:`ylower` and :data:`yupper`. The lower limits are inclusive 798 (i.e. events with these coordinates are included in the bin) and the 799 upper limits are exclusive (i.e. events with the value of the upper 800 limit are not included and fall in the neighboring higher bin, if it 801 exists). The functions return 0 to indicate success. If :data:`i` or 802 :data:`j` lies outside the valid range of indices for the histogram then 803 the error handler is called with an error code of :macro:`GSL_EDOM`. 804 805.. function:: double gsl_histogram2d_xmax (const gsl_histogram2d * h) 806 double gsl_histogram2d_xmin (const gsl_histogram2d * h) 807 size_t gsl_histogram2d_nx (const gsl_histogram2d * h) 808 double gsl_histogram2d_ymax (const gsl_histogram2d * h) 809 double gsl_histogram2d_ymin (const gsl_histogram2d * h) 810 size_t gsl_histogram2d_ny (const gsl_histogram2d * h) 811 812 These functions return the maximum upper and minimum lower range limits 813 and the number of bins for the x and y directions of the histogram 814 :data:`h`. They provide a way of determining these values without 815 accessing the :type:`gsl_histogram2d` struct directly. 816 817.. function:: void gsl_histogram2d_reset (gsl_histogram2d * h) 818 819 This function resets all the bins of the histogram :data:`h` to zero. 820 821Searching 2D histogram ranges 822============================= 823 824The following functions are used by the access and update routines to 825locate the bin which corresponds to a given :math:`(x,y)` coordinate. 826 827.. function:: int gsl_histogram2d_find (const gsl_histogram2d * h, double x, double y, size_t * i, size_t * j) 828 829 This function finds and sets the indices :data:`i` and :data:`j` to 830 the bin which covers the coordinates (:data:`x`, :data:`y`). The bin is 831 located using a binary search. The search includes an optimization for 832 histograms with uniform ranges, and will return the correct bin immediately 833 in this case. If :math:`(x,y)` is found then the function sets the 834 indices (:data:`i`, :data:`j`) and returns :macro:`GSL_SUCCESS`. If 835 :math:`(x,y)` lies outside the valid range of the histogram then the 836 function returns :macro:`GSL_EDOM` and the error handler is invoked. 837 8382D Histogram Statistics 839======================= 840 841.. function:: double gsl_histogram2d_max_val (const gsl_histogram2d * h) 842 843 This function returns the maximum value contained in the histogram bins. 844 845.. function:: void gsl_histogram2d_max_bin (const gsl_histogram2d * h, size_t * i, size_t * j) 846 847 This function finds the indices of the bin containing the maximum value 848 in the histogram :data:`h` and stores the result in (:data:`i`, :data:`j`). In 849 the case where several bins contain the same maximum value the first bin 850 found is returned. 851 852.. function:: double gsl_histogram2d_min_val (const gsl_histogram2d * h) 853 854 This function returns the minimum value contained in the histogram bins. 855 856.. function:: void gsl_histogram2d_min_bin (const gsl_histogram2d * h, size_t * i, size_t * j) 857 858 This function finds the indices of the bin containing the minimum value 859 in the histogram :data:`h` and stores the result in (:data:`i`, :data:`j`). In 860 the case where several bins contain the same maximum value the first bin 861 found is returned. 862 863.. function:: double gsl_histogram2d_xmean (const gsl_histogram2d * h) 864 865 This function returns the mean of the histogrammed x variable, where the 866 histogram is regarded as a probability distribution. Negative bin values 867 are ignored for the purposes of this calculation. 868 869.. function:: double gsl_histogram2d_ymean (const gsl_histogram2d * h) 870 871 This function returns the mean of the histogrammed y variable, where the 872 histogram is regarded as a probability distribution. Negative bin values 873 are ignored for the purposes of this calculation. 874 875.. function:: double gsl_histogram2d_xsigma (const gsl_histogram2d * h) 876 877 This function returns the standard deviation of the histogrammed 878 x variable, where the histogram is regarded as a probability 879 distribution. Negative bin values are ignored for the purposes of this 880 calculation. 881 882.. function:: double gsl_histogram2d_ysigma (const gsl_histogram2d * h) 883 884 This function returns the standard deviation of the histogrammed 885 y variable, where the histogram is regarded as a probability 886 distribution. Negative bin values are ignored for the purposes of this 887 calculation. 888 889.. function:: double gsl_histogram2d_cov (const gsl_histogram2d * h) 890 891 This function returns the covariance of the histogrammed x and y 892 variables, where the histogram is regarded as a probability 893 distribution. Negative bin values are ignored for the purposes of this 894 calculation. 895 896.. function:: double gsl_histogram2d_sum (const gsl_histogram2d * h) 897 898 This function returns the sum of all bin values. Negative bin values 899 are included in the sum. 900 9012D Histogram Operations 902======================= 903 904.. function:: int gsl_histogram2d_equal_bins_p (const gsl_histogram2d * h1, const gsl_histogram2d * h2) 905 906 This function returns 1 if all the individual bin ranges of the two 907 histograms are identical, and 0 otherwise. 908 909.. function:: int gsl_histogram2d_add (gsl_histogram2d * h1, const gsl_histogram2d * h2) 910 911 This function adds the contents of the bins in histogram :data:`h2` to the 912 corresponding bins of histogram :data:`h1`, 913 i.e. :math:`h'_1(i,j) = h_1(i,j) + h_2(i,j)`. 914 The two histograms must have identical bin ranges. 915 916.. function:: int gsl_histogram2d_sub (gsl_histogram2d * h1, const gsl_histogram2d * h2) 917 918 This function subtracts the contents of the bins in histogram :data:`h2` from the 919 corresponding bins of histogram :data:`h1`, 920 i.e. :math:`h'_1(i,j) = h_1(i,j) - h_2(i,j)`. 921 The two histograms must have identical bin ranges. 922 923.. function:: int gsl_histogram2d_mul (gsl_histogram2d * h1, const gsl_histogram2d * h2) 924 925 This function multiplies the contents of the bins of histogram :data:`h1` 926 by the contents of the corresponding bins in histogram :data:`h2`, 927 i.e. :math:`h'_1(i,j) = h_1(i,j) * h_2(i,j)`. 928 The two histograms must have identical bin ranges. 929 930.. function:: int gsl_histogram2d_div (gsl_histogram2d * h1, const gsl_histogram2d * h2) 931 932 This function divides the contents of the bins of histogram :data:`h1` 933 by the contents of the corresponding bins in histogram :data:`h2`, 934 i.e. :math:`h'_1(i,j) = h_1(i,j) / h_2(i,j)`. 935 The two histograms must have identical bin ranges. 936 937.. function:: int gsl_histogram2d_scale (gsl_histogram2d * h, double scale) 938 939 This function multiplies the contents of the bins of histogram :data:`h` 940 by the constant :data:`scale`, i.e. 941 942 .. only:: not texinfo 943 944 .. math:: h'_1(i,j) = h_1(i,j) * \hbox{\it scale} 945 946 .. only:: texinfo 947 948 :: 949 950 h'_1(i,j) = h_1(i,j) scale 951 952.. function:: int gsl_histogram2d_shift (gsl_histogram2d * h, double offset) 953 954 This function shifts the contents of the bins of histogram :data:`h` 955 by the constant :data:`offset`, i.e. 956 957 .. only:: not texinfo 958 959 .. math:: h'_1(i,j) = h_1(i,j) + \hbox{\it offset} 960 961 .. only:: texinfo 962 963 :: 964 965 h'_1(i,j) = h_1(i,j) + offset 966 967Reading and writing 2D histograms 968================================= 969 970The library provides functions for reading and writing two dimensional 971histograms to a file as binary data or formatted text. 972 973.. function:: int gsl_histogram2d_fwrite (FILE * stream, const gsl_histogram2d * h) 974 975 This function writes the ranges and bins of the histogram :data:`h` to the 976 stream :data:`stream` in binary format. The return value is 0 for success 977 and :macro:`GSL_EFAILED` if there was a problem writing to the file. Since 978 the data is written in the native binary format it may not be portable 979 between different architectures. 980 981.. function:: int gsl_histogram2d_fread (FILE * stream, gsl_histogram2d * h) 982 983 This function reads into the histogram :data:`h` from the stream 984 :data:`stream` in binary format. The histogram :data:`h` must be 985 preallocated with the correct size since the function uses the number of 986 x and y bins in :data:`h` to determine how many bytes to read. The return 987 value is 0 for success and :macro:`GSL_EFAILED` if there was a problem 988 reading from the file. The data is assumed to have been written in the 989 native binary format on the same architecture. 990 991.. function:: int gsl_histogram2d_fprintf (FILE * stream, const gsl_histogram2d * h, const char * range_format, const char * bin_format) 992 993 This function writes the ranges and bins of the histogram :data:`h` 994 line-by-line to the stream :data:`stream` using the format specifiers 995 :data:`range_format` and :data:`bin_format`. These should be one of the 996 :code:`%g`, :code:`%e` or :code:`%f` formats for floating point 997 numbers. The function returns 0 for success and :macro:`GSL_EFAILED` if 998 there was a problem writing to the file. The histogram output is 999 formatted in five columns, and the columns are separated by spaces, 1000 like this:: 1001 1002 xrange[0] xrange[1] yrange[0] yrange[1] bin(0,0) 1003 xrange[0] xrange[1] yrange[1] yrange[2] bin(0,1) 1004 xrange[0] xrange[1] yrange[2] yrange[3] bin(0,2) 1005 .... 1006 xrange[0] xrange[1] yrange[ny-1] yrange[ny] bin(0,ny-1) 1007 1008 xrange[1] xrange[2] yrange[0] yrange[1] bin(1,0) 1009 xrange[1] xrange[2] yrange[1] yrange[2] bin(1,1) 1010 xrange[1] xrange[2] yrange[1] yrange[2] bin(1,2) 1011 .... 1012 xrange[1] xrange[2] yrange[ny-1] yrange[ny] bin(1,ny-1) 1013 1014 .... 1015 1016 xrange[nx-1] xrange[nx] yrange[0] yrange[1] bin(nx-1,0) 1017 xrange[nx-1] xrange[nx] yrange[1] yrange[2] bin(nx-1,1) 1018 xrange[nx-1] xrange[nx] yrange[1] yrange[2] bin(nx-1,2) 1019 .... 1020 xrange[nx-1] xrange[nx] yrange[ny-1] yrange[ny] bin(nx-1,ny-1) 1021 1022 Each line contains the lower and upper limits of the bin and the 1023 contents of the bin. Since the upper limits of the each bin are the 1024 lower limits of the neighboring bins there is duplication of these 1025 values but this allows the histogram to be manipulated with 1026 line-oriented tools. 1027 1028.. function:: int gsl_histogram2d_fscanf (FILE * stream, gsl_histogram2d * h) 1029 1030 This function reads formatted data from the stream :data:`stream` into the 1031 histogram :data:`h`. The data is assumed to be in the five-column format 1032 used by :func:`gsl_histogram2d_fprintf`. The histogram :data:`h` must be 1033 preallocated with the correct lengths since the function uses the sizes 1034 of :data:`h` to determine how many numbers to read. The function returns 0 1035 for success and :macro:`GSL_EFAILED` if there was a problem reading from 1036 the file. 1037 1038Resampling from 2D histograms 1039============================= 1040 1041As in the one-dimensional case, a two-dimensional histogram made by 1042counting events can be regarded as a measurement of a probability 1043distribution. Allowing for statistical error, the height of each bin 1044represents the probability of an event where (:math:`x`, :math:`y`) falls in 1045the range of that bin. For a two-dimensional histogram the probability 1046distribution takes the form :math:`p(x,y) dx dy` where, 1047 1048.. math:: p(x,y) = n_{ij} / (N A_{ij}) 1049 1050In this equation :math:`n_{ij}` 1051is the number of events in the bin which 1052contains :math:`(x,y)`, :math:`A_{ij}` 1053is the area of the bin and :math:`N` is 1054the total number of events. The distribution of events within each bin 1055is assumed to be uniform. 1056 1057.. type:: gsl_histogram2d_pdf 1058 1059 ============================= =========================================================================== 1060 :code:`size_t nx, ny` This is the number of histogram bins used to approximate the probability 1061 distribution function in the x and y directions. 1062 :code:`double * xrange` The ranges of the bins in the x-direction are stored in an array of 1063 :code:`nx + 1` elements pointed to by :data:`xrange`. 1064 :code:`double * yrange` The ranges of the bins in the y-direction are stored in an array of 1065 :code:`ny + 1` pointed to by :data:`yrange`. 1066 :code:`double * sum` The cumulative probability for the bins is stored in an array of 1067 :data:`nx` * :data:`ny` elements pointed to by :data:`sum`. 1068 ============================= =========================================================================== 1069 1070The following functions allow you to create a :type:`gsl_histogram2d_pdf` 1071struct which represents a two dimensional probability distribution and 1072generate random samples from it. 1073 1074.. function:: gsl_histogram2d_pdf * gsl_histogram2d_pdf_alloc (size_t nx, size_t ny) 1075 1076 This function allocates memory for a two-dimensional probability 1077 distribution of size :data:`nx`-by-:data:`ny` and returns a pointer to a 1078 newly initialized :type:`gsl_histogram2d_pdf` struct. If insufficient 1079 memory is available a null pointer is returned and the error handler is 1080 invoked with an error code of :macro:`GSL_ENOMEM`. 1081 1082.. function:: int gsl_histogram2d_pdf_init (gsl_histogram2d_pdf * p, const gsl_histogram2d * h) 1083 1084 This function initializes the two-dimensional probability distribution 1085 calculated :data:`p` from the histogram :data:`h`. If any of the bins of 1086 :data:`h` are negative then the error handler is invoked with an error 1087 code of :macro:`GSL_EDOM` because a probability distribution cannot 1088 contain negative values. 1089 1090.. function:: void gsl_histogram2d_pdf_free (gsl_histogram2d_pdf * p) 1091 1092 This function frees the two-dimensional probability distribution 1093 function :data:`p` and all of the memory associated with it. 1094 1095.. function:: int gsl_histogram2d_pdf_sample (const gsl_histogram2d_pdf * p, double r1, double r2, double * x, double * y) 1096 1097 This function uses two uniform random numbers between zero and one, 1098 :data:`r1` and :data:`r2`, to compute a single random sample from the 1099 two-dimensional probability distribution :data:`p`. 1100 1101Example programs for 2D histograms 1102================================== 1103 1104This program demonstrates two features of two-dimensional histograms. 1105First a 10-by-10 two-dimensional histogram is created with x and y running 1106from 0 to 1. Then a few sample points are added to the histogram, at 1107(0.3,0.3) with a height of 1, at (0.8,0.1) with a height of 5 and at 1108(0.7,0.9) with a height of 0.5. This histogram with three events is 1109used to generate a random sample of 1000 simulated events, which are 1110printed out. 1111 1112.. include:: examples/histogram2d.c 1113 :code: 1114 1115The following plot shows the distribution of the simulated events. Using 1116a higher resolution grid we can see the original underlying histogram 1117and also the statistical fluctuations caused by the events being 1118uniformly distributed over the area of the original bins. 1119 1120.. figure:: /images/histogram2d.png 1121 :scale: 60% 1122 1123 Distribution of simulated events from example program 1124