1`folly/stats/Histogram.h` 2------------------- 3 4### Classes 5*** 6 7#### `Histogram` 8 9`Histogram.h` defines a simple histogram class, templated on the type of data 10you want to store. This class is useful for tracking a large stream of data 11points, where you want to remember the overall distribution of the data, but do 12not need to remember each data point individually. 13 14Each histogram bucket stores the number of data points that fell in the bucket, 15as well as the overall sum of the data points in the bucket. Note that no 16overflow checking is performed, so if you have a bucket with a large number of 17very large values, it may overflow and cause inaccurate data for this bucket. 18As such, the histogram class is not well suited to storing data points with 19very large values. However, it works very well for smaller data points such as 20request latencies, request or response sizes, etc. 21 22In addition to providing access to the raw bucket data, the `Histogram` class 23also provides methods for estimating percentile values. This allows you to 24estimate the median value (the 50th percentile) and other values such as the 2595th or 99th percentiles. 26 27All of the buckets have the same width. The number of buckets and bucket width 28is fixed for the lifetime of the histogram. As such, you do need to know your 29expected data range ahead of time in order to have accurate statistics. The 30histogram does keep one bucket to store all data points that fall below the 31histogram minimum, and one bucket for the data points above the maximum. 32However, because these buckets don't have a good lower/upper bound, percentile 33estimates in these buckets may be inaccurate. 34 35#### `HistogramBuckets` 36 37The `Histogram` class is built on top of `HistogramBuckets`. 38`HistogramBuckets` provides an API very similar to `Histogram`, but allows a 39user-defined bucket class. This allows users to implement more complex 40histogram types that store more than just the count and sum in each bucket. 41 42When computing percentile estimates `HistogramBuckets` allows user-defined 43functions for computing the average value and data count in each bucket. This 44allows you to define more complex buckets which may have multiple different 45ways of computing the average value and the count. 46 47For example, one use case could be tracking timeseries data in each bucket. 48Each set of timeseries data can have independent data in the bucket, which can 49show how the data distribution is changing over time. 50 51### Example Usage 52*** 53 54Say we have code that sends many requests to remote services, and want to 55generate a histogram showing how long the requests take. The following code 56will initialize histogram with 50 buckets, tracking values between 0 and 5000. 57(There are 50 buckets since the bucket width is specified as 100. If the 58bucket width is not an even multiple of the histogram range, the last bucket 59will simply be shorter than the others.) 60 61``` Cpp 62 folly::Histogram<int64_t> latencies(100, 0, 5000); 63``` 64 65The addValue() method is used to add values to the histogram. Each time a 66request finishes we can add its latency to the histogram: 67 68``` Cpp 69 latencies.addValue(now - startTime); 70``` 71 72You can access each of the histogram buckets to display the overall 73distribution. Note that bucket 0 tracks all data points that were below the 74specified histogram minimum, and the last bucket tracks the data points that 75were above the maximum. 76 77``` Cpp 78 auto numBuckets = latencies.getNumBuckets(); 79 cout << "Below min: " << latencies.getBucketByIndex(0).count << "\n"; 80 for (unsigned int n = 1; n < numBuckets - 1; ++n) { 81 cout << latencies.getBucketMin(n) << "-" << latencies.getBucketMax(n) 82 << ": " << latencies.getBucketByIndex(n).count << "\n"; 83 } 84 cout << "Above max: " 85 << latencies.getBucketByIndex(numBuckets - 1).count << "\n"; 86``` 87 88You can also use the `getPercentileEstimate()` method to estimate the value at 89the Nth percentile in the distribution. For example, to estimate the median, 90as well as the 95th and 99th percentile values: 91 92``` Cpp 93 int64_t median = latencies.getPercentileEstimate(0.5); 94 int64_t p95 = latencies.getPercentileEstimate(0.95); 95 int64_t p99 = latencies.getPercentileEstimate(0.99); 96``` 97 98### Thread Safety 99*** 100 101Note that `Histogram` and `HistogramBuckets` objects are not thread-safe. If 102you wish to access a single `Histogram` from multiple threads, you must perform 103your own locking to ensure that multiple threads do not access it at the same 104time. 105