• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

deps/H21-May-2016-91,91973,987

integ/H21-May-2016-1,9061,548

rpm/H03-May-2022-264202

sinks/H03-May-2022-1,214806

src/H21-May-2016-6,2353,410

tests/H21-May-2016-2,7862,318

.gitignoreH A D21-May-2016635 6954

.travis.ymlH A D21-May-2016369 1715

CHANGELOG.mdH A D21-May-20163.5 KiB13495

LICENSEH A D21-May-20161.5 KiB2723

Makefile.amH A D21-May-20162.7 KiB9050

README.mdH A D21-May-201614.6 KiB433317

SConstructH A D21-May-20163 KiB5745

VagrantfileH A D21-May-20161 KiB5134

bench.pyH A D21-May-2016646 3225

bench_bin.pyH A D21-May-20161.4 KiB6147

bootstrap.shH A D21-May-2016418 217

configure.acH A D21-May-20161.1 KiB4830

README.md

1Statsite [![Build Status](https://travis-ci.org/armon/statsite.png)](https://travis-ci.org/armon/statsite)
2========
3
4Statsite is a metrics aggregation server. Statsite is based heavily
5on Etsy's StatsD <https://github.com/etsy/statsd>, and is wire compatible.
6
7Features
8--------
9
10* Multiple metric types
11  - Key / Value
12  - Gauges
13  - Counters
14  - Timers
15  - Sets
16* Efficient summary metrics for timer data:
17  - Mean
18  - Min/Max
19  - Standard deviation
20  - Median, Percentile 95, Percentile 99
21  - Histograms
22* Dynamic set implementation:
23  - Exactly counts for small sets
24  - HyperLogLog for large sets
25* Included sinks:
26  - Graphite
27  - InfluxDB
28  - Ganglia
29  - Librato
30  - CloudWatch
31  - OpenTSDB
32* Binary protocol
33* TCP, UDP, and STDIN
34* Fast
35
36
37Architecture
38-------------
39
40Statsite is designed to be both highly performant,
41and very flexible. To achieve this, it implements the stats
42collection and aggregation in pure C, using an event loop to be
43extremely fast. This allows it to handle hundreds of connections,
44and millions of metrics. After each flush interval expires,
45statsite performs a fork/exec to start a new stream handler
46invoking a specified application. Statsite then streams the
47aggregated metrics over stdin to the application, which is
48free to handle the metrics as it sees fit.
49
50This allows statsite to aggregate metrics and then ship metrics
51to any number of sinks (Graphite, SQL databases, etc). There
52is an included Python script that ships metrics to graphite.
53
54Statsite tries to minimize memory usage by not
55storing all the metrics that are received. Counter values are
56aggregated as they are received, and timer values are stored
57and aggregated using the Cormode-Muthukrishnan algorithm from
58"Effective Computation of Biased Quantiles over Data Streams".
59This means that the percentile values are not perfectly accurate,
60and are subject to a specifiable error epsilon. This allows us to
61store only a fraction of the samples.
62
63Histograms can also be optionally maintained for timer values.
64The minimum and maximum values along with the bin widths must
65be specified in advance, and as samples are received the bins
66are updated. Statsite supports multiple histograms configurations,
67and uses a longest-prefix match policy.
68
69Handling of Sets in statsite depend on the number of
70entries received. For small cardinalities (<64 currently),
71statsite will count exactly the number of unique items. For
72larger sets, it switches to using a HyperLogLog to estimate
73cardinalities with high accuracy and low space utilization.
74This allows statsite to estimate huge set sizes without
75retaining all the values. The parameters of the HyperLogLog
76can be tuned to provide greater accuracy at the cost of memory.
77
78The HyperLogLog is based on the Google paper, "HyperLogLog in
79Practice: Algorithmic Engineering of a State of The Art Cardinality
80Estimation Algorithm".
81
82Install
83-------
84
85Download and build from source. This requires `autoconf`, `automake` and `libtool` to be available,
86available usually through a system package manager. Steps:
87
88    $ git clone https://github.com/armon/statsite.git
89    $ cd statsite
90    $ ./bootstrap.sh
91    $ ./configure
92    $ make
93    $ ./src/statsite
94
95Building the test code may generate errors if libcheck is not available.
96To build the test code successfully, do the following::
97
98    $ cd deps/check-0.9.8/
99    $ ./configure
100    $ make
101    # make install
102    # ldconfig (necessary on some Linux distros)
103    $ cd ../../
104    $ make test
105
106At this point, the test code should build successfully.
107
108Usage
109-----
110
111Statsite is configured using a simple INI file.
112Here is an example configuration file::
113
114    [statsite]
115    port = 8125
116    udp_port = 8125
117    log_level = INFO
118    log_facility = local0
119    flush_interval = 10
120    timer_eps = 0.01
121    set_eps = 0.02
122    stream_cmd = python sinks/graphite.py localhost 2003
123
124    [histogram_api]
125    prefix=api
126    min=0
127    max=100
128    width=5
129
130    [histogram_default]
131    prefix=
132    min=0
133    max=200
134    width=20
135
136Then run statsite, pointing it to that file::
137
138    statsite -f /etc/statsite.conf
139
140A full list of configuration options is below.
141
142Configuration Options
143---------------------
144
145Each statsite configuration option is documented below. Statsite configuration
146options must exist in the `statsite` section of the INI file:
147
148* tcp\_port : Integer, sets the TCP port to listen on. Default 8125. 0 to disable.
149
150* port: Same as above. For compatibility.
151
152* udp\_port : Integer, sets the UDP port. Default 8125. 0 to disable.
153
154* bind\_address : The address to bind on. Defaults to 0.0.0.0
155
156* parse\_stdin: Enables parsing stdin as an input stream. Defaults to 0.
157
158* log\_level : The logging level that statsite should use. One of:
159  DEBUG, INFO, WARN, ERROR, or CRITICAL. All logs go to syslog,
160  and also stderr when not daemonizing. Default is DEBUG.
161
162* log\_facility : The syslog logging facility that statsite should use.
163  One of: user, daemon, local0, local1, local2, local3, local4, local5,
164  local6, local7. All logs go to syslog.
165
166* flush\_interval : How often the metrics should be flushed to the
167  sink in seconds. Defaults to 10 seconds.
168
169* timer\_eps : The upper bound on error for timer estimates. Defaults
170  to 1%. Decreasing this value causes more memory utilization per timer.
171
172* set\_eps : The upper bound on error for unique set estimates. Defaults
173  to 2%. Decreasing this value causes more memory utilization per set.
174
175* stream\_cmd : This is the command that statsite invokes every
176  `flush_interval` seconds to handle the metrics. It can be any executable.
177  It should read inputs over stdin and exit with status code 0 on success.
178
179* input\_counter : If set, statsite will count how many commands it received
180  in the flush interval, and the count will be emitted under this name. For
181  example if set to "numStats", then statsite will emit "counter.numStats" with
182  the number of samples it has received.
183
184* daemonize : Should statsite daemonize. Defaults to 0.
185
186* pid\_file : When daemonizing, where to put the pid file. Defaults
187  to /var/run/statsite.pid
188
189* binary\_stream : Should data be streamed to the stream\_cmd in
190  binary form instead of ASCII form. Defaults to 0.
191
192* use\_type\_prefix : Should prefixes with message type be added to the messages.
193  Does not affect global\_prefix. Defaults to 1.
194
195* global\_prefix : Prefix that will be added to all messages.
196  Defaults to empty string.
197
198* kv\_prefix, gauges\_prefix, counts\_prefix, sets\_prefix, timers\_prefix : prefix for
199  each message type. Defaults to respectively: "kv.", "gauges.", "counts.",
200  "sets.", "timers.". Values will be ignored if use_type_prefix set to 0.
201
202* extended\_counters : If enabled, the counter output is extended to include
203  all the computed summary values. Otherwise, the counter is emitted as just
204  the sum value. Summary values include `count`, `mean`, `stdev`, `sum`, `sum_sq`,
205  `lower`, `upper`, and `rate`.
206  Defaults to false.
207
208* extended\_counters\_include : Allows you to configure which extended counters to include
209  through a comma separated list of values, extended\_counters must be set to true. Supported values include `count`, `mean`, `stdev`, `sum`, `sum_sq`,
210  `lower`, `upper`, and `rate`. If this option is not specified but extended_counters is set to true, then all values will be included by default.
211
212* timers\_include : Allows you to configure which timer metrics to include
213  through a comma separated list of values. Supported values include `count`, `mean`, `stdev`, `sum`, `sum_sq`,
214  `lower`, `upper`, `rate`, `median` and `sample_rate`. If this option is not specified then all values except `median` will be included by default.
215  `median` will be included if `quantiles` include 0.5
216
217* prefix\_binary\_stream : If enabled, the keys streamed to a the stream\_cmd
218  when using binary\_stream mode are also prefixed. By default, this is false,
219  and keys do not get the prefix.
220
221* quantiles : A comma-separated list of quantiles to calculate for timers.
222  Defaults to `0.5, 0.95, 0.99`
223
224In addition to global configurations, statsite supports histograms
225as well. Histograms are configured one per section, and the INI
226section must start with the word `histogram`. These are the recognized
227options:
228
229* prefix : This is the key prefix to match on. The longest matching prefix
230  is used. If the prefix is blank, it is the default for all keys.
231
232* min : Floating value. The minimum bound on the histogram. Values below
233  this go into a special bucket containing everything less than this value.
234
235* max: Floating value. The maximum bound on the histogram. Values above
236  this go into a special bucket containing everything more than this value.
237
238* width : Floating value. The width of each bucket between the min and max.
239
240Each histogram section must specify all options to be valid.
241
242
243Protocol
244--------
245
246By default, Statsite will listen for TCP and UDP connections. A message
247looks like the following (where the flag is optional)::
248
249    key:value|type[|@flag]
250
251Messages must be terminated by newlines (`\n`).
252
253Currently supported message types:
254
255* `kv` - Simple Key/Value.
256* `g`  - Gauge, similar to `kv` but only the last value per key is retained
257* `ms` - Timer.
258* `h`  - Alias for timer
259* `c`  - Counter.
260* `s`  - Unique Set
261
262After the flush interval, the counters and timers of the same key are
263aggregated and this is sent to the store.
264
265Gauges also support "delta" updates, which are supported by prefixing the
266value with either a `+` or a `-`. This implies you can't explicitly set a gauge to a negative number without first setting it to zero.
267
268Examples:
269
270The following is a simple key/value pair, in this case reporting how many
271queries we've seen in the last second on MySQL::
272
273    mysql.queries:1381|kv
274
275The following is a timer, timing the response speed of an API call::
276
277    api.session_created:114|ms
278
279The next example increments the "rewards" counter by 1::
280
281    rewards:1|c
282
283Here we initialize a gauge and then modify its value::
284
285    inventory:100|g
286    inventory:-5|g
287    inventory:+2|g
288
289Sets count the unique items, so if statsite gets::
290
291    users:abe|s
292    users:zoe|s
293    users:bob|s
294    users:abe|s
295
296Then it will emit a count 3 for the number of uniques it has seen.
297
298Writing Statsite Sinks
299---------------------
300
301Statsite ships with graphite, librato, gmetric, and influxdb sinks, but ANY executable
302or script  can be used as a sink. The sink should read its inputs from stdin, where
303each metric is in the form::
304
305    key|val|timestamp\n
306
307Each metric is separated by a newline. The process should terminate with
308an exit code of 0 to indicate success.
309
310Here is an example of the simplest possible Python sink:
311
312    #!/usr/bin/env python
313    import sys
314
315    lines = sys.stdin.read().split("\n")
316    metrics = [l.split("|") for l in lines]
317
318    for key, value, timestamp in metrics:
319        print key, value, timestamp
320
321
322Binary Protocol
323---------------
324
325In addition to the statsd compatible ASCII protocol, statsite includes
326a lightweight binary protocol. This can be used if you want to make use
327of special characters such as the colon, pipe character, or newlines. It
328is also marginally faster to process, and may provide 10-20% more throughput.
329
330Each command is sent to statsite over the same ports with this header:
331
332    <Magic Byte><Metric Type><Key Length>
333
334Then depending on the metric type, it is followed by either:
335
336    <Value><Key>
337    <Set Length><Key><Set Key>
338
339The "Magic Byte" is the value 0xaa (170). This switches the internal
340processing from the ASCII mode to binary. The metric type is one of:
341
342* 0x1 : Key value / Gauge
343* 0x2 : Counter
344* 0x3 : Timer
345* 0x4 : Set
346* 0x5 : Gauge
347* 0x6 : Gauge Delta update
348
349The key length is a 2 byte unsigned integer with the length of the
350key, INCLUDING a NULL terminator. The key must include a null terminator,
351and it's length must include this.
352
353If the metric type is K/V, Counter or Timer, then we expect a value and
354a key. The value is a standard IEEE754 double value, which is 8 bytes in length.
355The key is provided as a byte stream which is `Key Length` long,
356terminated by a NULL (0) byte.
357
358If the metric type is Set, then we expect the length of a set key,
359provided like the key length. The key should then be followed by
360an additional Set Key, which is `Set Length` long, terminated
361by a NULL (0) byte.
362
363All of these values must be transmitted in Little Endian order.
364
365Here is an example of sending ("Conns", "c", 200) as hex:
366
367    0xaa 0x02 0x0600 0x0000000000006940 0x436f6e6e7300
368
369
370Note: The binary protocol does not include support for "flags" and resultantly
371cannot be used for transmitting sampled counters.
372
373
374Binary Sink Protocol
375--------------------
376
377It is also possible to have the data streamed to be represented
378in a binary format. Again, this is used if you want to use the reserved
379characters. It may also be faster.
380
381Each command is sent to the sink in the following manner:
382
383    <Timestamp><Metric Type><Value Type><Key Length><Value><Key>[<Count>]
384
385Most of these are the same as the binary protocol. There are a few.
386changes however. The Timestamp is sent as an 8 byte unsigned integer,
387which is the current Unix timestamp. The Metric type is one of:
388
389* 0x1 : Key value
390* 0x2 : Counter
391* 0x3 : Timer
392* 0x4 : Set
393* 0x5 : Gauge
394
395The value type is one of:
396
397* 0x0 : No type (Key/Value)
398* 0x1 : Sum (Also used for Sets)
399* 0x2 : Sum Squared
400* 0x3 : Mean
401* 0x4 : Count
402* 0x5 : Standard deviation
403* 0x6 : Minimum Value
404* 0x7 : Maximum Value
405* 0x8 : Histogram Floor Value
406* 0x9 : Histogram Bin Value
407* 0xa : Histogram Ceiling Value
408* 0xb : Count Rate (Sum / Flush Interval)
409* 0xc : Sample Rate (Count / Flush Interval)
410* 0x80 OR `percentile` :  If the type OR's with 128 (0x80), then it is a
411    percentile amount. The amount is OR'd with 0x80 to provide the type. For
412    example (0x80 | 0x32) = 0xb2 is the 50% percentile or medium. The 95th
413    percentile is (0x80 | 0xdf) = 0xdf.
414
415The key length is a 2 byte unsigned integer representing the key length
416terminated by a NULL character. The Value is an IEEE754 double. Lastly,
417the key is a NULL-terminated character stream.
418
419The final `<Count>` field is only set for histogram values.
420It is always provided as an unsigned 32 bit integer value. Histograms use the
421value field to specify the bin, and the count field for the entries in that
422bin. The special values for histogram floor and ceiling indicate values that
423were outside the specified histogram range. For example, if the min value was
42450 and the max 200, then HISTOGRAM\_FLOOR will have value 50, and the count is
425the number of entires which were below this minimum value. The ceiling is the same
426but visa versa. For bin values, the value is the minimum value of the bin, up to
427but not including the next bin.
428
429To enable the binary sink protocol, add a configuration variable `binary_stream`
430to the configuration file with the value `yes`. An example sink is provided in
431`sinks/binary_sink.py`.
432
433