1Configuring Carbon
2==================
3
4Carbon's config files all live in ``/opt/graphite/conf/``. If you've just installed Graphite, none of the ``.conf`` files will exist yet, but there will be a ``.conf.example`` file for each one. Simply copy the example files, removing the .example extension, and customize your settings.
5
6::
7
8  pushd /opt/graphite/conf
9  cp carbon.conf.example carbon.conf
10  cp storage-schemas.conf.example storage-schemas.conf
11
12The example defaults are sane, but they may not meet your information resolution needs or storage limitations.
13
14
15carbon.conf
16-----------
17This is the main config file, and defines the settings for each Carbon daemon.
18
19**Each setting within this file is documented via comments in the config file itself.** The settings are broken down into sections for each daemon - carbon-cache is controlled by the ``[cache]`` section, carbon-relay is controlled by ``[relay]`` and carbon-aggregator by ``[aggregator]``. However, if this is your first time using Graphite, don't worry about anything but the ``[cache]`` section for now.
20
21.. TIP::
22    Carbon-cache and carbon-relay can run on the same host! Try swapping the default ports listed for ``LINE_RECEIVER_PORT`` and ``PICKLE_RECEIVER_PORT`` between the ``[cache]`` and ``[relay]`` sections to prevent having to reconfigure your deployed metric senders. When setting ``DESTINATIONS`` in the ``[relay]`` section, keep in mind your newly-set ``PICKLE_RECEIVER_PORT`` in the ``[cache]`` section.
23
24
25storage-schemas.conf
26--------------------
27This configuration file details retention rates for storing metrics. It matches metric paths to patterns, and tells whisper what frequency and history of datapoints to store.
28
29Important notes before continuing:
30
31* There can be many sections in this file.
32* The sections are applied in order from the top (first) and bottom (last).
33* The patterns are `regular expressions <https://docs.python.org/3/library/re.html#regular-expression-syntax>`_, as opposed to the wildcards used in the URL API.
34* The first pattern that matches the metric name is used.
35* This retention is set at the time the first metric is sent.
36* Changing this file will not affect already-created .wsp files. Use whisper-resize.py to change those.
37
38A given rule is made up of 3 lines:
39
40* A name, specified inside square brackets.
41* A regex, specified after "pattern="
42* A retention rate line, specified after "retentions="
43
44The retentions line can specify multiple retentions. Each retention of ``frequency:history`` is separated by a comma.
45
46Frequencies and histories are specified using the following suffixes:
47
48* s - second
49* m - minute
50* h - hour
51* d - day
52* w - week
53* y - year
54
55
56Here's a simple, single retention example:
57
58.. code-block:: none
59
60 [garbage_collection]
61 pattern = garbageCollections$
62 retentions = 10s:14d
63
64The name ``[garbage_collection]`` is mainly for documentation purposes, and will show up in ``creates.log`` when metrics matching this section are created.
65
66The regular expression ``pattern`` will match any metric that ends with ``garbageCollections``. For example, ``com.acmeCorp.instance01.jvm.memory.garbageCollections`` would match, but ``com.acmeCorp.instance01.jvm.memory.garbageCollections.full`` would not. Graphite is using the `Python Regular Expression Syntax <https://docs.python.org/3/library/re.html#regular-expression-syntax>`_, for an introduction to regular expressions consult the `Regular Expression HOWTO <https://docs.python.org/3/howto/regex.html#regex-howto>`_.
67
68The ``retentions`` line is saying that each datapoint represents 10 seconds, and we want to keep enough datapoints so that they add up to 14 days of data.
69
70Here's a more complicated example with multiple retention rates:
71
72.. code-block:: none
73
74 [apache_busyWorkers]
75 pattern = ^servers\.www.*\.workers\.busyWorkers$
76 retentions = 15s:7d,1m:21d,15m:5y
77
78In this example, imagine that your metric scheme is ``servers.<servername>.<metrics>``. The pattern would match server names that start with 'www', followed by anything, that are sending metrics that end in '.workers.busyWorkers' (note the escaped '.' characters).
79
80Additionally, this example uses multiple retentions. The general rule is to specify retentions from most-precise:least-history to least-precise:most-history -- whisper will properly downsample metrics (averaging by default) as thresholds for retention are crossed.
81
82By using multiple retentions, you can store long histories of metrics while saving on disk space and I/O. Because whisper averages (by default) as it downsamples, one is able to determine totals of metrics by reversing the averaging process later on down the road.
83
84Example: You store the number of sales per minute for 1 year, and the sales per hour for 5 years after that.  You need to know the total sales for January 1st of the year before.  You can query whisper for the raw data, and you'll get 24 datapoints, one for each hour.  They will most likely be floating point numbers.  You can take each datapoint, multiply by 60 (the ratio of high-precision to low-precision datapoints) and still get the total sales per hour.
85
86
87Additionally, whisper supports a legacy retention specification for backwards compatibility reasons - ``seconds-per-datapoint:count-of-datapoints``
88
89.. code-block:: none
90
91  retentions = 60:1440
92
9360 represents the number of seconds per datapoint, and 1440 represents the number of datapoints to store.  This required some unnecessarily complicated math, so although it's valid, it's not recommended.
94
95
96storage-aggregation.conf
97------------------------
98This file defines how to aggregate data to lower-precision retentions.  The format is similar to ``storage-schemas.conf``.
99Important notes before continuing:
100
101* This file is optional.  If it is not present, defaults will be used.
102* The sections are applied in order from the top (first) and bottom (last), similar to ``storage-schemas.conf``.
103* The first pattern that matches the metric name is used, similar to ``storage-schemas.conf``.
104* There is no ``retentions`` line.  Instead, there are ``xFilesFactor`` and/or ``aggregationMethod`` lines.
105* ``xFilesFactor`` should be a floating point number between 0 and 1, and specifies what fraction of the previous retention level's slots must have non-null values in order to aggregate to a non-null value.  The default is 0.5.
106* ``aggregationMethod`` specifies the function used to aggregate values for the next retention level.  Legal methods are ``average``, ``sum``, ``min``, ``max``, and ``last``. The default is ``average``.
107* These are set at the time the first metric is sent.
108* Changing this file will not affect .wsp files already created on disk. Use whisper-set-aggregation-method.py to change those.
109
110Here's an example:
111
112.. code-block:: none
113
114 [all_min]
115 pattern = \.min$
116 xFilesFactor = 0.1
117 aggregationMethod = min
118
119The pattern above will match any metric that ends with ``.min``.
120
121The ``xFilesFactor`` line is saying that a minimum of 10% of the slots in the previous retention level must have values for next retention level to contain an aggregate.
122The ``aggregationMethod`` line is saying that the aggregate function to use is ``min``.
123
124If either ``xFilesFactor`` or ``aggregationMethod`` is left out, the default value will be used.
125
126The aggregation parameters are kept separate from the retention parameters because the former depends on the type of data being collected and the latter depends on volume and importance.
127
128If you want to change aggregation methods for existing data, be sure that you update the whisper files as well.
129
130Example:
131
132.. code-block:: none
133
134  /opt/graphite/bin/whisper-set-aggregation-method.py /opt/graphite/storage/whisper/test.wsp max
135
136This example sets the aggregation for the test.wsp to max. (The location of the python script depends on your installation)
137
138
139relay-rules.conf
140----------------
141Relay rules are used to send certain metrics to a certain backend. This is handled by the carbon-relay system.  It must be running for relaying to work. You can use a regular expression to select the metrics and define the servers to which they should go with the servers line.
142
143Example:
144
145.. code-block:: none
146
147  [example]
148  pattern = ^mydata\.foo\..+
149  servers = 10.1.2.3, 10.1.2.4:2004, myserver.mydomain.com
150
151You must define at least one section as the default.
152
153
154aggregation-rules.conf
155----------------------
156Aggregation rules allow you to add several metrics together as they come in, reducing the need to sum() many metrics in every URL. Note that unlike some other config files, any time this file is modified it will take effect automatically. This requires the carbon-aggregator service to be running.
157
158The form of each line in this file should be as follows:
159
160.. code-block:: none
161
162  output_template (frequency) = method input_pattern
163
164This will capture any received metrics that match 'input_pattern'
165for calculating an aggregate metric. The calculation will occur
166every 'frequency' seconds and the 'method' can specify 'sum' or
167'avg'. The name of the aggregate metric will be derived from
168'output_template' filling in any captured fields from 'input_pattern'.
169Any metric that will arrive to ``carbon-aggregator`` will proceed to its
170output untouched unless it is overridden by some rule.
171
172For example, if your metric naming scheme is:
173
174.. code-block:: none
175
176  <env>.applications.<app>.<server>.<metric>
177
178You could configure some aggregations like so:
179
180.. code-block:: none
181
182  <env>.applications.<app>.all.requests (60) = sum <env>.applications.<app>.*.requests
183  <env>.applications.<app>.all.latency (60) = avg <env>.applications.<app>.*.latency
184
185As an example, if the following metrics are received:
186
187.. code-block:: none
188
189  prod.applications.apache.www01.requests
190  prod.applications.apache.www02.requests
191  prod.applications.apache.www03.requests
192  prod.applications.apache.www04.requests
193  prod.applications.apache.www05.requests
194
195They would all go into the same aggregation buffer and after 60 seconds the
196aggregate metric ``prod.applications.apache.all.requests`` would be calculated
197by summing their values.
198
199Template components such as <env> will match everything up to the next dot.
200To match metric multiple components including the dots, use <<metric>> in the input template:
201
202.. code-block:: none
203
204  <env>.applications.<app>.all.<app_metric> (60) = sum <env>.applications.<app>.*.<<app_metric>>
205
206It is also possible to use regular expressions. Following the example above when using:
207
208.. code-block:: none
209
210  <env>.applications.<app>.<domain>.requests (60) = sum <env>.applications.<app>.<domain>\d{2}.requests
211
212You will end up with ``prod.applications.apache.www.requests`` instead of ``prod.applications.apache.all.requests``.
213
214Another common use pattern of ``carbon-aggregator`` is to aggregate several data points
215of the *same metric*. This could come in handy when you have got the same metric coming from
216several hosts, or when you are bound to send data more frequently than your shortest retention.
217
218rewrite-rules.conf
219------------------
220
221Rewrite rules allow you to rewrite metric names using Python regular
222expressions. Note that unlike some other config files, any time this file is
223modified it will take effect automatically. This requires the carbon-aggregator
224service to be running.
225
226The form of each line in this file should be as follows:
227
228.. code-block:: none
229
230  regex-pattern = replacement-text
231
232This will capture any received metrics that match 'regex-pattern' and rewrite
233the matched portion of the text with 'replacement-text'. The 'regex-pattern'
234must be a valid Python regular expression, and the 'replacement-text' can be any
235value. You may also use capture groups:
236
237.. code-block:: none
238
239  ^collectd\.([a-z0-9]+)\. = \1.system.
240
241Which would result in:
242
243.. code-block:: none
244
245  collectd.prod.cpu-0.idle-time => prod.system.cpu-0.idle-item
246
247rewrite-rules.conf consists of two sections, [pre] and [post]. The rules in the
248pre section are applied to metric names as soon as they are received. The post
249rules are applied after aggregation has taken place.
250
251For example:
252
253.. code-block:: none
254
255  [post]
256  _sum$ =
257  _avg$ =
258
259These rules would strip off a suffix of _sum or _avg from any metric names after
260aggregation.
261
262**Note:** if you plan to use the ``=`` sign in your rewrite rules. Use its octal value: ``\075``.
263For example ``foo=bar = foo.bar`` would be ``foo\075bar = foo.bar``
264
265whitelist and blacklist
266-----------------------
267The whitelist functionality allows any of the carbon daemons to only accept metrics that are explicitly
268whitelisted and/or to reject blacklisted metrics. The functionality can be enabled in carbon.conf with
269the ``USE_WHITELIST`` flag. This can be useful when too many metrics are being sent to a Graphite
270instance or when there are metric senders sending useless or invalid metrics.
271
272``GRAPHITE_CONF_DIR`` is searched for ``whitelist.conf`` and ``blacklist.conf``. Each file contains one regular
273expressions per line to match against metric values. If the whitelist configuration is missing or empty,
274all metrics will be passed through by default.
275