• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..11-Oct-2021-

aws/H11-Oct-2021-575458

azure/H11-Oct-2021-787630

consul/H11-Oct-2021-1,071840

digitalocean/H11-Oct-2021-1,004887

dns/H11-Oct-2021-664513

eureka/H11-Oct-2021-796654

file/H11-Oct-2021-959730

gce/H11-Oct-2021-228170

hetzner/H11-Oct-2021-1,2291,023

http/H11-Oct-2021-615523

install/H11-Oct-2021-4023

kubernetes/H11-Oct-2021-5,3744,545

linode/H11-Oct-2021-946796

marathon/H11-Oct-2021-1,2631,039

moby/H11-Oct-2021-6,8946,507

openstack/H11-Oct-2021-1,3321,118

puppetdb/H11-Oct-2021-582461

refresh/H11-Oct-2021-212157

scaleway/H11-Oct-2021-1,034831

targetgroup/H11-Oct-2021-251188

triton/H11-Oct-2021-541429

xds/H11-Oct-2021-1,7651,331

zookeeper/H11-Oct-2021-360269

README.mdH A D11-Oct-202111.2 KiB268213

discovery.goH A D11-Oct-20213.7 KiB11859

manager.goH A D11-Oct-20219.5 KiB357278

manager_test.goH A D11-Oct-202127.7 KiB1,1411,054

registry.goH A D11-Oct-20217.7 KiB259201

README.md

1# Service Discovery
2
3This directory contains the service discovery (SD) component of Prometheus.
4
5## Design of a Prometheus SD
6
7There are many requests to add new SDs to Prometheus, this section looks at
8what makes a good SD and covers some of the common implementation issues.
9
10### Does this make sense as an SD?
11
12The first question to be asked is does it make sense to add this particular
13SD? An SD mechanism should be reasonably well established, and at a minimum in
14use across multiple organizations. It should allow discovering of machines
15and/or services running somewhere. When exactly an SD is popular enough to
16justify being added to Prometheus natively is an open question.
17
18Note: As part of lifting the past moratorium on new SD implementations it was
19agreed that, in addition to the existing requirements, new service discovery
20implementations will be required to have a committed maintainer with push access (i.e., on -team).
21
22It should not be a brand new SD mechanism, or a variant of an established
23mechanism. We want to integrate Prometheus with the SD that's already there in
24your infrastructure, not invent yet more ways to do service discovery. We also
25do not add mechanisms to work around users lacking service discovery and/or
26configuration management infrastructure.
27
28SDs that merely discover other applications running the same software (e.g.
29talk to one Kafka or Cassandra server to find the others) are not service
30discovery. In that case the SD you should be looking at is whatever decides
31that a machine is going to be a Kafka server, likely a machine database or
32configuration management system.
33
34If something is particularly custom or unusual, `file_sd` is the generic
35mechanism provided for users to hook in. Generally with Prometheus we offer a
36single generic mechanism for things with infinite variations, rather than
37trying to support everything natively (see also, alertmanager webhook, remote
38read, remote write, node exporter textfile collector). For example anything
39that would involve talking to a relational database should use `file_sd`
40instead.
41
42For configuration management systems like Chef, while they do have a
43database/API that'd in principle make sense to talk to for service discovery,
44the idiomatic approach is to use Chef's templating facilities to write out a
45file for use with `file_sd`.
46
47
48### Mapping from SD to Prometheus
49
50The general principle with SD is to extract all the potentially useful
51information we can out of the SD, and let the user choose what they need of it
52using
53[relabelling](https://prometheus.io/docs/operating/configuration/#<relabel_config>).
54This information is generally termed metadata.
55
56Metadata is exposed as a set of key/value pairs (labels) per target. The keys
57are prefixed with `__meta_<sdname>_<key>`, and there should also be an `__address__`
58label with the host:port of the target (preferably an IP address to avoid DNS
59lookups). No other labelnames should be exposed.
60
61It is very common for initial pull requests for new SDs to include hardcoded
62assumptions that make sense for the author's setup. SD should be generic,
63any customisation should be handled via relabelling. There should be basically
64no business logic, filtering, or transformations of the data from the SD beyond
65that which is needed to fit it into the metadata data model.
66
67Arrays (e.g. a list of tags) should be converted to a single label with the
68array values joined with a comma. Also prefix and suffix the value with a
69comma. So for example the array `[a, b, c]` would become `,a,b,c,`. As
70relabelling regexes are fully anchored, this makes it easier to write correct
71regexes against (`.*,a,.*` works no matter where `a` appears in the list). The
72canonical example of this is `__meta_consul_tags`.
73
74Maps, hashes and other forms of key/value pairs should be all prefixed and
75exposed as labels. For example for EC2 tags, there would be
76`__meta_ec2_tag_Description=mydescription` for the Description tag. Labelnames
77may only contain `[_a-zA-Z0-9]`, sanitize by replacing with underscores as needed.
78
79For targets with multiple potential ports, you can a) expose them as a list, b)
80if they're named expose them as a map or c) expose them each as their own
81target. Kubernetes SD takes the target per port approach. a) and b) can be
82combined.
83
84For machine-like SDs (OpenStack, EC2, Kubernetes to some extent) there may
85be multiple network interfaces for a target. Thus far reporting the details
86of only the first/primary network interface has sufficed.
87
88
89### Other implementation considerations
90
91SDs are intended to dump all possible targets. For example the optional use of
92EC2 service discovery would be to take the entire region's worth of EC2
93instances it provides and do everything needed in one `scrape_config`. For
94large deployments where you are only interested in a small proportion of the
95returned targets, this may cause performance issues. If this occurs it is
96acceptable to also offer filtering via whatever mechanisms the SD exposes. For
97EC2 that would be the `Filter` option on `DescribeInstances`. Keep in mind that
98this is a performance optimisation, it should be possible to do the same
99filtering using relabelling alone. As with SD generally, we do not invent new
100ways to filter targets (that is what relabelling is for), merely offer up
101whatever functionality the SD itself offers.
102
103It is a general rule with Prometheus that all configuration comes from the
104configuration file. While the libraries you use to talk to the SD may also
105offer other mechanisms for providing configuration/authentication under the
106covers (EC2's use of environment variables being a prime example), using your SD
107mechanism should not require this. Put another way, your SD implementation
108should not read environment variables or files to obtain configuration.
109
110Some SD mechanisms have rate limits that make them challenging to use. As an
111example we have unfortunately had to reject Amazon ECS service discovery due to
112the rate limits being so low that it would not be usable for anything beyond
113small setups.
114
115If a system offers multiple distinct types of SD, select which is in use with a
116configuration option rather than returning them all from one mega SD that
117requires relabelling to select just the one you want. So far we have only seen
118this with Kubernetes. When a single SD with a selector vs.  multiple distinct
119SDs makes sense is an open question.
120
121If there is a failure while processing talking to the SD, abort rather than
122returning partial data. It is better to work from stale targets than partial
123or incorrect metadata.
124
125The information obtained from service discovery is not considered sensitive
126security wise. Do not return secrets in metadata, anyone with access to
127the Prometheus server will be able to see them.
128
129
130## Writing an SD mechanism
131
132### The SD interface
133
134A Service Discovery (SD) mechanism has to discover targets and provide them to Prometheus. We expect similar targets to be grouped together, in the form of a [target group](https://pkg.go.dev/github.com/prometheus/prometheus/discovery/targetgroup#Group). The SD mechanism sends the targets down to prometheus as list of target groups.
135
136An SD mechanism has to implement the `Discoverer` Interface:
137```go
138type Discoverer interface {
139	Run(ctx context.Context, up chan<- []*targetgroup.Group)
140}
141```
142
143Prometheus will call the `Run()` method on a provider to initialize the discovery mechanism. The mechanism will then send *all* the target groups into the channel.
144Now the mechanism will watch for changes. For each update it can send all target groups, or only changed and new target groups, down the channel. `Manager` will handle
145both cases.
146
147For example if we had a discovery mechanism and it retrieves the following groups:
148
149```go
150[]targetgroup.Group{
151	{
152		Targets: []model.LabelSet{
153			{
154				"__instance__": "10.11.150.1:7870",
155				"hostname":     "demo-target-1",
156				"test":         "simple-test",
157			},
158			{
159				"__instance__": "10.11.150.4:7870",
160				"hostname":     "demo-target-2",
161				"test":         "simple-test",
162			},
163		},
164		Labels: model.LabelSet{
165			"job": "mysql",
166		},
167		"Source": "file1",
168	},
169	{
170		Targets: []model.LabelSet{
171			{
172				"__instance__": "10.11.122.11:6001",
173				"hostname":     "demo-postgres-1",
174				"test":         "simple-test",
175			},
176			{
177				"__instance__": "10.11.122.15:6001",
178				"hostname":     "demo-postgres-2",
179				"test":         "simple-test",
180			},
181		},
182		Labels: model.LabelSet{
183			"job": "postgres",
184		},
185		"Source": "file2",
186	},
187}
188```
189
190Here there are two target groups one group with source `file1` and another with `file2`. The grouping is implementation specific and could even be one target per group. But, one has to make sure every target group sent by an SD instance should have a `Source` which is unique across all the target groups of that SD instance.
191
192In this case, both the target groups are sent down the channel the first time `Run()` is called. Now, for an update, we need to send the whole _changed_ target group down the channel. i.e, if the target with `hostname: demo-postgres-2` goes away, we send:
193```go
194&targetgroup.Group{
195	Targets: []model.LabelSet{
196		{
197			"__instance__": "10.11.122.11:6001",
198			"hostname":     "demo-postgres-1",
199			"test":         "simple-test",
200		},
201	},
202	Labels: model.LabelSet{
203		"job": "postgres",
204	},
205	"Source": "file2",
206}
207```
208down the channel.
209
210If all the targets in a group go away, we need to send the target groups with empty `Targets` down the channel. i.e, if all targets with `job: postgres` go away, we send:
211```go
212&targetgroup.Group{
213	Targets:  nil,
214	"Source": "file2",
215}
216```
217down the channel.
218
219### The Config interface
220
221Now that your service discovery mechanism is ready to discover targets, you must help
222Prometheus discover it. This is done by implementing the `discovery.Config` interface
223and registering it with `discovery.RegisterConfig` in an init function of your package.
224
225```go
226type Config interface {
227	// Name returns the name of the discovery mechanism.
228	Name() string
229
230	// NewDiscoverer returns a Discoverer for the Config
231	// with the given DiscovererOptions.
232	NewDiscoverer(DiscovererOptions) (Discoverer, error)
233}
234
235type DiscovererOptions struct {
236	Logger log.Logger
237}
238```
239
240The value returned by `Name()` should be short, descriptive, lowercase, and unique.
241It's used to tag the provided `Logger` and as the part of the YAML key for your SD
242mechanism's list of configs in `scrape_config` and `alertmanager_config`
243(e.g. `${NAME}_sd_configs`).
244
245### New Service Discovery Check List
246
247Here are some non-obvious parts of adding service discoveries that need to be verified:
248
249- Validate that discovery configs can be DeepEqualled by adding them to
250  `config/testdata/conf.good.yml` and to the associated tests.
251
252- If the config contains file paths directly or indirectly (e.g. with a TLSConfig or
253  HTTPClientConfig field), then it must implement `config.DirectorySetter`.
254
255- Import your SD package from `prometheus/discovery/install`. The install package is
256  imported from `main` to register all builtin SD mechanisms.
257
258- List the service discovery in both `<scrape_config>` and
259  `<alertmanager_config>` in `docs/configuration/configuration.md`.
260
261<!-- TODO: Add best-practices -->
262
263### Examples of Service Discovery pull requests
264
265The examples given might become out of date but should give a good impression about the areas touched by a new service discovery.
266
267- [Eureka](https://github.com/prometheus/prometheus/pull/3369)
268