• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..09-Feb-2022-

api/H09-Feb-2022-17,02215,536

eval/H09-Feb-2022-1,068860

logging/H09-Feb-2022-2516

metrics/H09-Feb-2022-332270

models/H09-Feb-2022-1,079833

notifier/H09-Feb-2022-10,4058,921

schedule/H09-Feb-2022-2,5832,084

sender/H09-Feb-2022-205158

state/H09-Feb-2022-3,1082,910

store/H09-Feb-2022-1,228950

tests/H09-Feb-2022-138120

README.mdH A D09-Feb-20225 KiB8764

ngalert.goH A D09-Feb-20227.4 KiB218172

ngalert_test.goH A D09-Feb-20223 KiB7467

README.md

1# Next generation alerting (ngalert) in Grafana 8
2
3Ngalert (Next generation alert) is the next generation of alerting in Grafana 8.
4
5## Overview
6
7The ngalert package can be found in `pkg/services/ngalert` and has the following sub-packages:
8
9    - api
10    - eval
11    - logging
12    - metrics
13    - models
14    - notifier
15    - schedule
16    - sender
17    - state
18    - store
19    - tests
20
21## Scheduling and evaluation of alert rules
22
23The scheduling of alert rules happens in the `schedule` package. This package is responsible for managing the evaluation
24of alert rules including checking for new alert rules and stopping the evaluation of deleted alert rules.
25
26The scheduler runs at a fixed interval, called its heartbeat, in which it does a number of tasks:
27
281. Fetch the alert rules for all organizations (excluding disabled)
292. Start a goroutine (if this is a new alert rule or the scheduler has just started) to evaluate the alert rule
303. Send an `*evalContext` event to the goroutine for each alert rule if its interval has elapsed
314. Stop the goroutines for all alert rules that have been deleted since the last heartbeat
32
33The function that evaluates each alert rule is called `ruleRoutine`. It waits for an `*evalContext` event (sent each
34interval seconds elapsed and is configurable per alert rule) and then evaluates the alert rule. To ensure that the
35scheduler is evaluating the latest version of the alert rule it compares its local version of the alert rule with that
36in the `*evalContext` event, fetching the latest version of the alert rule from the database if the version numbers
37mismatch. It then invokes the Evaluator which evaluates any queries, classic conditions or expressions in alert rule
38and passes the results of this evaluation to the State Manager. An evaluation can return no results in the case of
39NoData or Error, a single result in the case of classic conditions, or more than one result if the alert rule is
40multi-dimensional (i.e. one result per label set). In the case of multi-dimensional alert rules the results from an
41evaluation should never contain more than one per label set.
42
43The State Manager is responsible for determining the current state of the alert rule (normal, pending, firing, etc) by
44comparing each evaluation result to the previous evaluations of the same label set in the state cache. Given a label set,
45it updates the state cache with the new current state, the evaluation time of the current evaluation and appends the
46current evaluation to the slice of previous evaluations. If the alert changes state (i.e. pending to firing)
47then it also creates an annotation to mark it on the dashboard and panel for this alert rule.
48
49You might have noticed that so far we have avoided using the word "Alert" and instead talked about evaluation results
50and the current state of an alert rule. The reason for that is at this time in the evaluation of an alert rule the
51State Manager does not know about alerts, it just knows for each label set the state of an alert rule, the current
52evaluation and previous evaluations.
53
54## Notification of alerts
55
56When an evaluation transitions the state of an alert rule for a given label set from pending to firing or from firing
57to normal the scheduler creates an alert instance and passes it to Alertmanager. In the case where a label set is
58transitioning from pending to firing the state of the alert instance is "Firing" and when transitioning from firing to
59normal the state of the alert instance is "Normal".
60
61### Which Alertmanager?
62
63In ngalert it is possible to send alerts to the internal Alertmanager, an external Alertmanager, or both.
64
65The internal Alertmanager is called `MultiOrgAlertmanager` and creates an Alertmanager for each organization in
66Grafana to preserve isolation between organizations in Grafana. The `MultiOrgAlertmanager` receives alerts from the
67scheduler and then forwards the alert to the correct Alertmanager for the organization.
68
69When Grafana is configured to send alerts to an external Alertmanager it does so via the sender which creates an
70abstraction over notification of alerts and discovery of external Alertmanagers in Prometheus. The sender receives
71alerts via the `SendAlerts` function and then passes them to Prometheus.
72
73### How does Alertmanager turn alerts into notifications?
74
75Alertmanager receives alerts via the `PutAlerts` function. Each alert is validated and its annotations and labels are
76normalized, then the alerts are put in an in-memory structure. The dispatcher iterates over the alerts and matches
77it to a route in the configuration as explained [here](https://prometheus.io/docs/alerting/latest/configuration/#route).
78
79The alert is then matched to an alert group depending on the configuration in the route. The alert is then sent through
80a number of stages including silencing and inhibition and at last the receiver which can include wait, de-duplication,
81retry.
82
83### What are notification channels?
84
85Notification channels receive alerts and turn them into notifications and is often the last callback in the receiver
86after wait, de-duplication and retry.
87