1[![Download](https://api.bintray.com/packages/nbari/epazote/epazote/images/download.svg)](https://bintray.com/nbari/epazote/epazote/_latestVersion)
2[![Build Status](https://travis-ci.org/epazote/epazote.svg?branch=master)](https://travis-ci.org/epazote/epazote)
3[![Coverage Status](https://coveralls.io/repos/github/epazote/epazote/badge.svg?branch=develop)](https://coveralls.io/github/epazote/epazote?branch=develop)
4[![Go Report Card](https://goreportcard.com/badge/github.com/epazote/epazote)](https://goreportcard.com/report/github.com/epazote/epazote)
5
6# Epazote ��
7Automated HTTP (microservices) supervisor
8
9**Epazote** automatically update/add services specified in a file call
10``epazote.yml``. Periodically checks the defined endpoints and execute recovery
11commands in case services responses are not behaving like expected helping with
12this to automate actions in order to keep services/applications up and running.
13
14In Continuous Integration/Deployment environments the file ``epazote.yml`` can
15dynamically be updated/change without need to restart the supervisor, avoiding
16with this an extra dependency on the deployment flow which could imply to
17restart the supervisor, in this case **Epazote**.
18
19## How it works
20In its basic way of operation, **Epazote** periodically checks the services endpoints
21"[URLs](https://en.wikipedia.org/wiki/Uniform_Resource_Locator)"
22by doing an [HTTP GET Request](https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol#Request_methods),
23based on the response [Status code](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes),
24[Headers](https://en.wikipedia.org/wiki/List_of_HTTP_header_fields) or
25either the
26[body](https://en.wikipedia.org/wiki/HTTP_message_body), it executes a command.
27
28In most scenarios, is desired to apply a command directly to the application in
29cause, like a signal (``kill -HUP``), or either a restart (``sv restart app``),
30therefore in this case **Epazote** and the application should be running on the
31same server.
32
33**Epazote** can also work in a standalone mode by only monitoring and sending
34alerts if desired.
35
36# How to use it
37First you need to install **Epazote**, either you can compile it from
38[source](https://github.com/nbari/epazote)
39or download a pre-compiled binary matching your operating system from here:
40https://dl.bintray.com/nbari/epazote/
41
42 [![Download](https://api.bintray.com/packages/nbari/epazote/epazote/images/download.svg)](https://bintray.com/nbari/epazote/epazote/_latestVersion)
43
44> To compile from source, after downloading the sources use ``make`` to build the binary
45
46**Epazote** was designed with simplicity in mind, as an easy tool for
47[DevOps](https://en.wikipedia.org/wiki/DevOps) and as a complement to
48infrastructure orchestration tools like [Ansible](http://www.ansible.com/) and
49[SaltStack](http://saltstack.com/), because of this [YAML](http://www.yaml.org/)
50is used for the configuration files, avoiding with this, the learn of a new
51language or syntax and simplifying the setup.
52
53## Basic example
54
55```yaml
56services:
57    google:
58        url: https://www.google.com
59        seconds: 5
60        expect:
61            status: 302
62            ssl:
63                hours: 72
64            if_not:
65                cmd: echo -n "google down"
66```
67
68To supervise ``google`` you would run (basic.yml is a file containing the above code):
69
70    $ epazote -f /path/to/yaml/file/basic.yml -d
71
72> -d is for debugging, will print all output to standard output.
73
74This basic setup will supervise every 5 seconds the service with name
75``google``, it will do an HTTP GET to ``http://www.google.com`` and will expect
76an ``302 Status code`` if not,  it will ``echo -n "google down"``
77
78The ``ssl: hours: 72`` means to send an alert if the certificate is about to
79expire in the next 72 hours.
80
81Extending the basic example for receiving notifications:
82
83```yaml
84config:
85    smtp:
86        username: smtp@domain.tld
87        password: password
88        server: mail.example.com
89        port: 587
90        headers:
91            from: you@domain.tld
92            to: team@domain.tld
93            subject: "[name - exit- status]"
94
95services:
96    google:
97        url: http://www.google.com
98        minutes: 3
99        expect:
100            status: 200
101            if_not:
102                cmd: echo -n "google down"
103                notify: yes
104```
105
106In this case, every 3 minutes the service will be checked and in case of not
107receiving a ``200 Status code``, besides executing the command: ``echo -n
108"google down"`` an email is going to be send to ``team@domain.tld``, this
109because of the ``notify: yes`` setting.
110
111## The configuration file
112
113The configuration file ([YAML formated](https://en.wikipedia.org/wiki/YAML))
114consists of two parts, a **config** and a **services** (Key-value pairs).
115
116## The config section
117
118The **config** section is composed of:
119
120    - smtp (Email settings for sending notification)
121    - scan (Paths used to find the file 'epazote.yml')
122
123Example:
124
125```yaml
126config:
127    smtp:
128        username: smtp@domain.tld
129        password: password
130        server: mail.example.com
131        port: 587
132        headers:
133            from: epazote@domain.tld
134            to: team@domain.tld ops@domain.tld etc@domain.tld
135            subject: "[_name_, _because_]"
136    scan:
137        paths:
138            - /arena/home/sites
139            - /home/apps
140        minutes: 5
141```
142
143### config - smtp
144
145Required to properly send alerts via email, all fields are required, the
146``headers`` section can be extended with any desired key-pair values.
147
148### config - smtp - subject
149The subject can be formed by using this keywords: ``_because_`` ``_exit_``
150``_name_`` ``_output_`` ``_status_`` ``_url_`` on the previous example,
151``subject: [_name_, _status_]`` would transform to ``[my service - 500]``
152the ``name`` has replaced by the service name, ``my service`` and
153``status`` by the response status code ``500`` in this case.
154
155### config - scan
156
157Paths to scan every N ``seconds``, ``minutes`` or ``hours``, a search for
158services specified in a file call ``epazote.yml`` is made.
159
160The **scan** setting is optional however is very useful when doing Continues
161Deployments. for example if your code is automatically uploaded to the
162directory ``/arena/home/sites/application_1`` and your scan paths contain
163``/arena/home/sites``, you could simple upload on your application directory a
164file named ``epazote.yml`` with the service rules, thus achieving the deployment
165of your application and the supervising at the same time.
166
167### config (optional)
168
169As you may notice the ``config`` section contains mainly settings for sending
170alerts/notifications apart from the ``scan`` setting, therefore is totally
171optional, meaning that **Epazote** can still run and check your services without
172the need of the ``config`` section.
173
174If you want to automatically update/load services you will need the
175``config - scan`` setting.
176
177
178## The services section
179
180Services are the main functionality of **Epazote**, is where the URL's and the
181rules based on the response are defined, since options vary from service to
182service, an example could help better to understand the setup:
183
184```yaml
185services:
186    my service 1:
187        url: http://myservice.domain.tld/_healthcheck_
188        timeout: 5
189        seconds: 60
190        log: http://monitor.domain.tld
191        expect:
192            status: 200
193            header:
194                content-type: application/json
195            body: find this string on my site
196            if_not:
197                cmd: sv restart /services/my_service_1
198                notify: team@domain.tld
199                msg: |
200                    line 1 bla bla
201                    line 2
202        if_status:
203            500:
204                cmd: reboot
205            404:
206                cmd: sv restart /services/cache
207                msg: restarting cache
208                notify: team@domain.tld x@domain.tld
209        if_header:
210            x-amqp-kapputt:
211                cmd: restart abc
212                notify: bunny@domain.tld
213                msg: |
214                    The rabbit is angry
215                    & hungry
216            x-db-kapputt:
217                cmd: svc restart /services/db
218
219    other service:
220        url: https://self-signed.ssl.tld/ping
221        header:
222            Origin: http://localhost
223            Accept-Encoding: gzip
224        insecure: true
225        minutes: 3
226
227    redirect service:
228        url: http://test.domain.tld/
229        follow: yes
230        hour: 1
231        expect:
232            status: 302
233            if_not:
234                cmd: service restart abc
235                notify: yes
236                emoji: 1F600-1F621
237
238    salt-master:
239        test: pgrep -f salt
240        if_not:
241            cmd: service restart salt_master
242            notify: operations@domain.tld
243```
244
245### services - name of service (string)
246An unique string that identifies your service, in the above example, there are 3
247services named:
248 - my service 1
249 - other service
250 - redirect service
251
252### services - url (string)
253URL of the service to supervise
254
255### services - follow (boolean true/false)
256By default if a [302 Status code](https://en.wikipedia.org/wiki/HTTP_302) is
257received, **Epazote** will not follow it, if you would like to follow all
258redirects, this setting must be set to **true**.
259
260### services - insecure (boolean true/false)
261This option explicitly allows **Epazote** to perform "insecure" SSL connections.
262It will disable the certificate verification.
263
264### services - stop (int)
265Defines the number or times the ``cmd`` will be executed, by default the ``cmd``
266is executed only once, with the intention to avoid indefinitely loops. If value
267is set to ``-1`` the ``cmd`` never stops. defaults to 0, ``stop 2`` will execute
268"0, 1, 2" (3 times) the ``cmd``.
269
270### services - timeout in seconds (int)
271Timeout specifies a time limit for the HTTP requests, A value of zero means no
272timeout, defaults to 5 seconds.
273
274### services - retry_limit (int)
275Specifies the number of times to retry an request, defaults to 3.
276
277### services - retry_interval (int)
278Specifies the time between attempts in milliseconds. The default value is 500 (0.5 seconds).
279
280### services - read_limit (int)
281Read only ``N`` number of bytes instead of the full
282body. This helps to make a more "complete" request and
283avoid getting an HTTP status code [408 when testing aws ELB](http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/ts-el b-error-message.html#ts-elb-errorcodes-http408).
284
285### services - seconds, minutes, hours
286How often to check the service, the options are: (Only one should be used)
287 - seconds N
288 - minutes N
289 - hours N
290
291``N`` should be an integer.
292
293### services - log (URL)
294An URL to post all events, default disabled.
295
296### services - expect
297The ``expect`` block options are:
298- status (int)
299- header (key, value)
300- body   (regular expression)
301- if_not (Action block)
302
303### services - expect - status
304An Integer representing the expected [HTTP Status Code](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes)
305
306### services - expect - header (start_with match)
307A key-value map of expected headers, it can be only one or more.
308
309The headers will be considered valid if they starts with the required value,
310for example if you want to check for ``Content-type: application/json; charset=utf-8``
311you can simple do something like:
312
313```yaml
314    header:
315        Content-Type: application/json
316```
317
318This helps to simplify the matching and useful in cases where the headers
319changes, for example: ``Content-Range: bytes 100-64656926/64656927`` can be
320matched with:
321
322```yaml
323    header:
324        Content-Range: bytes
325```
326
327### services - expect - body
328A [regular expression](https://en.wikipedia.org/wiki/Regular_expression) used
329to match a string on the body of the site, use full in cases you want to ensure
330that the content delivered is always the same or keeps a pattern.
331
332### services - expect (How it works)
333The ``expect`` logic tries to implement a
334[if-else](https://en.wikipedia.org/wiki/if_else) logic ``status``, ``header``,
335``body`` are the **if** and the ``if_not`` block becomes the **else**.
336
337    if
338        status
339        header
340        body
341    else:
342        if_not
343
344In must cases only one option is required, check on the above example for the service named "redirect service".
345
346In case that more than one option is used, this is the order in how they are evaluated, no meter how they where introduced on the configuration file:
347
348    1. body
349    2. status
350    3. header
351
352The reason for this order is related to performance, at the end we want to
353monitor/supervise the services in an efficient way avoiding to waste extra
354resources, in must cases only the HTTP Headers are enough to take an action,
355therefore we don't need to read the full body page, because of this if no
356``body`` is defined, **Epazote** will only read the Headers saving with this
357time and process time.
358
359### services - expect - if_not
360``if_not`` is a block with an action of what to do it we don't get what we where
361expecting (``expect``). See services - Actions
362
363### services - if_status  & if_header
364There maybe cases in where third-party dependencies are down and because of this
365your application could not be working properly, for this cases the ``if_status``
366and ``if_header`` could be useful.
367
368For example if the database is your application could start responding an status
369code 500 or either a custom header and based on does values take execute an
370action:
371
372The format for ``if_status`` is a key-pair where key is an int representing an
373HTTP status code, and the value an Action option
374
375The format for ``if_header`` is a key-pair where key is a string of something
376you could relate/match and has in other if_X conditions, value is an Action.
377
378This are the only ``if's`` and the order of execution:
379 1. if_status
380 2. if_header
381 3. if_not
382
383This means that if a service uses ``if_status`` and ``if_not``, it will
384evaluate first the ``if_status`` and execute an Action if required, in case
385an ``if_status`` and ``if_header`` are set, same applies, first is evaluated
386``if_status``, then ``if_header`` and last ``if_not``.
387
388## services - Actions
389An Action has five options:
390 - cmd
391 - notify
392 - msg
393 - emoji
394 - http
395
396They can be used all together, only one or either none.
397
398### services - Actions - cmd (string)
399``cmd`` Contains the command to be executed.
400
401### services - Actions - notify (string)
402``notify`` Should contain ``yes``, the email email address or addresses (space separated)
403of the recipients that will be notified when the action is executed.
404
405If the string is ``yes`` the global recipients will be used.
406
407### services - Actions - msg (list)
408```yaml
409msg:
410 - send this if exit 0 (all OK)
411 - send this if exit 1 (something is wrong)
412```
413Based on the exit status either msg[0] or msg[1] is used,
414
415### services - Actions - emoji (list)
416``emoji`` [Unicode](https://en.wikipedia.org/wiki/Unicode) characters
417to be used in the subject, example:
418```yaml
419emoji:
420  - 1F600
421  - 1F621
422```
423If services are OK they will use the first ``1F600`` if not they will
424use ``1F621``, if set to ``0`` no emoji will be used. The idea behind using
425[unicode/emoji](https://en.wikipedia.org/wiki/Emoticons_(Unicode_block))
426is to cough attention faster and not just ignore the email thinking is spam.
427
428### service - Actions - http (list(key, value))
429A custom URL to GET/POST depending on the exit status, example:
430```yaml
431http:
432  - url: "https://api.hipchat.com/v1/rooms/message?auth_token=your_token&room_id=7&from=Alerts&message=service+OK+_name_+_because_"
433  - url: "https://api.hipchat.com/"
434    header:
435      Content-Type: application/x-www-form-urlencoded
436    data: |
437     room_id=10&from=Alerts&message=_name_+exit+code+_exit_
438    method: POST
439```
440When a service fails or returns an exit 1 the second url
441``https://api.hipchat.com/`` with method ``POST`` and the custom ``data``
442will be used, notice that all the ocurances on the data that are within an
443``_(key)_`` will be replaced with the corresponding value, in this case:
444
445     room_id=10&from=Alerts&message=_name_+exit+code+_exit_
446
447will be replaced with:
448
449     room_id=10&from=Alerts&message=SERVICE NAME+exit+code+0
450
451When recovery the first url will be used, in this case will be a GET instead of a post, so:
452
453    https://api.hipchat.com/v1/rooms/message?auth_token=your_token&room_id=7&from=Alerts&message=service+OK+_name_+_because_
454
455becomes:
456
457    https://api.hipchat.com/v1/rooms/message?auth_token=your_token&room_id=7&from=Alerts&message=service+OK+SERVICE+NAME+STATUS+200
458
459> notice that the _name_, _exit_, _because_ are been replaced with the values of name, exit, because of the service.
460
461
462## services - Test
463**Epazote** It is mainly used for HTTP services, for supervising other
464applications that don't listen or accept HTTP connections, like a database,
465cache engine, etc. There are tools like
466[daemontools](https://cr.yp.to/daemontools.html),
467[runit](http://smarden.org/runit/) as already mentioned, even so, **Epazote**
468can eventually be used to execute an action based on the exit of a command
469for example:
470
471```yaml
472    salt-master:
473        test: pgrep -f salt
474        if_not:
475            cmd: service restart salt_master
476            notify: operations@domain.tld
477```
478
479In this case: ``test: pgrep -f salt`` will execute the ``cmd`` on the ``if_not``
480block in case the exit code is > 0, from the ``pgrep`` man page:
481
482```txt
483EXIT STATUS
484     The pgrep and pkill utilities return one of the following values upon exit:
485
486          0       One or more processes were matched.
487          1       No processes were matched.
488          2       Invalid options were specified on the command line.
489          3       An internal error occurred.
490```
491
492
493## Extra setup
494*green dots give some comfort* -- Because of this when using the ``log``
495option an extra service could be configure as a receiver for all the post
496that **Epazote** produce and based on the data obtained create a custom
497dashboard, something similar to: https://status.cloud.google.com/ or
498http://status.aws.amazon.com/
499
500# Issues
501Please report any problem, bug, here: https://github.com/nbari/epazote/issues
502