1[![Download](https://api.bintray.com/packages/nbari/epazote/epazote/images/download.svg)](https://bintray.com/nbari/epazote/epazote/_latestVersion) 2[![Build Status](https://travis-ci.org/epazote/epazote.svg?branch=master)](https://travis-ci.org/epazote/epazote) 3[![Coverage Status](https://coveralls.io/repos/github/epazote/epazote/badge.svg?branch=develop)](https://coveralls.io/github/epazote/epazote?branch=develop) 4[![Go Report Card](https://goreportcard.com/badge/github.com/epazote/epazote)](https://goreportcard.com/report/github.com/epazote/epazote) 5 6# Epazote 7Automated HTTP (microservices) supervisor 8 9**Epazote** automatically update/add services specified in a file call 10``epazote.yml``. Periodically checks the defined endpoints and execute recovery 11commands in case services responses are not behaving like expected helping with 12this to automate actions in order to keep services/applications up and running. 13 14In Continuous Integration/Deployment environments the file ``epazote.yml`` can 15dynamically be updated/change without need to restart the supervisor, avoiding 16with this an extra dependency on the deployment flow which could imply to 17restart the supervisor, in this case **Epazote**. 18 19## How it works 20In its basic way of operation, **Epazote** periodically checks the services endpoints 21"[URLs](https://en.wikipedia.org/wiki/Uniform_Resource_Locator)" 22by doing an [HTTP GET Request](https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol#Request_methods), 23based on the response [Status code](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes), 24[Headers](https://en.wikipedia.org/wiki/List_of_HTTP_header_fields) or 25either the 26[body](https://en.wikipedia.org/wiki/HTTP_message_body), it executes a command. 27 28In most scenarios, is desired to apply a command directly to the application in 29cause, like a signal (``kill -HUP``), or either a restart (``sv restart app``), 30therefore in this case **Epazote** and the application should be running on the 31same server. 32 33**Epazote** can also work in a standalone mode by only monitoring and sending 34alerts if desired. 35 36# How to use it 37First you need to install **Epazote**, either you can compile it from 38[source](https://github.com/nbari/epazote) 39or download a pre-compiled binary matching your operating system from here: 40https://dl.bintray.com/nbari/epazote/ 41 42 [![Download](https://api.bintray.com/packages/nbari/epazote/epazote/images/download.svg)](https://bintray.com/nbari/epazote/epazote/_latestVersion) 43 44> To compile from source, after downloading the sources use ``make`` to build the binary 45 46**Epazote** was designed with simplicity in mind, as an easy tool for 47[DevOps](https://en.wikipedia.org/wiki/DevOps) and as a complement to 48infrastructure orchestration tools like [Ansible](http://www.ansible.com/) and 49[SaltStack](http://saltstack.com/), because of this [YAML](http://www.yaml.org/) 50is used for the configuration files, avoiding with this, the learn of a new 51language or syntax and simplifying the setup. 52 53## Basic example 54 55```yaml 56services: 57 google: 58 url: https://www.google.com 59 seconds: 5 60 expect: 61 status: 302 62 ssl: 63 hours: 72 64 if_not: 65 cmd: echo -n "google down" 66``` 67 68To supervise ``google`` you would run (basic.yml is a file containing the above code): 69 70 $ epazote -f /path/to/yaml/file/basic.yml -d 71 72> -d is for debugging, will print all output to standard output. 73 74This basic setup will supervise every 5 seconds the service with name 75``google``, it will do an HTTP GET to ``http://www.google.com`` and will expect 76an ``302 Status code`` if not, it will ``echo -n "google down"`` 77 78The ``ssl: hours: 72`` means to send an alert if the certificate is about to 79expire in the next 72 hours. 80 81Extending the basic example for receiving notifications: 82 83```yaml 84config: 85 smtp: 86 username: smtp@domain.tld 87 password: password 88 server: mail.example.com 89 port: 587 90 headers: 91 from: you@domain.tld 92 to: team@domain.tld 93 subject: "[name - exit- status]" 94 95services: 96 google: 97 url: http://www.google.com 98 minutes: 3 99 expect: 100 status: 200 101 if_not: 102 cmd: echo -n "google down" 103 notify: yes 104``` 105 106In this case, every 3 minutes the service will be checked and in case of not 107receiving a ``200 Status code``, besides executing the command: ``echo -n 108"google down"`` an email is going to be send to ``team@domain.tld``, this 109because of the ``notify: yes`` setting. 110 111## The configuration file 112 113The configuration file ([YAML formated](https://en.wikipedia.org/wiki/YAML)) 114consists of two parts, a **config** and a **services** (Key-value pairs). 115 116## The config section 117 118The **config** section is composed of: 119 120 - smtp (Email settings for sending notification) 121 - scan (Paths used to find the file 'epazote.yml') 122 123Example: 124 125```yaml 126config: 127 smtp: 128 username: smtp@domain.tld 129 password: password 130 server: mail.example.com 131 port: 587 132 headers: 133 from: epazote@domain.tld 134 to: team@domain.tld ops@domain.tld etc@domain.tld 135 subject: "[_name_, _because_]" 136 scan: 137 paths: 138 - /arena/home/sites 139 - /home/apps 140 minutes: 5 141``` 142 143### config - smtp 144 145Required to properly send alerts via email, all fields are required, the 146``headers`` section can be extended with any desired key-pair values. 147 148### config - smtp - subject 149The subject can be formed by using this keywords: ``_because_`` ``_exit_`` 150``_name_`` ``_output_`` ``_status_`` ``_url_`` on the previous example, 151``subject: [_name_, _status_]`` would transform to ``[my service - 500]`` 152the ``name`` has replaced by the service name, ``my service`` and 153``status`` by the response status code ``500`` in this case. 154 155### config - scan 156 157Paths to scan every N ``seconds``, ``minutes`` or ``hours``, a search for 158services specified in a file call ``epazote.yml`` is made. 159 160The **scan** setting is optional however is very useful when doing Continues 161Deployments. for example if your code is automatically uploaded to the 162directory ``/arena/home/sites/application_1`` and your scan paths contain 163``/arena/home/sites``, you could simple upload on your application directory a 164file named ``epazote.yml`` with the service rules, thus achieving the deployment 165of your application and the supervising at the same time. 166 167### config (optional) 168 169As you may notice the ``config`` section contains mainly settings for sending 170alerts/notifications apart from the ``scan`` setting, therefore is totally 171optional, meaning that **Epazote** can still run and check your services without 172the need of the ``config`` section. 173 174If you want to automatically update/load services you will need the 175``config - scan`` setting. 176 177 178## The services section 179 180Services are the main functionality of **Epazote**, is where the URL's and the 181rules based on the response are defined, since options vary from service to 182service, an example could help better to understand the setup: 183 184```yaml 185services: 186 my service 1: 187 url: http://myservice.domain.tld/_healthcheck_ 188 timeout: 5 189 seconds: 60 190 log: http://monitor.domain.tld 191 expect: 192 status: 200 193 header: 194 content-type: application/json 195 body: find this string on my site 196 if_not: 197 cmd: sv restart /services/my_service_1 198 notify: team@domain.tld 199 msg: | 200 line 1 bla bla 201 line 2 202 if_status: 203 500: 204 cmd: reboot 205 404: 206 cmd: sv restart /services/cache 207 msg: restarting cache 208 notify: team@domain.tld x@domain.tld 209 if_header: 210 x-amqp-kapputt: 211 cmd: restart abc 212 notify: bunny@domain.tld 213 msg: | 214 The rabbit is angry 215 & hungry 216 x-db-kapputt: 217 cmd: svc restart /services/db 218 219 other service: 220 url: https://self-signed.ssl.tld/ping 221 header: 222 Origin: http://localhost 223 Accept-Encoding: gzip 224 insecure: true 225 minutes: 3 226 227 redirect service: 228 url: http://test.domain.tld/ 229 follow: yes 230 hour: 1 231 expect: 232 status: 302 233 if_not: 234 cmd: service restart abc 235 notify: yes 236 emoji: 1F600-1F621 237 238 salt-master: 239 test: pgrep -f salt 240 if_not: 241 cmd: service restart salt_master 242 notify: operations@domain.tld 243``` 244 245### services - name of service (string) 246An unique string that identifies your service, in the above example, there are 3 247services named: 248 - my service 1 249 - other service 250 - redirect service 251 252### services - url (string) 253URL of the service to supervise 254 255### services - follow (boolean true/false) 256By default if a [302 Status code](https://en.wikipedia.org/wiki/HTTP_302) is 257received, **Epazote** will not follow it, if you would like to follow all 258redirects, this setting must be set to **true**. 259 260### services - insecure (boolean true/false) 261This option explicitly allows **Epazote** to perform "insecure" SSL connections. 262It will disable the certificate verification. 263 264### services - stop (int) 265Defines the number or times the ``cmd`` will be executed, by default the ``cmd`` 266is executed only once, with the intention to avoid indefinitely loops. If value 267is set to ``-1`` the ``cmd`` never stops. defaults to 0, ``stop 2`` will execute 268"0, 1, 2" (3 times) the ``cmd``. 269 270### services - timeout in seconds (int) 271Timeout specifies a time limit for the HTTP requests, A value of zero means no 272timeout, defaults to 5 seconds. 273 274### services - retry_limit (int) 275Specifies the number of times to retry an request, defaults to 3. 276 277### services - retry_interval (int) 278Specifies the time between attempts in milliseconds. The default value is 500 (0.5 seconds). 279 280### services - read_limit (int) 281Read only ``N`` number of bytes instead of the full 282body. This helps to make a more "complete" request and 283avoid getting an HTTP status code [408 when testing aws ELB](http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/ts-el b-error-message.html#ts-elb-errorcodes-http408). 284 285### services - seconds, minutes, hours 286How often to check the service, the options are: (Only one should be used) 287 - seconds N 288 - minutes N 289 - hours N 290 291``N`` should be an integer. 292 293### services - log (URL) 294An URL to post all events, default disabled. 295 296### services - expect 297The ``expect`` block options are: 298- status (int) 299- header (key, value) 300- body (regular expression) 301- if_not (Action block) 302 303### services - expect - status 304An Integer representing the expected [HTTP Status Code](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes) 305 306### services - expect - header (start_with match) 307A key-value map of expected headers, it can be only one or more. 308 309The headers will be considered valid if they starts with the required value, 310for example if you want to check for ``Content-type: application/json; charset=utf-8`` 311you can simple do something like: 312 313```yaml 314 header: 315 Content-Type: application/json 316``` 317 318This helps to simplify the matching and useful in cases where the headers 319changes, for example: ``Content-Range: bytes 100-64656926/64656927`` can be 320matched with: 321 322```yaml 323 header: 324 Content-Range: bytes 325``` 326 327### services - expect - body 328A [regular expression](https://en.wikipedia.org/wiki/Regular_expression) used 329to match a string on the body of the site, use full in cases you want to ensure 330that the content delivered is always the same or keeps a pattern. 331 332### services - expect (How it works) 333The ``expect`` logic tries to implement a 334[if-else](https://en.wikipedia.org/wiki/if_else) logic ``status``, ``header``, 335``body`` are the **if** and the ``if_not`` block becomes the **else**. 336 337 if 338 status 339 header 340 body 341 else: 342 if_not 343 344In must cases only one option is required, check on the above example for the service named "redirect service". 345 346In case that more than one option is used, this is the order in how they are evaluated, no meter how they where introduced on the configuration file: 347 348 1. body 349 2. status 350 3. header 351 352The reason for this order is related to performance, at the end we want to 353monitor/supervise the services in an efficient way avoiding to waste extra 354resources, in must cases only the HTTP Headers are enough to take an action, 355therefore we don't need to read the full body page, because of this if no 356``body`` is defined, **Epazote** will only read the Headers saving with this 357time and process time. 358 359### services - expect - if_not 360``if_not`` is a block with an action of what to do it we don't get what we where 361expecting (``expect``). See services - Actions 362 363### services - if_status & if_header 364There maybe cases in where third-party dependencies are down and because of this 365your application could not be working properly, for this cases the ``if_status`` 366and ``if_header`` could be useful. 367 368For example if the database is your application could start responding an status 369code 500 or either a custom header and based on does values take execute an 370action: 371 372The format for ``if_status`` is a key-pair where key is an int representing an 373HTTP status code, and the value an Action option 374 375The format for ``if_header`` is a key-pair where key is a string of something 376you could relate/match and has in other if_X conditions, value is an Action. 377 378This are the only ``if's`` and the order of execution: 379 1. if_status 380 2. if_header 381 3. if_not 382 383This means that if a service uses ``if_status`` and ``if_not``, it will 384evaluate first the ``if_status`` and execute an Action if required, in case 385an ``if_status`` and ``if_header`` are set, same applies, first is evaluated 386``if_status``, then ``if_header`` and last ``if_not``. 387 388## services - Actions 389An Action has five options: 390 - cmd 391 - notify 392 - msg 393 - emoji 394 - http 395 396They can be used all together, only one or either none. 397 398### services - Actions - cmd (string) 399``cmd`` Contains the command to be executed. 400 401### services - Actions - notify (string) 402``notify`` Should contain ``yes``, the email email address or addresses (space separated) 403of the recipients that will be notified when the action is executed. 404 405If the string is ``yes`` the global recipients will be used. 406 407### services - Actions - msg (list) 408```yaml 409msg: 410 - send this if exit 0 (all OK) 411 - send this if exit 1 (something is wrong) 412``` 413Based on the exit status either msg[0] or msg[1] is used, 414 415### services - Actions - emoji (list) 416``emoji`` [Unicode](https://en.wikipedia.org/wiki/Unicode) characters 417to be used in the subject, example: 418```yaml 419emoji: 420 - 1F600 421 - 1F621 422``` 423If services are OK they will use the first ``1F600`` if not they will 424use ``1F621``, if set to ``0`` no emoji will be used. The idea behind using 425[unicode/emoji](https://en.wikipedia.org/wiki/Emoticons_(Unicode_block)) 426is to cough attention faster and not just ignore the email thinking is spam. 427 428### service - Actions - http (list(key, value)) 429A custom URL to GET/POST depending on the exit status, example: 430```yaml 431http: 432 - url: "https://api.hipchat.com/v1/rooms/message?auth_token=your_token&room_id=7&from=Alerts&message=service+OK+_name_+_because_" 433 - url: "https://api.hipchat.com/" 434 header: 435 Content-Type: application/x-www-form-urlencoded 436 data: | 437 room_id=10&from=Alerts&message=_name_+exit+code+_exit_ 438 method: POST 439``` 440When a service fails or returns an exit 1 the second url 441``https://api.hipchat.com/`` with method ``POST`` and the custom ``data`` 442will be used, notice that all the ocurances on the data that are within an 443``_(key)_`` will be replaced with the corresponding value, in this case: 444 445 room_id=10&from=Alerts&message=_name_+exit+code+_exit_ 446 447will be replaced with: 448 449 room_id=10&from=Alerts&message=SERVICE NAME+exit+code+0 450 451When recovery the first url will be used, in this case will be a GET instead of a post, so: 452 453 https://api.hipchat.com/v1/rooms/message?auth_token=your_token&room_id=7&from=Alerts&message=service+OK+_name_+_because_ 454 455becomes: 456 457 https://api.hipchat.com/v1/rooms/message?auth_token=your_token&room_id=7&from=Alerts&message=service+OK+SERVICE+NAME+STATUS+200 458 459> notice that the _name_, _exit_, _because_ are been replaced with the values of name, exit, because of the service. 460 461 462## services - Test 463**Epazote** It is mainly used for HTTP services, for supervising other 464applications that don't listen or accept HTTP connections, like a database, 465cache engine, etc. There are tools like 466[daemontools](https://cr.yp.to/daemontools.html), 467[runit](http://smarden.org/runit/) as already mentioned, even so, **Epazote** 468can eventually be used to execute an action based on the exit of a command 469for example: 470 471```yaml 472 salt-master: 473 test: pgrep -f salt 474 if_not: 475 cmd: service restart salt_master 476 notify: operations@domain.tld 477``` 478 479In this case: ``test: pgrep -f salt`` will execute the ``cmd`` on the ``if_not`` 480block in case the exit code is > 0, from the ``pgrep`` man page: 481 482```txt 483EXIT STATUS 484 The pgrep and pkill utilities return one of the following values upon exit: 485 486 0 One or more processes were matched. 487 1 No processes were matched. 488 2 Invalid options were specified on the command line. 489 3 An internal error occurred. 490``` 491 492 493## Extra setup 494*green dots give some comfort* -- Because of this when using the ``log`` 495option an extra service could be configure as a receiver for all the post 496that **Epazote** produce and based on the data obtained create a custom 497dashboard, something similar to: https://status.cloud.google.com/ or 498http://status.aws.amazon.com/ 499 500# Issues 501Please report any problem, bug, here: https://github.com/nbari/epazote/issues 502