README.md
1#MCollective NRPE Agent
2
3Often after just doing a change on servers you want to just be sure that they’re all going to pass a certain nagios check.
4
5Say you’ve deployed some new code and restarted tomcat, there might be a time while you will experience high loads, you can very quickly determine when all your machines are back to normal using this. It can take nagios many minutes to check all your machines which is a totally unneeded delay in your deployment process.
6
7If you put your nagios checks on your servers using the common Nagios NRPE then this agent can be used to quickly check it across your entire estate.
8
9I wrote a blog post on using this plugin to aggregate checks for Nagios: [Aggregating Nagios Checks With MCollective](http://www.devco.net/archives/2010/07/03/aggregating_nagios_checks_with_mcollective.php)
10
11##Setting up NRPE
12This agent makes an assumption or two about how you set up NRPE, in your nrpe.cfg add the following:
13
14```
15include_dir=/etc/nagios/nrpe.d/
16```
17
18You should now set your commands up one file per check command, for example /etc/nagios/nrpe.d/check_load.cfg:
19
20```
21command[check_load]=/usr/lib64/nagios/plugins/check_load -w 1.5,1.5,1.5 -c 2,2,2
22```
23
24With this setup the agent will now be able to find your check_load command.
25I’ve added a Puppet define and template to help you create checks like this [on GitHub](http://github.com/puppetlabs/mcollective-plugins/tree/master/agent/nrpe/puppet/)
26
27##Agent Installation
28Follow the basic [plugin install guide](http://projects.puppetlabs.com/projects/mcollective-plugins/wiki/InstalingPlugins)
29
30##Agent Configuration
31You can set the directory where the NRPE cfg files live using plugin.nrpe.conf_dir
32
33##Usage
34###Using generic mco rpc
35You can use the normal mco rpc script to run the agent:
36
37```
38% mco rpc nrpe runcommand command=check_load
39Discovering hosts using the mongo method .... 27
40
41 * [ ============================================================> ] 27 / 27
42
43
44dev1.example.com Request Aborted
45 UNKNOWN
46 Exit Code: 3
47 Output: No such command: check_load
48 Performance Data:
49
50
51Summary of Exit Code:
52
53 OK : 26
54 UNKNOWN : 1
55 WARNING : 0
56 CRITICAL : 0
57
58
59Finished processing 27 / 27 hosts in 380.57 ms
60```
61
62###Supplied Client
63Or we provide a client specifically for this agent that is a bit more appropriate for the purpose:
64
65The client by default only shows problems:
66
67```
68% mco nrpe -W /dev_server/ check_load
69
70 * [ ============================================================> ] 19 / 19
71
72dev1.example.com status=UNKNOWN
73 No such command: check_load
74
75Summary of Exit Code:
76
77 OK : 18
78 UNKNOWN : 1
79 WARNING : 0
80 CRITICAL : 0
81
82
83Finished processing 19 / 19 hosts in 216.59 ms
84```
85
86To see all the statusses:
87
88```
89% mco nrpe -W /dev_server/ check_load -v
90Discovering hosts using the mongo method .... 3
91
92 * [ ============================================================> ] 6 / 6
93
94dev1.example.com status=UNKNOWN
95 No such command: check_load
96
97dev9.example.com status=OK
98 OK - load average: 0.00, 0.00, 0.00
99
100dev7.example.com status=OK
101 OK - load average: 0.00, 0.00, 0.00
102
103Summary of Exit Code:
104
105 OK : 2
106 UNKNOWN : 1
107 CRITICAL : 0
108 WARNING : 0
109
110
111---- check_load NRPE results ----
112 Nodes: 3 / 3
113 Pass / Fail: 2 / 1
114 Start Time: Fri Dec 14 11:21:58 +0000 2012
115 Discovery Time: 50.86ms
116 Agent Time: 212.90ms
117 Total Time: 263.76ms
118```
119
120###Data Plugin
121
122The NRPE Agent ships with a data plugin that will enable you to filter discovery on the results of NRPE commands.
123
124```
125% mco rpc rpcutil ping -S "Nrpe('check_disk1').exitcode=0"
126Discovering hosts using the mc method for 3 second(s) .... 1
127
128 * [ ============================================================> ] 1 / 1
129
130
131dev2.example.com
132 Timestamp: 1355484245
133
134
135Finished processing 1 / 1 hosts in 138.15 ms
136```
137