1:man source:   crmsh_hb_report
2:man version:  1.2
3:man manual:   Pacemaker documentation
4
5crmsh_hb_report(8)
6==================
7
8NAME
9----
10crmsh_hb_report - create report for CRM based clusters (Pacemaker)
11
12
13SYNOPSIS
14--------
15*crm report* -f {time|"cts:"testnum} [-t time] [-u user] [-l file]
16       [-n nodes] [-E files] [-p patt] [-L patt] [-e prog]
17	   [-MSDCZAQVsvhd] [dest]
18
19
20DESCRIPTION
21-----------
22The crmsh_hb_report(8) is a utility to collect all information (logs,
23configuration files, system information, etc) relevant to
24Pacemaker (CRM) over the given period of time.
25
26
27OPTIONS
28-------
29dest::
30	The report name. It can also contain a path where to put the
31	report tarball. If left out, the tarball is created in the
32	current directory named "hb_report-current_date", for instance
33	hb_report-Wed-03-Mar-2010.
34
35*-d*::
36	Don't create the compressed tar, but leave the result in a
37	directory.
38
39*-f* { time | "cts:"testnum }::
40	The start time from which to collect logs. The time is in the
41	format as used by the Date::Parse perl module. For cts tests,
42	specify the "cts:" string followed by the test number. This
43	option is required.
44
45*-t* time::
46	The end time to which to collect logs. Defaults to now.
47
48*-n* nodes::
49	A list of space separated hostnames (cluster members).
50	crm report may try to find out the set of nodes by itself, but
51	if it runs on the loghost which, as it is usually the case,
52	does not belong to the cluster, that may be difficult. Also,
53	OpenAIS doesn't contain a list of nodes and if Pacemaker is
54	not running, there is no way to find it out automatically.
55	This option is cumulative (i.e. use -n "a b" or -n a -n b).
56
57*-l* file::
58	Log file location. If, for whatever reason, crm report cannot
59	find the log files, you can specify its absolute path.
60
61*-E* files::
62	Extra log files to collect. This option is cumulative. By
63	default, /var/log/messages are collected along with the
64	cluster logs.
65
66*-M*::
67	Don't collect extra log files, but only the file containing
68	messages from the cluster subsystems.
69
70*-L* patt::
71	A list of regular expressions to match in log files for
72	analysis. This option is additive (default: "CRIT: ERROR:").
73
74*-p* patt::
75	Additional patterns to match parameter name which contain
76	sensitive information. This option is additive (default: "passw.*").
77
78*-Q*::
79	Quick run. Gathering some system information can be expensive.
80	With this option, such operations are skipped and thus
81	information collecting sped up. The operations considered
82	I/O or CPU intensive: verifying installed packages content,
83	sanitizing files for sensitive information, and producing dot
84	files from PE inputs.
85
86*-A*::
87	This is an OpenAIS cluster. `crm report` has some heuristics to
88	find the cluster stack, but that is not always reliable.
89	By default, `crm report` assumes that it is run on a Heartbeat
90	cluster.
91
92*-u* user::
93	The ssh user. `crm report` will try to login to other nodes
94	without specifying a user, then as "root", and finally as
95	"hacluster". If you have another user for administration over
96	ssh, please use this option.
97
98*-X* ssh-options::
99	Extra ssh options. These will be added to every ssh
100	invocation. Alternatively, use `$HOME/.ssh/config` to setup
101	desired ssh connection options.
102
103*-S*::
104	Single node operation. Run `crm report` only on this node and
105	don't try to start slave collectors on other members of the
106	cluster. Under normal circumstances this option is not
107	needed. Use if ssh(1) does not work to other nodes.
108
109*-Z*::
110	If the destination directory exist, remove it instead of
111	exiting (this is default for CTS).
112
113*-V*::
114	Print the version including the last repository changeset.
115
116*-v*::
117	Increase verbosity. Normally used to debug unexpected
118	behaviour.
119
120*-h*::
121	Show usage and some examples.
122
123*-D* (obsolete)::
124	Don't invoke editor to fill the description text file.
125
126*-e* prog (obsolete)::
127	Your favourite text editor. Defaults to $EDITOR, vim, vi,
128	emacs, or nano, whichever is found first.
129
130*-C* (obsolete)::
131	Remove the destination directory once the report has been put
132	in a tarball.
133
134EXAMPLES
135--------
136Last night during the backup there were several warnings
137encountered (logserver is the log host):
138
139	logserver# crm report -f 3:00 -t 4:00 -n "node1 node2" report
140
141collects everything from all nodes from 3am to 4am last night.
142The files are compressed to a tarball report.tar.bz2.
143
144Just found a problem during testing:
145
146	# note the current time
147	node1# date
148	Fri Sep 11 18:51:40 CEST 2009
149	node1# /etc/init.d/heartbeat start
150	node1# nasty-command-that-breaks-things
151	node1# sleep 120 #wait for the cluster to settle
152	node1# crm report -f 18:51 hb1
153
154	# if crm report can't figure out that this is corosync
155	node1# crm report -f 18:51 -A hb1
156
157	# if crm report can't figure out the cluster members
158	node1# crm report -f 18:51 -n "node1 node2" hb1
159
160The files are compressed to a tarball hb1.tar.bz2.
161
162INTERPRETING RESULTS
163--------------------
164The compressed tar archive is the final product of `crm report`.
165This is one example of its content, for a CTS test case on a
166three node OpenAIS cluster:
167
168	$ ls -RF 001-Restart
169
170	001-Restart:
171	analysis.txt     events.txt  logd.cf       s390vm13/  s390vm16/
172	description.txt  ha-log.txt  openais.conf  s390vm14/
173
174	001-Restart/s390vm13:
175	STOPPED  crm_verify.txt  hb_uuid.txt  openais.conf@   sysinfo.txt
176	cib.txt  dlm_dump.txt    logd.cf@     pengine/        sysstats.txt
177	cib.xml  events.txt      messages     permissions.txt
178
179	001-Restart/s390vm13/pengine:
180	pe-input-738.bz2  pe-input-740.bz2  pe-warn-450.bz2
181	pe-input-739.bz2  pe-warn-449.bz2   pe-warn-451.bz2
182
183	001-Restart/s390vm14:
184	STOPPED  crm_verify.txt  hb_uuid.txt  openais.conf@   sysstats.txt
185	cib.txt  dlm_dump.txt    logd.cf@     permissions.txt
186	cib.xml  events.txt      messages     sysinfo.txt
187
188	001-Restart/s390vm16:
189	STOPPED  crm_verify.txt  hb_uuid.txt  messages        sysinfo.txt
190	cib.txt  dlm_dump.txt    hostcache    openais.conf@   sysstats.txt
191	cib.xml  events.txt      logd.cf@     permissions.txt
192
193The top directory contains information which pertains to the
194cluster or event as a whole. Files with exactly the same content
195on all nodes will also be at the top, with per-node links created
196(as it is in this example the case with openais.conf and logd.cf).
197
198The cluster log files are named ha-log.txt regardless of the
199actual log file name on the system. If it is found on the
200loghost, then it is placed in the top directory. If not, the top
201directory ha-log.txt contains all nodes logs merged and sorted by
202time. Files named messages are excerpts of /var/log/messages from
203nodes.
204
205Most files are copied verbatim or they contain output of a
206command. For instance, cib.xml is a copy of the CIB found in
207/var/lib/heartbeat/crm/cib.xml. crm_verify.txt is output of the
208crm_verify(8) program.
209
210Some files are result of a more involved processing:
211
212	*analysis.txt*::
213	A set of log messages matching user defined patterns (may be
214	provided with the -L option).
215
216	*events.txt*::
217	A set of log messages matching event patterns. It should
218	provide information about major cluster motions without
219	unnecessary details.  These patterns are devised by the
220	cluster experts.  Currently, the patterns cover membership
221	and quorum changes, resource starts and stops, fencing
222	(stonith) actions, and cluster starts and stops. events.txt
223	is always generated for each node. In case the central
224	cluster log was found, also combined for all nodes.
225
226	*permissions.txt*::
227	One of the more common problem causes are file and directory
228	permissions. `crm report` looks for a set of predefined
229	directories and checks their permissions. Any issues are
230	reported here.
231
232	*backtraces.txt*::
233	gdb generated backtrace information for cores dumped
234	within the specified period.
235
236	*sysinfo.txt*::
237	Various release information about the platform, kernel,
238	operating system, packages, and anything else deemed to be
239	relevant. The static part of the system.
240
241	*sysstats.txt*::
242	Output of various system commands such as ps(1), uptime(1),
243	netstat(8), and ip(8). The dynamic part of the system.
244
245description.txt should contain a user supplied description of the
246problem, but since it is very seldom used, it will be dropped
247from the future releases.
248
249PREREQUISITES
250-------------
251
252ssh::
253	It is not strictly required, but you won't regret having a
254	password-less ssh. It is not too difficult to setup and will save
255	you a lot of time. If you can't have it, for example because your
256	security policy does not allow such a thing, or you just prefer
257	menial work, then you will have to resort to the semi-manual
258	semi-automated report generation. See below for instructions.
259	+
260	If you need to supply a password for your passphrase/login, then
261	always use the `-u` option.
262	+
263	For extra ssh(1) options, if you're too lazy to setup
264	$HOME/.ssh/config, use the `-X` option. Do not forget to put
265	the options in quotes.
266
267sudo::
268	If the ssh user (as specified with the `-u` option) is other
269	than `root`, then `crm report` uses `sudo` to collect the
270	information which is readable only by the `root` user. In that
271	case it is required to setup the `sudoers` file properly. The
272	user (or group to which the user belongs) should have the
273	following line:
274	+
275	<user> ALL = NOPASSWD: /usr/sbin/crm
276	+
277	See the `sudoers(5)` man page for more details.
278
279Times::
280	In order to find files and messages in the given period and to
281	parse the `-f` and `-t` options, `crm report` uses perl and one of the
282	`Date::Parse` or `Date::Manip` perl modules. Note that you need
283	only one of these. Furthermore, on nodes which have no logs and
284	where you don't run `crm report` directly, no date parsing is
285	necessary. In other words, if you run this on a loghost then you
286	don't need these perl modules on the cluster nodes.
287	+
288	On rpm based distributions, you can find `Date::Parse` in
289	`perl-TimeDate` and on Debian and its derivatives in
290	`libtimedate-perl`.
291
292Core dumps::
293	To backtrace core dumps gdb is needed and the packages with
294	the debugging info. The debug info packages may be installed
295	at the time the report is created. Let's hope that you will
296	need this really seldom.
297
298TIMES
299-----
300
301Specifying times can at times be a nuisance. That is why we have
302chosen to use one of the perl modules--they do allow certain
303freedom when talking dates. You can either read the instructions
304at the
305http://search.cpan.org/dist/TimeDate/lib/Date/Parse.pm#EXAMPLE_DATES[Date::Parse
306examples page].
307or just rely on common sense and try stuff like:
308
309	3:00          (today at 3am)
310	15:00         (today at 3pm)
311	2007/9/1 2pm  (September 1st at 2pm)
312	Tue Sep 15 20:46:27 CEST 2009 (September 15th etc)
313
314`crm report` will (probably) complain if it can't figure out what do
315you mean.
316
317Try to delimit the event as close as possible in order to reduce
318the size of the report, but still leaving a minute or two around
319for good measure.
320
321`-f` is not optional. And don't forget to quote dates when they
322contain spaces.
323
324
325Should I send all this to the rest of Internet?
326-----------------------------------------------
327
328By default, the sensitive data in CIB and PE files is not mangled
329by `crm report` because that makes PE input files mostly useless.
330If you still have no other option but to send the report to a
331public mailing list and do not want the sensitive data to be
332included, use the `-s` option. Without this option, `crm report`
333will issue a warning if it finds information which should not be
334exposed. By default, parameters matching 'passw.*' are considered
335sensitive.  Use the `-p` option to specify additional regular
336expressions to match variable names which may contain information
337you don't want to leak. For example:
338
339	# crm report -f 18:00 -p "user.*" -p "secret.*" /var/tmp/report
340
341Heartbeat's ha.cf is always sanitized. Logs and other files are
342not filtered.
343
344LOGS
345----
346
347It may be tricky to find syslog logs. The scheme used is to log a
348unique message on all nodes and then look it up in the usual
349syslog locations. This procedure is not foolproof, in particular
350if the syslog files are in a non-standard directory. We look in
351/var/log /var/logs /var/syslog /var/adm /var/log/ha
352/var/log/cluster. In case we can't find the logs, please supply
353their location:
354
355	# crm report -f 5pm -l /var/log/cluster1/ha-log -S /tmp/report_node1
356
357If you have different log locations on different nodes, well,
358perhaps you'd like to make them the same and make life easier for
359everybody.
360
361Files starting with "ha-" are preferred. In case syslog sends
362messages to more than one file, if one of them is named ha-log or
363ha-debug those will be favoured over syslog or messages.
364
365`crm report` supports also archived logs in case the period
366specified extends that far in the past. The archives must reside
367in the same directory as the current log and their names must
368be prefixed with the name of the current log (syslog-1.gz or
369messages-20090105.bz2).
370
371If there is no separate log for the cluster, possibly unrelated
372messages from other programs are included. We don't filter logs,
373but just pick a segment for the period you specified.
374
375MANUAL REPORT COLLECTION
376------------------------
377
378So, your ssh doesn't work. In that case, you will have to run
379this procedure on all nodes. Use `-S` so that `crm report` doesn't
380bother with ssh:
381
382	# crm report -f 5:20pm -t 5:30pm -S /tmp/report_node1
383
384If you also have a log host which is not in the cluster, then
385you'll have to copy the log to one of the nodes and tell us where
386it is:
387
388	# crm report -f 5:20pm -t 5:30pm -l /var/tmp/ha-log -S /tmp/report_node1
389
390OPERATION
391---------
392`crm report` collects files and other information in a fairly
393straightforward way. The most complex tasks are discovering the
394log file locations (if syslog is used which is the most common
395case) and coordinating the operation on multiple nodes.
396
397The instance of `crm report` running on the host where it was
398invoked is the master instance. Instances running on other nodes
399are slave instances. The master instance communicates with slave
400instances by ssh. There are multiple ssh invocations per run, so
401it is essential that the ssh works without password, i.e. with
402the public key authentication and authorized_keys.
403
404The operation consists of three phases. Each phase must finish
405on all nodes before the next one can commence. The first phase
406consists of logging unique messages through syslog on all nodes.
407This is the shortest of all phases.
408
409The second phase is the most involved. During this phase all
410local information is collected, which includes:
411
412- logs (both current and archived if the start time is far in the past)
413- various configuration files (corosync, heartbeat, logd)
414- the CIB (both as xml and as represented by the crm shell)
415- pengine inputs (if this node was the DC at any point in
416  time over the given period)
417- system information and status
418- package information and status
419- dlm lock information
420- backtraces (if there were core dumps)
421
422The third phase is collecting information from all nodes and
423analyzing it. The analyzis consists of the following tasks:
424
425- identify files equal on all nodes which may then be moved to
426  the top directory
427- save log messages matching user defined patterns
428  (defaults to ERRORs and CRITical conditions)
429- report if there were coredumps and by whom
430- report crm_verify(8) results
431- save log messages matching major events to events.txt
432- in case logging is configured without loghost, node logs and
433  events files are combined using a perl utility
434
435
436BUGS
437----
438Finding logs may at times be extremely difficult, depending on
439how weird the syslog configuration. It would be nice to ask
440syslog-ng developers to provide a way to find out the log
441destination based on facility and priority.
442
443If you think you found a bug, please rerun with the -v option and
444attach the output to bugzilla.
445
446`crm report` can function in a satisfactory way only if ssh works to
447all nodes using authorized_keys (without password).
448
449There are way too many options.
450
451
452AUTHOR
453------
454Written by Dejan Muhamedagic, <dejan@suse.de>
455
456
457RESOURCES
458---------
459ClusterLabs: <http://clusterlabs.org/>
460
461Heartbeat and other Linux HA resources: <http://linux-ha.org/wiki>
462
463OpenAIS: <http://www.openais.org/>
464
465Corosync: <http://www.corosync.org/>
466
467
468SEE ALSO
469--------
470crm(8), Date::Parse(3)
471
472
473COPYING
474-------
475Copyright \(C) 2007-2009 Dejan Muhamedagic. Free use of this
476software is granted under the terms of the GNU General Public License (GPL).
477
478