1:man source: crmsh_hb_report 2:man version: 1.2 3:man manual: Pacemaker documentation 4 5crmsh_hb_report(8) 6================== 7 8NAME 9---- 10crmsh_hb_report - create report for CRM based clusters (Pacemaker) 11 12 13SYNOPSIS 14-------- 15*crm report* -f {time|"cts:"testnum} [-t time] [-u user] [-l file] 16 [-n nodes] [-E files] [-p patt] [-L patt] [-e prog] 17 [-MSDCZAQVsvhd] [dest] 18 19 20DESCRIPTION 21----------- 22The crmsh_hb_report(8) is a utility to collect all information (logs, 23configuration files, system information, etc) relevant to 24Pacemaker (CRM) over the given period of time. 25 26 27OPTIONS 28------- 29dest:: 30 The report name. It can also contain a path where to put the 31 report tarball. If left out, the tarball is created in the 32 current directory named "hb_report-current_date", for instance 33 hb_report-Wed-03-Mar-2010. 34 35*-d*:: 36 Don't create the compressed tar, but leave the result in a 37 directory. 38 39*-f* { time | "cts:"testnum }:: 40 The start time from which to collect logs. The time is in the 41 format as used by the Date::Parse perl module. For cts tests, 42 specify the "cts:" string followed by the test number. This 43 option is required. 44 45*-t* time:: 46 The end time to which to collect logs. Defaults to now. 47 48*-n* nodes:: 49 A list of space separated hostnames (cluster members). 50 crm report may try to find out the set of nodes by itself, but 51 if it runs on the loghost which, as it is usually the case, 52 does not belong to the cluster, that may be difficult. Also, 53 OpenAIS doesn't contain a list of nodes and if Pacemaker is 54 not running, there is no way to find it out automatically. 55 This option is cumulative (i.e. use -n "a b" or -n a -n b). 56 57*-l* file:: 58 Log file location. If, for whatever reason, crm report cannot 59 find the log files, you can specify its absolute path. 60 61*-E* files:: 62 Extra log files to collect. This option is cumulative. By 63 default, /var/log/messages are collected along with the 64 cluster logs. 65 66*-M*:: 67 Don't collect extra log files, but only the file containing 68 messages from the cluster subsystems. 69 70*-L* patt:: 71 A list of regular expressions to match in log files for 72 analysis. This option is additive (default: "CRIT: ERROR:"). 73 74*-p* patt:: 75 Additional patterns to match parameter name which contain 76 sensitive information. This option is additive (default: "passw.*"). 77 78*-Q*:: 79 Quick run. Gathering some system information can be expensive. 80 With this option, such operations are skipped and thus 81 information collecting sped up. The operations considered 82 I/O or CPU intensive: verifying installed packages content, 83 sanitizing files for sensitive information, and producing dot 84 files from PE inputs. 85 86*-A*:: 87 This is an OpenAIS cluster. `crm report` has some heuristics to 88 find the cluster stack, but that is not always reliable. 89 By default, `crm report` assumes that it is run on a Heartbeat 90 cluster. 91 92*-u* user:: 93 The ssh user. `crm report` will try to login to other nodes 94 without specifying a user, then as "root", and finally as 95 "hacluster". If you have another user for administration over 96 ssh, please use this option. 97 98*-X* ssh-options:: 99 Extra ssh options. These will be added to every ssh 100 invocation. Alternatively, use `$HOME/.ssh/config` to setup 101 desired ssh connection options. 102 103*-S*:: 104 Single node operation. Run `crm report` only on this node and 105 don't try to start slave collectors on other members of the 106 cluster. Under normal circumstances this option is not 107 needed. Use if ssh(1) does not work to other nodes. 108 109*-Z*:: 110 If the destination directory exist, remove it instead of 111 exiting (this is default for CTS). 112 113*-V*:: 114 Print the version including the last repository changeset. 115 116*-v*:: 117 Increase verbosity. Normally used to debug unexpected 118 behaviour. 119 120*-h*:: 121 Show usage and some examples. 122 123*-D* (obsolete):: 124 Don't invoke editor to fill the description text file. 125 126*-e* prog (obsolete):: 127 Your favourite text editor. Defaults to $EDITOR, vim, vi, 128 emacs, or nano, whichever is found first. 129 130*-C* (obsolete):: 131 Remove the destination directory once the report has been put 132 in a tarball. 133 134EXAMPLES 135-------- 136Last night during the backup there were several warnings 137encountered (logserver is the log host): 138 139 logserver# crm report -f 3:00 -t 4:00 -n "node1 node2" report 140 141collects everything from all nodes from 3am to 4am last night. 142The files are compressed to a tarball report.tar.bz2. 143 144Just found a problem during testing: 145 146 # note the current time 147 node1# date 148 Fri Sep 11 18:51:40 CEST 2009 149 node1# /etc/init.d/heartbeat start 150 node1# nasty-command-that-breaks-things 151 node1# sleep 120 #wait for the cluster to settle 152 node1# crm report -f 18:51 hb1 153 154 # if crm report can't figure out that this is corosync 155 node1# crm report -f 18:51 -A hb1 156 157 # if crm report can't figure out the cluster members 158 node1# crm report -f 18:51 -n "node1 node2" hb1 159 160The files are compressed to a tarball hb1.tar.bz2. 161 162INTERPRETING RESULTS 163-------------------- 164The compressed tar archive is the final product of `crm report`. 165This is one example of its content, for a CTS test case on a 166three node OpenAIS cluster: 167 168 $ ls -RF 001-Restart 169 170 001-Restart: 171 analysis.txt events.txt logd.cf s390vm13/ s390vm16/ 172 description.txt ha-log.txt openais.conf s390vm14/ 173 174 001-Restart/s390vm13: 175 STOPPED crm_verify.txt hb_uuid.txt openais.conf@ sysinfo.txt 176 cib.txt dlm_dump.txt logd.cf@ pengine/ sysstats.txt 177 cib.xml events.txt messages permissions.txt 178 179 001-Restart/s390vm13/pengine: 180 pe-input-738.bz2 pe-input-740.bz2 pe-warn-450.bz2 181 pe-input-739.bz2 pe-warn-449.bz2 pe-warn-451.bz2 182 183 001-Restart/s390vm14: 184 STOPPED crm_verify.txt hb_uuid.txt openais.conf@ sysstats.txt 185 cib.txt dlm_dump.txt logd.cf@ permissions.txt 186 cib.xml events.txt messages sysinfo.txt 187 188 001-Restart/s390vm16: 189 STOPPED crm_verify.txt hb_uuid.txt messages sysinfo.txt 190 cib.txt dlm_dump.txt hostcache openais.conf@ sysstats.txt 191 cib.xml events.txt logd.cf@ permissions.txt 192 193The top directory contains information which pertains to the 194cluster or event as a whole. Files with exactly the same content 195on all nodes will also be at the top, with per-node links created 196(as it is in this example the case with openais.conf and logd.cf). 197 198The cluster log files are named ha-log.txt regardless of the 199actual log file name on the system. If it is found on the 200loghost, then it is placed in the top directory. If not, the top 201directory ha-log.txt contains all nodes logs merged and sorted by 202time. Files named messages are excerpts of /var/log/messages from 203nodes. 204 205Most files are copied verbatim or they contain output of a 206command. For instance, cib.xml is a copy of the CIB found in 207/var/lib/heartbeat/crm/cib.xml. crm_verify.txt is output of the 208crm_verify(8) program. 209 210Some files are result of a more involved processing: 211 212 *analysis.txt*:: 213 A set of log messages matching user defined patterns (may be 214 provided with the -L option). 215 216 *events.txt*:: 217 A set of log messages matching event patterns. It should 218 provide information about major cluster motions without 219 unnecessary details. These patterns are devised by the 220 cluster experts. Currently, the patterns cover membership 221 and quorum changes, resource starts and stops, fencing 222 (stonith) actions, and cluster starts and stops. events.txt 223 is always generated for each node. In case the central 224 cluster log was found, also combined for all nodes. 225 226 *permissions.txt*:: 227 One of the more common problem causes are file and directory 228 permissions. `crm report` looks for a set of predefined 229 directories and checks their permissions. Any issues are 230 reported here. 231 232 *backtraces.txt*:: 233 gdb generated backtrace information for cores dumped 234 within the specified period. 235 236 *sysinfo.txt*:: 237 Various release information about the platform, kernel, 238 operating system, packages, and anything else deemed to be 239 relevant. The static part of the system. 240 241 *sysstats.txt*:: 242 Output of various system commands such as ps(1), uptime(1), 243 netstat(8), and ip(8). The dynamic part of the system. 244 245description.txt should contain a user supplied description of the 246problem, but since it is very seldom used, it will be dropped 247from the future releases. 248 249PREREQUISITES 250------------- 251 252ssh:: 253 It is not strictly required, but you won't regret having a 254 password-less ssh. It is not too difficult to setup and will save 255 you a lot of time. If you can't have it, for example because your 256 security policy does not allow such a thing, or you just prefer 257 menial work, then you will have to resort to the semi-manual 258 semi-automated report generation. See below for instructions. 259 + 260 If you need to supply a password for your passphrase/login, then 261 always use the `-u` option. 262 + 263 For extra ssh(1) options, if you're too lazy to setup 264 $HOME/.ssh/config, use the `-X` option. Do not forget to put 265 the options in quotes. 266 267sudo:: 268 If the ssh user (as specified with the `-u` option) is other 269 than `root`, then `crm report` uses `sudo` to collect the 270 information which is readable only by the `root` user. In that 271 case it is required to setup the `sudoers` file properly. The 272 user (or group to which the user belongs) should have the 273 following line: 274 + 275 <user> ALL = NOPASSWD: /usr/sbin/crm 276 + 277 See the `sudoers(5)` man page for more details. 278 279Times:: 280 In order to find files and messages in the given period and to 281 parse the `-f` and `-t` options, `crm report` uses perl and one of the 282 `Date::Parse` or `Date::Manip` perl modules. Note that you need 283 only one of these. Furthermore, on nodes which have no logs and 284 where you don't run `crm report` directly, no date parsing is 285 necessary. In other words, if you run this on a loghost then you 286 don't need these perl modules on the cluster nodes. 287 + 288 On rpm based distributions, you can find `Date::Parse` in 289 `perl-TimeDate` and on Debian and its derivatives in 290 `libtimedate-perl`. 291 292Core dumps:: 293 To backtrace core dumps gdb is needed and the packages with 294 the debugging info. The debug info packages may be installed 295 at the time the report is created. Let's hope that you will 296 need this really seldom. 297 298TIMES 299----- 300 301Specifying times can at times be a nuisance. That is why we have 302chosen to use one of the perl modules--they do allow certain 303freedom when talking dates. You can either read the instructions 304at the 305http://search.cpan.org/dist/TimeDate/lib/Date/Parse.pm#EXAMPLE_DATES[Date::Parse 306examples page]. 307or just rely on common sense and try stuff like: 308 309 3:00 (today at 3am) 310 15:00 (today at 3pm) 311 2007/9/1 2pm (September 1st at 2pm) 312 Tue Sep 15 20:46:27 CEST 2009 (September 15th etc) 313 314`crm report` will (probably) complain if it can't figure out what do 315you mean. 316 317Try to delimit the event as close as possible in order to reduce 318the size of the report, but still leaving a minute or two around 319for good measure. 320 321`-f` is not optional. And don't forget to quote dates when they 322contain spaces. 323 324 325Should I send all this to the rest of Internet? 326----------------------------------------------- 327 328By default, the sensitive data in CIB and PE files is not mangled 329by `crm report` because that makes PE input files mostly useless. 330If you still have no other option but to send the report to a 331public mailing list and do not want the sensitive data to be 332included, use the `-s` option. Without this option, `crm report` 333will issue a warning if it finds information which should not be 334exposed. By default, parameters matching 'passw.*' are considered 335sensitive. Use the `-p` option to specify additional regular 336expressions to match variable names which may contain information 337you don't want to leak. For example: 338 339 # crm report -f 18:00 -p "user.*" -p "secret.*" /var/tmp/report 340 341Heartbeat's ha.cf is always sanitized. Logs and other files are 342not filtered. 343 344LOGS 345---- 346 347It may be tricky to find syslog logs. The scheme used is to log a 348unique message on all nodes and then look it up in the usual 349syslog locations. This procedure is not foolproof, in particular 350if the syslog files are in a non-standard directory. We look in 351/var/log /var/logs /var/syslog /var/adm /var/log/ha 352/var/log/cluster. In case we can't find the logs, please supply 353their location: 354 355 # crm report -f 5pm -l /var/log/cluster1/ha-log -S /tmp/report_node1 356 357If you have different log locations on different nodes, well, 358perhaps you'd like to make them the same and make life easier for 359everybody. 360 361Files starting with "ha-" are preferred. In case syslog sends 362messages to more than one file, if one of them is named ha-log or 363ha-debug those will be favoured over syslog or messages. 364 365`crm report` supports also archived logs in case the period 366specified extends that far in the past. The archives must reside 367in the same directory as the current log and their names must 368be prefixed with the name of the current log (syslog-1.gz or 369messages-20090105.bz2). 370 371If there is no separate log for the cluster, possibly unrelated 372messages from other programs are included. We don't filter logs, 373but just pick a segment for the period you specified. 374 375MANUAL REPORT COLLECTION 376------------------------ 377 378So, your ssh doesn't work. In that case, you will have to run 379this procedure on all nodes. Use `-S` so that `crm report` doesn't 380bother with ssh: 381 382 # crm report -f 5:20pm -t 5:30pm -S /tmp/report_node1 383 384If you also have a log host which is not in the cluster, then 385you'll have to copy the log to one of the nodes and tell us where 386it is: 387 388 # crm report -f 5:20pm -t 5:30pm -l /var/tmp/ha-log -S /tmp/report_node1 389 390OPERATION 391--------- 392`crm report` collects files and other information in a fairly 393straightforward way. The most complex tasks are discovering the 394log file locations (if syslog is used which is the most common 395case) and coordinating the operation on multiple nodes. 396 397The instance of `crm report` running on the host where it was 398invoked is the master instance. Instances running on other nodes 399are slave instances. The master instance communicates with slave 400instances by ssh. There are multiple ssh invocations per run, so 401it is essential that the ssh works without password, i.e. with 402the public key authentication and authorized_keys. 403 404The operation consists of three phases. Each phase must finish 405on all nodes before the next one can commence. The first phase 406consists of logging unique messages through syslog on all nodes. 407This is the shortest of all phases. 408 409The second phase is the most involved. During this phase all 410local information is collected, which includes: 411 412- logs (both current and archived if the start time is far in the past) 413- various configuration files (corosync, heartbeat, logd) 414- the CIB (both as xml and as represented by the crm shell) 415- pengine inputs (if this node was the DC at any point in 416 time over the given period) 417- system information and status 418- package information and status 419- dlm lock information 420- backtraces (if there were core dumps) 421 422The third phase is collecting information from all nodes and 423analyzing it. The analyzis consists of the following tasks: 424 425- identify files equal on all nodes which may then be moved to 426 the top directory 427- save log messages matching user defined patterns 428 (defaults to ERRORs and CRITical conditions) 429- report if there were coredumps and by whom 430- report crm_verify(8) results 431- save log messages matching major events to events.txt 432- in case logging is configured without loghost, node logs and 433 events files are combined using a perl utility 434 435 436BUGS 437---- 438Finding logs may at times be extremely difficult, depending on 439how weird the syslog configuration. It would be nice to ask 440syslog-ng developers to provide a way to find out the log 441destination based on facility and priority. 442 443If you think you found a bug, please rerun with the -v option and 444attach the output to bugzilla. 445 446`crm report` can function in a satisfactory way only if ssh works to 447all nodes using authorized_keys (without password). 448 449There are way too many options. 450 451 452AUTHOR 453------ 454Written by Dejan Muhamedagic, <dejan@suse.de> 455 456 457RESOURCES 458--------- 459ClusterLabs: <http://clusterlabs.org/> 460 461Heartbeat and other Linux HA resources: <http://linux-ha.org/wiki> 462 463OpenAIS: <http://www.openais.org/> 464 465Corosync: <http://www.corosync.org/> 466 467 468SEE ALSO 469-------- 470crm(8), Date::Parse(3) 471 472 473COPYING 474------- 475Copyright \(C) 2007-2009 Dejan Muhamedagic. Free use of this 476software is granted under the terms of the GNU General Public License (GPL). 477 478