1# Pacemaker Cluster Test Suite (CTS) 2 3## Purpose 4 5CTS thoroughly exercises a pacemaker test cluster by running a randomized 6series of predefined tests on the cluster. CTS can be run against a 7pre-existing cluster configuration or (more typically) overwrite the existing 8configuration with a test configuration. 9 10 11## Requirements 12 13* Three or more machines (one test exerciser and two or more test cluster 14 machines). 15 16* The test cluster machines should be on the same subnet and have journalling 17 filesystems (ext3, ext4, xfs, etc.) for all of their filesystems other than 18 /boot. You also need a number of free IP addresses on that subnet if you 19 intend to test mutual IP address takeover. 20 21* The test exerciser machine doesn't need to be on the same subnet as the test 22 cluster machines. Minimal demands are made on the exerciser machine - it 23 just has to stay up during the tests. 24 25* It helps a lot in tracking problems if all machines' clocks are closely 26 synchronized. NTP does this automatically, but you can do it by hand if you 27 want. 28 29* The exerciser needs to be able to ssh over to the cluster nodes as root 30 without a password challenge. Configure ssh accordingly (see the Mini-HOWTO 31 at the end of this document for more details). 32 33* The exerciser needs to be able to resolve the machine names of the 34 test cluster - either by DNS or by /etc/hosts. 35 36* CTS is not guaranteed to run on all platforms that pacemaker itself does. 37 It calls commands such as service that may not be provided by all OSes. 38 39## Preparation 40 41Install Pacemaker (including CTS) on all machines. These scripts are 42coordinated with particular versions of Pacemaker, so you need the same version 43of CTS as the rest of Pacemaker, and you need the same version of 44pacemaker and CTS on both the test exerciser and the test cluster machines. 45 46You can install CTS from source, although many distributions provide 47packages that include it (e.g. pacemaker-cts or pacemaker-dev). 48Typically, packages will install CTS as /usr/share/pacemaker/tests/cts. 49 50Configure cluster communications (Corosync, CMAN or Heartbeat) on the 51cluster machines and verify everything works. 52 53NOTE: Do not run the cluster on the test exerciser machine. 54 55NOTE: Wherever machine names are mentioned in these configuration files, 56they must match the machines' `uname -n` name. This may or may not match 57the machines' FQDN (fully qualified domain name) - it depends on how 58you (and your OS) have named the machines. 59 60 61## Run CTS 62 63Now assuming you did all this, what you need to do is run CTSlab.py: 64 65 python ./CTSlab.py [options] number-of-tests-to-run 66 67You must specify which nodes are part of the cluster with --nodes, e.g.: 68 69 --node "pcmk-1 pcmk-2 pcmk-3" 70 71Most people will want to save the output with --outputfile, e.g.: 72 73 --outputfile ~/cts.log 74 75Unless you want to test your pre-existing cluster configuration, you also want: 76 77 --clobber-cib 78 --populate-resources 79 --test-ip-base $IP # e.g. --test-ip-base 192.168.9.100 80 81and configure some sort of fencing: 82 83 --stonith $TYPE # e.g. "--stonith xvm" to use fence_xvm or "--stonith lha" to use external/ssh 84 85A complete command line might look like: 86 87 python ./CTSlab.py --nodes "pcmk-1 pcmk-2 pcmk-3" --outputfile ~/cts.log \ 88 --clobber-cib --populate-resources --test-ip-base 192.168.9.100 \ 89 --stonith xvm 50 90 91For more options, use the --help option. 92 93NOTE: Perhaps more convenient way to compile a command line like above 94 is to use cluster_test script that, at least in the source repository, 95 sits in the same directory as this very file. 96 97To extract the result of a particular test, run: 98 99 crm_report -T $test 100 101 102## Optional/advanced testing 103 104### Memory testing 105 106Pacemaker and CTS have various options for testing memory management. On the 107cluster nodes, pacemaker components will use various environment variables to 108control these options. How these variables are set varies by OS, but usually 109they are set in the /etc/sysconfig/pacemaker or /etc/default/pacemaker file. 110 111Valgrind is a program for detecting memory management problems (such as 112use-after-free errors). If you have valgrind installed, you can enable it by 113setting the following environment variables on all cluster nodes: 114 115 PCMK_valgrind_enabled=attrd,cib,crmd,lrmd,pengine,stonith-ng 116 VALGRIND_OPTS="--leak-check=full --trace-children=no --num-callers=25 117 --log-file=/var/lib/pacemaker/valgrind-%p 118 --suppressions=/usr/share/pacemaker/tests/valgrind-pcmk.suppressions 119 --gen-suppressions=all" 120 121and running CTS with these options: 122 123 --valgrind-tests --valgrind-procs="attrd cib crmd lrmd pengine stonith-ng" 124 125These options should only be set while specifically testing memory management, 126because they may slow down the cluster significantly, and they will disable 127writes to the CIB. If desired, you can enable valgrind on a subset of pacemaker 128components rather than all of them as listed above. 129 130Valgrind will put a text file for each process in the location specified by 131valgrind's --log-file option. For explanations of the messages valgrind 132generates, see http://valgrind.org/docs/manual/mc-manual.html 133 134Separately, if you are using the GNU C library, the G_SLICE, MALLOC_PERTURB_, 135and MALLOC_CHECK_ environment variables can be set to affect the library's 136memory management functions. 137 138When using valgrind, G_SLICE should be set to "always-malloc", which helps 139valgrind track memory by always using the malloc() and free() routines 140directly. When not using valgrind, G_SLICE can be left unset, or set to 141"debug-blocks", which enables the C library to catch many memory errors 142but may impact performance. 143 144If the MALLOC_PERTURB_ environment variable is set to an 8-bit integer, the C 145library will initialize all newly allocated bytes of memory to the integer 146value, and will set all newly freed bytes of memory to the bitwise inverse of 147the integer value. This helps catch uses of uninitialized or freed memory 148blocks that might otherwise go unnoticed. Example: 149 150 MALLOC_PERTURB_=221 151 152If the MALLOC_CHECK_ environment variable is set, the C library will check for 153certain heap corruption errors. The most useful value in testing is 3, which 154will cause the library to print a message to stderr and abort execution. 155Example: 156 157 MALLOC_CHECK_=3 158 159Valgrind should be enabled for either all nodes or none, but the C library 160variables may be set differently on different nodes. 161 162 163### Remote node testing 164 165If the pacemaker_remoted daemon is installed on all cluster nodes, CTS will 166enable remote node tests. 167 168The remote node tests choose a random node, stop the cluster on it, start 169pacemaker_remote on it, and add an ocf:pacemaker:remote resource to turn it 170into a remote node. When the test is done, CTS will turn the node back into 171a cluster node. 172 173To avoid conflicts, CTS will rename the node, prefixing the original node name 174with "remote-". For example, "pcmk-1" will become "remote-pcmk-1". 175 176The name change may require special stonith configuration, if the fence agent 177expects the node name to be the same as its hostname. A common approach is to 178specify the "remote-" names in pcmk_host_list. If you use pcmk_host_list=all, 179CTS will expand that to all cluster nodes and their "remote-" names. 180You may additionally need a pcmk_host_map argument to map the "remote-" names 181to the hostnames. Example: 182 183 --stonith xvm --stonith-args \ 184 pcmk_arg_map=domain:uname,pcmk_host_list=all,pcmk_host_map=remote-pcmk-1:pcmk-1;remote-pcmk-2:pcmk-2 185 186### Remote node testing with valgrind 187 188When running the remote node tests, the pacemaker components on the cluster 189nodes can be run under valgrind as described in the "Memory testing" section. 190However, pacemaker_remote cannot be run under valgrind that way, because it is 191started by the OS's regular boot system and not by pacemaker. 192 193Details vary by system, but the goal is to set the VALGRIND_OPTS environment 194variable and then start pacemaker_remoted by prefixing it with the path to 195valgrind. 196 197The init script and systemd service file provided with pacemaker_remote will 198load the pacemaker environment variables from the same location used by other 199pacemaker components, so VALGRIND_OPTS will be set correctly if using one of 200those. 201 202For an OS using systemd, you can override the ExecStart parameter to run 203valgrind. For example: 204 205 mkdir /etc/systemd/system/pacemaker_remote.service.d 206 cat >/etc/systemd/system/pacemaker_remote.service.d/valgrind.conf <<EOF 207 [Service] 208 ExecStart= 209 ExecStart=/usr/bin/valgrind /usr/sbin/pacemaker_remoted 210 EOF 211 212### Container testing 213 214If the --container-tests option is given to CTS, it will enable 215testing of LXC resources (currently only the RemoteLXC test, 216which starts a remote node using an LXC container). 217 218The container tests have additional package dependencies (see the toplevel 219README). Also, SELinux must be enabled (in either permissive or enforcing mode), 220libvirtd must be enabled and running, and root must be able to ssh without a 221password between all cluster nodes (not just from the test machine). Before 222running the tests, you can verify your environment with: 223 224 /usr/share/pacemaker/tests/cts/lxc_autogen.sh -v 225 226LXC tests will create two containers with hardcoded parameters: a NAT'ed bridge 227named virbr0 using the IP network 192.168.123.0/24 will be created on the 228cluster node hosting the containers; the host will be assigned 22952:54:00:A8:12:35 as the MAC address and 192.168.123.1 as the IP address. 230Each container will be assigned a random MAC address starting with 52:54:, 231the IP address 192.168.123.11 or 192.168.123.12, the hostname lxc1 or lxc2 232(which will be added to the host's /etc/hosts file), and 196MB RAM. 233 234The test will revert all of the configuration when it is done. 235 236 237## Mini-HOWTOs 238 239### Allow passwordless remote SSH connections 240 241The CTS scripts run "ssh -l root" so you don't have to do any of your testing 242logged in as root on the test machine. Here is how to allow such connections 243without requiring a password to be entered each time: 244 245* On your test exerciser, create an SSH key if you do not already have one. 246 Most commonly, SSH keys will be in your ~/.ssh directory, with the 247 private key file not having an extension, and the public key file 248 named the same with the extension ".pub" (for example, ~/.ssh/id_rsa.pub). 249 250 If you don't already have a key, you can create one with: 251 252 ssh-keygen -t rsa 253 254* From your test exerciser, authorize your SSH public key for root on all test 255 machines (both the exerciser and the cluster test machines): 256 257 ssh-copy-id -i ~/.ssh/id_rsa.pub root@$MACHINE 258 259 You will probably have to provide your password, and possibly say 260 "yes" to some questions about accepting the identity of the test machines. 261 262 The above assumes you have a RSA SSH key in the specified location; 263 if you have some other type of key (DSA, ECDSA, etc.), use its file name 264 in the -i option above. 265 266* To test, try this command from the exerciser machine for each 267 of your cluster machines, and for the exerciser machine itself. 268 269 ssh -l root $MACHINE 270 271 If this works without prompting for a password, you're in business. 272 If not, look at the documentation for your version of ssh. 273