1.. index:: 2 single: storage; active/active 3 4Convert Storage to Active/Active 5-------------------------------- 6 7The primary requirement for an Active/Active cluster is that the data 8required for your services is available, simultaneously, on both 9machines. Pacemaker makes no requirement on how this is achieved; you 10could use a Storage Area Network (SAN) if you had one available, but 11since DRBD supports multiple Primaries, we can continue to use it here. 12 13.. index:: 14 single: GFS2 15 single: DLM 16 single: filesystem; GFS2 17 18Install Cluster Filesystem Software 19################################### 20 21The only hitch is that we need to use a cluster-aware filesystem. The 22one we used earlier with DRBD, xfs, is not one of those. Both OCFS2 23and GFS2 are supported; here, we will use GFS2. 24 25On both nodes, install the GFS2 command-line utilities required by 26cluster filesystems: 27 28.. code-block:: none 29 30 # yum install -y gfs2-utils 31 32Additionally, install Distributed Lock Manager (DLM) on both nodes. 33To do so, download the RPM from the `CentOS composes artifacts tree <https://composes.centos.org/latest-CentOS-Stream-8/compose/ResilientStorage/x86_64/os/Packages/>`_, 34onto your nodes and then run the following 35command: 36 37.. code-block:: none 38 39 # rpm -i dlm-4.1.0-1.el8.x86_64.rpm 40 41Configure the Cluster for the DLM 42################################# 43 44The DLM control daemon needs to run on both nodes, so we'll start by creating a 45resource for it (using the **ocf:pacemaker:controld** resource script), and clone 46it: 47 48.. code-block:: none 49 50 [root@pcmk-1 ~]# pcs cluster cib dlm_cfg 51 [root@pcmk-1 ~]# pcs -f dlm_cfg resource create dlm \ 52 ocf:pacemaker:controld op monitor interval=60s 53 [root@pcmk-1 ~]# pcs -f dlm_cfg resource clone dlm clone-max=2 clone-node-max=1 54 [root@pcmk-1 ~]# pcs resource status 55 * ClusterIP (ocf::heartbeat:IPaddr2): Started pcmk-2 56 * WebSite (ocf::heartbeat:apache): Started pcmk-2 57 * Clone Set: WebData-clone [WebData] (promotable): 58 * Masters: [ pcmk-2 ] 59 * Slaves: [ pcmk-1 ] 60 * WebFS (ocf::heartbeat:Filesystem): Started pcmk-2 61 [root@pcmk-1 ~]# pcs resource config 62 Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2) 63 Attributes: cidr_netmask=24 ip=192.168.122.120 64 Operations: monitor interval=30s (ClusterIP-monitor-interval-30s) 65 start interval=0s timeout=20s (ClusterIP-start-interval-0s) 66 stop interval=0s timeout=20s (ClusterIP-stop-interval-0s) 67 Resource: WebSite (class=ocf provider=heartbeat type=apache) 68 Attributes: configfile=/etc/httpd/conf/httpd.conf statusurl=http://localhost/server-status 69 Operations: monitor interval=1min (WebSite-monitor-interval-1min) 70 start interval=0s timeout=40s (WebSite-start-interval-0s) 71 stop interval=0s timeout=60s (WebSite-stop-interval-0s) 72 Clone: WebData-clone 73 Meta Attrs: clone-max=2 clone-node-max=1 notify=true promotable=true promoted-max=1 promoted-node-max=1 74 Resource: WebData (class=ocf provider=linbit type=drbd) 75 Attributes: drbd_resource=wwwdata 76 Operations: demote interval=0s timeout=90 (WebData-demote-interval-0s) 77 monitor interval=60s (WebData-monitor-interval-60s) 78 notify interval=0s timeout=90 (WebData-notify-interval-0s) 79 promote interval=0s timeout=90 (WebData-promote-interval-0s) 80 reload interval=0s timeout=30 (WebData-reload-interval-0s) 81 start interval=0s timeout=240 (WebData-start-interval-0s) 82 stop interval=0s timeout=100 (WebData-stop-interval-0s) 83 Resource: WebFS (class=ocf provider=heartbeat type=Filesystem) 84 Attributes: device=/dev/drbd1 directory=/var/www/html fstype=xfs 85 Operations: monitor interval=20s timeout=40s (WebFS-monitor-interval-20s) 86 start interval=0s timeout=60s (WebFS-start-interval-0s) 87 stop interval=0s timeout=60s (WebFS-stop-interval-0s) 88 89Activate our new configuration, and see how the cluster responds: 90 91.. code-block:: none 92 93 [root@pcmk-1 ~]# pcs cluster cib-push dlm_cfg --config 94 CIB updated 95 [root@pcmk-1 ~]# pcs status 96 Cluster name: mycluster 97 Cluster Summary: 98 * Stack: corosync 99 * Current DC: pcmk-2 (version 2.1.0-3.el8-7c3f660707) - partition with quorum 100 * Last updated: Wed Jul 13 10:57:20 2021 101 * Last change: Wed Jul 13 10:57:15 2021 by root via cibadmin on pcmk-1 102 * 2 nodes configured 103 * 7 resource instances configured 104 105 Node List: 106 * Online: [ pcmk-1 pcmk-2 ] 107 108 Full List of Resources: 109 * ClusterIP (ocf::heartbeat:IPaddr2): Started pcmk-1 110 * WebSite (ocf::heartbeat:apache): Started pcmk-1 111 * Clone Set: WebData-clone [WebData] (promotable): 112 * Masters: [ pcmk-1 ] 113 * Slaves: [ pcmk-2 ] 114 * WebFS (ocf::heartbeat:Filesystem): Started pcmk-1 115 * Clone Set: dlm-clone [dlm]: 116 * Started: [ pcmk-1 pcmk-2 ] 117 118 Daemon Status: 119 corosync: active/disabled 120 pacemaker: active/disabled 121 pcsd: active/enabled 122 123Create and Populate GFS2 Filesystem 124################################### 125 126Before we do anything to the existing partition, we need to make sure it 127is unmounted. We do this by telling the cluster to stop the WebFS resource. 128This will ensure that other resources (in our case, Apache) using WebFS 129are not only stopped, but stopped in the correct order. 130 131.. code-block:: none 132 133 [root@pcmk-1 ~]# pcs resource disable WebFS 134 [root@pcmk-1 ~]# pcs resource 135 * ClusterIP (ocf::heartbeat:IPaddr2): Started pcmk-1 136 * WebSite (ocf::heartbeat:apache): Stopped 137 * Clone Set: WebData-clone [WebData] (promotable): 138 * Masters: [ pcmk-1 ] 139 * Slaves: [ pcmk-2 ] 140 * WebFS (ocf::heartbeat:Filesystem): Stopped (disabled) 141 * Clone Set: dlm-clone [dlm]: 142 * Started: [ pcmk-1 pcmk-2 ] 143 144You can see that both Apache and WebFS have been stopped, and that **pcmk-1** 145is currently running the promoted instance for the DRBD device. 146 147Now we can create a new GFS2 filesystem on the DRBD device. 148 149.. WARNING:: 150 151 This will erase all previous content stored on the DRBD device. Ensure 152 you have a copy of any important data. 153 154.. IMPORTANT:: 155 156 Run the next command on whichever node has the DRBD Primary role. 157 Otherwise, you will receive the message: 158 159 .. code-block:: none 160 161 /dev/drbd1: Read-only file system 162 163.. code-block:: none 164 165 [root@pcmk-2 ~]# mkfs.gfs2 -p lock_dlm -j 2 -t mycluster:web /dev/drbd1 166 It appears to contain an existing filesystem (xfs) 167 This will destroy any data on /dev/drbd1 168 Are you sure you want to proceed? [y/n] y 169 Discarding device contents (may take a while on large devices): Done 170 Adding journals: Done 171 Building resource groups: Done 172 Creating quota file: Done 173 Writing superblock and syncing: Done 174 Device: /dev/drbd1 175 Block size: 4096 176 Device size: 0.50 GB (131059 blocks) 177 Filesystem size: 0.50 GB (131055 blocks) 178 Journals: 2 179 Journal size: 8MB 180 Resource groups: 4 181 Locking protocol: "lock_dlm" 182 Lock table: "mycluster:web" 183 UUID: 19712677-7206-4660-a079-5d17341dd720 184 185The ``mkfs.gfs2`` command required a number of additional parameters: 186 187* ``-p lock_dlm`` specifies that we want to use the kernel's DLM. 188 189* ``-j 2`` indicates that the filesystem should reserve enough 190 space for two journals (one for each node that will access the filesystem). 191 192* ``-t mycluster:web`` specifies the lock table name. The format for this 193 field is ``<CLUSTERNAME>:<FSNAME>``. For ``CLUSTERNAME``, we need to use the 194 same value we specified originally with ``pcs cluster setup --name`` (which is 195 also the value of **cluster_name** in ``/etc/corosync/corosync.conf``). If 196 you are unsure what your cluster name is, you can look in 197 ``/etc/corosync/corosync.conf`` or execute the command 198 ``pcs cluster corosync pcmk-1 | grep cluster_name``. 199 200Now we can (re-)populate the new filesystem with data 201(web pages). We'll create yet another variation on our home page. 202 203.. code-block:: none 204 205 [root@pcmk-1 ~]# mount /dev/drbd1 /mnt 206 [root@pcmk-1 ~]# cat <<-END >/mnt/index.html 207 <html> 208 <body>My Test Site - GFS2</body> 209 </html> 210 END 211 [root@pcmk-1 ~]# chcon -R --reference=/var/www/html /mnt 212 [root@pcmk-1 ~]# umount /dev/drbd1 213 [root@pcmk-1 ~]# drbdadm verify wwwdata 214 215Reconfigure the Cluster for GFS2 216################################ 217 218With the WebFS resource stopped, let's update the configuration. 219 220.. code-block:: none 221 222 [root@pcmk-1 ~]# pcs resource config WebFS 223 Resource: WebFS (class=ocf provider=heartbeat type=Filesystem) 224 Attributes: device=/dev/drbd1 directory=/var/www/html fstype=xfs 225 Meta Attrs: target-role=Stopped 226 Operations: monitor interval=20s timeout=40s (WebFS-monitor-interval-20s) 227 start interval=0s timeout=60s (WebFS-start-interval-0s) 228 stop interval=0s timeout=60s (WebFS-stop-interval-0s) 229 230The fstype option needs to be updated to **gfs2** instead of **xfs**. 231 232.. code-block:: none 233 234 [root@pcmk-1 ~]# pcs resource update WebFS fstype=gfs2 235 [root@pcmk-1 ~]# pcs resource config WebFS 236 Resource: WebFS (class=ocf provider=heartbeat type=Filesystem) 237 Attributes: device=/dev/drbd1 directory=/var/www/html fstype=gfs2 238 Meta Attrs: target-role=Stopped 239 Operations: monitor interval=20s timeout=40s (WebFS-monitor-interval-20s) 240 start interval=0s timeout=60s (WebFS-start-interval-0s) 241 stop interval=0s timeout=60s (WebFS-stop-interval-0s) 242 243GFS2 requires that DLM be running, so we also need to set up new colocation 244and ordering constraints for it: 245 246.. code-block:: none 247 248 [root@pcmk-1 ~]# pcs constraint colocation add WebFS with dlm-clone INFINITY 249 [root@pcmk-1 ~]# pcs constraint order dlm-clone then WebFS 250 Adding dlm-clone WebFS (kind: Mandatory) (Options: first-action=start then-action=start) 251 252 253.. index:: 254 pair: filesystem; clone 255 256Clone the Filesystem Resource 257############################# 258 259Now that we have a cluster filesystem ready to go, we can configure the cluster 260so both nodes mount the filesystem. 261 262Clone the filesystem resource in a new configuration. 263Notice how pcs automatically updates the relevant constraints again. 264 265.. code-block:: none 266 267 [root@pcmk-1 ~]# pcs cluster cib active_cfg 268 [root@pcmk-1 ~]# pcs -f active_cfg resource clone WebFS 269 [root@pcmk-1 ~]# pcs -f active_cfg constraint 270 [root@pcmk-1 ~]# pcs -f active_cfg constraint 271 Location Constraints: 272 Resource: WebSite 273 Enabled on: 274 Node: pcmk-1 (score:50) 275 Ordering Constraints: 276 start ClusterIP then start WebSite (kind:Mandatory) 277 promote WebData-clone then start WebFS-clone (kind:Mandatory) 278 start WebFS-clone then start WebSite (kind:Mandatory) 279 Colocation Constraints: 280 WebSite with ClusterIP (score:INFINITY) 281 WebFS-clone with WebData-clone (score:INFINITY) (with-rsc-role:Master) 282 WebSite with WebFS-clone (score:INFINITY) 283 Ticket Constraints: 284 285Tell the cluster that it is now allowed to promote both instances to be DRBD 286Primary. 287 288.. code-block:: none 289 290 [root@pcmk-1 ~]# pcs -f active_cfg resource update WebData-clone promoted-max=2 291 292Finally, load our configuration to the cluster, and re-enable the WebFS resource 293(which we disabled earlier). 294 295.. code-block:: none 296 297 [root@pcmk-1 ~]# pcs cluster cib-push active_cfg --config 298 CIB updated 299 [root@pcmk-1 ~]# pcs resource enable WebFS 300 301After all the processes are started, the status should look similar to this. 302 303.. code-block:: none 304 305 [root@pcmk-1 ~]# pcs resource 306 [root@pcmk-1 ~]# pcs resource 307 * ClusterIP (ocf::heartbeat:IPaddr2): Started pcmk-1 308 * WebSite (ocf::heartbeat:apache): Started pcmk-1 309 * Clone Set: WebData-clone [WebData] (promotable): 310 * Masters: [ pcmk-1 pcmk-2 ] 311 * Clone Set: dlm-clone [dlm]: 312 * Started: [ pcmk-1 pcmk-2 ] 313 * Clone Set: WebFS-clone [WebFS]: 314 * Started: [ pcmk-1 pcmk-2 ] 315 316Test Failover 317############# 318 319Testing failover is left as an exercise for the reader. 320 321With this configuration, the data is now active/active. The website 322administrator could change HTML files on either node, and the live website will 323show the changes even if it is running on the opposite node. 324 325If the web server is configured to listen on all IP addresses, it is possible 326to remove the constraints between the WebSite and ClusterIP resources, and 327clone the WebSite resource. The web server would always be ready to serve web 328pages, and only the IP address would need to be moved in a failover. 329