1.. index::
2   single: storage; active/active
3
4Convert Storage to Active/Active
5--------------------------------
6
7The primary requirement for an Active/Active cluster is that the data
8required for your services is available, simultaneously, on both
9machines. Pacemaker makes no requirement on how this is achieved; you
10could use a Storage Area Network (SAN) if you had one available, but
11since DRBD supports multiple Primaries, we can continue to use it here.
12
13.. index::
14   single: GFS2
15   single: DLM
16   single: filesystem; GFS2
17
18Install Cluster Filesystem Software
19###################################
20
21The only hitch is that we need to use a cluster-aware filesystem. The
22one we used earlier with DRBD, xfs, is not one of those. Both OCFS2
23and GFS2 are supported; here, we will use GFS2.
24
25On both nodes, install the GFS2 command-line utilities required by
26cluster filesystems:
27
28.. code-block:: none
29
30    # yum install -y gfs2-utils
31
32Additionally, install Distributed Lock Manager (DLM) on both nodes.
33To do so, download the RPM from the `CentOS composes artifacts tree <https://composes.centos.org/latest-CentOS-Stream-8/compose/ResilientStorage/x86_64/os/Packages/>`_,
34onto your nodes and then run the following
35command:
36
37.. code-block:: none
38
39    # rpm -i dlm-4.1.0-1.el8.x86_64.rpm
40
41Configure the Cluster for the DLM
42#################################
43
44The DLM control daemon needs to run on both nodes, so we'll start by creating a
45resource for it (using the **ocf:pacemaker:controld** resource script), and clone
46it:
47
48.. code-block:: none
49
50    [root@pcmk-1 ~]# pcs cluster cib dlm_cfg
51    [root@pcmk-1 ~]# pcs -f dlm_cfg resource create dlm \
52        ocf:pacemaker:controld op monitor interval=60s
53    [root@pcmk-1 ~]# pcs -f dlm_cfg resource clone dlm clone-max=2 clone-node-max=1
54    [root@pcmk-1 ~]# pcs resource status
55      * ClusterIP	(ocf::heartbeat:IPaddr2):	 Started pcmk-2
56      * WebSite	(ocf::heartbeat:apache):	 Started pcmk-2
57      * Clone Set: WebData-clone [WebData] (promotable):
58        * Masters: [ pcmk-2 ]
59        * Slaves: [ pcmk-1 ]
60      * WebFS	(ocf::heartbeat:Filesystem):	 Started pcmk-2
61    [root@pcmk-1 ~]# pcs resource config
62     Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
63      Attributes: cidr_netmask=24 ip=192.168.122.120
64      Operations: monitor interval=30s (ClusterIP-monitor-interval-30s)
65                  start interval=0s timeout=20s (ClusterIP-start-interval-0s)
66                  stop interval=0s timeout=20s (ClusterIP-stop-interval-0s)
67     Resource: WebSite (class=ocf provider=heartbeat type=apache)
68      Attributes: configfile=/etc/httpd/conf/httpd.conf statusurl=http://localhost/server-status
69      Operations: monitor interval=1min (WebSite-monitor-interval-1min)
70                  start interval=0s timeout=40s (WebSite-start-interval-0s)
71                  stop interval=0s timeout=60s (WebSite-stop-interval-0s)
72     Clone: WebData-clone
73      Meta Attrs: clone-max=2 clone-node-max=1 notify=true promotable=true promoted-max=1 promoted-node-max=1
74      Resource: WebData (class=ocf provider=linbit type=drbd)
75       Attributes: drbd_resource=wwwdata
76       Operations: demote interval=0s timeout=90 (WebData-demote-interval-0s)
77                   monitor interval=60s (WebData-monitor-interval-60s)
78                   notify interval=0s timeout=90 (WebData-notify-interval-0s)
79                   promote interval=0s timeout=90 (WebData-promote-interval-0s)
80                   reload interval=0s timeout=30 (WebData-reload-interval-0s)
81                   start interval=0s timeout=240 (WebData-start-interval-0s)
82                   stop interval=0s timeout=100 (WebData-stop-interval-0s)
83     Resource: WebFS (class=ocf provider=heartbeat type=Filesystem)
84      Attributes: device=/dev/drbd1 directory=/var/www/html fstype=xfs
85      Operations: monitor interval=20s timeout=40s (WebFS-monitor-interval-20s)
86                  start interval=0s timeout=60s (WebFS-start-interval-0s)
87                  stop interval=0s timeout=60s (WebFS-stop-interval-0s)
88
89Activate our new configuration, and see how the cluster responds:
90
91.. code-block:: none
92
93    [root@pcmk-1 ~]# pcs cluster cib-push dlm_cfg --config
94    CIB updated
95    [root@pcmk-1 ~]# pcs status
96    Cluster name: mycluster
97    Cluster Summary:
98      * Stack: corosync
99      * Current DC: pcmk-2 (version 2.1.0-3.el8-7c3f660707) - partition with quorum
100      * Last updated: Wed Jul 13 10:57:20 2021
101      * Last change:  Wed Jul 13 10:57:15 2021 by root via cibadmin on pcmk-1
102      * 2 nodes configured
103      * 7 resource instances configured
104
105    Node List:
106      * Online: [ pcmk-1 pcmk-2 ]
107
108    Full List of Resources:
109      * ClusterIP	(ocf::heartbeat:IPaddr2):	 Started pcmk-1
110      * WebSite	(ocf::heartbeat:apache):	 Started pcmk-1
111      * Clone Set: WebData-clone [WebData] (promotable):
112        * Masters: [ pcmk-1 ]
113        * Slaves: [ pcmk-2 ]
114      * WebFS	(ocf::heartbeat:Filesystem):	 Started pcmk-1
115      * Clone Set: dlm-clone [dlm]:
116        * Started: [ pcmk-1 pcmk-2 ]
117
118    Daemon Status:
119      corosync: active/disabled
120      pacemaker: active/disabled
121      pcsd: active/enabled
122
123Create and Populate GFS2 Filesystem
124###################################
125
126Before we do anything to the existing partition, we need to make sure it
127is unmounted. We do this by telling the cluster to stop the WebFS resource.
128This will ensure that other resources (in our case, Apache) using WebFS
129are not only stopped, but stopped in the correct order.
130
131.. code-block:: none
132
133    [root@pcmk-1 ~]# pcs resource disable WebFS
134    [root@pcmk-1 ~]# pcs resource
135      * ClusterIP	(ocf::heartbeat:IPaddr2):	 Started pcmk-1
136      * WebSite	(ocf::heartbeat:apache):	 Stopped
137      * Clone Set: WebData-clone [WebData] (promotable):
138        * Masters: [ pcmk-1 ]
139        * Slaves: [ pcmk-2 ]
140      * WebFS	(ocf::heartbeat:Filesystem):	 Stopped (disabled)
141      * Clone Set: dlm-clone [dlm]:
142        * Started: [ pcmk-1 pcmk-2 ]
143
144You can see that both Apache and WebFS have been stopped, and that **pcmk-1**
145is currently running the promoted instance for the DRBD device.
146
147Now we can create a new GFS2 filesystem on the DRBD device.
148
149.. WARNING::
150
151    This will erase all previous content stored on the DRBD device. Ensure
152    you have a copy of any important data.
153
154.. IMPORTANT::
155
156    Run the next command on whichever node has the DRBD Primary role.
157    Otherwise, you will receive the message:
158
159    .. code-block:: none
160
161        /dev/drbd1: Read-only file system
162
163.. code-block:: none
164
165    [root@pcmk-2 ~]# mkfs.gfs2 -p lock_dlm -j 2 -t mycluster:web /dev/drbd1
166    It appears to contain an existing filesystem (xfs)
167    This will destroy any data on /dev/drbd1
168    Are you sure you want to proceed? [y/n] y
169    Discarding device contents (may take a while on large devices): Done
170    Adding journals: Done
171    Building resource groups: Done
172    Creating quota file: Done
173    Writing superblock and syncing: Done
174    Device:                    /dev/drbd1
175    Block size:                4096
176    Device size:               0.50 GB (131059 blocks)
177    Filesystem size:           0.50 GB (131055 blocks)
178    Journals:                  2
179    Journal size:              8MB
180    Resource groups:           4
181    Locking protocol:          "lock_dlm"
182    Lock table:                "mycluster:web"
183    UUID:                      19712677-7206-4660-a079-5d17341dd720
184
185The ``mkfs.gfs2`` command required a number of additional parameters:
186
187* ``-p lock_dlm`` specifies that we want to use the kernel's DLM.
188
189* ``-j 2`` indicates that the filesystem should reserve enough
190  space for two journals (one for each node that will access the filesystem).
191
192* ``-t mycluster:web`` specifies the lock table name. The format for this
193  field is ``<CLUSTERNAME>:<FSNAME>``. For ``CLUSTERNAME``, we need to use the
194  same value we specified originally with ``pcs cluster setup --name`` (which is
195  also the value of **cluster_name** in ``/etc/corosync/corosync.conf``). If
196  you are unsure what your cluster name is, you can look in
197  ``/etc/corosync/corosync.conf`` or execute the command
198  ``pcs cluster corosync pcmk-1 | grep cluster_name``.
199
200Now we can (re-)populate the new filesystem with data
201(web pages). We'll create yet another variation on our home page.
202
203.. code-block:: none
204
205    [root@pcmk-1 ~]# mount /dev/drbd1 /mnt
206    [root@pcmk-1 ~]# cat <<-END >/mnt/index.html
207    <html>
208    <body>My Test Site - GFS2</body>
209    </html>
210    END
211    [root@pcmk-1 ~]# chcon -R --reference=/var/www/html /mnt
212    [root@pcmk-1 ~]# umount /dev/drbd1
213    [root@pcmk-1 ~]# drbdadm verify wwwdata
214
215Reconfigure the Cluster for GFS2
216################################
217
218With the WebFS resource stopped, let's update the configuration.
219
220.. code-block:: none
221
222    [root@pcmk-1 ~]# pcs resource config WebFS
223     Resource: WebFS (class=ocf provider=heartbeat type=Filesystem)
224       Attributes: device=/dev/drbd1 directory=/var/www/html fstype=xfs
225       Meta Attrs: target-role=Stopped
226       Operations: monitor interval=20s timeout=40s (WebFS-monitor-interval-20s)
227                   start interval=0s timeout=60s (WebFS-start-interval-0s)
228                   stop interval=0s timeout=60s (WebFS-stop-interval-0s)
229
230The fstype option needs to be updated to **gfs2** instead of **xfs**.
231
232.. code-block:: none
233
234    [root@pcmk-1 ~]# pcs resource update WebFS fstype=gfs2
235    [root@pcmk-1 ~]# pcs resource config WebFS
236     Resource: WebFS (class=ocf provider=heartbeat type=Filesystem)
237       Attributes: device=/dev/drbd1 directory=/var/www/html fstype=gfs2
238       Meta Attrs: target-role=Stopped
239       Operations: monitor interval=20s timeout=40s (WebFS-monitor-interval-20s)
240                   start interval=0s timeout=60s (WebFS-start-interval-0s)
241                   stop interval=0s timeout=60s (WebFS-stop-interval-0s)
242
243GFS2 requires that DLM be running, so we also need to set up new colocation
244and ordering constraints for it:
245
246.. code-block:: none
247
248    [root@pcmk-1 ~]# pcs constraint colocation add WebFS with dlm-clone INFINITY
249    [root@pcmk-1 ~]# pcs constraint order dlm-clone then WebFS
250    Adding dlm-clone WebFS (kind: Mandatory) (Options: first-action=start then-action=start)
251
252
253.. index::
254   pair: filesystem; clone
255
256Clone the Filesystem Resource
257#############################
258
259Now that we have a cluster filesystem ready to go, we can configure the cluster
260so both nodes mount the filesystem.
261
262Clone the filesystem resource in a new configuration.
263Notice how pcs automatically updates the relevant constraints again.
264
265.. code-block:: none
266
267    [root@pcmk-1 ~]# pcs cluster cib active_cfg
268    [root@pcmk-1 ~]# pcs -f active_cfg resource clone WebFS
269    [root@pcmk-1 ~]# pcs -f active_cfg constraint
270    [root@pcmk-1 ~]# pcs -f active_cfg constraint
271    Location Constraints:
272      Resource: WebSite
273        Enabled on:
274          Node: pcmk-1 (score:50)
275    Ordering Constraints:
276      start ClusterIP then start WebSite (kind:Mandatory)
277      promote WebData-clone then start WebFS-clone (kind:Mandatory)
278      start WebFS-clone then start WebSite (kind:Mandatory)
279    Colocation Constraints:
280      WebSite with ClusterIP (score:INFINITY)
281      WebFS-clone with WebData-clone (score:INFINITY) (with-rsc-role:Master)
282      WebSite with WebFS-clone (score:INFINITY)
283    Ticket Constraints:
284
285Tell the cluster that it is now allowed to promote both instances to be DRBD
286Primary.
287
288.. code-block:: none
289
290    [root@pcmk-1 ~]# pcs -f active_cfg resource update WebData-clone promoted-max=2
291
292Finally, load our configuration to the cluster, and re-enable the WebFS resource
293(which we disabled earlier).
294
295.. code-block:: none
296
297    [root@pcmk-1 ~]# pcs cluster cib-push active_cfg --config
298    CIB updated
299    [root@pcmk-1 ~]# pcs resource enable WebFS
300
301After all the processes are started, the status should look similar to this.
302
303.. code-block:: none
304
305    [root@pcmk-1 ~]# pcs resource
306    [root@pcmk-1 ~]# pcs resource
307      * ClusterIP	(ocf::heartbeat:IPaddr2):	 Started pcmk-1
308      * WebSite	(ocf::heartbeat:apache):	 Started pcmk-1
309      * Clone Set: WebData-clone [WebData] (promotable):
310        * Masters: [ pcmk-1 pcmk-2 ]
311      * Clone Set: dlm-clone [dlm]:
312        * Started: [ pcmk-1 pcmk-2 ]
313      * Clone Set: WebFS-clone [WebFS]:
314        * Started: [ pcmk-1 pcmk-2 ]
315
316Test Failover
317#############
318
319Testing failover is left as an exercise for the reader.
320
321With this configuration, the data is now active/active. The website
322administrator could change HTML files on either node, and the live website will
323show the changes even if it is running on the opposite node.
324
325If the web server is configured to listen on all IP addresses, it is possible
326to remove the constraints between the WebSite and ClusterIP resources, and
327clone the WebSite resource. The web server would always be ready to serve web
328pages, and only the IP address would need to be moved in a failover.
329