1.. index::
2   single: fencing
3   single: STONITH
4
5.. _fencing:
6
7Fencing
8-------
9
10What Is Fencing?
11################
12
13*Fencing* is the ability to make a node unable to run resources, even when that
14node is unresponsive to cluster commands.
15
16Fencing is also known as *STONITH*, an acronym for "Shoot The Other Node In The
17Head", since the most common fencing method is cutting power to the node.
18Another method is "fabric fencing", cutting the node's access to some
19capability required to run resources (such as network access or a shared disk).
20
21.. index::
22   single: fencing; why necessary
23
24Why Is Fencing Necessary?
25#########################
26
27Fencing protects your data from being corrupted by malfunctioning nodes or
28unintentional concurrent access to shared resources.
29
30Fencing protects against the "split brain" failure scenario, where cluster
31nodes have lost the ability to reliably communicate with each other but are
32still able to run resources. If the cluster just assumed that uncommunicative
33nodes were down, then multiple instances of a resource could be started on
34different nodes.
35
36The effect of split brain depends on the resource type. For example, an IP
37address brought up on two hosts on a network will cause packets to randomly be
38sent to one or the other host, rendering the IP useless. For a database or
39clustered file system, the effect could be much more severe, causing data
40corruption or divergence.
41
42Fencing is also used when a resource cannot otherwise be stopped. If a
43resource fails to stop on a node, it cannot be started on a different node
44without risking the same type of conflict as split-brain. Fencing the
45original node ensures the resource can be safely started elsewhere.
46
47Users may also configure the ``on-fail`` property of :ref:`operation` or the
48``loss-policy`` property of
49:ref:`ticket constraints <ticket-constraints>` to ``fence``, in which
50case the cluster will fence the resource's node if the operation fails or the
51ticket is lost.
52
53.. index::
54   single: fencing; device
55
56Fence Devices
57#############
58
59A *fence device* or *fencing device* is a special type of resource that
60provides the means to fence a node.
61
62Examples of fencing devices include intelligent power switches and IPMI devices
63that accept SNMP commands to cut power to a node, and iSCSI controllers that
64allow SCSI reservations to be used to cut a node's access to a shared disk.
65
66Since fencing devices will be used to recover from loss of networking
67connectivity to other nodes, it is essential that they do not rely on the same
68network as the cluster itself, otherwise that network becomes a single point of
69failure.
70
71Since loss of a node due to power outage is indistinguishable from loss of
72network connectivity to that node, it is also essential that at least one fence
73device for a node does not share power with that node. For example, an on-board
74IPMI controller that shares power with its host should not be used as the sole
75fencing device for that host.
76
77Since fencing is used to isolate malfunctioning nodes, no fence device should
78rely on its target functioning properly. This includes, for example, devices
79that ssh into a node and issue a shutdown command (such devices might be
80suitable for testing, but never for production).
81
82.. index::
83   single: fencing; agent
84
85Fence Agents
86############
87
88A *fence agent* or *fencing agent* is a ``stonith``-class resource agent.
89
90The fence agent standard provides commands (such as ``off`` and ``reboot``)
91that the cluster can use to fence nodes. As with other resource agent classes,
92this allows a layer of abstraction so that Pacemaker doesn't need any knowledge
93about specific fencing technologies -- that knowledge is isolated in the agent.
94
95Pacemaker supports two fence agent standards, both inherited from
96no-longer-active projects:
97
98* Red Hat Cluster Suite (RHCS) style: These are typically installed in
99  ``/usr/sbin`` with names starting with ``fence_``.
100
101* Linux-HA style: These typically have names starting with ``external/``.
102  Pacemaker can support these agents using the **fence_legacy** RHCS-style
103  agent as a wrapper, *if* support was enabled when Pacemaker was built, which
104  requires the ``cluster-glue`` library.
105
106When a Fence Device Can Be Used
107###############################
108
109Fencing devices do not actually "run" like most services. Typically, they just
110provide an interface for sending commands to an external device.
111
112Additionally, fencing may be initiated by Pacemaker, by other cluster-aware
113software such as DRBD or DLM, or manually by an administrator, at any point in
114the cluster life cycle, including before any resources have been started.
115
116To accommodate this, Pacemaker does not require the fence device resource to be
117"started" in order to be used. Whether a fence device is started or not
118determines whether a node runs any recurring monitor for the device, and gives
119the node a slight preference for being chosen to execute fencing using that
120device.
121
122By default, any node can execute any fencing device. If a fence device is
123disabled by setting its ``target-role`` to ``Stopped``, then no node can use
124that device. If a location constraint with a negative score prevents a specific
125node from "running" a fence device, then that node will never be chosen to
126execute fencing using the device. A node may fence itself, but the cluster will
127choose that only if no other nodes can do the fencing.
128
129A common configuration scenario is to have one fence device per target node.
130In such a case, users often configure anti-location constraints so that
131the target node does not monitor its own device.
132
133Limitations of Fencing Resources
134################################
135
136Fencing resources have certain limitations that other resource classes don't:
137
138* They may have only one set of meta-attributes and one set of instance
139  attributes.
140* If :ref:`rules` are used to determine fencing resource options, these
141  might be evaluated only when first read, meaning that later changes to the
142  rules will have no effect. Therefore, it is better to avoid confusion and not
143  use rules at all with fencing resources.
144
145These limitations could be revisited if there is sufficient user demand.
146
147.. index::
148   single: fencing; special instance attributes
149
150.. _fencing-attributes:
151
152Special Meta-Attributes for Fencing Resources
153#############################################
154
155The table below lists special resource meta-attributes that may be set for any
156fencing resource.
157
158.. table:: **Additional Properties of Fencing Resources**
159
160   +----------------------+---------+--------------------+----------------------------------------+
161   | Field                | Type    | Default            | Description                            |
162   +======================+=========+====================+========================================+
163   | provides             | string  |                    | .. index::                             |
164   |                      |         |                    |    single: provides                    |
165   |                      |         |                    |                                        |
166   |                      |         |                    | Any special capability provided by the |
167   |                      |         |                    | fence device. Currently, only one such |
168   |                      |         |                    | capability is meaningful:              |
169   |                      |         |                    | :ref:`unfencing <unfencing>`.          |
170   +----------------------+---------+--------------------+----------------------------------------+
171
172Special Instance Attributes for Fencing Resources
173#################################################
174
175The table below lists special instance attributes that may be set for any
176fencing resource (*not* meta-attributes, even though they are interpreted by
177Pacemaker rather than the fence agent). These are also listed in the man page
178for ``pacemaker-fenced``.
179
180.. Not_Yet_Implemented:
181
182   +----------------------+---------+--------------------+----------------------------------------+
183   | priority             | integer | 0                  | .. index::                             |
184   |                      |         |                    |    single: priority                    |
185   |                      |         |                    |                                        |
186   |                      |         |                    | The priority of the fence device.      |
187   |                      |         |                    | Devices are tried in order of highest  |
188   |                      |         |                    | priority to lowest.                    |
189   +----------------------+---------+--------------------+----------------------------------------+
190
191.. table:: **Additional Properties of Fencing Resources**
192
193   +----------------------+---------+--------------------+----------------------------------------+
194   | Field                | Type    | Default            | Description                            |
195   +======================+=========+====================+========================================+
196   | stonith-timeout      | time    |                    | .. index::                             |
197   |                      |         |                    |    single: stonith-timeout             |
198   |                      |         |                    |                                        |
199   |                      |         |                    | This is not used by Pacemaker (see the |
200   |                      |         |                    | ``pcmk_reboot_timeout``,               |
201   |                      |         |                    | ``pcmk_off_timeout``, etc. properties  |
202   |                      |         |                    | instead), but it may be used by        |
203   |                      |         |                    | Linux-HA fence agents.                 |
204   +----------------------+---------+--------------------+----------------------------------------+
205   | pcmk_host_map        | string  |                    | .. index::                             |
206   |                      |         |                    |    single: pcmk_host_map               |
207   |                      |         |                    |                                        |
208   |                      |         |                    | A mapping of host names to ports       |
209   |                      |         |                    | numbers for devices that do not        |
210   |                      |         |                    | support host names.                    |
211   |                      |         |                    |                                        |
212   |                      |         |                    | Example: ``node1:1;node2:2,3`` tells   |
213   |                      |         |                    | the cluster to use port 1 for          |
214   |                      |         |                    | ``node1`` and ports 2 and 3 for        |
215   |                      |         |                    | ``node2``. If ``pcmk_host_check`` is   |
216   |                      |         |                    | explicitly set to ``static-list``,     |
217   |                      |         |                    | either this or ``pcmk_host_list`` must |
218   |                      |         |                    | be set.                                |
219   +----------------------+---------+--------------------+----------------------------------------+
220   | pcmk_host_list       | string  |                    | .. index::                             |
221   |                      |         |                    |    single: pcmk_host_list              |
222   |                      |         |                    |                                        |
223   |                      |         |                    | A list of machines controlled by this  |
224   |                      |         |                    | device. If ``pcmk_host_check`` is      |
225   |                      |         |                    | explicitly set to ``static-list``,     |
226   |                      |         |                    | either this or ``pcmk_host_map`` must  |
227   |                      |         |                    | be set.                                |
228   +----------------------+---------+--------------------+----------------------------------------+
229   | pcmk_host_check      | string  | Value appropriate  | .. index::                             |
230   |                      |         | to other           |    single: pcmk_host_check             |
231   |                      |         | parameters (see    |                                        |
232   |                      |         | "Default Check     | The method Pacemaker should use to     |
233   |                      |         | Type" below)       | determine which nodes can be targeted  |
234   |                      |         |                    | by this device. Allowed values:        |
235   |                      |         |                    |                                        |
236   |                      |         |                    | * ``static-list:`` targets are listed  |
237   |                      |         |                    |   in the ``pcmk_host_list`` or         |
238   |                      |         |                    |   ``pcmk_host_map`` attribute          |
239   |                      |         |                    | * ``dynamic-list:`` query the device   |
240   |                      |         |                    |   via the agent's ``list`` action      |
241   |                      |         |                    | * ``status:`` query the device via the |
242   |                      |         |                    |   agent's ``status`` action            |
243   |                      |         |                    | * ``none:`` assume the device can      |
244   |                      |         |                    |   fence any node                       |
245   +----------------------+---------+--------------------+----------------------------------------+
246   | pcmk_delay_max       | time    | 0s                 | .. index::                             |
247   |                      |         |                    |    single: pcmk_delay_max              |
248   |                      |         |                    |                                        |
249   |                      |         |                    | Enable a delay of no more than the     |
250   |                      |         |                    | time specified before executing        |
251   |                      |         |                    | fencing actions. Pacemaker derives the |
252   |                      |         |                    | overall delay by taking the value of   |
253   |                      |         |                    | pcmk_delay_base and adding a random    |
254   |                      |         |                    | delay value such that the sum is kept  |
255   |                      |         |                    | below this maximum. This is sometimes  |
256   |                      |         |                    | used in two-node clusters to ensure    |
257   |                      |         |                    | that the nodes don't fence each other  |
258   |                      |         |                    | at the same time.                      |
259   +----------------------+---------+--------------------+----------------------------------------+
260   | pcmk_delay_base      | time    | 0s                 | .. index::                             |
261   |                      |         |                    |    single: pcmk_delay_base             |
262   |                      |         |                    |                                        |
263   |                      |         |                    | Enable a static delay before executing |
264   |                      |         |                    | fencing actions. This can be used, for |
265   |                      |         |                    | example, in two-node clusters to       |
266   |                      |         |                    | ensure that the nodes don't fence each |
267   |                      |         |                    | other, by having separate fencing      |
268   |                      |         |                    | resources with different values. The   |
269   |                      |         |                    | node that is fenced with the shorter   |
270   |                      |         |                    | delay will lose a fencing race. The    |
271   |                      |         |                    | overall delay introduced by pacemaker  |
272   |                      |         |                    | is derived from this value plus a      |
273   |                      |         |                    | random delay such that the sum is kept |
274   |                      |         |                    | below the maximum delay.               |
275   +----------------------+---------+--------------------+----------------------------------------+
276   | pcmk_action_limit    | integer | 1                  | .. index::                             |
277   |                      |         |                    |    single: pcmk_action_limit           |
278   |                      |         |                    |                                        |
279   |                      |         |                    | The maximum number of actions that can |
280   |                      |         |                    | be performed in parallel on this       |
281   |                      |         |                    | device. A value of -1 means unlimited. |
282   |                      |         |                    | Node fencing actions initiated by the  |
283   |                      |         |                    | cluster (as opposed to an administrator|
284   |                      |         |                    | running the ``stonith_admin`` tool or  |
285   |                      |         |                    | the fencer running recurring device    |
286   |                      |         |                    | monitors and ``status`` and ``list``   |
287   |                      |         |                    | commands) are additionally subject to  |
288   |                      |         |                    | the ``concurrent-fencing`` cluster     |
289   |                      |         |                    | property.                              |
290   +----------------------+---------+--------------------+----------------------------------------+
291   | pcmk_host_argument   | string  | ``port`` otherwise | .. index::                             |
292   |                      |         | ``plug`` if        |    single: pcmk_host_argument          |
293   |                      |         | supported          |                                        |
294   |                      |         | according to the   | *Advanced use only.* Which parameter   |
295   |                      |         | metadata of the    | should be supplied to the fence agent  |
296   |                      |         | fence agent        | to identify the node to be fenced.     |
297   |                      |         |                    | Some devices support neither the       |
298   |                      |         |                    | standard ``plug`` nor the deprecated   |
299   |                      |         |                    | ``port`` parameter, or may provide     |
300   |                      |         |                    | additional ones. Use this to specify   |
301   |                      |         |                    | an alternate, device-specific          |
302   |                      |         |                    | parameter. A value of ``none`` tells   |
303   |                      |         |                    | the cluster not to supply any          |
304   |                      |         |                    | additional parameters.                 |
305   +----------------------+---------+--------------------+----------------------------------------+
306   | pcmk_reboot_action   | string  | reboot             | .. index::                             |
307   |                      |         |                    |    single: pcmk_reboot_action          |
308   |                      |         |                    |                                        |
309   |                      |         |                    | *Advanced use only.* The command to    |
310   |                      |         |                    | send to the resource agent in order to |
311   |                      |         |                    | reboot a node. Some devices do not     |
312   |                      |         |                    | support the standard commands or may   |
313   |                      |         |                    | provide additional ones. Use this to   |
314   |                      |         |                    | specify an alternate, device-specific  |
315   |                      |         |                    | command.                               |
316   +----------------------+---------+--------------------+----------------------------------------+
317   | pcmk_reboot_timeout  | time    | 60s                | .. index::                             |
318   |                      |         |                    |    single: pcmk_reboot_timeout         |
319   |                      |         |                    |                                        |
320   |                      |         |                    | *Advanced use only.* Specify an        |
321   |                      |         |                    | alternate timeout to use for           |
322   |                      |         |                    | ``reboot`` actions instead of the      |
323   |                      |         |                    | value of ``stonith-timeout``. Some     |
324   |                      |         |                    | devices need much more or less time to |
325   |                      |         |                    | complete than normal. Use this to      |
326   |                      |         |                    | specify an alternate, device-specific  |
327   |                      |         |                    | timeout.                               |
328   +----------------------+---------+--------------------+----------------------------------------+
329   | pcmk_reboot_retries  | integer | 2                  | .. index::                             |
330   |                      |         |                    |    single: pcmk_reboot_retries         |
331   |                      |         |                    |                                        |
332   |                      |         |                    | *Advanced use only.* The maximum       |
333   |                      |         |                    | number of times to retry the           |
334   |                      |         |                    | ``reboot`` command within the timeout  |
335   |                      |         |                    | period. Some devices do not support    |
336   |                      |         |                    | multiple connections, and operations   |
337   |                      |         |                    | may fail if the device is busy with    |
338   |                      |         |                    | another task, so Pacemaker will        |
339   |                      |         |                    | automatically retry the operation, if  |
340   |                      |         |                    | there is time remaining. Use this      |
341   |                      |         |                    | option to alter the number of times    |
342   |                      |         |                    | Pacemaker retries before giving up.    |
343   +----------------------+---------+--------------------+----------------------------------------+
344   | pcmk_off_action      | string  | off                | .. index::                             |
345   |                      |         |                    |    single: pcmk_off_action             |
346   |                      |         |                    |                                        |
347   |                      |         |                    | *Advanced use only.* The command to    |
348   |                      |         |                    | send to the resource agent in order to |
349   |                      |         |                    | shut down a node. Some devices do not  |
350   |                      |         |                    | support the standard commands or may   |
351   |                      |         |                    | provide additional ones. Use this to   |
352   |                      |         |                    | specify an alternate, device-specific  |
353   |                      |         |                    | command.                               |
354   +----------------------+---------+--------------------+----------------------------------------+
355   | pcmk_off_timeout     | time    | 60s                | .. index::                             |
356   |                      |         |                    |    single: pcmk_off_timeout            |
357   |                      |         |                    |                                        |
358   |                      |         |                    | *Advanced use only.* Specify an        |
359   |                      |         |                    | alternate timeout to use for           |
360   |                      |         |                    | ``off`` actions instead of the         |
361   |                      |         |                    | value of ``stonith-timeout``. Some     |
362   |                      |         |                    | devices need much more or less time to |
363   |                      |         |                    | complete than normal. Use this to      |
364   |                      |         |                    | specify an alternate, device-specific  |
365   |                      |         |                    | timeout.                               |
366   +----------------------+---------+--------------------+----------------------------------------+
367   | pcmk_off_retries     | integer | 2                  | .. index::                             |
368   |                      |         |                    |    single: pcmk_off_retries            |
369   |                      |         |                    |                                        |
370   |                      |         |                    | *Advanced use only.* The maximum       |
371   |                      |         |                    | number of times to retry the           |
372   |                      |         |                    | ``off`` command within the timeout     |
373   |                      |         |                    | period. Some devices do not support    |
374   |                      |         |                    | multiple connections, and operations   |
375   |                      |         |                    | may fail if the device is busy with    |
376   |                      |         |                    | another task, so Pacemaker will        |
377   |                      |         |                    | automatically retry the operation, if  |
378   |                      |         |                    | there is time remaining. Use this      |
379   |                      |         |                    | option to alter the number of times    |
380   |                      |         |                    | Pacemaker retries before giving up.    |
381   +----------------------+---------+--------------------+----------------------------------------+
382   | pcmk_list_action     | string  | list               | .. index::                             |
383   |                      |         |                    |    single: pcmk_list_action            |
384   |                      |         |                    |                                        |
385   |                      |         |                    | *Advanced use only.* The command to    |
386   |                      |         |                    | send to the resource agent in order to |
387   |                      |         |                    | list nodes. Some devices do not        |
388   |                      |         |                    | support the standard commands or may   |
389   |                      |         |                    | provide additional ones. Use this to   |
390   |                      |         |                    | specify an alternate, device-specific  |
391   |                      |         |                    | command.                               |
392   +----------------------+---------+--------------------+----------------------------------------+
393   | pcmk_list_timeout    | time    | 60s                | .. index::                             |
394   |                      |         |                    |    single: pcmk_list_timeout           |
395   |                      |         |                    |                                        |
396   |                      |         |                    | *Advanced use only.* Specify an        |
397   |                      |         |                    | alternate timeout to use for           |
398   |                      |         |                    | ``list`` actions instead of the        |
399   |                      |         |                    | value of ``stonith-timeout``. Some     |
400   |                      |         |                    | devices need much more or less time to |
401   |                      |         |                    | complete than normal. Use this to      |
402   |                      |         |                    | specify an alternate, device-specific  |
403   |                      |         |                    | timeout.                               |
404   +----------------------+---------+--------------------+----------------------------------------+
405   | pcmk_list_retries    | integer | 2                  | .. index::                             |
406   |                      |         |                    |    single: pcmk_list_retries           |
407   |                      |         |                    |                                        |
408   |                      |         |                    | *Advanced use only.* The maximum       |
409   |                      |         |                    | number of times to retry the           |
410   |                      |         |                    | ``list`` command within the timeout    |
411   |                      |         |                    | period. Some devices do not support    |
412   |                      |         |                    | multiple connections, and operations   |
413   |                      |         |                    | may fail if the device is busy with    |
414   |                      |         |                    | another task, so Pacemaker will        |
415   |                      |         |                    | automatically retry the operation, if  |
416   |                      |         |                    | there is time remaining. Use this      |
417   |                      |         |                    | option to alter the number of times    |
418   |                      |         |                    | Pacemaker retries before giving up.    |
419   +----------------------+---------+--------------------+----------------------------------------+
420   | pcmk_monitor_action  | string  | monitor            | .. index::                             |
421   |                      |         |                    |    single: pcmk_monitor_action         |
422   |                      |         |                    |                                        |
423   |                      |         |                    | *Advanced use only.* The command to    |
424   |                      |         |                    | send to the resource agent in order to |
425   |                      |         |                    | report extended status. Some devices do|
426   |                      |         |                    | not support the standard commands or   |
427   |                      |         |                    | may provide additional ones. Use this  |
428   |                      |         |                    | to specify an alternate,               |
429   |                      |         |                    | device-specific command.               |
430   +----------------------+---------+--------------------+----------------------------------------+
431   | pcmk_monitor_timeout | time    | 60s                | .. index::                             |
432   |                      |         |                    |    single: pcmk_monitor_timeout        |
433   |                      |         |                    |                                        |
434   |                      |         |                    | *Advanced use only.* Specify an        |
435   |                      |         |                    | alternate timeout to use for           |
436   |                      |         |                    | ``monitor`` actions instead of the     |
437   |                      |         |                    | value of ``stonith-timeout``. Some     |
438   |                      |         |                    | devices need much more or less time to |
439   |                      |         |                    | complete than normal. Use this to      |
440   |                      |         |                    | specify an alternate, device-specific  |
441   |                      |         |                    | timeout.                               |
442   +----------------------+---------+--------------------+----------------------------------------+
443   | pcmk_monitor_retries | integer | 2                  | .. index::                             |
444   |                      |         |                    |    single: pcmk_monitor_retries        |
445   |                      |         |                    |                                        |
446   |                      |         |                    | *Advanced use only.* The maximum       |
447   |                      |         |                    | number of times to retry the           |
448   |                      |         |                    | ``monitor`` command within the timeout |
449   |                      |         |                    | period. Some devices do not support    |
450   |                      |         |                    | multiple connections, and operations   |
451   |                      |         |                    | may fail if the device is busy with    |
452   |                      |         |                    | another task, so Pacemaker will        |
453   |                      |         |                    | automatically retry the operation, if  |
454   |                      |         |                    | there is time remaining. Use this      |
455   |                      |         |                    | option to alter the number of times    |
456   |                      |         |                    | Pacemaker retries before giving up.    |
457   +----------------------+---------+--------------------+----------------------------------------+
458   | pcmk_status_action   | string  | status             | .. index::                             |
459   |                      |         |                    |    single: pcmk_status_action          |
460   |                      |         |                    |                                        |
461   |                      |         |                    | *Advanced use only.* The command to    |
462   |                      |         |                    | send to the resource agent in order to |
463   |                      |         |                    | report status. Some devices do         |
464   |                      |         |                    | not support the standard commands or   |
465   |                      |         |                    | may provide additional ones. Use this  |
466   |                      |         |                    | to specify an alternate,               |
467   |                      |         |                    | device-specific command.               |
468   +----------------------+---------+--------------------+----------------------------------------+
469   | pcmk_status_timeout  | time    | 60s                | .. index::                             |
470   |                      |         |                    |    single: pcmk_status_timeout         |
471   |                      |         |                    |                                        |
472   |                      |         |                    | *Advanced use only.* Specify an        |
473   |                      |         |                    | alternate timeout to use for           |
474   |                      |         |                    | ``status`` actions instead of the      |
475   |                      |         |                    | value of ``stonith-timeout``. Some     |
476   |                      |         |                    | devices need much more or less time to |
477   |                      |         |                    | complete than normal. Use this to      |
478   |                      |         |                    | specify an alternate, device-specific  |
479   |                      |         |                    | timeout.                               |
480   +----------------------+---------+--------------------+----------------------------------------+
481   | pcmk_status_retries  | integer | 2                  | .. index::                             |
482   |                      |         |                    |    single: pcmk_status_retries         |
483   |                      |         |                    |                                        |
484   |                      |         |                    | *Advanced use only.* The maximum       |
485   |                      |         |                    | number of times to retry the           |
486   |                      |         |                    | ``status`` command within the timeout  |
487   |                      |         |                    | period. Some devices do not support    |
488   |                      |         |                    | multiple connections, and operations   |
489   |                      |         |                    | may fail if the device is busy with    |
490   |                      |         |                    | another task, so Pacemaker will        |
491   |                      |         |                    | automatically retry the operation, if  |
492   |                      |         |                    | there is time remaining. Use this      |
493   |                      |         |                    | option to alter the number of times    |
494   |                      |         |                    | Pacemaker retries before giving up.    |
495   +----------------------+---------+--------------------+----------------------------------------+
496
497Default Check Type
498##################
499
500If the user does not explicitly configure ``pcmk_host_check`` for a fence
501device, a default value appropriate to other configured parameters will be
502used:
503
504* If either ``pcmk_host_list`` or ``pcmk_host_map`` is configured,
505  ``static-list`` will be used;
506* otherwise, if the fence device supports the ``list`` action, and the first
507  attempt at using ``list`` succeeds, ``dynamic-list`` will be used;
508* otherwise, if the fence device supports the ``status`` action, ``status``
509  will be used;
510* otherwise, ``none`` will be used.
511
512.. index::
513   single: unfencing
514   single: fencing; unfencing
515
516.. _unfencing:
517
518Unfencing
519#########
520
521With fabric fencing (such as cutting network or shared disk access rather than
522power), it is expected that the cluster will fence the node, and then a system
523administrator must manually investigate what went wrong, correct any issues
524found, then reboot (or restart the cluster services on) the node.
525
526Once the node reboots and rejoins the cluster, some fabric fencing devices
527require an explicit command to restore the node's access. This capability is
528called *unfencing* and is typically implemented as the fence agent's ``on``
529command.
530
531If any cluster resource has ``requires`` set to ``unfencing``, then that
532resource will not be probed or started on a node until that node has been
533unfenced.
534
535Fencing and Quorum
536##################
537
538In general, a cluster partition may execute fencing only if the partition has
539quorum, and the ``stonith-enabled`` cluster property is set to true. However,
540there are exceptions:
541
542* The requirements apply only to fencing initiated by Pacemaker. If an
543  administrator initiates fencing using the ``stonith_admin`` command, or an
544  external application such as DLM initiates fencing using Pacemaker's C API,
545  the requirements do not apply.
546
547* A cluster partition without quorum is allowed to fence any active member of
548  that partition. As a corollary, this allows a ``no-quorum-policy`` of
549  ``suicide`` to work.
550
551* If the ``no-quorum-policy`` cluster property is set to ``ignore``, then
552  quorum is not required to execute fencing of any node.
553
554Fencing Timeouts
555################
556
557Fencing timeouts are complicated, since a single fencing operation can involve
558many steps, each of which may have a separate timeout.
559
560Fencing may be initiated in one of several ways:
561
562* An administrator may initiate fencing using the ``stonith_admin`` tool,
563  which has a ``--timeout`` option (defaulting to 2 minutes) that will be used
564  as the fence operation timeout.
565
566* An external application such as DLM may initiate fencing using the Pacemaker
567  C API. The application will specify the fence operation timeout in this case,
568  which might or might not be configurable by the user.
569
570* The cluster may initiate fencing itself. In this case, the
571  ``stonith-timeout`` cluster property (defaulting to 1 minute) will be used as
572  the fence operation timeout.
573
574However fencing is initiated, the initiator contacts Pacemaker's fencer
575(``pacemaker-fenced``) to request fencing. This connection and request has its
576own timeout, separate from the fencing operation timeout, but usually happens
577very quickly.
578
579The fencer will contact all fencers in the cluster to ask what devices they
580have available to fence the target node. The fence operation timeout will be
581used as the timeout for each of these queries.
582
583Once a fencing device has been selected, the fencer will check whether any
584action-specific timeout has been configured for the device, to use instead of
585the fence operation timeout. For example, if ``stonith-timeout`` is 60 seconds,
586but the fencing device has ``pcmk_reboot_timeout`` configured as 90 seconds,
587then a timeout of 90 seconds will be used for reboot actions using that device.
588
589A device may have retries configured, in which case the timeout applies across
590all attempts. For example, if a device has ``pcmk_reboot_retries`` configured
591as 2, and the first reboot attempt fails, the second attempt will only have
592whatever time is remaining in the action timeout after subtracting how much
593time the first attempt used. This means that if the first attempt fails due to
594using the entire timeout, no further attempts will be made. There is currently
595no way to configure a per-attempt timeout.
596
597If more than one device is required to fence a target, whether due to failure
598of the first device or a fencing topology with multiple devices configured for
599the target, each device will have its own separate action timeout.
600
601For all of the above timeouts, the fencer will generally multiply the
602configured value by 1.2 to get an actual value to use, to account for time
603needed by the fencer's own processing.
604
605Separate from the fencer's timeouts, some fence agents have internal timeouts
606for individual steps of their fencing process. These agents often have
607parameters to configure these timeouts, such as ``login-timeout``,
608``shell-timeout``, or ``power-timeout``. Many such agents also have a
609``disable-timeout`` parameter to ignore their internal timeouts and just let
610Pacemaker handle the timeout. This causes a difference in retry behavior.
611If ``disable-timeout`` is not set, and the agent hits one of its internal
612timeouts, it will report that as a failure to Pacemaker, which can then retry.
613If ``disable-timeout`` is set, and Pacemaker hits a timeout for the agent, then
614there will be no time remaining, and no retry will be done.
615
616Fence Devices Dependent on Other Resources
617##########################################
618
619In some cases, a fence device may require some other cluster resource (such as
620an IP address) to be active in order to function properly.
621
622This is obviously undesirable in general: fencing may be required when the
623depended-on resource is not active, or fencing may be required because the node
624running the depended-on resource is no longer responding.
625
626However, this may be acceptable under certain conditions:
627
628* The dependent fence device should not be able to target any node that is
629  allowed to run the depended-on resource.
630
631* The depended-on resource should not be disabled during production operation.
632
633* The ``concurrent-fencing`` cluster property should be set to ``true``.
634  Otherwise, if both the node running the depended-on resource and some node
635  targeted by the dependent fence device need to be fenced, the fencing of the
636  node running the depended-on resource might be ordered first, making the
637  second fencing impossible and blocking further recovery. With concurrent
638  fencing, the dependent fence device might fail at first due to the
639  depended-on resource being unavailable, but it will be retried and eventually
640  succeed once the resource is brought back up.
641
642Even under those conditions, there is one unlikely problem scenario. The DC
643always schedules fencing of itself after any other fencing needed, to avoid
644unnecessary repeated DC elections. If the dependent fence device targets the
645DC, and both the DC and a different node running the depended-on resource need
646to be fenced, the DC fencing will always fail and block further recovery. Note,
647however, that losing a DC node entirely causes some other node to become DC and
648schedule the fencing, so this is only a risk when a stop or other operation
649with ``on-fail`` set to ``fencing`` fails on the DC.
650
651.. index::
652   single: fencing; configuration
653
654Configuring Fencing
655###################
656
657Higher-level tools can provide simpler interfaces to this process, but using
658Pacemaker command-line tools, this is how you could configure a fence device.
659
660#. Find the correct driver:
661
662   .. code-block:: none
663
664      # stonith_admin --list-installed
665
666   .. note::
667
668      You may have to install packages to make fence agents available on your
669      host. Searching your available packages for ``fence-`` is usually
670      helpful. Ensure the packages providing the fence agents you require are
671      installed on every cluster node.
672
673#. Find the required parameters associated with the device
674   (replacing ``$AGENT_NAME`` with the name obtained from the previous step):
675
676   .. code-block:: none
677
678      # stonith_admin --metadata --agent $AGENT_NAME
679
680#. Create a file called ``stonith.xml`` containing a primitive resource
681   with a class of ``stonith``, a type equal to the agent name obtained earlier,
682   and a parameter for each of the values returned in the previous step.
683
684#. If the device does not know how to fence nodes based on their uname,
685   you may also need to set the special ``pcmk_host_map`` parameter.  See
686   :ref:`fencing-attributes` for details.
687
688#. If the device does not support the ``list`` command, you may also need
689   to set the special ``pcmk_host_list`` and/or ``pcmk_host_check``
690   parameters.  See :ref:`fencing-attributes` for details.
691
692#. If the device does not expect the victim to be specified with the
693   ``port`` parameter, you may also need to set the special
694   ``pcmk_host_argument`` parameter. See :ref:`fencing-attributes` for details.
695
696#. Upload it into the CIB using cibadmin:
697
698   .. code-block:: none
699
700      # cibadmin --create --scope resources --xml-file stonith.xml
701
702#. Set ``stonith-enabled`` to true:
703
704   .. code-block:: none
705
706      # crm_attribute --type crm_config --name stonith-enabled --update true
707
708#. Once the stonith resource is running, you can test it by executing the
709   following, replacing ``$NODE_NAME`` with the name of the node to fence
710   (although you might want to stop the cluster on that machine first):
711
712   .. code-block:: none
713
714      # stonith_admin --reboot $NODE_NAME
715
716
717Example Fencing Configuration
718_____________________________
719
720For this example, we assume we have a cluster node, ``pcmk-1``, whose IPMI
721controller is reachable at the IP address 192.0.2.1. The IPMI controller uses
722the username ``testuser`` and the password ``abc123``.
723
724#. Looking at what's installed, we may see a variety of available agents:
725
726   .. code-block:: none
727
728      # stonith_admin --list-installed
729
730   .. code-block:: none
731
732      (... some output omitted ...)
733      fence_idrac
734      fence_ilo3
735      fence_ilo4
736      fence_ilo5
737      fence_imm
738      fence_ipmilan
739      (... some output omitted ...)
740
741   Perhaps after some reading some man pages and doing some Internet searches,
742   we might decide ``fence_ipmilan`` is our best choice.
743
744#. Next, we would check what parameters ``fence_ipmilan`` provides:
745
746   .. code-block:: none
747
748      # stonith_admin --metadata -a fence_ipmilan
749
750   .. code-block:: xml
751
752      <resource-agent name="fence_ipmilan" shortdesc="Fence agent for IPMI">
753        <symlink name="fence_ilo3" shortdesc="Fence agent for HP iLO3"/>
754        <symlink name="fence_ilo4" shortdesc="Fence agent for HP iLO4"/>
755        <symlink name="fence_ilo5" shortdesc="Fence agent for HP iLO5"/>
756        <symlink name="fence_imm" shortdesc="Fence agent for IBM Integrated Management Module"/>
757        <symlink name="fence_idrac" shortdesc="Fence agent for Dell iDRAC"/>
758        <longdesc>fence_ipmilan is an I/O Fencing agentwhich can be used with machines controlled by IPMI.This agent calls support software ipmitool (http://ipmitool.sf.net/). WARNING! This fence agent might report success before the node is powered off. You should use -m/method onoff if your fence device works correctly with that option.</longdesc>
759        <vendor-url/>
760        <parameters>
761          <parameter name="action" unique="0" required="0">
762            <getopt mixed="-o, --action=[action]"/>
763            <content type="string" default="reboot"/>
764            <shortdesc lang="en">Fencing action</shortdesc>
765          </parameter>
766          <parameter name="auth" unique="0" required="0">
767            <getopt mixed="-A, --auth=[auth]"/>
768            <content type="select">
769              <option value="md5"/>
770              <option value="password"/>
771              <option value="none"/>
772            </content>
773            <shortdesc lang="en">IPMI Lan Auth type.</shortdesc>
774          </parameter>
775          <parameter name="cipher" unique="0" required="0">
776            <getopt mixed="-C, --cipher=[cipher]"/>
777            <content type="string"/>
778            <shortdesc lang="en">Ciphersuite to use (same as ipmitool -C parameter)</shortdesc>
779          </parameter>
780          <parameter name="hexadecimal_kg" unique="0" required="0">
781            <getopt mixed="--hexadecimal-kg=[key]"/>
782            <content type="string"/>
783            <shortdesc lang="en">Hexadecimal-encoded Kg key for IPMIv2 authentication</shortdesc>
784          </parameter>
785          <parameter name="ip" unique="0" required="0" obsoletes="ipaddr">
786            <getopt mixed="-a, --ip=[ip]"/>
787            <content type="string"/>
788            <shortdesc lang="en">IP address or hostname of fencing device</shortdesc>
789          </parameter>
790          <parameter name="ipaddr" unique="0" required="0" deprecated="1">
791            <getopt mixed="-a, --ip=[ip]"/>
792            <content type="string"/>
793            <shortdesc lang="en">IP address or hostname of fencing device</shortdesc>
794          </parameter>
795          <parameter name="ipport" unique="0" required="0">
796            <getopt mixed="-u, --ipport=[port]"/>
797            <content type="integer" default="623"/>
798            <shortdesc lang="en">TCP/UDP port to use for connection with device</shortdesc>
799          </parameter>
800          <parameter name="lanplus" unique="0" required="0">
801            <getopt mixed="-P, --lanplus"/>
802            <content type="boolean" default="0"/>
803            <shortdesc lang="en">Use Lanplus to improve security of connection</shortdesc>
804          </parameter>
805          <parameter name="login" unique="0" required="0" deprecated="1">
806            <getopt mixed="-l, --username=[name]"/>
807            <content type="string"/>
808            <shortdesc lang="en">Login name</shortdesc>
809          </parameter>
810          <parameter name="method" unique="0" required="0">
811            <getopt mixed="-m, --method=[method]"/>
812            <content type="select" default="onoff">
813              <option value="onoff"/>
814              <option value="cycle"/>
815            </content>
816            <shortdesc lang="en">Method to fence</shortdesc>
817          </parameter>
818          <parameter name="passwd" unique="0" required="0" deprecated="1">
819            <getopt mixed="-p, --password=[password]"/>
820            <content type="string"/>
821            <shortdesc lang="en">Login password or passphrase</shortdesc>
822          </parameter>
823          <parameter name="passwd_script" unique="0" required="0" deprecated="1">
824            <getopt mixed="-S, --password-script=[script]"/>
825            <content type="string"/>
826            <shortdesc lang="en">Script to run to retrieve password</shortdesc>
827          </parameter>
828          <parameter name="password" unique="0" required="0" obsoletes="passwd">
829            <getopt mixed="-p, --password=[password]"/>
830            <content type="string"/>
831            <shortdesc lang="en">Login password or passphrase</shortdesc>
832          </parameter>
833          <parameter name="password_script" unique="0" required="0" obsoletes="passwd_script">
834            <getopt mixed="-S, --password-script=[script]"/>
835            <content type="string"/>
836            <shortdesc lang="en">Script to run to retrieve password</shortdesc>
837          </parameter>
838          <parameter name="plug" unique="0" required="0" obsoletes="port">
839            <getopt mixed="-n, --plug=[ip]"/>
840            <content type="string"/>
841            <shortdesc lang="en">IP address or hostname of fencing device (together with --port-as-ip)</shortdesc>
842          </parameter>
843          <parameter name="port" unique="0" required="0" deprecated="1">
844            <getopt mixed="-n, --plug=[ip]"/>
845            <content type="string"/>
846            <shortdesc lang="en">IP address or hostname of fencing device (together with --port-as-ip)</shortdesc>
847          </parameter>
848          <parameter name="privlvl" unique="0" required="0">
849            <getopt mixed="-L, --privlvl=[level]"/>
850            <content type="select" default="administrator">
851              <option value="callback"/>
852              <option value="user"/>
853              <option value="operator"/>
854              <option value="administrator"/>
855            </content>
856            <shortdesc lang="en">Privilege level on IPMI device</shortdesc>
857          </parameter>
858          <parameter name="target" unique="0" required="0">
859            <getopt mixed="--target=[targetaddress]"/>
860            <content type="string"/>
861            <shortdesc lang="en">Bridge IPMI requests to the remote target address</shortdesc>
862          </parameter>
863          <parameter name="username" unique="0" required="0" obsoletes="login">
864            <getopt mixed="-l, --username=[name]"/>
865            <content type="string"/>
866            <shortdesc lang="en">Login name</shortdesc>
867          </parameter>
868          <parameter name="quiet" unique="0" required="0">
869            <getopt mixed="-q, --quiet"/>
870            <content type="boolean"/>
871            <shortdesc lang="en">Disable logging to stderr. Does not affect --verbose or --debug-file or logging to syslog.</shortdesc>
872          </parameter>
873          <parameter name="verbose" unique="0" required="0">
874            <getopt mixed="-v, --verbose"/>
875            <content type="boolean"/>
876            <shortdesc lang="en">Verbose mode</shortdesc>
877          </parameter>
878          <parameter name="debug" unique="0" required="0" deprecated="1">
879            <getopt mixed="-D, --debug-file=[debugfile]"/>
880            <content type="string"/>
881            <shortdesc lang="en">Write debug information to given file</shortdesc>
882          </parameter>
883          <parameter name="debug_file" unique="0" required="0" obsoletes="debug">
884            <getopt mixed="-D, --debug-file=[debugfile]"/>
885            <content type="string"/>
886            <shortdesc lang="en">Write debug information to given file</shortdesc>
887          </parameter>
888          <parameter name="version" unique="0" required="0">
889            <getopt mixed="-V, --version"/>
890            <content type="boolean"/>
891            <shortdesc lang="en">Display version information and exit</shortdesc>
892          </parameter>
893          <parameter name="help" unique="0" required="0">
894            <getopt mixed="-h, --help"/>
895            <content type="boolean"/>
896            <shortdesc lang="en">Display help and exit</shortdesc>
897          </parameter>
898          <parameter name="delay" unique="0" required="0">
899            <getopt mixed="--delay=[seconds]"/>
900            <content type="second" default="0"/>
901            <shortdesc lang="en">Wait X seconds before fencing is started</shortdesc>
902          </parameter>
903          <parameter name="ipmitool_path" unique="0" required="0">
904            <getopt mixed="--ipmitool-path=[path]"/>
905            <content type="string" default="/usr/bin/ipmitool"/>
906            <shortdesc lang="en">Path to ipmitool binary</shortdesc>
907          </parameter>
908          <parameter name="login_timeout" unique="0" required="0">
909            <getopt mixed="--login-timeout=[seconds]"/>
910            <content type="second" default="5"/>
911            <shortdesc lang="en">Wait X seconds for cmd prompt after login</shortdesc>
912          </parameter>
913          <parameter name="port_as_ip" unique="0" required="0">
914            <getopt mixed="--port-as-ip"/>
915            <content type="boolean"/>
916            <shortdesc lang="en">Make "port/plug" to be an alias to IP address</shortdesc>
917          </parameter>
918          <parameter name="power_timeout" unique="0" required="0">
919            <getopt mixed="--power-timeout=[seconds]"/>
920            <content type="second" default="20"/>
921            <shortdesc lang="en">Test X seconds for status change after ON/OFF</shortdesc>
922          </parameter>
923          <parameter name="power_wait" unique="0" required="0">
924            <getopt mixed="--power-wait=[seconds]"/>
925            <content type="second" default="2"/>
926            <shortdesc lang="en">Wait X seconds after issuing ON/OFF</shortdesc>
927          </parameter>
928          <parameter name="shell_timeout" unique="0" required="0">
929            <getopt mixed="--shell-timeout=[seconds]"/>
930            <content type="second" default="3"/>
931            <shortdesc lang="en">Wait X seconds for cmd prompt after issuing command</shortdesc>
932          </parameter>
933          <parameter name="retry_on" unique="0" required="0">
934            <getopt mixed="--retry-on=[attempts]"/>
935            <content type="integer" default="1"/>
936            <shortdesc lang="en">Count of attempts to retry power on</shortdesc>
937          </parameter>
938          <parameter name="sudo" unique="0" required="0" deprecated="1">
939            <getopt mixed="--use-sudo"/>
940            <content type="boolean"/>
941            <shortdesc lang="en">Use sudo (without password) when calling 3rd party software</shortdesc>
942          </parameter>
943          <parameter name="use_sudo" unique="0" required="0" obsoletes="sudo">
944            <getopt mixed="--use-sudo"/>
945            <content type="boolean"/>
946            <shortdesc lang="en">Use sudo (without password) when calling 3rd party software</shortdesc>
947          </parameter>
948          <parameter name="sudo_path" unique="0" required="0">
949            <getopt mixed="--sudo-path=[path]"/>
950            <content type="string" default="/usr/bin/sudo"/>
951            <shortdesc lang="en">Path to sudo binary</shortdesc>
952          </parameter>
953        </parameters>
954        <actions>
955          <action name="on" automatic="0"/>
956          <action name="off"/>
957          <action name="reboot"/>
958          <action name="status"/>
959          <action name="monitor"/>
960          <action name="metadata"/>
961          <action name="manpage"/>
962          <action name="validate-all"/>
963          <action name="diag"/>
964          <action name="stop" timeout="20s"/>
965          <action name="start" timeout="20s"/>
966        </actions>
967      </resource-agent>
968
969   Once we've decided what parameter values we think we need, it is a good idea
970   to run the fence agent's status action manually, to verify that our values
971   work correctly:
972
973   .. code-block:: none
974
975      # fence_ipmilan --lanplus -a 192.0.2.1 -l testuser -p abc123 -o status
976
977      Chassis Power is on
978
979#. Based on that, we might create a fencing resource configuration like this in
980   ``stonith.xml`` (or any file name, just use the same name with ``cibadmin``
981   later):
982
983   .. code-block:: xml
984
985      <primitive id="Fencing-pcmk-1" class="stonith" type="fence_ipmilan" >
986        <instance_attributes id="Fencing-params" >
987          <nvpair id="Fencing-lanplus" name="lanplus" value="1" />
988          <nvpair id="Fencing-ip" name="ip" value="192.0.2.1" />
989          <nvpair id="Fencing-password" name="password" value="testuser" />
990          <nvpair id="Fencing-username" name="username" value="abc123" />
991        </instance_attributes>
992        <operations >
993          <op id="Fencing-monitor-10m" interval="10m" name="monitor" timeout="300s" />
994        </operations>
995      </primitive>
996
997   .. note::
998
999      Even though the man page shows that the ``action`` parameter is
1000      supported, we do not provide that in the resource configuration.
1001      Pacemaker will supply an appropriate action whenever the fence device
1002      must be used.
1003
1004#. In this case, we don't need to configure ``pcmk_host_map`` because
1005   ``fence_ipmilan`` ignores the target node name and instead uses its
1006   ``ip`` parameter to know how to contact the IPMI controller.
1007
1008#. We do need to let Pacemaker know which cluster node can be fenced by this
1009   device, since ``fence_ipmilan`` doesn't support the ``list`` action. Add
1010   a line like this to the agent's instance attributes:
1011
1012   .. code-block:: xml
1013
1014          <nvpair id="Fencing-pcmk_host_list" name="pcmk_host_list" value="pcmk-1" />
1015
1016#. We don't need to configure ``pcmk_host_argument`` since ``ip`` is all the
1017   fence agent needs (it ignores the target name).
1018
1019#. Make the configuration active:
1020
1021   .. code-block:: none
1022
1023      # cibadmin --create --scope resources --xml-file stonith.xml
1024
1025#. Set ``stonith-enabled`` to true (this only has to be done once):
1026
1027   .. code-block:: none
1028
1029      # crm_attribute --type crm_config --name stonith-enabled --update true
1030
1031#. Since our cluster is still in testing, we can reboot ``pcmk-1`` without
1032   bothering anyone, so we'll test our fencing configuration by running this
1033   from one of the other cluster nodes:
1034
1035   .. code-block:: none
1036
1037      # stonith_admin --reboot pcmk-1
1038
1039   Then we will verify that the node did, in fact, reboot.
1040
1041We can repeat that process to create a separate fencing resource for each node.
1042
1043With some other fence device types, a single fencing resource is able to be
1044used for all nodes. In fact, we could do that with ``fence_ipmilan``, using the
1045``port-as-ip`` parameter along with ``pcmk_host_map``. Either approach is
1046fine.
1047
1048.. index::
1049   single: fencing; topology
1050   single: fencing-topology
1051   single: fencing-level
1052
1053Fencing Topologies
1054##################
1055
1056Pacemaker supports fencing nodes with multiple devices through a feature called
1057*fencing topologies*. Fencing topologies may be used to provide alternative
1058devices in case one fails, or to require multiple devices to all be executed
1059successfully in order to consider the node successfully fenced, or even a
1060combination of the two.
1061
1062Create the individual devices as you normally would, then define one or more
1063``fencing-level`` entries in the ``fencing-topology`` section of the
1064configuration.
1065
1066* Each fencing level is attempted in order of ascending ``index``. Allowed
1067  values are 1 through 9.
1068* If a device fails, processing terminates for the current level. No further
1069  devices in that level are exercised, and the next level is attempted instead.
1070* If the operation succeeds for all the listed devices in a level, the level is
1071  deemed to have passed.
1072* The operation is finished when a level has passed (success), or all levels
1073  have been attempted (failed).
1074* If the operation failed, the next step is determined by the scheduler and/or
1075  the controller.
1076
1077Some possible uses of topologies include:
1078
1079* Try on-board IPMI, then an intelligent power switch if that fails
1080* Try fabric fencing of both disk and network, then fall back to power fencing
1081  if either fails
1082* Wait up to a certain time for a kernel dump to complete, then cut power to
1083  the node
1084
1085.. table:: **Attributes of a fencing-level Element**
1086
1087   +------------------+-----------------------------------------------------------------------------------------+
1088   | Attribute        | Description                                                                             |
1089   +==================+=========================================================================================+
1090   | id               | .. index::                                                                              |
1091   |                  |    pair: fencing-level; id                                                              |
1092   |                  |                                                                                         |
1093   |                  | A unique name for this element (required)                                               |
1094   +------------------+-----------------------------------------------------------------------------------------+
1095   | target           | .. index::                                                                              |
1096   |                  |    pair: fencing-level; target                                                          |
1097   |                  |                                                                                         |
1098   |                  | The name of a single node to which this level applies                                   |
1099   +------------------+-----------------------------------------------------------------------------------------+
1100   | target-pattern   | .. index::                                                                              |
1101   |                  |    pair: fencing-level; target-pattern                                                  |
1102   |                  |                                                                                         |
1103   |                  | An extended regular expression (as defined in `POSIX                                    |
1104   |                  | <https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04>`_) |
1105   |                  | matching the names of nodes to which this level applies                                 |
1106   +------------------+-----------------------------------------------------------------------------------------+
1107   | target-attribute | .. index::                                                                              |
1108   |                  |    pair: fencing-level; target-attribute                                                |
1109   |                  |                                                                                         |
1110   |                  | The name of a node attribute that is set (to ``target-value``) for nodes to which this  |
1111   |                  | level applies                                                                           |
1112   +------------------+-----------------------------------------------------------------------------------------+
1113   | target-value     | .. index::                                                                              |
1114   |                  |    pair: fencing-level; target-value                                                    |
1115   |                  |                                                                                         |
1116   |                  | The node attribute value (of ``target-attribute``) that is set for nodes to which this  |
1117   |                  | level applies                                                                           |
1118   +------------------+-----------------------------------------------------------------------------------------+
1119   | index            | .. index::                                                                              |
1120   |                  |    pair: fencing-level; index                                                           |
1121   |                  |                                                                                         |
1122   |                  | The order in which to attempt the levels. Levels are attempted in ascending order       |
1123   |                  | *until one succeeds*. Valid values are 1 through 9.                                     |
1124   +------------------+-----------------------------------------------------------------------------------------+
1125   | devices          | .. index::                                                                              |
1126   |                  |    pair: fencing-level; devices                                                         |
1127   |                  |                                                                                         |
1128   |                  | A comma-separated list of devices that must all be tried for this level                 |
1129   +------------------+-----------------------------------------------------------------------------------------+
1130
1131.. note:: **Fencing topology with different devices for different nodes**
1132
1133   .. code-block:: xml
1134
1135      <cib crm_feature_set="3.6.0" validate-with="pacemaker-3.5" admin_epoch="1" epoch="0" num_updates="0">
1136        <configuration>
1137          ...
1138          <fencing-topology>
1139            <!-- For pcmk-1, try poison-pill and fail back to power -->
1140            <fencing-level id="f-p1.1" target="pcmk-1" index="1" devices="poison-pill"/>
1141            <fencing-level id="f-p1.2" target="pcmk-1" index="2" devices="power"/>
1142
1143            <!-- For pcmk-2, try disk and network, and fail back to power -->
1144            <fencing-level id="f-p2.1" target="pcmk-2" index="1" devices="disk,network"/>
1145            <fencing-level id="f-p2.2" target="pcmk-2" index="2" devices="power"/>
1146          </fencing-topology>
1147          ...
1148        <configuration>
1149        <status/>
1150      </cib>
1151
1152Example Dual-Layer, Dual-Device Fencing Topologies
1153__________________________________________________
1154
1155The following example illustrates an advanced use of ``fencing-topology`` in a
1156cluster with the following properties:
1157
1158* 2 nodes (prod-mysql1 and prod-mysql2)
1159* the nodes have IPMI controllers reachable at 192.0.2.1 and 192.0.2.2
1160* the nodes each have two independent Power Supply Units (PSUs) connected to
1161  two independent Power Distribution Units (PDUs) reachable at 198.51.100.1
1162  (port 10 and port 11) and 203.0.113.1 (port 10 and port 11)
1163* fencing via the IPMI controller uses the ``fence_ipmilan`` agent (1 fence device
1164  per controller, with each device targeting a separate node)
1165* fencing via the PDUs uses the ``fence_apc_snmp`` agent (1 fence device per
1166  PDU, with both devices targeting both nodes)
1167* a random delay is used to lessen the chance of a "death match"
1168* fencing topology is set to try IPMI fencing first then dual PDU fencing if
1169  that fails
1170
1171In a node failure scenario, Pacemaker will first select ``fence_ipmilan`` to
1172try to kill the faulty node. Using the fencing topology, if that method fails,
1173it will then move on to selecting ``fence_apc_snmp`` twice (once for the first
1174PDU, then again for the second PDU).
1175
1176The fence action is considered successful only if both PDUs report the required
1177status. If any of them fails, fencing loops back to the first fencing method,
1178``fence_ipmilan``, and so on, until the node is fenced or the fencing action is
1179cancelled.
1180
1181.. note:: **First fencing method: single IPMI device per target**
1182
1183   Each cluster node has it own dedicated IPMI controller that can be contacted
1184   for fencing using the following primitives:
1185
1186   .. code-block:: xml
1187
1188      <primitive class="stonith" id="fence_prod-mysql1_ipmi" type="fence_ipmilan">
1189        <instance_attributes id="fence_prod-mysql1_ipmi-instance_attributes">
1190          <nvpair id="fence_prod-mysql1_ipmi-instance_attributes-ipaddr" name="ipaddr" value="192.0.2.1"/>
1191          <nvpair id="fence_prod-mysql1_ipmi-instance_attributes-login" name="login" value="fencing"/>
1192          <nvpair id="fence_prod-mysql1_ipmi-instance_attributes-passwd" name="passwd" value="finishme"/>
1193          <nvpair id="fence_prod-mysql1_ipmi-instance_attributes-lanplus" name="lanplus" value="true"/>
1194          <nvpair id="fence_prod-mysql1_ipmi-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="prod-mysql1"/>
1195          <nvpair id="fence_prod-mysql1_ipmi-instance_attributes-pcmk_delay_max" name="pcmk_delay_max" value="8s"/>
1196        </instance_attributes>
1197      </primitive>
1198      <primitive class="stonith" id="fence_prod-mysql2_ipmi" type="fence_ipmilan">
1199        <instance_attributes id="fence_prod-mysql2_ipmi-instance_attributes">
1200          <nvpair id="fence_prod-mysql2_ipmi-instance_attributes-ipaddr" name="ipaddr" value="192.0.2.2"/>
1201          <nvpair id="fence_prod-mysql2_ipmi-instance_attributes-login" name="login" value="fencing"/>
1202          <nvpair id="fence_prod-mysql2_ipmi-instance_attributes-passwd" name="passwd" value="finishme"/>
1203          <nvpair id="fence_prod-mysql2_ipmi-instance_attributes-lanplus" name="lanplus" value="true"/>
1204          <nvpair id="fence_prod-mysql2_ipmi-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="prod-mysql2"/>
1205          <nvpair id="fence_prod-mysql2_ipmi-instance_attributes-pcmk_delay_max" name="pcmk_delay_max" value="8s"/>
1206        </instance_attributes>
1207      </primitive>
1208
1209.. note:: **Second fencing method: dual PDU devices**
1210
1211   Each cluster node also has 2 distinct power supplies controlled by 2
1212   distinct PDUs:
1213
1214   * Node 1: PDU 1 port 10 and PDU 2 port 10
1215   * Node 2: PDU 1 port 11 and PDU 2 port 11
1216
1217   The matching fencing agents are configured as follows:
1218
1219   .. code-block:: xml
1220
1221      <primitive class="stonith" id="fence_apc1" type="fence_apc_snmp">
1222        <instance_attributes id="fence_apc1-instance_attributes">
1223          <nvpair id="fence_apc1-instance_attributes-ipaddr" name="ipaddr" value="198.51.100.1"/>
1224          <nvpair id="fence_apc1-instance_attributes-login" name="login" value="fencing"/>
1225          <nvpair id="fence_apc1-instance_attributes-passwd" name="passwd" value="fencing"/>
1226          <nvpair id="fence_apc1-instance_attributes-pcmk_host_list"
1227             name="pcmk_host_map" value="prod-mysql1:10;prod-mysql2:11"/>
1228          <nvpair id="fence_apc1-instance_attributes-pcmk_delay_max" name="pcmk_delay_max" value="8s"/>
1229        </instance_attributes>
1230      </primitive>
1231      <primitive class="stonith" id="fence_apc2" type="fence_apc_snmp">
1232        <instance_attributes id="fence_apc2-instance_attributes">
1233          <nvpair id="fence_apc2-instance_attributes-ipaddr" name="ipaddr" value="203.0.113.1"/>
1234          <nvpair id="fence_apc2-instance_attributes-login" name="login" value="fencing"/>
1235          <nvpair id="fence_apc2-instance_attributes-passwd" name="passwd" value="fencing"/>
1236          <nvpair id="fence_apc2-instance_attributes-pcmk_host_list"
1237             name="pcmk_host_map" value="prod-mysql1:10;prod-mysql2:11"/>
1238          <nvpair id="fence_apc2-instance_attributes-pcmk_delay_max" name="pcmk_delay_max" value="8s"/>
1239        </instance_attributes>
1240      </primitive>
1241
1242.. note:: **Fencing topology**
1243
1244   Now that all the fencing resources are defined, it's time to create the
1245   right topology. We want to first fence using IPMI and if that does not work,
1246   fence both PDUs to effectively and surely kill the node.
1247
1248   .. code-block:: xml
1249
1250      <fencing-topology>
1251        <fencing-level id="level-1-1" target="prod-mysql1" index="1" devices="fence_prod-mysql1_ipmi" />
1252        <fencing-level id="level-1-2" target="prod-mysql1" index="2" devices="fence_apc1,fence_apc2"  />
1253        <fencing-level id="level-2-1" target="prod-mysql2" index="1" devices="fence_prod-mysql2_ipmi" />
1254        <fencing-level id="level-2-2" target="prod-mysql2" index="2" devices="fence_apc1,fence_apc2"  />
1255      </fencing-topology>
1256
1257   In ``fencing-topology``, the lowest ``index`` value for a target determines
1258   its first fencing method.
1259
1260Remapping Reboots
1261#################
1262
1263When the cluster needs to reboot a node, whether because ``stonith-action`` is
1264``reboot`` or because a reboot was requested externally (such as by
1265``stonith_admin --reboot``), it will remap that to other commands in two cases:
1266
1267* If the chosen fencing device does not support the ``reboot`` command, the
1268  cluster will ask it to perform ``off`` instead.
1269
1270* If a fencing topology level with multiple devices must be executed, the
1271  cluster will ask all the devices to perform ``off``, then ask the devices to
1272  perform ``on``.
1273
1274To understand the second case, consider the example of a node with redundant
1275power supplies connected to intelligent power switches. Rebooting one switch
1276and then the other would have no effect on the node. Turning both switches off,
1277and then on, actually reboots the node.
1278
1279In such a case, the fencing operation will be treated as successful as long as
1280the ``off`` commands succeed, because then it is safe for the cluster to
1281recover any resources that were on the node. Timeouts and errors in the ``on``
1282phase will be logged but ignored.
1283
1284When a reboot operation is remapped, any action-specific timeout for the
1285remapped action will be used (for example, ``pcmk_off_timeout`` will be used
1286when executing the ``off`` command, not ``pcmk_reboot_timeout``).
1287