1<refentry id="repmgr-node-check">
2  <indexterm>
3    <primary>repmgr node check</primary>
4  </indexterm>
5
6  <refmeta>
7    <refentrytitle>repmgr node check</refentrytitle>
8  </refmeta>
9
10  <refnamediv>
11    <refname>repmgr node check</refname>
12    <refpurpose>performs some health checks on a node from a replication perspective</refpurpose>
13  </refnamediv>
14
15  <refsect1>
16    <title>Description</title>
17    <para>
18      Performs some health checks on a node from a replication perspective.
19      This command must be run on the local node.
20    </para>
21	<note>
22	  <para>
23		Currently &repmgr; performs health checks on physical replication
24		slots only, with the aim of warning about streaming replication standbys which
25		have become detached and the associated risk of uncontrolled WAL file
26		growth.
27	  </para>
28	</note>
29  </refsect1>
30
31  <refsect1>
32    <title>Example</title>
33    <para>
34      Execution on the primary server:
35      <programlisting>
36       $ repmgr -f /etc/repmgr.conf node check
37       Node "node1":
38            Server role: OK (node is primary)
39            Replication lag: OK (N/A - node is primary)
40            WAL archiving: OK (0 pending files)
41            Upstream connection: OK (N/A - is primary)
42            Downstream servers: OK (2 of 2 downstream nodes attached)
43            Replication slots: OK (node has no physical replication slots)
44            Missing replication slots: OK (node has no missing physical replication slots)
45            Configured data directory: OK (configured "data_directory" is "/var/lib/postgresql/data")</programlisting>
46    </para>
47    <para>
48      Execution on a standby server:
49      <programlisting>
50       $ repmgr -f /etc/repmgr.conf node check
51       Node "node2":
52            Server role: OK (node is standby)
53            Replication lag: OK (0 seconds)
54            WAL archiving: OK (0 pending archive ready files)
55            Upstream connection: OK (node "node2" (ID: 2) is attached to expected upstream node "node1" (ID: 1))
56            Downstream servers: OK (this node has no downstream nodes)
57            Replication slots: OK (node has no physical replication slots)
58            Missing physical replication slots: OK (node has no missing physical replication slots)
59            Configured data directory: OK (configured "data_directory" is "/var/lib/postgresql/data")</programlisting>
60    </para>
61  </refsect1>
62  <refsect1>
63    <title>Individual checks</title>
64    <para>
65      Each check can be performed individually by supplying
66      an additional command line parameter, e.g.:
67      <programlisting>
68        $ repmgr node check --role
69        OK (node is primary)</programlisting>
70    </para>
71    <para>
72	  Parameters for individual checks are as follows:
73    <itemizedlist spacing="compact" mark="bullet">
74
75     <listitem>
76      <simpara>
77        <option>--role</option>: checks if the node has the expected role
78      </simpara>
79     </listitem>
80
81     <listitem>
82      <simpara>
83        <option>--replication-lag</option>: checks if the node is lagging by more than
84        <varname>replication_lag_warning</varname> or <varname>replication_lag_critical</varname>
85      </simpara>
86     </listitem>
87
88     <listitem>
89      <simpara>
90        <option>--archive-ready</option>: checks for WAL files which have not yet been archived,
91        and returns <literal>WARNING</literal> or <literal>CRITICAL</literal> if the number
92        exceeds <varname>archive_ready_warning</varname> or <varname>archive_ready_critical</varname> respectively.
93      </simpara>
94     </listitem>
95
96     <listitem>
97      <simpara>
98        <option>--downstream</option>: checks that the expected downstream nodes are attached
99      </simpara>
100     </listitem>
101
102     <listitem>
103      <simpara>
104        <option>--upstream</option>: checks that the node is attached to its expected upstream
105      </simpara>
106     </listitem>
107
108     <listitem>
109      <simpara>
110        <option>--slots</option>: checks there are no inactive physical replication slots
111      </simpara>
112     </listitem>
113
114     <listitem>
115      <simpara>
116        <option>--missing-slots</option>: checks there are no missing physical replication slots
117      </simpara>
118     </listitem>
119
120     <listitem>
121      <simpara>
122        <option>--data-directory-config</option>: checks the data directory configured in
123        <filename>repmgr.conf</filename> matches the actual data directory.
124        This check is not directly related to replication, but is useful to verify &repmgr;
125        is correctly configured.
126      </simpara>
127     </listitem>
128
129
130    </itemizedlist>
131  </para>
132  </refsect1>
133
134  <refsect1>
135    <title>Additional checks</title>
136    <para>
137      Several checks are provided for diagnostic purposes and are not
138      included in the general output:
139      <itemizedlist spacing="compact" mark="bullet">
140
141        <listitem>
142          <simpara>
143            <option>--db-connection</option>: checks if &repmgr; can connect to the
144            database on the local node.
145          </simpara>
146          <simpara>
147            This option is particularly useful in combination with <command>SSH</command>, as
148            it can be used to troubleshoot connection issues encountered when &repmgr; is
149            executed remotely (e.g. during a switchover operation).
150          </simpara>
151
152        </listitem>
153
154        <listitem>
155          <simpara>
156            <option>--replication-config-owner</option>: checks if the file containing replication
157            configuration (PostgreSQL 12 and later: <filename>postgresql.auto.conf</filename>;
158            PostgreSQL 11 and earlier: <filename>recovery.conf</filename>) is
159            owned by the same user who owns the data directory.
160          </simpara>
161          <simpara>
162            Incorrect ownership of these files (e.g. if they are owned by <literal>root</literal>)
163            will cause operations which need to update the replication configuration
164            (e.g. <link linkend="repmgr-standby-follow"><command>repmgr standby follow</command></link>
165            or <link linkend="repmgr-standby-promote"><command>repmgr standby promote</command></link>)
166            to fail.
167          </simpara>
168        </listitem>
169
170      </itemizedlist>
171    </para>
172  </refsect1>
173
174  <refsect1>
175    <title>Connection options</title>
176    <para>
177      <itemizedlist spacing="compact" mark="bullet">
178
179        <listitem>
180          <simpara>
181            <option>-S</option>/<option>--superuser</option>: connect as the
182            named superuser instead of the &repmgr; user
183          </simpara>
184        </listitem>
185
186      </itemizedlist>
187    </para>
188  </refsect1>
189
190  <refsect1>
191    <title>Output format</title>
192    <para>
193      <itemizedlist spacing="compact" mark="bullet">
194
195        <listitem>
196          <simpara>
197            <option>--csv</option>: generate output in CSV format (not available
198            for individual checks)
199          </simpara>
200        </listitem>
201
202        <listitem>
203          <simpara>
204            <option>--nagios</option>: generate output in a Nagios-compatible format
205            (for individual checks only)
206          </simpara>
207        </listitem>
208      </itemizedlist>
209    </para>
210  </refsect1>
211
212
213
214  <refsect1>
215    <title>Exit codes</title>
216
217    <para>
218      When executing <command>repmgr node check</command> with one of the individual
219      checks listed above, &repmgr; will emit one of the following Nagios-style exit codes
220      (even if <option>--nagios</option> is not supplied):
221
222      <itemizedlist spacing="compact" mark="bullet">
223
224        <listitem>
225          <simpara>
226            <literal>0</literal>: OK
227          </simpara>
228        </listitem>
229
230        <listitem>
231          <simpara>
232            <literal>1</literal>: WARNING
233          </simpara>
234        </listitem>
235
236        <listitem>
237          <simpara>
238            <literal>2</literal>: ERROR
239          </simpara>
240        </listitem>
241
242        <listitem>
243          <simpara>
244            <literal>3</literal>: UNKNOWN
245          </simpara>
246        </listitem>
247
248      </itemizedlist>
249    </para>
250
251
252
253    <para>
254      One of the following exit codes will be emitted by <command>repmgr status check</command>
255      if no individual check was specified.
256    </para>
257
258    <variablelist>
259
260      <varlistentry>
261        <term><option>SUCCESS (0)</option></term>
262        <listitem>
263          <para>
264            No issues were detected.
265          </para>
266        </listitem>
267      </varlistentry>
268
269      <varlistentry>
270        <term><option>ERR_NODE_STATUS (25)</option></term>
271        <listitem>
272          <para>
273            One or more issues were detected.
274          </para>
275        </listitem>
276      </varlistentry>
277
278   </variablelist>
279
280  </refsect1>
281
282
283
284  <refsect1>
285    <title>See also</title>
286    <para>
287     <xref linkend="repmgr-node-status"/>, <xref linkend="repmgr-cluster-show"/>
288    </para>
289  </refsect1>
290
291</refentry>
292