xref: /freebsd/usr.sbin/nfsd/pnfsserver.4 (revision 148a8da8)
1.\" Copyright (c) 2018 Rick Macklem
2.\"
3.\" Redistribution and use in source and binary forms, with or without
4.\" modification, are permitted provided that the following conditions
5.\" are met:
6.\" 1. Redistributions of source code must retain the above copyright
7.\"    notice, this list of conditions and the following disclaimer.
8.\" 2. Redistributions in binary form must reproduce the above copyright
9.\"    notice, this list of conditions and the following disclaimer in the
10.\"    documentation and/or other materials provided with the distribution.
11.\"
12.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
13.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
14.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
15.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
16.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
17.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
18.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
19.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
20.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
21.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
22.\" SUCH DAMAGE.
23.\"
24.\" $FreeBSD$
25.\"
26.Dd August 8, 2018
27.Dt PNFSSERVER 4
28.Os
29.Sh NAME
30.Nm pNFSserver
31.Nd NFS Version 4.1 Parallel NFS Protocol Server
32.Sh DESCRIPTION
33A set of FreeBSD servers may be configured to provide a
34.Xr pnfs 4
35service.
36One FreeBSD system needs to be configured as a MetaData Server (MDS) and
37at least one additional FreeBSD system needs to be configured as one or
38more Data Servers (DS)s.
39.Pp
40These FreeBSD systems are configured to be NFSv4.1 servers, see
41.Xr nfsd 8
42and
43.Xr exports 5
44if you are not familiar with configuring a NFSv4.1 server.
45.Sh DS server configuration
46The DS(s) need to be configured as NFSv4.1 server(s), with a top level exported
47directory used for storage of data files.
48This directory must be owned by
49.Dq root
50and would normally have a mode of
51.Dq 700 .
52Within this directory there needs to be additional directories named
53ds0,...,dsN (where N is 19 by default) also owned by
54.Dq root
55with mode
56.Dq 700 .
57These are the directories where the data files are stored.
58The following command can be run by root when in the top level exported
59directory to create these subdirectories.
60.Bd -literal -offset indent
61jot -w ds 20 0 | xargs mkdir -m 700
62.Ed
63.sp
64Note that
65.Dq 20
66is the default and can be set to a larger value on the MDS as shown below.
67.sp
68The top level exported directory used for storage of data files must be
69exported to the MDS with the
70.Dq maproot=root sec=sys
71export options so that the MDS can create entries in these subdirectories.
72It must also be exported to all pNFS aware clients, but these clients do
73not require the
74.Dq maproot=root
75export option and this directory should be exported to them with the same
76options as used by the MDS to export file system(s) to the clients.
77.Pp
78It is possible to have multiple DSs on the same FreeBSD system, but each
79of these DSs must have a separate top level exported directory used for storage
80of data files and each
81of these DSs must be mountable via a separate IP address.
82Alias addresses can be set on the DS server system for a network
83interface via
84.Xr ifconfig 8
85to create these different IP addresses.
86Multiple DSs on the same server may be useful when data for different file systems
87on the MDS are being stored on different file system volumes on the FreeBSD
88DS system.
89.Sh MDS server configuration
90The MDS must be a separate FreeBSD system from the FreeBSD DS system(s) and
91NFS clients.
92It is configured as a NFSv4.1 server with file system(s) exported to
93clients.
94However, the
95.Dq -p
96command line argument for
97.Xr nfsd
98is used to indicate that it is running as the MDS for a pNFS server.
99.Pp
100The DS(s) must all be mounted on the MDS using the following mount options:
101.Bd -literal -offset indent
102nfsv4,minorversion=1,soft,retrans=2
103.Ed
104.sp
105so that they can be defined as DSs in the
106.Dq -p
107option.
108Normally these mounts would be entered in the
109.Xr fstab 5
110on the MDS.
111For example, if there are four DSs named nfsv4-data[0-3], the
112.Xr fstab 5
113lines might look like:
114.Bd -literal -offset
115nfsv4-data0:/ /data0 nfs rw,nfsv4,minorversion=1,soft,retrans=2 0 0
116nfsv4-data1:/ /data1 nfs rw,nfsv4,minorversion=1,soft,retrans=2 0 0
117nfsv4-data2:/ /data2 nfs rw,nfsv4,minorversion=1,soft,retrans=2 0 0
118nfsv4-data3:/ /data3 nfs rw,nfsv4,minorversion=1,soft,retrans=2 0 0
119.Ed
120.sp
121The
122.Xr nfsd 8
123command line option
124.Dq -p
125indicates that the NFS server is a pNFS MDS and specifies what
126DSs are to be used.
127.br
128For the above
129.Xr fstab 5
130example, the
131.Xr nfsd 8
132nfs_server_flags line in your
133.Xr rc.conf 5
134might look like:
135.Bd -literal -offset
136nfs_server_flags="-u -t -n 128 -p nfsv4-data0:/data0,nfsv4-data1:/data1,nfsv4-data2:/data2,nfsv4-data3:/data3"
137.Ed
138.sp
139This example specifies that the data files should be distributed over the
140four DSs and File layouts will be issued to pNFS enabled clients.
141If issuing Flexible File layouts is desired for this case, setting the sysctl
142.Dq vfs.nfsd.default_flexfile
143non-zero in your
144.Xr sysctl.conf 5
145file will make the
146.Nm
147do that.
148.br
149Alternately, this variant of
150.Dq nfs_server_flags
151will specify that two way mirroring is to be done, via the
152.Dq -m
153command line option.
154.Bd -literal -offset
155nfs_server_flags="-u -t -n 128 -p nfsv4-data0:/data0,nfsv4-data1:/data1,nfsv4-data2:/data2,nfsv4-data3:/data3 -m 2"
156.Ed
157.sp
158With two way mirroring, the data file for each exported file on the MDS
159will be stored on two of the DSs.
160When mirroring is enabled, the server will always issue Flexible File layouts.
161.Pp
162It is also possible to specify which DSs are to be used to store data files for
163specific exported file systems on the MDS.
164For example, if the MDS has exported two file systems
165.Dq /export1
166and
167.Dq /export2
168to clients, the following variant of
169.Dq nfs_server_flags
170will specify that data files for
171.Dq /export1
172will be stored on nfsv4-data0 and nfsv4-data1, whereas the data files for
173.Dq /export2
174will be store on nfsv4-data2 and nfsv4-data3.
175.Bd -literal -offset
176nfs_server_flags="-u -t -n 128 -p nfsv4-data0:/data0#/export1,nfsv4-data1:/data1#/export1,nfsv4-data2:/data2#/export2,nfsv4-data3:/data3#/export2"
177.Ed
178.sp
179This can be used by system administrators to control where data files are
180stored and might be useful for control of storage use.
181For this case, it may be convenient to co-locate more than one of the DSs
182on the same FreeBSD server, using separate file systems on the DS system
183for storage of the respective DS's data files.
184If mirroring is desired for this case, the
185.Dq -m
186option also needs to be specified.
187There must be enough DSs assigned to each exported file system on the MDS
188to support the level of mirroring.
189The above example would be fine for two way mirroring, but four way mirroring
190would not work, since there are only two DSs assigned to each exported file
191system on the MDS.
192.Pp
193The number of subdirectories in each DS is defined by the
194.Dq vfs.nfs.dsdirsize
195sysctl on the MDS.
196This value can be increased from the default of 20, but only when the
197.Xr nfsd 8
198is not running and after the additional ds20,... subdirectories have been
199created on all the DSs.
200For a service that will store a large number of files this sysctl should be
201set much larger, to avoid the number of entries in a subdirectory from
202getting too large.
203.Sh Client mounts
204Once operational, NFSv4.1 FreeBSD client mounts done with the
205.Dq pnfs
206option should do I/O directly on the DSs.
207The clients mounting the MDS must be running the
208.Xr nfscbd
209daemon for pNFS to work.
210Set
211.Bd -literal -offset indent
212nfscbd_enable="YES"
213.Ed
214.sp
215in the
216.Xr rc.conf 5
217on these clients.
218Non-pNFS aware clients or NFSv3 mounts will do all I/O RPCs on the MDS,
219which acts as a proxy for the appropriate DS(s).
220.Sh Backing up a pNFS service
221Since the data is separated from the metadata, the simple way to back up
222a pNFS service is to do so from an NFS client that has the service mounted
223on it.
224If you back up the MDS exported file system(s) on the MDS, you must do it
225in such a way that the
226.Dq system
227namespace extended attributes get backed up.
228.Sh Handling of failed mirrored DSs
229When a mirrored DS fails, it can be disabled one of three ways:
230.sp
2311 - The MDS detects a problem when trying to do proxy
232operations on the DS.
233This can take a couple of minutes
234after the DS failure or network partitioning occurs.
235.sp
2362 - A pNFS client can report an I/O error that occurred for a DS to the MDS in
237the arguments for a LayoutReturn operation.
238.sp
2393 - The system administrator can perform the pnfsdskill(8) command on the MDS
240to disable it. If the system administrator does a pnfsdskill(8) and it fails
241with ENXIO (Device not configured) that normally means the DS was already
242disabled via #1 or #2. Since doing this is harmless, once a system
243administrator knows that there is a problem with a mirrored DS, doing the
244command is recommended.
245.sp
246Once a system administrator knows that a mirrored DS has malfunctioned
247or has been network partitioned, they should do the following as root/su
248on the MDS:
249.Bd -literal -offset indent
250# pnfsdskill <mounted-on-path-of-DS>
251# umount -N <mounted-on-path-of-DS>
252.Ed
253.sp
254Note that the <mounted-on-path-of-DS> must be the exact mounted-on path
255string used when the DS was mounted on the MDS.
256.Pp
257Once the mirrored DS has been disabled, the pNFS service should continue to
258function, but file updates will only happen on the DS(s)
259that have not been disabled. Assuming two way mirroring, that implies
260the one DS of the pair stored in the
261.Dq pnfsd.dsfile
262extended attribute for the file on the MDS, for files stored on the disabled DS.
263.Pp
264The next step is to clear the IP address in the
265.Dq pnfsd.dsfile
266extended attribute on all files on the MDS for the failed DS.
267This is done so that, when the disabled DS is repaired and brought back online,
268the data files on this DS will not be used, since they may be out of date.
269The command that clears the IP address is
270.Xr pnfsdsfile 8
271with the
272.Dq -r
273option.
274.Bd -literal -offset
275For example:
276# pnfsdsfile -r nfsv4-data3 yyy.c
277yyy.c:	nfsv4-data2.home.rick	ds0/207508569ff983350c000000ec7c0200e4c57b2e0000000000000000	0.0.0.0	ds0/207508569ff983350c000000ec7c0200e4c57b2e0000000000000000
278.Ed
279.sp
280replaces nfsv4-data3 with an IPv4 address of 0.0.0.0, so that nfsv4-data3
281will not get used.
282.Pp
283Normally this will be called within a
284.Xr find 1
285command for all regular
286files in the exported directory tree and must be done on the MDS.
287When used with
288.Xr find 1 ,
289you will probably also want the
290.Dq -q
291option so that it won't spit out the results for every file.
292If the disabled/repaired DS is nfsv4-data3, the commands done on the MDS
293would be:
294.Bd -literal -offset
295# cd <top-level-exported-dir>
296# find . -type f -exec pnfsdsfile -q -r nfsv4-data3 {} \;
297.Ed
298.sp
299There is a problem with the above command if the file found by
300.Xr find 1
301is renamed or unlinked before the
302.Xr pnfsdsfile 8
303command is done on it.
304This should normally generate an error message.
305A simple unlink is harmless
306but a link/unlink or rename might result in the file not having been processed
307under its new name.
308To check that all files have their IP addresses set to 0.0.0.0 these
309commands can be used (assuming the
310.Xr sh 1
311shell):
312.Bd -literal -offset
313# cd <top-level-exported-dir>
314# find . -type f -exec pnfsdsfile {} \; | sed "/nfsv4-data3/!d"
315.Ed
316.sp
317Any line(s) printed require the
318.Xr pnfsdsfile 8
319with
320.Dq -r
321to be done again.
322Once this is done, the replaced/repaired DS can be brought back online.
323It should have empty ds0,...,dsN directories under the top level exported
324directory for storage of data files just like it did when first set up.
325Mount it on the MDS exactly as you did before disabling it.
326For the nfsv4-data3 example, the command would be:
327.Bd -literal -offset
328# mount -t nfs -o nfsv4,minorversion=1,soft,retrans=2 nfsv4-data3:/ /data3
329.Ed
330.sp
331Then restart the nfsd to re-enable the DS.
332.Bd -literal -offset
333# /etc/rc.d/nfsd restart
334.Ed
335.sp
336Now, new files can be stored on nfsv4-data3,
337but files with the IP address zeroed out on the MDS will not yet use the
338repaired DS (nfsv4-data3).
339The next step is to go through the exported file tree on the MDS and,
340for each of the
341files with an IPv4 address of 0.0.0.0 in its extended attribute, copy the file
342data to the repaired DS and re-enable use of this mirror for it.
343This command for copying the file data for one MDS file is
344.Xr pnfsdscopymr 8
345and it will also normally be used in a
346.Xr find 1 .
347For the example case, the commands on the MDS would be:
348.Bd -literal -offset
349# cd <top-level-exported-dir>
350# find . -type f -exec pnfsdscopymr -r /data3 {} \;
351.Ed
352.sp
353When this completes, the recovery should be complete or at least nearly so.
354As noted above, if a link/unlink or rename occurs on a file name while the
355above
356.Xr find 1
357is in progress, it may not get copied.
358To check for any file(s) not yet copied, the commands are:
359.Bd -literal -offset
360# cd <top-level-exported-dir>
361# find . -type f -exec pnfsdsfile {} \; | sed "/0\.0\.0\.0/!d"
362.Ed
363.sp
364If this command prints out any file name(s), these files must
365have the
366.Xr pnfsdscopymr 8
367command done on them to complete the recovery.
368.Bd -literal -offset
369# pnfsdscopymr -r /data3 <file-path-reported>
370.Ed
371.sp
372If this commmand fails with the error
373.br
374.Dq pnfsdscopymr: Copymr failed for file <path>: Device not configured
375.br
376repeatedly, this may be caused by a Read/Write layout that has not
377been returned.
378The only way to get rid of such a layout is to restart the
379.Xr nfsd 8 .
380.sp
381All of these commands are designed to be
382done while the pNFS service is running and can be re-run safely.
383.Pp
384For a more detailed discussion of the setup and management of a pNFS service
385see:
386.Bd -literal -offset indent
387http://people.freebsd.org/~rmacklem/pnfs-planb-setup.txt
388.Ed
389.sp
390.Sh SEE ALSO
391.Xr nfsv4 4 ,
392.Xr pnfs 4 ,
393.Xr exports 5 ,
394.Xr fstab 5 ,
395.Xr rc.conf 5 ,
396.Xr sysctl.conf 5 ,
397.Xr nfscbd 8 ,
398.Xr nfsd 8 ,
399.Xr nfsuserd 8 ,
400.Xr pnfsdscopymr 8 ,
401.Xr pnfsdsfile 8 ,
402.Xr pnfsdskill 8
403.Sh HISTORY
404The
405.Nm
406command first appeared in
407.Fx 12.0 .
408.Sh BUGS
409Since the MDS cannot be mirrored, it is a single point of failure just
410as a non
411.Tn pNFS
412server is.
413For non-mirrored configurations, all FreeBSD systems used in the service
414are single points of failure.
415