xref: /dragonfly/bin/cpdup/BACKUPS (revision 3948dfa0)
1$DragonFly: src/bin/cpdup/BACKUPS,v 1.4 2007/05/17 08:19:00 swildner Exp $
2
3			    INCREMENTAL BACKUP HOWTO
4
5    This document describes one of several ways to set up a LAN backup and
6    an off-site WAN backup system using cpdup's hardlinking capabilities.
7
8    The features described in this document are also encapsulated in scripts
9    which can be found in the scripts/ directory.  These scripts can be used
10    to automate all backup steps except for the initial preparation of the
11    backup and off-site machine's directory topology.  Operation of these
12    scripts is described in the last section of this document.
13
14
15		    PART 1 - PREPARE THE LAN BACKUP BOX
16
17    The easiest way to create a LAN backup box is to NFS mount all your
18    backup clients onto the backup box.  It is also possible to use cpdup's
19    remote host feature to access your client boxes but that requires root
20    access to the client boxes and is not described here.  (But see the
21    sections "OFF-SITE BACKUPS" and "SSH SECURITY TIPS" below.)
22
23    Create a directory on the backup machine called /nfs, a subdirectory
24    foreach remote client, and subdirectories for each partition on each
25    client.  Remember that cpdup does not cross mount points so you will
26    need a mount for each partition you wish to backup.  For example:
27
28	[ ON LAN BACKUP BOX ]
29
30	mkdir /nfs
31	mkdir /nfs/box1
32	mkdir /nfs/box1/home
33	mkdir /nfs/box1/var
34
35    Before you actually do the NFS mount, create a dummy file for each
36    mount point that can be used by scripts to detect when an NFS mount
37    has not been done.  Scripts can thus avoid a common failure scenario
38    and not accidently cpdup an empty mount point to the backup partition
39    (destroying that day's backup in the process).
40
41	touch /nfs/box1/home/NOT_MOUNTED
42	touch /nfs/box1/var/NOT_MOUNTED
43
44    Once the directory structure has been set up, do your NFS mounts and
45    also add them to your fstab.  Since you will probably wind up with a
46    lot of mounts it is a good idea to use 'ro,bg' (readonly, background
47    mount) in the fstab entries.
48
49	mount box1:/home /nfs/box1/home
50	mount box1:/var /nfs/box1/var
51
52    You should create a huge /backup partition on your backup machine which
53    is capable of holding all your mirrors.  Create a subdirectory called
54    /backup/mirrors in your huge backup partition.
55
56	mount <huge_disk> /backup
57	mkdir /backup/mirrors
58
59
60			PART 2 - DOING A LEVEL 0 BACKUP
61
62    (If you use the supplied scripts, a level 0 backup can be accomplished
63    simply by running the 'do_mirror' script with an argument of 0).
64
65    Create a level 0 backup using a standard cpdup with no special arguments
66    other then -i0 -s0 (tell it not to ask questions and turn off the
67    file-overwrite-with-directory safety feature).  Name the mirror with
68    the date in a string-sortable format.
69
70	set date = `date "+%Y%m%d"`
71	mkdir /backup/mirrors/box1.${date}
72	cpdup -i0 -s0 /nfs/box1/home /backup/mirrors/box1.${date}/home
73	cpdup -i0 -s0 /nfs/box1/var /backup/mirrors/box1.${date}/var
74
75    Create a softlink to the most recently completed backup, which is your
76    level 0 backup.  Note that using 'ln -sf' will create a link in the
77    subdirectory pointed to by the current link, not replace the current
78    link. 'ln -shf' can be used to replace the link but is not portable.
79    'mv -f' has the same problem.
80
81	sync
82	rm -f /backup/mirrors/box1
83	ln -s /backup/mirrors/box1.${date} /backup/mirrors/box1
84
85			PART 3 - DO AN INCREMENTAL BACKUP
86
87    An incremental backup is exactly the same as a level 0 backup EXCEPT
88    you use the -H option to specify the location of the most recent
89    completed backup.  We simply maintain the handy softlink pointing at
90    the most recent completed backup and the cpdup required to do this
91    becomes trivial.
92
93    Each day's incremental backup will reproduce the ENTIRE directory topology
94    for the client, but cpdup will hardlink files from the most recent backup
95    instead of copying them and this is what saves you all the disk space.
96
97	set date = `date "+%Y%m%d"`
98	mkdir /backup/mirrors/box1.${date}
99	if ( "`readlink /backup/mirrors/box1`" == "box1.${date}" ) then
100	    echo "silly boy, an incremental already exists for today"
101	    exit 1
102	endif
103	cpdup -H /backup/mirrors/box1 \
104	      -i0 -s0 /nfs/box1/home /backup/mirrors/box1.${date}/home
105
106    Be sure to update your 'most recent backup' softlink, but only do it
107    if the cpdup's for all the partitions for that client have succeeded.
108    That way the next incremental backup will be based on the previous one.
109
110	rm -f /backup/mirrors/box1
111	ln -s /backup/mirrors/box1.${date} /backup/mirrors/box1
112
113    Since these backups are mirrors, locating a backup is as simple
114    as CDing into the appropriate directory.  If your filesystem has a
115    hardlink limit and cpdup hits it, cpdup will 'break' the hardlink
116    and copy the file instead.  Generally speaking only a few special cases
117    will hit the hardlink limit for a filesystem.  For example, the
118    CVS/Root file in a checked out cvs repository is often hardlinked, and
119    the sheer number of hardlinked 'Root' files multiplied by the number
120    of backups can often hit the filesystem hardlink limit.
121
122		    PART 4 - DO AN INCREMENTAL VERIFIED BACKUP
123
124    Since your incremental backups use hardlinks heavily the actual file
125    might exist on the physical /backup disk in only one place even though
126    it may be present in dozens of daily mirrors.  To ensure that the
127    file being hardlinked does not get corrupted cpdup's -f option can be
128    used in conjunction with -H to force cpdup to validate the contents
129    of the file, even if all the stat info looks identical.
130
131	cpdup -f -H /backup/mirrors/box1 ...
132
133    You can create completely redundant (non-hardlinked-dependent) backups
134    by doing the equivalent of your level 0, i.e. not using -H.  However I
135    do NOT recommend that you do this, or that you do it very often (maybe
136    once every 6 months at the most), because each mirror created this way
137    will have a distinct copy of all the file data and you will quickly
138    run out of space in your /backup partition.
139
140		    MAINTAINANCE OF THE "/backup" DIRECTORY
141
142    Now, clearly you are going to run out of space in /backup if you keep
143    doing this, but you may be surprised at just how many daily incrementals
144    you can create before you fill up your /backup partition.
145
146    If /backup becomes full, simply start rm -rf'ing older mirror directories
147    until enough space is freed up.   You do not have to remove the oldest
148    directory first.  In fact, you might want to keep it around and remove
149    a day's backup here, a day's backup there, etc, until you free up enough
150    space.
151
152				OFF-SITE BACKUPS
153
154    Making an off-site backup involves similar methodology, but you use
155    cpdup's remote host capability to generate the backup.  To avoid
156    complications it is usually best to take a mirror already generated on
157    your LAN backup box and copy that to the remote box.
158
159    The remote backup box does not use NFS, so setup is trivial.  Just
160    create your super-large /backup partition and mkdir /backup/mirrors.
161    Your LAN backup box will need root access via ssh to your remote backup
162    box.  See the section "SSH SECURITY TIPS" below.
163
164    You can use the handy softlink to get the latest 'box1.date' mirror
165    directory and since the mirror is all in one partition you can just
166    cpdup the entire machine in one command.  Use the same dated directory
167    name on the remote box, so:
168
169        # latest will wind up something like 'box1.20060915'
170	set latest = `readlink /backup/mirrors/box1`
171	cpdup -i0 -s0 /backup/mirrors/$latest remote.box:/backup/mirrors/$latest
172
173    As with your LAN backup, create a softlink on the backup box denoting the
174    latest mirror for any given site.
175
176	if ( $status == 0 ) then
177	    ssh remote.box -n \
178		"rm -f /backup/mirrors/box1; ln -s /backup/mirrors/$latest /backup/mirrors/box1"
179	endif
180
181    Incremental backups can be accomplished using the same cpdup command,
182    but adding the -H option to the latest backup on the remote box.  Note
183    that the -H path is relative to the remote box, not the LAN backup box
184    you are running the command from.
185
186	set latest = `readlink /backup/mirrors/box1`
187	set remotelatest = `ssh remote.box -n "readlink /backup/mirrors/box1"`
188	if ( "$latest" == "$remotelatest" ) then
189	    echo "silly boy, you already made a remote incremental backup today"
190	    exit 1
191	endif
192	cpdup -H /backup/mirrors/$remotelatest \
193	      -i0 -s0 /backup/mirrors/$latest remote.box:/backup/mirrors/$latest
194	if ( $status == 0 ) then
195	    ssh remote.box -n \
196		"rm -f /backup/mirrors/box1; ln -s /backup/mirrors/$latest /backup/mirrors/box1"
197	endif
198
199    Cleaning out the remote directory works the same as cleaning out the LAN
200    backup directory.
201
202
203			    RESTORING FROM BACKUPS
204
205    Each backup is a full filesystem mirror, and depending on how much space
206    you have you should be able to restore it simply by cd'ing into the
207    appropriate backup directory and using 'cpdup blah box1:blah' (assuming
208    root access), or you can export the backup directory via NFS to your
209    client boxes and use cpdup locally on the client to extract the backup.
210    Using NFS is probably the most efficient solution.
211
212
213			PUTTING IT ALL TOGETHER - SOME SCRIPTS
214
215    Please refer to the scripts in the script/ subdirectory.  These scripts
216    are EXAMPLES ONLY.  If you want to use them, put them in your ~root/adm
217    directory on your backup box and set up a root crontab.
218
219    First follow the preparation rules in PART 1 above.  The scripts do not
220    do this automatically.  Edit the 'params' file that the scripts use
221    to set default paths and such.
222
223	** FOLLOW DIRECTIONS IN PART 1 ABOVE TO SET UP THE LAN BACKUP BOX **
224
225    Copy the scripts to ~/adm.  Do NOT install a crontab yet (but an example
226    can be found in scripts/crontab).
227
228    Do a manual lavel 0 LAN BACKUP using the do_mirror script.
229
230	cd ~/adm
231	./do_mirror 0
232
233    Once done you can do incremental backups using './do_mirror 1' to do a
234    verified incremental, or './do_mirror 2' to do a stat-optimized
235    incremental.  You can enable the cron jobs that run do_mirror and
236    do_cleanup now.
237
238    --
239
240    Setting up an off-site backup box is trivial.  The off-site backup box
241    needs to allow root ssh logins from the LAN backup box (at least for
242    now, sorry!).  Set up the off-site backup directory, typically
243    /backup/mirrors.  Then do a level 0 backup from your LAN backup box
244    to the off-site box using the do_remote script.
245
246	cd ~/adm
247	./do_remote 0
248
249    Once done you can do incremental backups using './do_remote 1' to do a
250    verified incremental, or './do_mirror 2' to do a stat-optimized
251    incremental.  You can enable the cron jobs that run do_remote now.
252
253    NOTE!  It is NOT recommended that you use verified-incremental backups
254    over a WAN, as all related data must be copied over the wire every single
255    day.  Instead, I recommend sticking with stat-optimized backups
256    (./do_mirror 2).
257
258    You will also need to set up a daily cleaning script on the off-site
259    backup box.
260
261    SCRIPT TODOS - the ./do_cleanup script is not very smart.  We really
262    should do a tower-of-hanoi removal
263
264
265			      SSH SECURITY TIPS
266
267    To allow root access via ssh, add the following line to your sshd
268    configuration on the client boxes (typically /etc/ssh/sshd_config):
269
270	PermitRootLogin forced-commands-only
271
272    If your OpenSSH version is too old to recognize that setting, you
273    should update to a more recent version immediately.
274    Restart sshd for the settings to take effect.
275
276    On the backup machine, create a special backup key for root:
277
278	mkdir /root/.ssh	# if it doesn't already exist
279	cd /root/.ssh
280	ssh-keygen -t dsa -N "" -f backup-key
281
282    You now have a key pair, consisting of a secret key called "backup-key"
283    and a public key called "backup-key.pub".  The secret key must *NEVER*
284    leave the backup machine nor be disclosed in any way!  Note that we
285    haven't procted the secret key with a passphrase (-N "") because it
286    will be used by cron jobs where no passphrase can be entered.
287
288    On the client boxes, create a file /root/.ssh/authorized_keys.
289    It should contain just this line:
290
291	command="/usr/local/bin/cpdup -S",from="<BAKHOST>",no-pty,
292	no-port-forwarding,no-X11-forwarding,no-agent-forwarding <PUBKEY>
293
294    This must be on one long line; it has been broken up here for
295    readability only.  Note that the options must be separated by commas
296    *ONLY* (no spaces).  Replace <BAKHOST> with the IP address or DNS name
297    of the backup machine.  Replace <PUBKEY> with the contents of the
298    file /root/.ssh/backup-key.pub from the backup machine (the public key,
299    not the secret key!).  It typically starts with "ssh-dss" followed by
300    a long character sequence that looks like line noise, followed by a
301    comment that typically indicates who created the key.
302
303    The format of the authorized_keys file is documented in the sshd(8)
304    manual page.  Please refer to it for more details.
305
306    If you have done all of the above correctly, then the root user on the
307    backup machine will be able to log into the client boxes as root and
308    execute "/usr/local/bin/cpdup -S", but nothing else.
309
310    To further improve security, you can place the slave cpdup on the client
311    machine into read-only mode by adding the -R option.  In this case, the
312    line from the authorized_keys file should begin as follows:
313
314	command="/usr/local/bin/cpdup -RS",from="<BAKHOST>",etc...
315
316    If you do that, your backup server can only pull backups from the client
317    machine, but it won't be able to change anything on it.  That is, you
318    cannot use the client machine as a remote target.  So, if an attacker
319    manages to be able to execute commands on your backup machine, he won't
320    be able to do any harm to your clients.  This also protects against
321    human errors, e.g. accidentally swapping source and destination.
322
323    By the way, it doesn't really matter much whether you specify the -R
324    option when running cpdup on the backup machine.  If you do it, then
325    the -R option will be passed to the slave, but the command="..." entry
326    from the authorized_keys file overides it anyway, so the slave always
327    runs with the -R option.
328
329    When using cpdup on the backup machine, make sure that the right key is
330    used by passing the -i option to the ssh command:
331
332	cpdup -F -i/root/.ssh/backup-key ...
333
334    If one or both of the machines involved has a slow processor, it might
335    be worthwhile to use a faster encryption algorithm, for example:
336
337	cpdup -F -cblowfish-cbc ...
338
339    If your OpenSSH version has been patched to support unencrypted transfers
340    *AND* you trust the physical network between the machines involved, you
341    might want to disable encryption alltogether:
342
343	cpdup -F -cnone ...
344