xref: /dragonfly/bin/cpdup/BACKUPS (revision bbb35c81)
1			    INCREMENTAL BACKUP HOWTO
2
3    This document describes one of several ways to set up a LAN backup and
4    an off-site WAN backup system using cpdup's hardlinking capabilities.
5
6    The features described in this document are also encapsulated in scripts
7    which can be found in the scripts/ directory.  These scripts can be used
8    to automate all backup steps except for the initial preparation of the
9    backup and off-site machine's directory topology.  Operation of these
10    scripts is described in the last section of this document.
11
12
13		    PART 1 - PREPARE THE LAN BACKUP BOX
14
15    The easiest way to create a LAN backup box is to NFS mount all your
16    backup clients onto the backup box.  It is also possible to use cpdup's
17    remote host feature to access your client boxes but that requires root
18    access to the client boxes and is not described here.  (But see the
19    sections "OFF-SITE BACKUPS" and "SSH SECURITY TIPS" below.)
20
21    Create a directory on the backup machine called /nfs, a subdirectory
22    foreach remote client, and subdirectories for each partition on each
23    client.  Remember that cpdup does not cross mount points so you will
24    need a mount for each partition you wish to backup.  For example:
25
26	[ ON LAN BACKUP BOX ]
27
28	mkdir /nfs
29	mkdir /nfs/box1
30	mkdir /nfs/box1/home
31	mkdir /nfs/box1/var
32
33    Before you actually do the NFS mount, create a dummy file for each
34    mount point that can be used by scripts to detect when an NFS mount
35    has not been done.  Scripts can thus avoid a common failure scenario
36    and not accidentally cpdup an empty mount point to the backup partition
37    (destroying that day's backup in the process).
38
39	touch /nfs/box1/home/NOT_MOUNTED
40	touch /nfs/box1/var/NOT_MOUNTED
41
42    Once the directory structure has been set up, do your NFS mounts and
43    also add them to your fstab.  Since you will probably wind up with a
44    lot of mounts it is a good idea to use 'ro,bg' (readonly, background
45    mount) in the fstab entries.
46
47	mount box1:/home /nfs/box1/home
48	mount box1:/var /nfs/box1/var
49
50    You should create a huge /backup partition on your backup machine which
51    is capable of holding all your mirrors.  Create a subdirectory called
52    /backup/mirrors in your huge backup partition.
53
54	mount <huge_disk> /backup
55	mkdir /backup/mirrors
56
57
58			PART 2 - DOING A LEVEL 0 BACKUP
59
60    (If you use the supplied scripts, a level 0 backup can be accomplished
61    simply by running the 'do_mirror' script with an argument of 0).
62
63    Create a level 0 backup using a standard cpdup with no special arguments
64    other then -i0 -s0 (tell it not to ask questions and turn off the
65    file-overwrite-with-directory safety feature).  Name the mirror with
66    the date in a string-sortable format.
67
68	set date = `date "+%Y%m%d"`
69	mkdir /backup/mirrors/box1.${date}
70	cpdup -i0 -s0 /nfs/box1/home /backup/mirrors/box1.${date}/home
71	cpdup -i0 -s0 /nfs/box1/var /backup/mirrors/box1.${date}/var
72
73    Create a softlink to the most recently completed backup, which is your
74    level 0 backup.  Note that using 'ln -sf' will create a link in the
75    subdirectory pointed to by the current link, not replace the current
76    link. 'ln -shf' can be used to replace the link but is not portable.
77    'mv -f' has the same problem.
78
79	sync
80	rm -f /backup/mirrors/box1
81	ln -s /backup/mirrors/box1.${date} /backup/mirrors/box1
82
83			PART 3 - DO AN INCREMENTAL BACKUP
84
85    An incremental backup is exactly the same as a level 0 backup EXCEPT
86    you use the -H option to specify the location of the most recent
87    completed backup.  We simply maintain the handy softlink pointing at
88    the most recent completed backup and the cpdup required to do this
89    becomes trivial.
90
91    Each day's incremental backup will reproduce the ENTIRE directory topology
92    for the client, but cpdup will hardlink files from the most recent backup
93    instead of copying them and this is what saves you all the disk space.
94
95	set date = `date "+%Y%m%d"`
96	mkdir /backup/mirrors/box1.${date}
97	if ( "`readlink /backup/mirrors/box1`" == "box1.${date}" ) then
98	    echo "silly boy, an incremental already exists for today"
99	    exit 1
100	endif
101	cpdup -H /backup/mirrors/box1 \
102	      -i0 -s0 /nfs/box1/home /backup/mirrors/box1.${date}/home
103
104    Be sure to update your 'most recent backup' softlink, but only do it
105    if the cpdup's for all the partitions for that client have succeeded.
106    That way the next incremental backup will be based on the previous one.
107
108	rm -f /backup/mirrors/box1
109	ln -s /backup/mirrors/box1.${date} /backup/mirrors/box1
110
111    Since these backups are mirrors, locating a backup is as simple
112    as CDing into the appropriate directory.  If your filesystem has a
113    hardlink limit and cpdup hits it, cpdup will 'break' the hardlink
114    and copy the file instead.  Generally speaking only a few special cases
115    will hit the hardlink limit for a filesystem.  For example, the
116    CVS/Root file in a checked out cvs repository is often hardlinked, and
117    the sheer number of hardlinked 'Root' files multiplied by the number
118    of backups can often hit the filesystem hardlink limit.
119
120		    PART 4 - DO AN INCREMENTAL VERIFIED BACKUP
121
122    Since your incremental backups use hardlinks heavily the actual file
123    might exist on the physical /backup disk in only one place even though
124    it may be present in dozens of daily mirrors.  To ensure that the
125    file being hardlinked does not get corrupted cpdup's -f option can be
126    used in conjunction with -H to force cpdup to validate the contents
127    of the file, even if all the stat info looks identical.
128
129	cpdup -f -H /backup/mirrors/box1 ...
130
131    You can create completely redundant (non-hardlinked-dependent) backups
132    by doing the equivalent of your level 0, i.e. not using -H.  However I
133    do NOT recommend that you do this, or that you do it very often (maybe
134    once every 6 months at the most), because each mirror created this way
135    will have a distinct copy of all the file data and you will quickly
136    run out of space in your /backup partition.
137
138		    MAINTENANCE OF THE "/backup" DIRECTORY
139
140    Now, clearly you are going to run out of space in /backup if you keep
141    doing this, but you may be surprised at just how many daily incrementals
142    you can create before you fill up your /backup partition.
143
144    If /backup becomes full, simply start rm -rf'ing older mirror directories
145    until enough space is freed up.   You do not have to remove the oldest
146    directory first.  In fact, you might want to keep it around and remove
147    a day's backup here, a day's backup there, etc, until you free up enough
148    space.
149
150				OFF-SITE BACKUPS
151
152    Making an off-site backup involves similar methodology, but you use
153    cpdup's remote host capability to generate the backup.  To avoid
154    complications it is usually best to take a mirror already generated on
155    your LAN backup box and copy that to the remote box.
156
157    The remote backup box does not use NFS, so setup is trivial.  Just
158    create your super-large /backup partition and mkdir /backup/mirrors.
159    Your LAN backup box will need root access via ssh to your remote backup
160    box.  See the section "SSH SECURITY TIPS" below.
161
162    You can use the handy softlink to get the latest 'box1.date' mirror
163    directory and since the mirror is all in one partition you can just
164    cpdup the entire machine in one command.  Use the same dated directory
165    name on the remote box, so:
166
167        # latest will wind up something like 'box1.20060915'
168	set latest = `readlink /backup/mirrors/box1`
169	cpdup -i0 -s0 /backup/mirrors/$latest remote.box:/backup/mirrors/$latest
170
171    As with your LAN backup, create a softlink on the backup box denoting the
172    latest mirror for any given site.
173
174	if ( $status == 0 ) then
175	    ssh remote.box -n \
176		"rm -f /backup/mirrors/box1; ln -s /backup/mirrors/$latest /backup/mirrors/box1"
177	endif
178
179    Incremental backups can be accomplished using the same cpdup command,
180    but adding the -H option to the latest backup on the remote box.  Note
181    that the -H path is relative to the remote box, not the LAN backup box
182    you are running the command from.
183
184	set latest = `readlink /backup/mirrors/box1`
185	set remotelatest = `ssh remote.box -n "readlink /backup/mirrors/box1"`
186	if ( "$latest" == "$remotelatest" ) then
187	    echo "silly boy, you already made a remote incremental backup today"
188	    exit 1
189	endif
190	cpdup -H /backup/mirrors/$remotelatest \
191	      -i0 -s0 /backup/mirrors/$latest remote.box:/backup/mirrors/$latest
192	if ( $status == 0 ) then
193	    ssh remote.box -n \
194		"rm -f /backup/mirrors/box1; ln -s /backup/mirrors/$latest /backup/mirrors/box1"
195	endif
196
197    Cleaning out the remote directory works the same as cleaning out the LAN
198    backup directory.
199
200
201			    RESTORING FROM BACKUPS
202
203    Each backup is a full filesystem mirror, and depending on how much space
204    you have you should be able to restore it simply by cd'ing into the
205    appropriate backup directory and using 'cpdup blah box1:blah' (assuming
206    root access), or you can export the backup directory via NFS to your
207    client boxes and use cpdup locally on the client to extract the backup.
208    Using NFS is probably the most efficient solution.
209
210
211			PUTTING IT ALL TOGETHER - SOME SCRIPTS
212
213    Please refer to the scripts in the script/ subdirectory.  These scripts
214    are EXAMPLES ONLY.  If you want to use them, put them in your ~root/adm
215    directory on your backup box and set up a root crontab.
216
217    First follow the preparation rules in PART 1 above.  The scripts do not
218    do this automatically.  Edit the 'params' file that the scripts use
219    to set default paths and such.
220
221	** FOLLOW DIRECTIONS IN PART 1 ABOVE TO SET UP THE LAN BACKUP BOX **
222
223    Copy the scripts to ~/adm.  Do NOT install a crontab yet (but an example
224    can be found in scripts/crontab).
225
226    Do a manual lavel 0 LAN BACKUP using the do_mirror script.
227
228	cd ~/adm
229	./do_mirror 0
230
231    Once done you can do incremental backups using './do_mirror 1' to do a
232    verified incremental, or './do_mirror 2' to do a stat-optimized
233    incremental.  You can enable the cron jobs that run do_mirror and
234    do_cleanup now.
235
236    --
237
238    Setting up an off-site backup box is trivial.  The off-site backup box
239    needs to allow root ssh logins from the LAN backup box (at least for
240    now, sorry!).  Set up the off-site backup directory, typically
241    /backup/mirrors.  Then do a level 0 backup from your LAN backup box
242    to the off-site box using the do_remote script.
243
244	cd ~/adm
245	./do_remote 0
246
247    Once done you can do incremental backups using './do_remote 1' to do a
248    verified incremental, or './do_mirror 2' to do a stat-optimized
249    incremental.  You can enable the cron jobs that run do_remote now.
250
251    NOTE!  It is NOT recommended that you use verified-incremental backups
252    over a WAN, as all related data must be copied over the wire every single
253    day.  Instead, I recommend sticking with stat-optimized backups
254    (./do_mirror 2).
255
256    You will also need to set up a daily cleaning script on the off-site
257    backup box.
258
259    SCRIPT TODOS - the ./do_cleanup script is not very smart.  We really
260    should do a tower-of-hanoi removal
261
262
263			      SSH SECURITY TIPS
264
265    To allow root access via ssh, add the following line to your sshd
266    configuration on the client boxes (typically /etc/ssh/sshd_config):
267
268	PermitRootLogin forced-commands-only
269
270    If your OpenSSH version is too old to recognize that setting, you
271    should update to a more recent version immediately.
272    Restart sshd for the settings to take effect.
273
274    On the backup machine, create a special backup key for root:
275
276	mkdir /root/.ssh	# if it doesn't already exist
277	cd /root/.ssh
278	ssh-keygen -t dsa -N "" -f backup-key
279
280    You now have a key pair, consisting of a secret key called "backup-key"
281    and a public key called "backup-key.pub".  The secret key must *NEVER*
282    leave the backup machine nor be disclosed in any way!  Note that we
283    haven't procted the secret key with a passphrase (-N "") because it
284    will be used by cron jobs where no passphrase can be entered.
285
286    On the client boxes, create a file /root/.ssh/authorized_keys.
287    It should contain just this line:
288
289	command="/usr/local/bin/cpdup -S",from="<BAKHOST>",no-pty,
290	no-port-forwarding,no-X11-forwarding,no-agent-forwarding <PUBKEY>
291
292    This must be on one long line; it has been broken up here for
293    readability only.  Note that the options must be separated by commas
294    *ONLY* (no spaces).  Replace <BAKHOST> with the IP address or DNS name
295    of the backup machine.  Replace <PUBKEY> with the contents of the
296    file /root/.ssh/backup-key.pub from the backup machine (the public key,
297    not the secret key!).  It typically starts with "ssh-dss" followed by
298    a long character sequence that looks like line noise, followed by a
299    comment that typically indicates who created the key.
300
301    The format of the authorized_keys file is documented in the sshd(8)
302    manual page.  Please refer to it for more details.
303
304    If you have done all of the above correctly, then the root user on the
305    backup machine will be able to log into the client boxes as root and
306    execute "/usr/local/bin/cpdup -S", but nothing else.
307
308    To further improve security, you can place the slave cpdup on the client
309    machine into read-only mode by adding the -R option.  In this case, the
310    line from the authorized_keys file should begin as follows:
311
312	command="/usr/local/bin/cpdup -RS",from="<BAKHOST>",etc...
313
314    If you do that, your backup server can only pull backups from the client
315    machine, but it won't be able to change anything on it.  That is, you
316    cannot use the client machine as a remote target.  So, if an attacker
317    manages to be able to execute commands on your backup machine, he won't
318    be able to do any harm to your clients.  This also protects against
319    human errors, e.g. accidentally swapping source and destination.
320
321    By the way, it doesn't really matter much whether you specify the -R
322    option when running cpdup on the backup machine.  If you do it, then
323    the -R option will be passed to the slave, but the command="..." entry
324    from the authorized_keys file overides it anyway, so the slave always
325    runs with the -R option.
326
327    When using cpdup on the backup machine, make sure that the right key is
328    used by passing the -i option to the ssh command:
329
330	cpdup -F -i/root/.ssh/backup-key ...
331
332    If one or both of the machines involved has a slow processor, it might
333    be worthwhile to use a faster encryption algorithm, for example:
334
335	cpdup -F -cblowfish-cbc ...
336
337    If your OpenSSH version has been patched to support unencrypted transfers
338    *AND* you trust the physical network between the machines involved, you
339    might want to disable encryption altogether:
340
341	cpdup -F -cnone ...
342