1$DragonFly: src/bin/cpdup/BACKUPS,v 1.4 2007/05/17 08:19:00 swildner Exp $ 2 3 INCREMENTAL BACKUP HOWTO 4 5 This document describes one of several ways to set up a LAN backup and 6 an off-site WAN backup system using cpdup's hardlinking capabilities. 7 8 The features described in this document are also encapsulated in scripts 9 which can be found in the scripts/ directory. These scripts can be used 10 to automate all backup steps except for the initial preparation of the 11 backup and off-site machine's directory topology. Operation of these 12 scripts is described in the last section of this document. 13 14 15 PART 1 - PREPARE THE LAN BACKUP BOX 16 17 The easiest way to create a LAN backup box is to NFS mount all your 18 backup clients onto the backup box. It is also possible to use cpdup's 19 remote host feature to access your client boxes but that requires root 20 access to the client boxes and is not described here. 21 22 Create a directory on the backup machine called /nfs, a subdirectory 23 foreach remote client, and subdirectories for each partition on each 24 client. Remember that cpdup does not cross mount points so you will 25 need a mount for each partition you wish to backup. For example: 26 27 [ ON LAN BACKUP BOX ] 28 29 mkdir /nfs 30 mkdir /nfs/box1 31 mkdir /nfs/box1/home 32 mkdir /nfs/box1/var 33 34 Before you actually do the NFS mount, create a dummy file for each 35 mount point that can be used by scripts to detect when an NFS mount 36 has not been done. Scripts can thus avoid a common failure scenario 37 and not accidently cpdup an empty mount point to the backup partition 38 (destroying that day's backup in the process). 39 40 touch /nfs/box1/home/NOT_MOUNTED 41 touch /nfs/box1/var/NOT_MOUNTED 42 43 Once the directory structure has been set up, do your NFS mounts and 44 also add them to your fstab. Since you will probably wind up with a 45 lot of mounts it is a good idea to use 'ro,bg' (readonly, background 46 mount) in the fstab entries. 47 48 mount box1:/home /nfs/box1/home 49 mount box1:/var /nfs/box1/var 50 51 You should create a huge /backup partition on your backup machine which 52 is capable of holding all your mirrors. Create a subdirectory called 53 /backup/mirrors in your huge backup partition. 54 55 mount <huge_disk> /backup 56 mkdir /backup/mirrors 57 58 59 PART 2 - DOING A LEVEL 0 BACKUP 60 61 (If you use the supplied scripts, a level 0 backup can be accomplished 62 simply by running the 'do_mirror' script with an argument of 0). 63 64 Create a level 0 backup using a standard cpdup with no special arguments 65 other then -i0 -s0 (tell it not to ask questions and turn off the 66 file-overwrite-with-directory safety feature). Name the mirror with 67 the date in a string-sortable format. 68 69 set date = `date "+%Y%m%d"` 70 mkdir /backup/mirrors/box1.${date} 71 cpdup -i0 -s0 /nfs/box1/home /backup/mirrors/box1.${date}/home 72 cpdup -i0 -s0 /nfs/box1/var /backup/mirrors/box1.${date}/var 73 74 Create a softlink to the most recently completed backup, which is your 75 level 0 backup. Note that using 'ln -sf' will create a link in the 76 subdirectory pointed to by the current link, not replace the current 77 link. 'ln -shf' can be used to replace the link but is not portable. 78 'mv -f' has the same problem. 79 80 sync 81 rm -f /backup/mirrors/box1 82 ln -s /backup/mirrors/box1.${date} /backup/mirrors/box1 83 84 PART 3 - DO AN INCREMENTAL BACKUP 85 86 An incremental backup is exactly the same as a level 0 backup EXCEPT 87 you use the -H option to specify the location of the most recent 88 completed backup. We simply maintain the handy softlink pointing at 89 the most recent completed backup and the cpdup required to do this 90 becomes trivial. 91 92 Each day's incremental backup will reproduce the ENTIRE directory topology 93 for the client, but cpdup will hardlink files from the most recent backup 94 instead of copying them and this is what saves you all the disk space. 95 96 set date = `date "+%Y%m%d"` 97 mkdir /backup/mirrors/box1.${date} 98 if ( "`readlink /backup/mirrors/box1`" == "box1.${date}" ) then 99 echo "silly boy, an incremental already exists for today" 100 exit 1 101 endif 102 cpdup -H /backup/mirrors/box1 \ 103 -i0 -s0 /nfs/box1/home /backup/mirrors/box1.${date}/home 104 105 Be sure to update your 'most recent backup' softlink, but only do it 106 if the cpdup's for all the partitions for that client have succeeded. 107 That way the next incremental backup will be based on the previous one. 108 109 rm -f /backup/mirrors/box1 110 ln -s /backup/mirrors/box1.${date} /backup/mirrors/box1 111 112 Since these backups are mirrors, locating a backup is as simple 113 as CDing into the appropriate directory. If your filesystem has a 114 hardlink limit and cpdup hits it, cpdup will 'break' the hardlink 115 and copy the file instead. Generally speaking only a few special cases 116 will hit the hardlink limit for a filesystem. For example, the 117 CVS/Root file in a checked out cvs repository is often hardlinked, and 118 the sheer number of hardlinked 'Root' files multiplied by the number 119 of backups can often hit the filesystem hardlink limit. 120 121 PART 4 - DO AN INCREMENTAL VERIFIED BACKUP 122 123 Since your incremental backups use hardlinks heavily the actual file 124 might exist on the physical /backup disk in only one place even though 125 it may be present in dozens of daily mirrors. To ensure that the 126 file being hardlinked does not get corrupted cpdup's -f option can be 127 used in conjunction with -H to force cpdup to validate the contents 128 of the file, even if all the stat info looks identical. 129 130 cpdup -f -H /backup/mirrors/box1 ... 131 132 You can create completely redundant (non-hardlinked-dependent) backups 133 by doing the equivalent of your level 0, i.e. not using -H. However I 134 do NOT recommend that you do this, or that you do it very often (maybe 135 once every 6 months at the most), because each mirror created this way 136 will have a distinct copy of all the file data and you will quickly 137 run out of space in your /backup partition. 138 139 MAINTAINANCE OF THE "/backup" DIRECTORY 140 141 Now, clearly you are going to run out of space in /backup if you keep 142 doing this, but you may be surprised at just how many daily incrementals 143 you can create before you fill up your /backup partition. 144 145 If /backup becomes full, simply start rm -rf'ing older mirror directories 146 until enough space is freed up. You do not have to remove the oldest 147 directory first. In fact, you might want to keep it around and remove 148 a day's backup here, a day's backup there, etc, until you free up enough 149 space. 150 151 OFF-SITE BACKUPS 152 153 Making an off-site backup involves similar methodology, but you use 154 cpdup's remote host capability to generate the backup. To avoid 155 complications it is usually best to take a mirror already generated on 156 your LAN backup box and copy that to the remote box. 157 158 The remote backup box does not use NFS, so setup is trivial. Just 159 create your super-large /backup partition and mkdir /backup/mirrors. 160 Your LAN backup box will need root access via ssh to your remote backup 161 box. 162 163 You can use the handy softlink to get the latest 'box1.date' mirror 164 directory and since the mirror is all in one partition you can just 165 cpdup the entire machine in one command. Use the same dated directory 166 name on the remote box, so: 167 168 # latest will wind up something like 'box1.20060915' 169 set latest = `readlink /backup/mirrors/box1` 170 cpdup -i0 -s0 /backup/mirrors/$latest remote.box:/backup/mirrors/$latest 171 172 As with your LAN backup, create a softlink on the backup box denoting the 173 latest mirror for any given site. 174 175 if ( $status == 0 ) then 176 ssh remote.box -n \ 177 "rm -f /backup/mirrors/box1; ln -s /backup/mirrors/$latest /backup/mirrors/box1" 178 endif 179 180 Incremental backups can be accomplished using the same cpdup command, 181 but adding the -H option to the latest backup on the remote box. Note 182 that the -H path is relative to the remote box, not the LAN backup box 183 you are running the command from. 184 185 set latest = `readlink /backup/mirrors/box1` 186 set remotelatest = `ssh remote.box -n "readlink /backup/mirrors/box1"` 187 if ( "$latest" == "$remotelatest" ) then 188 echo "silly boy, you already made a remote incremental backup today" 189 exit 1 190 endif 191 cpdup -H /backup/mirrors/$remotelatest \ 192 -i0 -s0 /backup/mirrors/$latest remote.box:/backup/mirrors/$latest 193 if ( $status == 0 ) then 194 ssh remote.box -n \ 195 "rm -f /backup/mirrors/box1; ln -s /backup/mirrors/$latest /backup/mirrors/box1" 196 endif 197 198 Cleaning out the remote directory works the same as cleaning out the LAN 199 backup directory. 200 201 202 RESTORING FROM BACKUPS 203 204 Each backup is a full filesystem mirror, and depending on how much space 205 you have you should be able to restore it simply by cd'ing into the 206 appropriate backup directory and using 'cpdup blah box1:blah' (assuming 207 root access), or you can export the backup directory via NFS to your 208 client boxes and use cpdup locally on the client to extract the backup. 209 Using NFS is probably the most efficient solution. 210 211 212 PUTTING IT ALL TOGETHER - SOME SCRIPTS 213 214 Please refer to the scripts in the script/ subdirectory. These scripts 215 are EXAMPLES ONLY. If you want to use them, put them in your ~root/adm 216 directory on your backup box and set up a root crontab. 217 218 First follow the preparation rules in PART 1 above. The scripts do not 219 do this automatically. Edit the 'params' file that the scripts use 220 to set default paths and such. 221 222 ** FOLLOW DIRECTIONS IN PART 1 ABOVE TO SET UP THE LAN BACKUP BOX ** 223 224 Copy the scripts to ~/adm. Do NOT install a crontab yet (but an example 225 can be found in scripts/crontab). 226 227 Do a manual lavel 0 LAN BACKUP using the do_mirror script. 228 229 cd ~/adm 230 ./do_mirror 0 231 232 Once done you can do incremental backups using './do_mirror 1' to do a 233 verified incremental, or './do_mirror 2' to do a stat-optimized 234 incremental. You can enable the cron jobs that run do_mirror and 235 do_cleanup now. 236 237 -- 238 239 Setting up an off-site backup box is trivial. The off-site backup box 240 needs to allow root ssh logins from the LAN backup box (at least for 241 now, sorry!). Set up the off-site backup directory, typically 242 /backup/mirrors. Then do a level 0 backup from your LAN backup box 243 to the off-site box using the do_remote script. 244 245 cd ~/adm 246 ./do_remote 0 247 248 Once done you can do incremental backups using './do_remote 1' to do a 249 verified incremental, or './do_mirror 2' to do a stat-optimized 250 incremental. You can enable the cron jobs that run do_remote now. 251 252 NOTE! It is NOT recommended that you use verified-incremental backups 253 over a WAN, as all related data must be copied over the wire every single 254 day. Instead, I recommend sticking with stat-optimized backups 255 (./do_mirror 2). 256 257 You will also need to set up a daily cleaning script on the off-site 258 backup box. 259 260 SCRIPT TODOS - the ./do_cleanup script is not very smart. We really 261 should do a tower-of-hanoi removal 262 263 264