1=encoding ISO8859-1 2 3=head1 BackupPC Introduction 4 5This documentation describes BackupPC version __VERSION__, 6released on __RELEASEDATE__. 7 8=head2 Overview 9 10BackupPC is a high-performance, enterprise-grade system for backing up 11Unix, Linux, WinXX, and MacOSX PCs, desktops and laptops to a server's 12disk. BackupPC is highly configurable and easy to install and maintain. 13 14Given the ever decreasing cost of disks and raid systems, it is now 15practical and cost effective to backup a large number of machines onto 16a server's local disk or network storage. For some sites this might be 17the complete backup solution. For other sites additional permanent 18archives could be created by periodically backing up the server to tape. 19 20Features include: 21 22=over 4 23 24=item * 25 26A clever pooling scheme minimizes disk storage and disk I/O. 27Identical files across multiple backups of the same or different PC 28are stored only once, resulting in substantial savings in disk storage 29and disk writes. 30 31=item * 32 33Compression provides additional reductions in storage, depending 34on the type of data being backed up. The CPU impact of compression 35is low since only new files (those not already in the pool) need 36to be compressed. 37 38=item * 39 40A powerful http/cgi user interface allows administrators to view 41the current status, edit configuration, add/delete hosts, view log 42files, and allows users to initiate and cancel backups and browse 43and restore files from backups. 44 45=item * 46 47The http/cgi user interface has internationalization (i18n) support, 48currently providing English, French, German, Spanish, Italian, 49Dutch, Polish, Portuguese-Brazilian, Chinese, Polish, Czech, 50Japanese, Ukrainian, and Russian. 51 52=item * 53 54No client-side software is needed. On WinXX the standard smb 55protocol is used to extract backup data. On linux, unix or MacOSX 56clients, rsync, tar (over ssh/rsh/nfs) or ftp is used to extract 57backup data. Alternatively, rsync can also be used on WinXX (using 58cygwin), since rsync provides for efficient transfers and allows 59incremental backups to detect almost all changes. 60 61=item * 62 63Flexible restore options. Single files can be downloaded from 64any backup directly from the CGI interface. Zip or Tar archives 65for selected files or directories from any backup can also be 66downloaded from the CGI interface. Finally, direct restore to 67the client machine (using smb or tar) for selected files or 68directories is also supported from the CGI interface. 69 70=item * 71 72BackupPC supports mobile environments where laptops are only 73intermittently connected to the network and have dynamic IP addresses 74(DHCP). Configuration settings allow machines connected via slower WAN 75connections (eg: dial up, DSL, cable) to not be backed up, even if they 76use the same fixed or dynamic IP address as when they are connected 77directly to the LAN. 78 79=item * 80 81Flexible configuration parameters allow multiple backups to be performed 82in parallel, specification of which shares to backup, which directories 83to backup or not backup, various schedules for full and incremental 84backups, schedules for email reminders to users and so on. Configuration 85parameters can be set system-wide or also on a per-PC basis. 86 87=item * 88 89Users are sent periodic email reminders if their PC has not 90recently been backed up. Email content, timing and policies 91are configurable. 92 93=item * 94 95BackupPC is Open Source software hosted by GitHub. 96 97=back 98 99=head2 BackupPC 4.0 100 101This is the first release of 4.0, which is a significant rewrite of 102BackupPC. This section provides a short overview of the changes and 103features in 4.0. 104 105Here's a short summary of what has changed in V4: 106 107=over 4 108 109=item * 110 111No use of hardlinks (except temporarily to do atomic renames). Reference 112counting is handled at the application level in a batch manner (hardlinks 113will still remain for any legacy V3 backups). 114 115=item * 116 117Backups are stored as "reverse deltas" - the most recent backup is always filled 118and older backups are reconstituted by merging all the deltas starting with the 119nearest future filled backup and working backwards. 120 121This is the opposite of V3 where incrementals are stored as "forward deltas" 122to a prior backup (typically the last full backup or prior lower-level 123incremental backup, or the last full in the case of rsync). 124 125=item * 126 127Since the most recent backup is filled, viewing/restoring that backup (which is 128the most common backup used) doesn't require merging any deltas from other backups. 129 130=item * 131 132The concepts of incr/full backups and unfilled/filled storage are decoupled. The most 133recent backup is always filled. By default, for the remaining backups, full backups 134are filled and incremental backups are unfilled, but that is configurable. 135 136=item * 137 138Uses full-file MD5 digests, which are stored in the directory attrib 139files. Each backup directory only contains an empty attrib file whose 140name includes its own MD5 digest, which is used to look up the attrib 141file's contents in the pool. In turn, that file contains the metadata 142for every file in that directory, including each files's MD5 digest. 143 144=item * 145 146The Pool layout still supports chains to handle md5 collisions. While collisions 147can be constructed and are now well-known, they are highly unlikely in the wild. 148Pool files are never renamed or moved, unlike V3. 149 150=item * 151 152Any backup can be deleted (deltas are merged into next older backup if it is 153not filled). 154 155=item * 156 157The reverse deltas allow "infinite incrementals" - no need for a full backup 158if you are willing to trade speed for the risk that a file change will 159not be detected if the metadata (eg, mtime or size) doesn't change. 160 161=item * 162 163An rsync "full" backup now uses --checksum (instead of --ignore-times), 164which is much more efficient on the server side - the server just needs to 165check the full-file checksum computed by the client, together with the mtime, 166nlinks, size attributes, to see if the file has changed. If you want a more 167conservative approach, you can change it back to --ignore-times, which 168requires the server to send block checksums to the client. 169 170=item * 171 172The use of rsync --checksum allows BackupPC to guess a potential match 173anywhere in the pool, even on a first-time backup. In that case, the usual 174rsync block checksums are still exchanged to make sure the complete file 175is identical. 176 177=item * 178 179Uses a modified rsync called rsync_bpc (currently based on rsync-3.0.9) 180on the server side (in place of File::RsyncP), with a C code interface 181to the BackupPC storage. So the whole data path for rsync is now in compiled 182C code, which is much faster than perl. 183 184=item * 185 186Due to the use of rsync-3.X, acls and xattrs are supported, and many other 187useful options (but not all) are supported. Rsync protocol 30 supports 188the efficient incremental file list, which significantly improves memory 189usage and startup time. It also supports MD5 full-file checksums, which 190match BackupPC's new digest. That allows a full-file digest to be checked 191as easily as an mtime on the server side. 192 193=item * 194 195Significant portions of the BackupPC code are now compiled C code in a 196new module called BackupPC::XS that is dynamically linked to perl. 197 198=back 199 200Here is a more detailed discussion: 201 202=over 4 203 204=item * 205 206Completely new backup storage. No hardlinks! Backups are stored as reverse deltas, 207with the most recent backup always filled. Prior backup "n" contains the changes 208relative to prior backup "n+1". 209 210=item * 211 212Since every backup is based on the last filled backup, the concept of incremental 213levels is removed. 214 215=item * 216 217Example: let's assume backup #4 is the most recent, and therefore filled, and 218backups #0..3 are not filled. 219 220Backups #0..3 store just the necessary reverse changes needed to 221reconstruct those backups, relative to the next backup. 222 223 - To view/restore backup #4, all the information is stored in backup #4. 224 - To view/restore backup #3, backup #4 (the filled one), is merged with the deltas in #3. 225 - To view/restore backup #2, backup #4 (the filled one), is merged with the deltas in #3 and #2 226 - etc. 227 228When a new backup is started (#5), we begin by renaming backup #4 to #5. 229At that instant, backup #4 storage is now empty (which means backups #4 230and #5 are currently identical). As the backup runs, changes are made 231to #5 with the changed/new files in place, and the opposite changes are 232added to backup #4, to keep the "view" of backup #4 unchanged. 233 234After the backup is done, #5 is now the filled version of the latest 235backup, and #4 contains the changes necessary to turn #5 back into the state 236when backup #4 was done. If there are no changes detected in the new 237backup, the storage tree for #4 will be empty. If just one file changed, 238the new file will be below #5, and the prior file will be below #4 (well, 239technically not quite true, since files aren't stored below the backup 240trees; more correctly, the attrib file in #5 will point to the new pool 241file, and the attrib file in #4 will point to the old pool file). 242 243=item * 244 245The concepts of incr/full backups and unfilled/filled storage are now 246decoupled. The most recent backup is always filled (whether or not the 247last backup was a full or incr). Certain older backups can be filled 248for convenience to make restoring old backups faster (because fewer 249backups need to be merged), and are used to specify expiry schedules. 250 251=item * 252 253When a backup starts, there are several different cases that determine 254how the backups are stored and whether prior deltas are stored: 255 256=over 4 257 258=item 1 259 260No existing backups: create a new backup #0 and do a full backup in place 261(ie: no prior deltas are stored). 262 263=item 2 264 265V3 backups exist, but no V4 backups. The last V3 backup is duplicated into 266V4 format, and a full backup is done in place (ie: no prior deltas are stored). 267 268=item 3 269 270Last V4 backup is a full, or more than $Conf{FillCycle} since last filled 271backup. The last backup is duplicated to create a new filled backup, and 272the new backup is done in place (ie: no prior deltas are stored). 273 274=item 4 275 276There are V4 backups and it's less than $Conf{FillCycle} since last one is 277filled. Renumber the last backup to #n+1, and put the reverse deltas in 278initially empty backup tree #n. 279 280=item 5 281 282CompressLevel has toggled on/off between backups. This isn't well tested and 283it's very hard to support efficiently. We treat this as a brand new (empty) backup 284in place, that is therefore filled. That way we won't need to merge between 285backups with compress on/off. 286 287=item 6 288 289Last backup was a V4 partial. If prior V4 backup is filled (and not partial), 290then just do another in-place backup. Otherwise, treat as case 4. When complete 291(whether successful or another partial), delete the prior deltas in #n, which 292merges the cumulative changes into #n-1. 293 294=back 295 296=item * 297 298The treatment of a "Partial" backup has changed. Unlike in V3 where partials are 299removed prior to the next backup, in V4 partials are kept and are used as the starting 300point for the next backup. See case 6 above. If the new backup fails, if no files 301have been backed up, the empty backup #n is removed. 302 303=item * 304 305Backups are stored as mangled directory trees, but each directory only 306contains an "attrib" file. The attrib file is zero-length, and its name 307includes the MD5 digest so the contents can be looked up in the pool. 308 309The attrib contents in the pool contains the directory contents: for each 310file, that means the metadata, xattrs and the MD5 digest of the file 311contents. 312 313=item * 314 315A modified rsync called rsync_bpc, based on rsync 3.0.9, is used on 316the server side, with a C code layer that emulates all the file-system 317OS calls to be compatible with the BackupPC store. That means for 318rsync, the data path is now fully in compiled C, which should mean a 319significant speedup. It also means many (but not all) of the rsync 320options are supported natively. 321 322=item * 323 324Significant parts of the BackupPC storage and pooling code have been written in C 325(the same code is used in the server rsync_bpc). BackupPC::FileZIO, BackupPC::PoolWrite, 326BackupPC::Attrib, BackupPC::AttribCache and BackupPC::PoolRefCnt (reference counting and 327storage) are all replaced with BackupPC::XS, a C-code perl extension. 328 329=item * 330 331Extended attributes (xattr) are supported. Rsync is configured to "store acls using xattr", 332meaning both acls and xattrs are supported. 333 334=item * 335 336infinite incrementals with rsync are supported. The most recent backup 337is always filled, so an incremental will still leave the most recent 338backup filled. 339 340=item * 341 342any V4 backup can be deleted - dependencies are merged into the next older backup 343if it isn't already filled. 344 345=item * 346 347file digests are full-file MD5. Collisions are much more unlikely than V3, 348but still possible. Duplicates are implemented with an extension to the 34916 byte MD5 digest (ie: 16 bytes for plain file, 17 bytes for next 350255 duplicates etc). 351 352=item * 353 354V4 pool files are stored in a new hierarchy, two levels deep, with 3557 bits at each level (ie: 128 directories at top-level, and each 356with 128 directories at next level). 357 358=item * 359 360V4 pool files are never moved or renamed. 361 362=item * 363 364Inodes for hardlinked files are stored in each backup tree. This makes 365backing up hardlinks accurate, compared to V3, and provides for consistent 366inode numbering across backups. 367 368=item * 369 370zero-sized files or empty attribute files don't get written or pooled. 371 372=item * 373 374the elimination of hardlinks means that reference counting has to be maintained by 375the BackupPC code. This is one of the riskiest area in terms of development 376and testing. Reference counts are maintained per-backup, per-host, and for the 377whole pool. 378 379Each operation that changes reference counts (eg: doing a new backup, deleting 380a backup, or duplicating (filling) a backup) creates one or more poolRefDelta 381files in that client's backup directory (ie: TopDir/pc/HOST/NNN). These files 382are lists of MD5 digests, and corresponding counts deltas. 383 384Each night, BackupPC_nightly runs BackupPC_refCountUpdate, which, for each 385host, updates the per-host reference count database with the new deltas. 386It then combines all the per-host reference count files to create the 387global pool reference count database. 388 389BackupPC_refCountUpdate can run concurrently with backups. If you still 390have V3 backups and pool, BackupPC_nightly still needs to run and check 391for old V3 pool files that can be deleted. But since there are no 392new V3 backups happening, BackupPC_nightly can run concurrently with 393backups. 394 395=item * 396 397There is a new utility BackupPC_fsck that can check/fix the per-host 398and global reference counts. The per-host reference count database 399is verified by parsing all the attrib files in each backup tree. 400The global reference count database is verified by combing all the 401per-host reference count databases and comparing them. 402 403BackupPC_fsck cannot run when BackupPC is. 404 405=item * 406 407When BackupPC_refCountUpdate updates the overall reference counts, it 408removes pool files that have a reference count of zero. To avoid race 409conditions, it uses a two-phase process. It first flags files that have 410zero reference counts using one of the file attributes. The next time 411it runs (typically 24 hours later), any flagged files that still have 412zero reference count are then removed. The rest of the code knows not 413to use flagged pool files to avoid race conditions. 414 415=item * 416 417Progress indication: a simple status that shows the number of files 418processed so far. It's hard to convert that to a percentage, since 419the total isn't known until the end of the backup. But knowing the 420number of files is quite helpful, since you can get an idea of the 421expected total based on the prior backups, or knowing what configuration 422you have changed (ie: adding a large new tree). 423 424=item * 425 426BackupPC_link is removed since it is no longer used. 427 428=item * 429 430Since files are no longer stored in backup trees, browsing the backup 431trees is even harder than V3 (where you just had to deal with mangling). 432A new utility BackupPC_ls acts like "ls -l", showing accurate directory 433listings of files, together with the MD5 digests. 434 435BackupPC_ls can be given either an explicit hostname, number, 436and unmangled path, or can be given the full (mangled) path, 437which makes it easier to use directory completion. It should 438be possible to configure tcsh and bash, together with some new 439hooks in BackupPC_ls, to give a more natural file/directory 440completion. 441 442BackupPC_zcat also can take just the MD5 digest (which you can paste 443from BackupPC_ls). Currently BackupPC_zcat doesn't support the tree 444parsing that BackupPC_ls does (it can only zcat actual files), but 445that should be easy to rectify. 446 447=item * 448 449Configuration for expiry: since full/incr are decoupled from filled/unfilled, 450expiry is a bit trickier. 451 452The convention for expiry parameters is "FullKeepPeriod/FullKeepCnt" 453etc refer to B<Filled> backups, and "IncrKeepPeriod/IncrKeepCnt" refer 454to B<Unfilled> backups. 455 456=item * 457 458V3 migration: nothing specific is needed. V4 can browse/view/restore 459V3 backups. When you install V4, no changes are made to any V3 backups. 460If you are upgrading from V3, be sure to set $Conf{PoolV3Enabled} to 1 so 461the old V3 pool is searched for matching files. 462 463=over 4 464 465=item * 466 467When you install V4, it will notice that the V3 pool exists. Running 468configure.pl should set $Conf{PoolV3Enabled} to 1 in that case, but 469you should be sure to check that. 470 471=item * 472 473When a V4 backup is first done, BackupPC_backupDuplicate is 474run to duplicate the most recent V3 backup to create a new V4 backup. 475A "filled" view of the most recent V3 backup is used to create 476a "filled" V4 backup tree. 477 478This step could be time consuming, since every file needs to be read 479(as a V3 file) and written as a V4 file. However, the V4 pooling 480code knows about the V3 pool, so it will move the V3 pool file 481into the V4 pool. So this duplication process doesn't burn a lot of 482pool storage space, but every file still needs to be read 483(to compute the MD5 digest) and "written" (really just 484matching/linking). 485 486=item * 487 488Expiry: all the V3 + V4 backups are considered on a combined basis 489for expiry checking. 490 491=item * 492 493On a clean new V4 install, the steps of computing and checking V3 494digests is eliminated. 495 496=item * 497 498Downgrading V4->V3: Not tested and not recommended. 499In theory you can remove any new V4 backups, remove the V4 pool 500itself, and you should be able to re-install V3 and still have 501access to your original full working V3 store (except for any 502V3 backups that V4 might have routinely removed based on normal 503backup expiry configuration). 504 505However, any V3 pool files moved to V4 will no longer be in the V3 506pool. So subsequent V3 backups will burn more storage as files 507get re-added to the old V3 pool. 508 509Hopefully downgrading isn't necessary... 510 511=back 512 513=item * 514 515Optimizations: the C code implementation should give a significant performance 516advantage, as well as the more flexible. 517 518Potential V4 optimizations that are planned, but not yet implemented, include: 519 520=over 4 521 522=item * 523 524rsync-bpc doesn't support checksum caching. 525 526=item * 527 528rsync-bpc with --ignore-times actually reads each unchanged file three times, 529and writes it once (normal rsync reads twice and writes once; the extra one 530is due to compression). Some careful optimization can eliminate two reads 531and the write. The final read can be eliminated with checksum caching. 532 533=item * 534 535BackupPC_refCountUpdate, BackupPC_fsck, BackupPC_backupDuplicate, 536BackupPC_backupDelete are all single-threaded. 537 538=back 539 540=back 541 542=head2 Backup basics 543 544=over 4 545 546=item Full Backup 547 548A full backup is a complete backup of a share. BackupPC can be configured 549to do a full backup at a regular interval (typically weekly). BackupPC 550can be configured to keep a certain number of full backups. Exponential 551expiry is also supported, allowing full backups with various vintages to 552be kept (for example, a settable number of most recent weekly fulls, plus 553a settable number of older fulls that are 2, 4, 8, or 16 weeks apart). 554 555=item Incremental Backup 556 557An incremental backup is a backup of files that have changed since the 558last successful backup. 559 560Rsync is the best option for BackupPC. Any files whose attributes 561have changed (ie: uid, gid, mtime, modes, size) since the last full 562are backed up. Deleted, new files and renamed files are detected by 563rsync incrementals. 564 565For SMB and tar, BackupPC uses the modification time (mtime) to 566determine which files have changed since the last backup. That 567means SMB and tar incrementals are not able to detect deleted files, 568renamed files or new files whose modification time is prior to the 569last lower-level backup. 570 571BackupPC can also be configured to keep a certain number of incremental 572backups, and to keep a smaller number of very old incremental backups. 573 574BackupPC "fills-in" incremental backups when browsing or restoring, 575based on the levels of each backup, giving every backup a "full" 576appearance. This makes browsing and restoring backups much easier: 577you can restore from any one backup independent of whether it was 578an incremental or full. 579 580=item Partial Backup 581 582When a full or incremental backup fails or is canceled, the most 583recent backup is labeled "partial". Prior to V4, that backup was 584incomplete, and would be deleted when the next backup completed. 585 586In V4 a partial backup denotes that the last backup is incomplete. 587However, since V4 does backup updating in place, it represents the best 588and latest backup. A partial backup can be browsed or used to restore 589files just like a successful full or incremental backup. And it will 590be used as the starting point for the next backup attempt. 591 592=item Identical Files 593 594BackupPC pools identical files. By "identical files" we mean files 595with identical contents, not necessary the same permissions, ownership 596or modification time. Two files might have different permissions, 597ownership, or modification time but will still be pooled whenever 598the contents are identical. This is possible since BackupPC stores 599the file metadata (permissions, ownership, and modification time) 600separately from the file contents. 601 602Prior to V4, identical files were stored using hardlinks. In V4+, 603hardlinks are eliminated (except for temporary atomic renames), and 604reference counting is done at the application level. 605 606=item Backup Policy 607 608Based on your site's requirements you need to decide what your backup 609policy is. BackupPC is not designed to provide exact re-imaging of 610failed disks. See L<Some Limitations> for more information. 611However, with rsync and tar transports for linux/unix clients, plus 612full support for special file types, extended attributes etc, 613likely means an exact image of a linux/unix file system can be made. 614 615BackupPC saves backups onto disk. Because of pooling you can relatively 616economically keep several weeks or months of old backups. 617 618At some sites the disk-based backup will be adequate, without a 619secondary offsite cloud, disk or tape backup. This system is robust 620to any single failure: if a client disk fails or loses files, the 621BackupPC server can be used to restore files. If the server disk 622fails, BackupPC can be restarted on a fresh file system, and create 623new backups from the clients. The chance of the server disk failing 624can be made very small by spending more money on increasingly better 625RAID systems. However, there is still the risk of catastrophic 626events like fires or earthquakes that can destroy both the BackupPC 627server and the clients it is backing up if they are physically 628nearby. 629 630Some sites might choose to do periodic backups to tape or cd/dvd. 631This backup can be done perhaps weekly using the archive function of 632BackupPC. 633 634Other users have reported success with removable disks to rotate the 635BackupPC data drives, or using rsync to mirror the BackupPC data pool 636offsite. 637 638In V4, since hardlinks are not used permanently, duplicating a V4 pool 639is much easier, allowing remote copying of the pool. 640 641=back 642 643=head2 Resources 644 645=over 4 646 647=item BackupPC home page 648 649The BackupPC project page is at: 650 651 https://backuppc.github.io/backuppc 652 653This page has links to the current documentation, github project source 654and general information. 655 656=item Github 657 658BackupPC development is hosted on github: 659 660 https://github.com/backuppc 661 662Releases for BackupPC and the required packages BackupPC-XS and rsync-bpc are 663available at: 664 665 https://github.com/backuppc/backuppc/releases 666 https://github.com/backuppc/backuppc-xs/releases 667 https://github.com/backuppc/rsync-bpc/releases 668 669=item BackupPC Wiki 670 671BackupPC has a Wiki at L<https://github.com/backuppc/backuppc/wiki>. 672Everyone is encouraged to contribute to the Wiki. Anyone with a 673Github account can edit the Wiki. 674 675=item Mailing lists 676 677Three BackupPC mailing lists exist for announcements (backuppc-announce), 678developers (backuppc-devel), and a general user list for support, asking 679questions or any other topic relevant to BackupPC (backuppc-users). 680 681The lists are archived on SourceForge: 682 683 https://sourceforge.net/p/backuppc/mailman/backuppc-users/ 684 685You can subscribe to these lists by visiting: 686 687 http://lists.sourceforge.net/lists/listinfo/backuppc-announce 688 http://lists.sourceforge.net/lists/listinfo/backuppc-users 689 http://lists.sourceforge.net/lists/listinfo/backuppc-devel 690 691The backuppc-announce list is moderated and is used only for 692important announcements (eg: new versions). It is low traffic. 693You only need to subscribe to one of backuppc-announce and 694backuppc-users: backuppc-users also receives any messages on 695backuppc-announce. 696 697The backuppc-devel list is only for developers who are working on BackupPC. 698Do not post questions or support requests there. But detailed technical 699discussions should happen on this list. 700 701To post a message to the backuppc-users list, send an email to 702 703 backuppc-users@lists.sourceforge.net 704 705Do not send subscription requests to this address! 706 707=item Other Programs of Interest 708 709If you want to mirror linux or unix files or directories to a remote server 710you should use rsync, L<http://rsync.samba.org>. BackupPC uses 711rsync as a transport mechanism; if you are already an rsync user you 712can think of BackupPC as adding efficient storage (compression and 713pooling) and a convenient user interface to rsync. 714 715Two popular open source packages that do tape backup are 716Amanda (L<http://www.amanda.org>) 717and Bacula (L<http://www.bacula.org>). 718These packages can be used as complete solutions, or also as back 719ends to BackupPC to backup the BackupPC server data to tape. 720 721Avery Pennarun's bup (L<https://github.com/bup/bup>) uses the git packfile format to 722do efficient incrementals and deduplication. 723Various programs and scripts use rsync to provide hardlinked backups. 724See, for example, Mike Rubel's site (L<http://www.mikerubel.org/computers/rsync_snapshots>), 725JW Schultz's dirvish (L<http://www.dirvish.org/>), 726Ben Escoto's rdiff-backup (L<http://www.nongnu.org/rdiff-backup>), 727and John Bowman's rlbackup (L<http://www.math.ualberta.ca/imaging/rlbackup>). 728 729BackupPC provides many additional features, such as compressed storage, 730deduplicating any matching files (rather than just files with the same name), 731and storing special files without root privileges. But these other programs 732provide simple, effective and fast solutions and are definitely worthy of 733consideration. 734 735=back 736 737=head2 Road map 738 739The new features planned for future releases of BackupPC 740are on the Wiki at L<https://github.com/backuppc/backuppc/wiki>. 741 742Comments and suggestions are welcome. 743 744=head2 You can help 745 746BackupPC is free. I work on BackupPC because I enjoy doing it and I like 747to contribute to the open source community. 748 749BackupPC already has more than enough features for my own needs. The 750main compensation for continuing to work on BackupPC is knowing that 751more and more people find it useful. So feedback is certainly 752appreciated, both positive and negative. 753 754Also, everyone is encouraged to contribute patches, bug reports, 755feature and design suggestions, new code, Wiki additions (you can 756do those directly) and documentation corrections or improvements. 757Answering questions on the mailing list is a big help too. 758 759=head1 Installing BackupPC 760 761=head2 Requirements 762 763BackupPC requires: 764 765=over 4 766 767=item * 768 769A linux, solaris, or unix based server with a substantial amount of free 770disk space (see the next section for what that means). The CPU and disk 771performance on this server will determine how many simultaneous backups 772you can run. You should be able to run 4-8 simultaneous backups on a 773moderately configured server. 774 775It is also recommended you consider either an LVM or RAID setup so that 776you can expand the file system as necessary. 777 778=item * 779 780Perl version 5.8.0 or later. If you don't have perl, please 781see L<http://www.cpan.org>. 782 783=item * 784 785The perl modules BackupPC::XS (version >= 0.50) is required, and 786several others, File::Listing, Archive::Zip, XML::RSS, Net::FTP, 787Net::FTP::RetrHandle, Net::FTP::AutoReconnect are recommended. 788 789Try "perldoc BackupPC::XS" and "perldoc Archive::Zip" to see if you have these 790modules. If not, fetch them from L<http://www.cpan.org> and see the 791instructions below for how to build and install them. 792 793The CGI Perl module is required for the http/cgi user interface. CGI was a core module, 794but from version 5.22 Perl no longer ships with it. 795 796=item * 797 798If you are using rsync to backup linux/unix machines you should have 799rsync on each client machine. Version 3+ is strongly recommended, but 800earlier versions will work too. See L<http://rsync.samba.org>. 801Use "rsync --version" to check your version. 802 803For BackupPC to use Rsync you will also need to install rsync-bpc on 804the server. 805 806=item * 807 808If you are using smb to backup WinXX machines you need smbclient and 809nmblookup from the samba package. You will also need nmblookup if 810you are backing up linux/unix DHCP machines. See L<http://www.samba.org>. 811 812See L<http://www.samba.org> for source and binaries. It's pretty easy to 813fetch and compile samba, and just grab smbclient and nmblookup, without 814doing the installation. Alternatively, L<http://www.samba.org> has binary 815distributions for most platforms. 816 817=item * 818 819If you are using tar to backup linux/unix machines, those machines should have 820version 1.13.20 or higher recommended. Use "tar --version" to check your version. 821Various GNU mirrors have the newest versions of tar; 822see L<http://www.gnu.org/software/tar/>. 823 824=item * 825 826The Apache web server, see L<http://www.apache.org>, preferably built 827with mod_perl support. 828 829=item * 830 831If rrdtool is installed on the BackupPC server, graphs of the pool usage 832will be maintained and displayed. To enable the graphs, point $Conf{RrdToolPath} 833to the rrdtool executable. 834 835=back 836 837=head2 What type of storage space do I need? 838 839Starting with 4.0.0, BackupPC no longer uses hardlinks for storage of 840deduplicated files. However, hardlinks are still used temporarily in 841a few places for doing atomic renames, with a fallback doing a file copy 842if the hardlink fails, and files are moved (renamed) across various paths 843that turn into expensive file copies if they span multiple file systems. 844 845So ideally BackupPC's data store (__TOPDIR__) is a single file system that 846supports hardlinks. It is ok to use a single symbolic link at the top-level 847directory (__TOPDIR__) to point the entire data store somewhere else). 848You can of course use any kind of RAID system or logical volume manager 849that combines the capacity of multiple disks into a single, larger, 850file system. Such approaches have the advantage that the file system can 851be expanded without having to copy it. 852 853Any standard linux or unix file system supports hardlinks. NFS mounted 854file systems work too (provided the underlying file system supports 855hardlinks). But windows based FAT and NTFS file systems will not work. 856 857In BackupPC 3.x, hardlinks are fundamental to deduplication, so a startup 858check is done ensure that the file system can support hardlinks, since 859this is a common area of configuration problems in v3. In 4.x, that check 860is only done if the pool still contains v3 backups and pool files. 861 862=head2 How much disk space do I need? 863 864Here's one real example (circa 2002) for an environment that is 865backing up 65 laptops with compression off. Each full backup averages 8663.2GB. Each incremental backup averages about 0.2GB. Storing one 867full backup and two incremental backups per laptop is around 240GB 868of raw data. But because of the pooling of identical files, only 86987GB is used. This is without compression. 870 871Another example, with compression on: backing up 95 laptops, where 872each backup averages 3.6GB and each incremental averages about 0.3GB. 873Keeping three weekly full backups, and six incrementals is around 8741200GB of raw data. Because of pooling and compression, only 150GB 875is needed. 876 877Here's a rule of thumb. Add up the disk usage of all the machines you 878want to backup (210GB in the first example above). This is a rough 879minimum space estimate that should allow a couple of full backups and at 880least half a dozen incremental backups per machine. If compression is on 881you can reduce the storage requirements by maybe 30-40%. Add some margin 882in case you add more machines or decide to keep more old backups. 883 884Your actual mileage will depend upon the types of clients, operating 885systems and applications you have. The more uniform the clients and 886applications the bigger the benefit from pooling common files. 887 888In addition to total disk space, you should make sure you have 889plenty of inodes on your BackupPC data partition. Some users have 890reported running out of inodes on their BackupPC data partition. 891So even if you have plenty of disk space, BackupPC will report 892failures when the inodes are exhausted. This is a particular 893problem with ext2/ext3 file systems that have a fixed number of 894inodes when the file system is built. Use "df -i" to see your 895inode usage. 896 897=head2 Step 1: Getting BackupPC 898 899Many linux distributions now include BackupPC, so installing 900BackupPC via your package manager is the best approach. 901 902For example, for Debian, supported by Ludovic Drolez, can be found at 903L<http://packages.debian.org/backuppc> and is included in the current 904stable Debian release. On Debian, BackupPC can be installed with 905the command: 906 907 apt-get install backuppc 908 909You should also install rsync-bpc; the BackupPC package might include 910it already, but if not: 911 912 apt-get install rsync-bpc 913 914If those commands work, you can skip to Step 3. 915 916Alternatively, manually fetching and installing BackupPC is easy. 917Start by downloading the latest version from 918 919 https://github.com/backuppc/backuppc/releases 920 921=head2 Step 2: Installing the distribution 922 923Note: most information in this step is only relevant if you build 924and install BackupPC yourself. If you use a package provided by a 925distribution, the package management system should take of installing 926any needed dependencies. 927 928First off, there are several perl modules you should install. The 929first one, BackupPC::XS, is required. The others are optional 930but highly recommended. Use either your linux package manager, 931or the cpan command, or follow the instructions in the README files 932to install these packages: 933 934=over 4 935 936=item BackupPC::XS 937 938Significant portions of BackupPC are implemented in C code contained in 939this module. You can run "perldoc BackupPC::XS" to see if this module 940is installed. You need to have version >= 0.50. BackupPC::XS is 941available from: 942 943 https://github.com/backuppc/backuppc-xs/releases 944 945and also CPAN. 946 947=item Archive::Zip 948 949To support restore via Zip archives you will need to install 950Archive::Zip, also from L<http://www.cpan.org>. 951You can run "perldoc Archive::Zip" to see if this module is installed. 952 953=item XML::RSS 954 955To support the RSS feature you will need to install XML::RSS, also from 956L<http://www.cpan.org>. There is not need to install this module if you 957don't plan on using RSS. You can run "perldoc XML::RSS" to see if this 958module is installed. 959 960=item CGI 961 962The CGI Perl module is required for the http/cgi user interface. CGI was a core module, 963but from version 5.22 Perl no longer ships with it so you'll need to install it if you 964are using a recent version of perl. 965 966=item SCGI 967 968The SCGI Perl module is required to use the S/CGI protocol for the http/cgi user interface. 969 970=item File::Listing, Net::FTP, Net::FTP::RetrHandle, Net::FTP::AutoReconnect 971 972To use ftp with BackupPC you will need four libraries, but actually 973need to install only File::Listing from L<http://www.cpan.org>. 974You can run "perldoc File::Listing" to see if this module is installed. 975Net::FTP is a standard module. Net::FTP::RetrHandle and 976Net::FTP::AutoReconnect included in BackupPC distribution. 977 978=back 979 980To build and install these packages you should use the cpan command. At 981the prompt, type 982 983 install BackupPC::XS 984 985Alternatively, if you want to install these manually, you can fetch the tarball 986from L<http://www.cpan.org> and then run these commands: 987 988 tar zxvf BackupPC-XS-0.50.tar.gz 989 cd BackupPC-XS-0.50 990 perl Makefile.PL 991 make 992 make test 993 make install 994 995The same sequence of commands can be used for each module. 996 997Next, you should install rsync_bpc if you want to use rsync to backup clients 998(which is the recommended approach for all client types). If you don't use 999your package manager, fetch the release from: 1000 1001 https://github.com/backuppc/rsync-bpc/releases 1002 1003Then run these commands (updating the version number as appropriate): 1004 1005 tar zxf rsync-bpc-3.0.9.5.tar.gz 1006 cd rsync-bpc-3.0.9.5 1007 ./configure 1008 make 1009 make install 1010 1011Now let's move onto BackupPC itself. After fetching BackupPC-__VERSION__.tar.gz, 1012run these commands as root: 1013 1014 tar zxf BackupPC-__VERSION__.tar.gz 1015 cd BackupPC-__VERSION__ 1016 perl configure.pl 1017 1018The configure.pl script also accepts command-line options if you 1019wish to run it in a non-interactive manner. It has self-contained 1020documentation for all the command-line options, which you can 1021read with perldoc: 1022 1023 perldoc configure.pl 1024 1025Starting with BackupPC 3.0.0, the configure.pl script by default 1026complies with the file system hierarchy (FHS) conventions. The 1027major difference compared to earlier versions is that by default 1028configuration files will be stored in /etc/BackupPC 1029rather than below the data directory, __TOPDIR__/conf, 1030and the log files will be stored in /var/log/BackupPC 1031rather than below the data directory, __TOPDIR__/log. 1032 1033Note that distributions may choose to use different locations for 1034BackupPC files than these defaults. 1035 1036If you are upgrading from an earlier version the configure.pl script 1037will keep the configuration files and log files in their original 1038location. 1039 1040When you run configure.pl you will be prompted for the full paths 1041of various executables, and you will be prompted for the following 1042information. 1043 1044=over 4 1045 1046=item BackupPC User 1047 1048It is best if BackupPC runs as a special user, eg backuppc, that has 1049limited privileges. It is preferred that backuppc belongs to a system 1050administrator group so that sysadmin members can browse BackupPC files, 1051edit the configuration files and so on. Although configurable, the 1052default settings leave group read permission on pool files, so make 1053sure the BackupPC user's group is chosen restrictively. 1054 1055On this installation, this is __BACKUPPCUSER__. 1056 1057For security purposes you might choose to configure the BackupPC 1058user with the shell set to /bin/false. Since you might need to 1059run some BackupPC programs as the BackupPC user for testing 1060purposes, you can use the -s option to su to explicitly run 1061a shell, eg: 1062 1063 su -s /bin/bash __BACKUPPCUSER__ 1064 1065Depending upon your configuration you might also need the -l option. 1066 1067If the -s option is not available on your operating system, you can 1068specify the -m option to use your login shell as invoked shell: 1069 1070 su -m __BACKUPPCUSER__ 1071 1072=item Data Directory 1073 1074You need to decide where to put the data directory, below which 1075all the BackupPC data is stored. This needs to be a big file system. 1076 1077On this installation, this is __TOPDIR__. 1078 1079=item Install Directory 1080 1081You should decide where the BackupPC scripts, libraries and documentation 1082should be installed, eg: /usr/local/BackupPC. 1083 1084On this installation, this is __INSTALLDIR__. 1085 1086=item CGI bin Directory 1087 1088You should decide where the BackupPC CGI script resides. This will 1089usually be below Apache's cgi-bin directory. 1090 1091It is also possible to use a different directory and use Apache's 1092``<Directory>'' directive to specify that location. See the Apache 1093HTTP Server documentation for additional information. 1094 1095On this installation, this is __CGIDIR__. 1096 1097=item Apache image Directory 1098 1099A directory where BackupPC's images are stored so that Apache can 1100serve them. You should ensure this directory is readable by Apache and 1101create a symlink to this directory from the BackupPC CGI bin Directory. 1102 1103=item Config and Log Directories 1104 1105In this installation the configuration and log directories are 1106located in the following locations: 1107 1108 __CONFDIR__/config.pl main config file 1109 __CONFDIR__/hosts hosts file 1110 __CONFDIR__/pc/HOST.pl per-pc config file 1111 __LOGDIR__/BackupPC log files, pid, status 1112 1113The configure.pl script doesn't prompt for these locations but 1114they can be set for new installations using command-line options. 1115 1116=back 1117 1118=head2 Step 3: Setting up config.pl 1119 1120After running configure.pl, browse through the config file, 1121__CONFDIR__/config.pl, and make sure all the default settings are 1122correct. In particular, you will need to decide whether to use 1123smb, tar,or rsync or ftp transport (or whether to set it on a 1124per-PC basis) and set the relevant parameters for that transport 1125method. See the section L<Step 5: Client Setup> for 1126more details. 1127 1128=head2 Step 4: Setting up the hosts file 1129 1130The file __CONFDIR__/hosts contains the list of clients to backup. 1131BackupPC reads this file in three cases: 1132 1133=over 4 1134 1135=item * 1136 1137Upon startup. 1138 1139=item * 1140 1141When BackupPC is sent a HUP (-1) signal. Assuming you installed the 1142init.d script, you can also do this with "/etc/init.d/backuppc reload". 1143 1144=item * 1145 1146When the modification time of the hosts file changes. BackupPC 1147checks the modification time once during each regular wakeup. 1148 1149=back 1150 1151Whenever you change the hosts file (to add or remove a host) you can 1152either do a kill -HUP BackupPC_pid or simply wait until the next regular 1153wakeup period. 1154 1155Each line in the hosts file contains three fields, separated 1156by whitespace: 1157 1158=over 4 1159 1160=item Host name 1161 1162This is typically the hostname or NetBios name of the client machine 1163and should be in lowercase. The hostname can contain spaces (escape 1164with a backslash), but it is not recommended. 1165 1166Please read the section L<How BackupPC Finds Hosts>. 1167 1168In certain cases you might want several distinct clients to refer 1169to the same physical machine. For example, you might have a database 1170you want to backup, and you want to bracket the backup of the database 1171with shutdown/restart using $Conf{DumpPreUserCmd} and $Conf{DumpPostUserCmd}. 1172But you also want to backup the rest of the machine while the database 1173is still running. In the case you can specify two different clients in 1174the host file, using any mnemonic name (eg: myhost_mysql and myhost), and 1175use $Conf{ClientNameAlias} in myhost_mysql's config.pl to specify the 1176real hostname of the machine. 1177 1178=item DHCP flag 1179 1180Starting with v2.0.0 the way hosts are discovered has changed and now 1181in most cases you should specify 0 for the DHCP flag, even if the host 1182has a dynamically assigned IP address. 1183Please read the section L<How BackupPC Finds Hosts> 1184to understand whether you need to set the DHCP flag. 1185 1186You only need to set DHCP to 1 if your client machine doesn't 1187respond to the NetBios multicast request: 1188 1189 nmblookup myHost 1190 1191but does respond to a request directed to its IP address: 1192 1193 nmblookup -A W.X.Y.Z 1194 1195If you do set DHCP to 1 on any client you will need to specify the range of 1196DHCP addresses to search is specified in $Conf{DHCPAddressRanges}. 1197 1198Note also that the $Conf{ClientNameAlias} feature does not work for 1199clients with DHCP set to 1. 1200 1201=item User name 1202 1203This should be the unix login/email name of the user who "owns" or uses 1204this machine. This is the user who will be sent email about this 1205machine, and this user will have permission to stop/start/browse/restore 1206backups for this host. Leave this blank if no specific person should 1207receive email or be allowed to stop/start/browse/restore backups 1208for this host. Administrators will still have full permissions. 1209 1210=item More users 1211 1212Additional usernames, separated by commas and with no whitespace, 1213can be specified. These users will also have full permission in 1214the CGI interface to stop/start/browse/restore backups for this host. 1215These users will not be sent email about this host. 1216 1217=back 1218 1219The first non-comment line of the hosts file is special: it contains 1220the names of the columns and should not be edited. 1221 1222Here's a simple example of a hosts file: 1223 1224 host dhcp user moreUsers 1225 farside 0 craig jim,dave 1226 larson 1 gary andy 1227 1228=head2 Step 5: Client Setup 1229 1230Four methods for getting backup data from a client are supported: 1231smb, tar, rsync and ftp. Smb or rsync are the preferred methods 1232for WinXX clients and rsync or tar are the preferred methods for 1233linux/unix/MacOSX clients. 1234 1235The transfer method is set using the $Conf{XferMethod} configuration 1236setting. If you have a mixed environment (ie: you will use smb for some 1237clients and tar for others), you will need to pick the most common 1238choice for $Conf{XferMethod} for the main config.pl file, and then 1239override it in the per-PC config file for those hosts that will use 1240the other method. (Or you could run two completely separate instances 1241of BackupPC, with different data directories, one for WinXX and the 1242other for linux/unix, but then common files between the different 1243machine types will duplicated.) 1244 1245Here are some brief client setup notes: 1246 1247=over 4 1248 1249=item WinXX 1250 1251One setup for WinXX clients is to set $Conf{XferMethod} to "smb". 1252Actually, rsyncd is the better method for WinXX if you are prepared to 1253run rsync/cygwin on your WinXX client. 1254 1255If you want to use rsyncd for WinXX clients you can find a pre-packaged 1256exe installer on L<https://github.com/backuppc/cygwin-rsyncd/releases>. 1257The package is called cygwin-rsync. It contains rsync.exe, template setup files 1258and the minimal set of cygwin libraries for everything to run. The README file 1259contains instructions for running rsync as a service, so it starts 1260automatically everytime you boot your machine. If you use rsync 1261to backup WinXX machines, be sure to set $Conf{ClientCharset} 1262correctly (eg: 'cp1252') so that the WinXX filename encoding is 1263correctly converted to utf8. 1264 1265Otherwise, to use SMB, you can either create shares for the data you want 1266to backup or your can use the existing C$ share. To create a new 1267share, open "My Computer", right click on the drive (eg: C), and 1268select "Sharing..." (or select "Properties" and select the "Sharing" 1269tab). In this dialog box you can enable sharing, select the share name 1270and permissions. 1271 1272All Windows NT based OS (NT, 2000, XP Pro), are configured by default 1273to share the entire C drive as C$. This is a special share used for 1274various administration functions, one of which is to grant access to backup 1275operators. All you need to do is create a new domain user, specifically 1276for backup. Then add the new backup user to the built in "Backup 1277Operators" group. You now have backup capability for any directory on 1278any computer in the domain in one easy step. This avoids using 1279administrator accounts and only grants permission to do exactly what you 1280want for the given user, i.e.: backup. 1281Also, for additional security, you may wish to deny the ability for this 1282user to logon to computers in the default domain policy. 1283 1284If this machine uses DHCP you will also need to make sure the 1285NetBios name is set. Go to Control Panel|System|Network Identification 1286(on Win2K) or Control Panel|System|Computer Name (on WinXP). 1287Also, you should go to Control Panel|Network Connections|Local Area 1288Connection|Properties|Internet Protocol (TCP/IP)|Properties|Advanced|WINS 1289and verify that NetBios is not disabled. 1290 1291The relevant configuration settings are $Conf{SmbShareName}, 1292$Conf{SmbShareUserName}, $Conf{SmbSharePasswd}, $Conf{SmbClientPath}, 1293$Conf{SmbClientFullCmd}, $Conf{SmbClientIncrCmd} and 1294$Conf{SmbClientRestoreCmd}. 1295 1296BackupPC needs to know the smb share username and password for a 1297client machine that uses smb. The username is specified in 1298$Conf{SmbShareUserName}. There are four ways to tell BackupPC the 1299smb share password: 1300 1301=over 4 1302 1303=item * 1304 1305As an environment variable BPC_SMB_PASSWD set before BackupPC starts. 1306If you start BackupPC manually the BPC_SMB_PASSWD variable must be set 1307manually first. For backward compatibility for v1.5.0 and prior, the 1308environment variable PASSWD can be used if BPC_SMB_PASSWD is not set. 1309Warning: on some systems it is possible to see environment variables of 1310running processes. 1311 1312=item * 1313 1314Alternatively the BPC_SMB_PASSWD setting can be included in 1315/etc/init.d/backuppc, in which case you must make sure this file 1316is not world (other) readable. 1317 1318=item * 1319 1320As a configuration variable $Conf{SmbSharePasswd} in 1321__CONFDIR__/config.pl. If you put the password 1322here you must make sure this file is not world (other) readable. 1323 1324=item * 1325 1326As a configuration variable $Conf{SmbSharePasswd} in the per-PC 1327configuration file (__CONFDIR__/pc/$host.pl or 1328__TOPDIR__/pc/$host/config.pl in non-FHS versions of BackupPC). 1329You will have to use this option if the smb share password is different 1330for each host. If you put the password here you must make sure this file 1331is not world (other) readable. 1332 1333=back 1334 1335Placement and protection of the smb share password is a significant 1336security issue, so please double-check the file and directory 1337permissions. In a future version there might be support for 1338encryption of this password, but a private key will still have to 1339be stored in a protected place. Suggestions are welcome. 1340 1341As an alternative to setting $Conf{XferMethod} to "smb" (using 1342smbclient) for WinXX clients, you can use an smb network filesystem (eg: 1343ksmbfs or similar) on your linux/unix server to mount the share, 1344and then set $Conf{XferMethod} to "tar" (use tar on the network 1345mounted file system). 1346 1347Also, to make sure that filenames with special characters are correctly 1348transferred by smbclient you should make sure that the smb.conf file 1349has (for samba 3.x): 1350 1351 [global] 1352 unix charset = UTF8 1353 1354UTF8 is the default setting, so if the parameter is missing then it 1355is ok. With this setting $Conf{ClientCharset} should be empty, 1356since smbclient has already converted the filenames to utf8. 1357 1358=item Linux/Unix 1359 1360The preferred setup for linux/unix clients is to set $Conf{XferMethod} 1361to "rsync", "rsyncd" or "tar". 1362 1363You can use either rsync, smb, or tar for linux/unix machines. Smb requires 1364that the Samba server (smbd) be run to provide the shares. Since the smb 1365protocol can't represent special files like symbolic links and fifos, 1366tar and rsync are the better transport methods for linux/unix machines. 1367(In fact, by default samba makes symbolic links look like the file or 1368directory that they point to, so you could get an infinite loop if a 1369symbolic link points to the current or parent directory. If you really 1370need to use Samba shares for linux/unix backups you should turn off the 1371"follow symlinks" samba config setting. See the smb.conf manual page.) 1372 1373Important note: many linux systems use sparse files for /var/log/lastlog, 1374and have large special files below /proc and /run. Make sure you 1375exclude those directories and files when you configure your client. 1376 1377The requirements for each Xfer Method are: 1378 1379=over 4 1380 1381=item rsync 1382 1383To use rsync, you need rsync-bpc installed on the BackupPC server. 1384 1385On the client, you should have at least rsync 3.x. Rsync is run on 1386the remote client via ssh. 1387 1388The relevant configuration settings are $Conf{RsyncClientPath}, 1389$Conf{RsyncSshArgs}, $Conf{RsyncShareName}, $Conf{RsyncArgs}, 1390$Conf{RsyncArgsExtra}, $Conf{RsyncFullArgsExtra}, and $Conf{RsyncRestoreArgs}. 1391 1392=item rsyncd 1393 1394To use rsync, you need rsync-bpc installed on the BackupPC server. 1395 1396On the client, you should have at least rsync 3.x. In this case the 1397rsync daemon should be running on the client machine and BackupPC 1398connects directly to it. 1399 1400The relevant configuration settings are $Conf{RsyncBackupPCPath}, 1401$Conf{RsyncdClientPort}, $Conf{RsyncdUserName}, $Conf{RsyncdPasswd}, 1402$Conf{RsyncShareName}, $Conf{RsyncArgs}, $Conf{RsyncArgsExtra}, and 1403$Conf{RsyncRestoreArgs}. $Conf{RsyncShareName} is the name of an rsync 1404module (ie: the thing in square brackets in rsyncd's conf file -- see 1405rsyncd.conf), not a file system path. 1406 1407Be aware that rsyncd will remove the leading '/' from path names in 1408symbolic links if you specify "use chroot = no" in the rsynd.conf file. 1409See the rsyncd.conf manual page for more information. 1410 1411=item tar 1412 1413You must have GNU tar on the client machine. Use "tar --version" 1414or "gtar --version" to verify. The version should be at least 14151.13.20. Tar is run on the client machine via rsh or ssh. 1416 1417The relevant configuration settings are $Conf{TarClientPath}, 1418$Conf{TarShareName}, $Conf{TarClientCmd}, $Conf{TarFullArgs}, 1419$Conf{TarIncrArgs}, and $Conf{TarClientRestoreCmd}. 1420 1421=item ftp 1422 1423FTP Xfer Method is supported in V4 but not recommended since it only 1424handles minimal metadata, it doesn't support hardlinks or special 1425files, and can only restore regular files (not symbolic links etc). 1426 1427You need to be running an ftp server on the client machine. 1428The relevant configuration settings are $Conf{FtpShareName}, 1429$Conf{FtpUserName}, $Conf{FtpPasswd}, $Conf{FtpBlockSize}, 1430$Conf{FtpPort}, $Conf{FtpTimeout}, and $Conf{FtpFollowSymlinks}. 1431 1432=back 1433 1434You need to set $Conf{ClientCharset} to the client's charset so that 1435filenames are correctly converted to utf8. Use "locale charmap" 1436on the client to see its charset. Note, however, that modern versions 1437of smbclient and rsync handle this conversion automatically, so in 1438most cases you won't need to set $Conf{ClientCharset}. 1439 1440For linux/unix machines you should not backup "/proc". This directory 1441contains a variety of files that look like regular files but they are 1442special files that don't need to be backed up (eg: /proc/kcore is a 1443regular file that contains physical memory). See $Conf{BackupFilesExclude}. 1444It is safe to backup /dev since it contains mostly character-special 1445and block-special files, which are correctly handed by BackupPC 1446(eg: backing up /dev/hda5 just saves the block-special file information, 1447not the contents of the disk). Similarly, on many linux systems, 1448/var/log/lastlog is a sparse file, with a very large apparent size, 1449so you should exclude that too. 1450 1451Alternatively, rather than backup all the file systems as a single 1452share ("/"), it is easier to restore a single file system if you backup 1453each file system separately. To do this you should list each file system 1454mount point in $Conf{TarShareName} or $Conf{RsyncShareName}, and add the 1455--one-file-system option to $Conf{TarClientCmd} or $Conf{RsyncArgs}. 1456In this case there is no need to exclude /proc explicitly since it looks 1457like a different file system. 1458 1459Ssh allows BackupPC to run as a privileged user on the client (eg: 1460root), since it needs sufficient permissions to read all the backup 1461files. Ssh is setup so that BackupPC on the server (an otherwise low 1462privileged user) can ssh as root on the client, without being prompted 1463for a password. However, directly enabled ssh root logins is not 1464good practice. A better approach is the ssh as a regular user, and 1465then configure sudo to allow just rsync to be executed. 1466 1467There are two common versions of ssh: v1 and v2. Here are some 1468instructions for one way to setup ssh. (Check which version of SSH 1469you have by typing "ssh" or "man ssh".) 1470 1471=item MacOSX 1472 1473In general this should be similar to Linux/Unix machines. 1474In versions 10.4 and later, the native MacOSX tar works, 1475and also supports resource forks. xtar is another option, 1476and rsync works too (although the MacOSX-supplied rsync 1477has an extension for extended attributes that is not 1478compatible with standard rsync). 1479 1480=item SSH Setup 1481 1482SSH is a secure way to run tar or rsync on a backup client to extract 1483the data. SSH provides strong authentication and encryption of 1484the network data. 1485 1486Note that if you run rsyncd (rsync daemon), ssh is not used. 1487In this case, rsyncd provides its own authentication, but there 1488is no encryption of network data. If you want encryption of 1489network data you can use ssh to create a tunnel, or use a 1490program like stunnel. 1491 1492Setup instructions for ssh can be found on the 1493Wiki at L<https://github.com/backuppc/backuppc/wiki>. 1494 1495=item Clients that use DHCP 1496 1497If a client machine uses DHCP BackupPC needs some way to find the 1498IP address given the hostname. One alternative is to set dhcp 1499to 1 in the hosts file, and BackupPC will search a pool of IP 1500addresses looking for hosts. More efficiently, it is better to 1501set dhcp = 0 and provide a mechanism for BackupPC to find the 1502IP address given the hostname. 1503 1504For WinXX machines BackupPC uses the NetBios name server to determine 1505the IP address given the hostname. 1506For unix machines you can run nmbd (the NetBios name server) from 1507the Samba distribution so that the machine responds to a NetBios 1508name request. See the manual page and Samba documentation for more 1509information. 1510 1511Alternatively, you can set $Conf{NmbLookupFindHostCmd} to any command 1512that returns the IP address given the hostname. 1513 1514Please read the section L<How BackupPC Finds Hosts> 1515for more details. 1516 1517=back 1518 1519=head2 Step 6: Running BackupPC 1520 1521The installation contains an init.d backuppc script that can be copied 1522to /etc/init.d so that BackupPC can auto-start on boot. 1523See init.d/README for further instructions. 1524 1525BackupPC should be ready to start. If you installed the init.d script, 1526then you should be able to run BackupPC with: 1527 1528 /etc/init.d/backuppc start 1529 1530(This script can also be invoked with "stop" to stop BackupPC and "reload" 1531to tell BackupPC to reload config.pl and the hosts file.) 1532 1533Otherwise, just run 1534 1535 __INSTALLDIR__/bin/BackupPC -d 1536 1537as user __BACKUPPCUSER__. The -d option tells BackupPC to run as a daemon 1538(ie: it does an additional fork). 1539 1540Any immediate errors will be printed to stderr and BackupPC will quit. 1541Otherwise, look in __LOGDIR__/LOG and verify that BackupPC reports 1542it has started and all is ok. 1543 1544=head2 Step 7: Talking to BackupPC 1545 1546You should verify that BackupPC is running by using BackupPC_serverMesg. 1547This sends a message to BackupPC via the unix (or TCP) socket and prints 1548the response. Like all BackupPC programs, BackupPC_serverMesg 1549should be run as the BackupPC user (__BACKUPPCUSER__), so you 1550should 1551 1552 su __BACKUPPCUSER__ 1553 1554before running BackupPC_serverMesg. If the BackupPC user is 1555configured with /bin/false as the shell, you can use the -s 1556option to su to explicitly run a shell, eg: 1557 1558 su -s /bin/bash __BACKUPPCUSER__ 1559 1560Depending upon your configuration you might also need 1561the -l option. 1562 1563If the -s option is not available on your operating system, you can 1564specify the -m option to use your login shell as invoked shell: 1565 1566 su -m __BACKUPPCUSER__ 1567 1568You can request status information and start and stop backups using this 1569interface. This socket interface is mainly provided for the CGI interface 1570(and some of the BackupPC subprograms use it too). But right now we just 1571want to make sure BackupPC is happy. Each of these commands should 1572produce some status output: 1573 1574 __INSTALLDIR__/bin/BackupPC_serverMesg status info 1575 __INSTALLDIR__/bin/BackupPC_serverMesg status jobs 1576 __INSTALLDIR__/bin/BackupPC_serverMesg status hosts 1577 1578The output should be some hashes printed with Data::Dumper. If it 1579looks cryptic and confusing, and doesn't look like an error message, 1580then all is ok. 1581 1582The hosts status should produce a list of every host you have listed 1583in __CONFDIR__/hosts as part of a big cryptic output line. 1584 1585You can also request that all hosts be queued: 1586 1587 __INSTALLDIR__/bin/BackupPC_serverMesg backup all 1588 1589At this point you should make sure the CGI interface works since 1590it will be much easier to see what is going on. We'll get to that 1591shortly. 1592 1593=head2 Step 8: Checking email delivery 1594 1595The script BackupPC_sendEmail sends status and error emails to 1596the administrator and users. It is usually run each night 1597by BackupPC_nightly. 1598 1599To verify that it can run sendmail and deliver email correctly 1600you should ask it to send a test email to you: 1601 1602 su __BACKUPPCUSER__ 1603 __INSTALLDIR__/bin/BackupPC_sendEmail -u MYNAME@MYDOMAIN.COM 1604 1605BackupPC_sendEmail also takes a -c option that checks if BackupPC 1606is running, and it sends an email to $Conf{EMailAdminUserName} 1607if it is not. That can be used as a keep-alive check by adding 1608 1609 __INSTALLDIR__/bin/BackupPC_sendEmail -c 1610 1611to __BACKUPPCUSER__'s cron. 1612 1613The -t option to BackupPC_sendEmail causes it to print the email 1614message instead of invoking sendmail to deliver the message. 1615 1616=head2 Step 9: CGI interface 1617 1618The CGI interface script, BackupPC_Admin, is a powerful and flexible 1619way to see and control what BackupPC is doing. It is written for an 1620Apache server. If you don't have Apache, see L<http://www.apache.org>. 1621 1622There are three options for setting up the CGI interface: 1623 1624=over 4 1625 1626=item SCGI 1627 1628New to 4.x, SCGI uses the SCGI interface to Apache, which requires 1629the mod_scgi.so module to be installed and loaded by Apache. This 1630allows Apache to run as any unprivileged user. The actual SCGI 1631server runs as the as the BackupPC user (__BACKUPPCUSER__), and 1632handles the requests from Apache via a TCP socket. 1633 1634=item mod_perl 1635 1636Mod_perl required the mod_perl module to be loaded by Apache. This 1637allows BackupPC_Admin to be run from inside Apache. Unlike SCGI, 1638using mod_perl with BackupPC_Admin requires a dedicated Apache to 1639be run as the BackupPC user (__BACKUPPCUSER__). This is because 1640BackupPC_Admin needs permission to access various files in BackupPC's 1641data directories. 1642 1643=item standard 1644 1645The standard mode, which is significantly slower than SCGI or 1646mod_perl, is where Apache runs BackupPC_Admin as a separate process 1647for every request. This adds significant startup overhead for every 1648request, and also requires that BackupPC_Admin be run as setuid to 1649the BackupPC user (__BACKUPPCUSER__), if Apache isn't being run as 1650that user. Setuid scripts are discouraged, so the preference is to 1651use SCGI or mod_perl. 1652 1653=back 1654 1655Here are some specifics for each setup: 1656 1657=over 4 1658 1659=item SCGI Setup 1660 1661First you need to install mod_scgi. If you can't find a pre-built 1662package, the source is available at L<http://python.ca/scgi>. The 1663release has subdirectories for apache1 and apache2. Pick your 1664matching version (nowadays most likely apache2). You'll need apxs, 1665the Apache Extension Tool, installed to build from source. Once 1666compiled, the module mod_scgi.so should be installed via the Makefile. 1667 1668To enable the SCGI server, set $Conf{SCGIServerPort} to an available 1669non-privileged TCP port number, eg: 10268. The matching port number 1670has to appear in the Apache configuration file. Typical Apache 1671configuration entries will look like this: 1672 1673 LoadModule scgi_module modules/mod_scgi.so 1674 SCGIMount /BackupPC_Admin 127.0.0.1:10268 1675 <Location /BackupPC_Admin> 1676 AuthUserFile /etc/httpd/conf/passwd 1677 AuthType basic 1678 AuthName "access" 1679 require valid-user 1680 </Location> 1681 1682Or a typical Nginx configuration file: 1683 1684 server { 1685 listen 80; 1686 server_name yourBackupPCServerHost; 1687 1688 root /var/www/backuppc; 1689 1690 access_log /var/log/nginx/backuppc.access.log; 1691 error_log /var/log/nginx/backuppc.error.log; 1692 1693 location /BackupPC_Admin { 1694 auth_basic "BackupPC"; 1695 auth_basic_user_file conf.d/backuppc.users; 1696 1697 include scgi_params; 1698 scgi_pass 127.0.0.1:10268; 1699 scgi_param REMOTE_USER $remote_user; 1700 scgi_param SCRIPT_NAME $document_uri; 1701 } 1702 } 1703 1704This allows the SCGI interface to be accessed with a URL: 1705 1706 http://yourBackupPCServerHost/BackupPC_Admin 1707 1708You can use a different path or name if you prefer a different URL. 1709Unlike traditional CGI, there is no need to specify a valid path to 1710a CGI script. 1711 1712Important security warning!! The SCGIServerPort must not be 1713accessible by anyone untrusted. That means you can't allow 1714untrusted users access to the BackupPC server, and you should 1715block the SCGIServerPort TCP port on the BackupPC server. If you 1716don't understand what that means, or can't confirm you have 1717configured SCGI securely, then don't enable SCGI - use one of 1718the following two methods!! 1719 1720=item Mod_perl Setup 1721 1722The advantage of the mod_perl setup is that no setuid script is 1723needed (like in the standard method below), and there is a significant 1724performance advantage. Not only does all the perl code need to be 1725parsed just once, the config.pl and hosts files, plus the connection 1726to the BackupPC server are cached between requests. The typical 1727speedup is around 10-15x. 1728 1729To use mod_perl you need to run Apache as user __BACKUPPCUSER__. 1730If you need to run multiple Apaches for different services then 1731you need to create multiple top-level Apache directories, each 1732with their own config file. You can make copies of /etc/init.d/httpd 1733and use the -d option to httpd to point each http to a different 1734top-level directory. Or you can use the -f option to explicitly 1735point to the config file. Multiple Apache's will run on different 1736Ports (eg: 80 is standard, 8080 is a typical alternative port accessed 1737via http://yourhost.com:8080). 1738 1739Inside BackupPC's Apache http.conf file you should check the 1740settings for ServerRoot, DocumentRoot, User, Group, and Port. See 1741L<http://httpd.apache.org/docs/server-wide.html> for more details. 1742 1743For mod_perl, BackupPC_Admin should not have setuid permission, so 1744you should turn it off: 1745 1746 chmod u-s __CGIDIR__/BackupPC_Admin 1747 1748To tell Apache to use mod_perl to execute BackupPC_Admin, add this 1749to Apache's 1.x httpd.conf file: 1750 1751 <IfModule mod_perl.c> 1752 PerlModule Apache::Registry 1753 PerlTaintCheck On 1754 <Location /cgi-bin/BackupPC/BackupPC_Admin> # <--- change path as needed 1755 SetHandler perl-script 1756 PerlHandler Apache::Registry 1757 Options ExecCGI 1758 PerlSendHeader On 1759 </Location> 1760 </IfModule> 1761 1762Apache 2.0.44 with Perl 5.8.0 on RedHat 7.1, Don Silvia reports that 1763this works (with tweaks from Michael Tuzi): 1764 1765 LoadModule perl_module modules/mod_perl.so 1766 PerlModule Apache2 1767 1768 <Directory /path/to/cgi/> 1769 SetHandler perl-script 1770 PerlResponseHandler ModPerl::Registry 1771 PerlOptions +ParseHeaders 1772 Options +ExecCGI 1773 Order deny,allow 1774 Deny from all 1775 Allow from 192.168.0 1776 AuthName "Backup Admin" 1777 AuthType Basic 1778 AuthUserFile /path/to/user_file 1779 Require valid-user 1780 </Directory> 1781 1782There are other optimizations and options with mod_perl. For 1783example, you can tell mod_perl to preload various perl modules, 1784which saves memory compared to loading separate copies in every 1785Apache process after they are forked. See Stas's definitive 1786mod_perl guide at L<http://perl.apache.org/guide>. 1787 1788=item Standard Setup 1789 1790The CGI interface should have been installed by the configure.pl script 1791in __CGIDIR__/BackupPC_Admin. BackupPC_Admin should have been installed 1792as setuid to the BackupPC user (__BACKUPPCUSER__), in addition to user 1793and group execute permission. 1794 1795You should be very careful about permissions on BackupPC_Admin and 1796the directory __CGIDIR__: it is important that normal users cannot 1797directly execute or change BackupPC_Admin, otherwise they can access 1798backup files for any PC. You might need to change the group ownership 1799of BackupPC_Admin to a group that Apache belongs to so that Apache 1800can execute it (don't add "other" execute permission!). 1801The permissions should look like this: 1802 1803 ls -l __CGIDIR__/BackupPC_Admin 1804 -swxr-x--- 1 __BACKUPPCUSER__ web 82406 Jun 17 22:58 __CGIDIR__/BackupPC_Admin 1805 1806The setuid script won't work unless perl on your machine was installed 1807with setuid emulation. This is likely the problem if you get an error 1808saying such as "Wrong user: my userid is 25, instead of 150", meaning 1809the script is running as the httpd user, not the BackupPC user. 1810This is because setuid scripts are disabled by the kernel in most 1811flavors of unix and linux. 1812 1813To see if your perl has setuid emulation, see if there is a program 1814called sperl5.8.0 (or sperl5.8.2 etc, based on your perl version) 1815in the place where perl is installed. If you can't find this program, 1816then you have two options: rebuild and reinstall perl with the setuid 1817emulation turned on (answer "y" to the question "Do you want to do 1818setuid/setgid emulation?" when you run perl's configure script), or 1819switch to the mod_perl alternative for the CGI script (which doesn't 1820need setuid to work). 1821 1822=back 1823 1824BackupPC_Admin requires that users are authenticated by Apache. 1825Specifically, it expects that Apache sets the REMOTE_USER environment 1826variable when it runs. There are several ways to do this. One way 1827is to create a .htaccess file in the cgi-bin directory that looks like: 1828 1829 AuthGroupFile /etc/httpd/conf/group # <--- change path as needed 1830 AuthUserFile /etc/http/conf/passwd # <--- change path as needed 1831 AuthType basic 1832 AuthName "access" 1833 require valid-user 1834 1835You will also need "AllowOverride Indexes AuthConfig" in the Apache 1836httpd.conf file to enable the .htaccess file. Alternatively, everything 1837can go in the Apache httpd.conf file inside a Location directive. The 1838list of users and password file above can be extracted from the NIS 1839passwd file. 1840 1841One alternative is to use LDAP. In Apache's http.conf add these lines: 1842 1843 LoadModule auth_ldap_module modules/auth_ldap.so 1844 AddModule auth_ldap.c 1845 1846 # cgi-bin - auth via LDAP (for BackupPC) 1847 <Location /cgi-bin/BackupPC/BackupPC_Admin> # <--- change path as needed 1848 AuthType Basic 1849 AuthName "BackupPC login" 1850 # replace MYDOMAIN, PORT, ORG and CO as needed 1851 AuthLDAPURL ldap://ldap.MYDOMAIN.com:PORT/o=ORG,c=CO?uid?sub?(objectClass=*) 1852 require valid-user 1853 </Location> 1854 1855If you want to disable the user authentication you can set 1856$Conf{CgiAdminUsers} to '*', which allows any user to have 1857full access to all hosts and backups. In this case the REMOTE_USER 1858environment variable does not have to be set by Apache. 1859 1860Alternatively, you can force a particular username by getting Apache 1861to set REMOTE_USER, eg, to hard code the user to www you could add 1862this to Apache's httpd.conf: 1863 1864 <Location /cgi-bin/BackupPC/BackupPC_Admin> # <--- change path as needed 1865 Setenv REMOTE_USER www 1866 </Location> 1867 1868Finally, you should also edit the config.pl file and adjust, as necessary, 1869the CGI-specific settings. They're near the end of the config file. In 1870particular, you should specify which users or groups have administrator 1871(privileged) access: see the config settings $Conf{CgiAdminUserGroup} 1872and $Conf{CgiAdminUsers}. Also, the configure.pl script placed various 1873images into $Conf{CgiImageDir} that BackupPC_Admin needs to serve 1874up. You should make sure that $Conf{CgiImageDirURL} is the correct 1875URL for the image directory. 1876 1877See the section L<Fixing installation problems> for suggestions on debugging the Apache authentication setup. 1878 1879=head2 How BackupPC Finds Hosts 1880 1881Starting with v2.0.0 the way hosts are discovered has changed. In most 1882cases you should specify 0 for the DHCP flag in the conf/hosts file, 1883even if the host has a dynamically assigned IP address. 1884 1885BackupPC (starting with v2.0.0) looks up hosts with DHCP = 0 in this manner: 1886 1887=over 4 1888 1889=item * 1890 1891First DNS is used to lookup the IP address given the client's name 1892using perl's gethostbyname() function. This should succeed for machines 1893that have fixed IP addresses that are known via DNS. You can manually 1894see whether a given host have a DNS entry according to perl's 1895gethostbyname function with this command: 1896 1897 perl -e 'print(gethostbyname("myhost") ? "ok\n" : "not found\n");' 1898 1899=item * 1900 1901If gethostbyname() fails, BackupPC then attempts a NetBios multicast to 1902find the host. Provided your client machine is configured properly, 1903it should respond to this NetBios multicast request. Specifically, 1904BackupPC runs a command of this form: 1905 1906 nmblookup myhost 1907 1908If this fails you will see output like: 1909 1910 querying myhost on 10.10.255.255 1911 name_query failed to find name myhost 1912 1913If it is successful you will see output like: 1914 1915 querying myhost on 10.10.255.255 1916 10.10.1.73 myhost<00> 1917 1918Depending on your netmask you might need to specify the -B option to 1919nmblookup. For example: 1920 1921 nmblookup -B 10.10.1.255 myhost 1922 1923If necessary, experiment with the nmblookup command which will return the 1924IP address of the client given its name. Then update 1925$Conf{NmbLookupFindHostCmd} with any necessary options to nmblookup. 1926 1927=back 1928 1929For hosts that have the DHCP flag set to 1, these machines are 1930discovered as follows: 1931 1932=over 4 1933 1934=item * 1935 1936A DHCP address pool ($Conf{DHCPAddressRanges}) needs to be specified. 1937BackupPC will check the NetBIOS name of each machine in the range using 1938a command of the form: 1939 1940 nmblookup -A W.X.Y.Z 1941 1942where W.X.Y.Z is each candidate address from $Conf{DHCPAddressRanges}. 1943Any host that has a valid NetBIOS name returned by this command (ie: 1944matching an entry in the hosts file) will be backed up. You can 1945modify the specific nmblookup command if necessary via $Conf{NmbLookupCmd}. 1946 1947=item * 1948 1949You only need to use this DHCP feature if your client machine doesn't 1950respond to the NetBios multicast request: 1951 1952 nmblookup myHost 1953 1954but does respond to a request directed to its IP address: 1955 1956 nmblookup -A W.X.Y.Z 1957 1958=back 1959 1960=head2 Other installation topics 1961 1962=over 4 1963 1964=item Removing a client 1965 1966If there is a machine that no longer needs to be backed up (eg: a retired 1967machine) you have two choices. First, you can keep the backups accessible 1968and browsable, but disable all new backups. Alternatively, you can 1969completely remove the client and all its backups. 1970 1971To disable backups for a client $Conf{BackupsDisable} can be 1972set to two different values in that client's per-PC config.pl file: 1973 1974=over 4 1975 1976=item 1 1977 1978Don't do any regular backups on this machine. Manually 1979requested backups (via the CGI interface) will still occur. 1980 1981=item 2 1982 1983Don't do any backups on this machine. Manually requested 1984backups (via the CGI interface) will be ignored. 1985 1986=back 1987 1988This will still allow the client's old backups to be browsable 1989and restorable. 1990 1991To completely remove a client and all its backups, you should remove its 1992entry in the conf/hosts file, and then delete the __TOPDIR__/pc/$host 1993directory. Whenever you change the hosts file, you should send 1994BackupPC a HUP (-1) signal so that it re-reads the hosts file. 1995If you don't do this, BackupPC will automatically re-read the 1996hosts file at the next regular wakeup. 1997 1998Note that when you remove a client's backups you won't initially 1999recover much disk space. That's because the client's files are 2000still in the pool. Overnight, when BackupPC_nightly next runs, 2001all the unused pool files will be deleted and this will recover 2002the disk space used by the client's backups. 2003 2004=item Copying the pool 2005 2006If the pool disk requirements grow you might need to copy the entire 2007data directory to a new (bigger) file system. Hopefully you are lucky 2008enough to avoid this by having the data directory on a RAID file system 2009or LVM that allows the capacity to be grown in place by adding disks. 2010 2011Backups prior to V4 make extensive use of hardlinks. So unless you have 2012a virgin V4 installation, your file system will contain large numbers 2013of hardlinks. This makes it hard to copy. 2014 2015Prior to V4 (or a V4 upgrade to a V3 installation), the backup data 2016directories contain large numbers of hardlinks. If you try to copy 2017the pool the target directory will occupy a lot more space if the 2018hardlinks aren't re-established. 2019 2020Unless you have a pure V4 installation, the best way to copy a pool 2021file system, if possible, is by copying the raw device at the block 2022level (eg: using dd). Application level programs that understand 2023hardlinks include the GNU cp program with the -a option and rsync -H. 2024However, the large number of hardlinks in the pool will make the 2025memory usage large and the copy very slow. Don't forget to stop 2026BackupPC while the copy runs. 2027 2028If you have a pure V4 installation, copying the pool and PC backup 2029directories should be quite easy. Rsync 3.x should work well. 2030 2031=back 2032 2033=head2 Fixing installation problems 2034 2035If you find a solution to your problem that could help other users 2036please add it to the Wiki at L<https://github.com/backuppc/backuppc/wiki>. 2037 2038=head1 Restore functions 2039 2040BackupPC supports several different methods for restoring files. The 2041most convenient restore options are provided via the CGI interface. 2042Alternatively, backup files can be restored using manual commands. 2043 2044=head2 CGI restore options 2045 2046By selecting a host in the CGI interface, a list of all the backups 2047for that machine will be displayed. By selecting the backup number 2048you can navigate the shares and directory tree for that backup. 2049 2050BackupPC's CGI interface automatically fills incremental backups 2051with the corresponding full backup, which means each backup has 2052a filled appearance. Therefore, there is no need to do multiple 2053restores from the incremental and full backups: BackupPC does all 2054the hard work for you. You simply select the files and directories 2055you want from the correct backup vintage in one step. 2056 2057You can download a single backup file at any time simply by selecting 2058it. Your browser should prompt you with the filename and ask you 2059whether to open the file or save it to disk. 2060 2061Alternatively, you can select one or more files or directories in 2062the currently selected directory and select "Restore selected files". 2063(If you need to restore selected files and directories from several 2064different parent directories you will need to do that in multiple 2065steps.) 2066 2067If you select all the files in a directory, BackupPC will replace 2068the list of files with the parent directory. You will be presented 2069with a screen that has three options: 2070 2071=over 4 2072 2073=item Option 1: Direct Restore 2074 2075With this option the selected files and directories are restored 2076directly back onto the host, by default in their original location. 2077Any old files with the same name will be overwritten, so use caution. 2078You can optionally change the target hostname, target share name, 2079and target path prefix for the restore, allowing you to restore the 2080files to a different location. 2081 2082Once you select "Start Restore" you will be prompted one last time 2083with a summary of the exact source and target files and directories 2084before you commit. When you give the final go ahead the restore 2085operation will be queued like a normal backup job, meaning that it 2086will be deferred if there is a backup currently running for that host. 2087When the restore job is run, smbclient, tar, rsync or rsyncd is used 2088(depending upon $Conf{XferMethod}) to actually restore the files. 2089Sorry, there is currently no option to cancel a restore that has been 2090started. Currently ftp restores are not fully implemented. 2091 2092A record of the restore request, including the result and list of 2093files and directories, is kept. It can be browsed from the host's 2094home page. $Conf{RestoreInfoKeepCnt} specifies how many old restore 2095status files to keep. 2096 2097Note that for direct restore to work, the $Conf{XferMethod} must 2098be able to write to the client. For example, that means an SMB 2099share for smbclient needs to be writable, and the rsyncd module 2100needs "read only" set to "false". This creates additional security 2101risks. If you only create read-only SMB shares (which is a good 2102idea), then the direct restore will fail. You can disable the 2103direct restore option by setting $Conf{SmbClientRestoreCmd}, 2104$Conf{TarClientRestoreCmd} and $Conf{RsyncRestoreArgs} to undef. 2105 2106=item Option 2: Download Zip archive 2107 2108With this option a zip file containing the selected files and directories 2109is downloaded. The zip file can then be unpacked or individual files 2110extracted as necessary on the host machine. The compression level can be 2111specified. A value of 0 turns off compression. 2112 2113When you select "Download Zip File" you should be prompted where to 2114save the restore.zip file. 2115 2116BackupPC does not consider downloading a zip file as an actual 2117restore operation, so the details are not saved for later browsing 2118as in the first case. However, a mention that a zip file was 2119downloaded by a particular user, and a list of the files, does 2120appear in BackupPC's log file. 2121 2122=item Option 3: Download Tar archive 2123 2124This is identical to the previous option, except a tar file is downloaded 2125rather than a zip file (and there is currently no compression option). 2126 2127=back 2128 2129=head2 Command-line restore options 2130 2131Apart from the CGI interface, BackupPC allows you to restore files 2132and directories from the command line. The following programs can 2133be used: 2134 2135=over 4 2136 2137=item BackupPC_zcat 2138 2139For each filename argument it inflates (uncompresses) the file and 2140writes it to stdout. To use BackupPC_zcat you could give it the 2141full filename, eg: 2142 2143 __INSTALLDIR__/bin/BackupPC_zcat __TOPDIR__/pc/host/5/fc/fcraig/fexample.txt > example.txt 2144 2145It's your responsibility to make sure the file is really compressed: 2146BackupPC_zcat doesn't check which backup the requested file is from. 2147BackupPC_zcat returns a nonzero status if it fails to uncompress 2148a file. 2149 2150In V4, BackupPC_zcat can be invoked in several other ways: 2151 2152 BackupPC_zcat file... 2153 BackupPC_zcat MD5_digest... 2154 BackupPC_zcat $TopDir/pc/host/num/share/mangledPath... 2155 BackupPC_zcat [-h host] [-n num] [-s share] clientPath... 2156 2157For example, you can do this: 2158 2159 BackupPC_zcat d73955e08410dfc5ea8069b05d2f43b2 2160 2161That digest can be pasted from the output of BackupPC_ls. 2162 2163The last form uses unmangled paths, so you can do this: 2164 2165 BackupPC_zcat -h HOST -n 10 -s / /home/craig/file 2166 2167You can also mix real paths with unmangled paths. Both of these versions work: 2168 2169 BackupPC_zcat /data/BackupPC/pc/HOST/10/fhome/fcraig/ffile 2170 BackupPC_zcat /data/BackupPC/pc/HOST/10/home/craig/file 2171 2172=item BackupPC_tarCreate 2173 2174BackupPC_tarCreate creates a tar file for any files or directories in 2175a particular backup. Merging of incrementals is done automatically, 2176so you don't need to worry about whether certain files appear in the 2177incremental or full backup. 2178 2179The usage is: 2180 2181 BackupPC_tarCreate [options] files/directories... 2182 Required options: 2183 -h host host from which the tar archive is created 2184 -n dumpNum dump number from which the tar archive is created 2185 A negative number means relative to the end (eg -1 2186 means the most recent dump, -2 2nd most recent etc). 2187 -s shareName share name from which the tar archive is created; 2188 can be "*" to mean all shares. 2189 2190 Other options: 2191 -t print summary totals 2192 -r pathRemove path prefix that will be replaced with pathAdd 2193 -p pathAdd new path prefix 2194 -b BLOCKS BLOCKS x 512 bytes per record (default 20; same as tar) 2195 -w writeBufSz write buffer size (default 1048576 = 1MB) 2196 -e charset charset for encoding filenames (default: value of 2197 $Conf{ClientCharset} when backup was done) 2198 -l just print a file listing; don't generate an archive 2199 -L just print a detailed file listing; don't generate an archive 2200 2201The command-line files and directories are relative to the specified 2202shareName. The tar file is written to stdout. 2203 2204The -h, -n and -s options specify which dump is used to generate 2205the tar archive. The -r and -p options can be used to relocate 2206the paths in the tar archive so extracted files can be placed 2207in a location different from their original location. 2208 2209=item BackupPC_zipCreate 2210 2211BackupPC_zipCreate creates a zip file for any files or directories in 2212a particular backup. Merging of incrementals is done automatically, 2213so you don't need to worry about whether certain files appear in the 2214incremental or full backup. 2215 2216The usage is: 2217 2218 BackupPC_zipCreate [options] files/directories... 2219 Required options: 2220 -h host host from which the zip archive is created 2221 -n dumpNum dump number from which the tar archive is created 2222 A negative number means relative to the end (eg -1 2223 means the most recent dump, -2 2nd most recent etc). 2224 -s shareName share name from which the zip archive is created 2225 2226 Other options: 2227 -t print summary totals 2228 -r pathRemove path prefix that will be replaced with pathAdd 2229 -p pathAdd new path prefix 2230 -c level compression level (default is 0, no compression) 2231 -e charset charset for encoding filenames (default: utf8) 2232 2233The command-line files and directories are relative to the specified 2234shareName. The zip file is written to stdout. The -h, -n and -s 2235options specify which dump is used to generate the zip archive. The 2236-r and -p options can be used to relocate the paths in the zip archive 2237so extracted files can be placed in a location different from their 2238original location. 2239 2240=item BackupPC_ls 2241 2242In V3, a full (or filled) backup tree contains all the files, albeit with "mangled" 2243names, and the file contents are compressed. Some users found it convenient to 2244directly navigate a PC's backup tree to check for files. 2245 2246In V4 that is not possible, since only a single attrib file is stored per directory 2247in the PC backup tree, so the directory contents aren't visible without looking in 2248the attrib file. 2249 2250A new utility BackupPC_ls (like "ls") can be used to view PC backup trees. It shows file digests, 2251which can be pasted to BackupPC_zcat if you want to view the file contents. The arguments 2252are similar to BackupPC_zcat. The usage is: 2253 2254 BackupPC_ls [-iR] [-h host] [-n bkupNum] [-s shareName] dirs/files... 2255 2256The -i option will show inodes (inode number and number of links). The -R option recurses into 2257directories. 2258 2259If you don't specify -h, -n and -s, then you can specify the real file system path instead. 2260For example, the following three commands are equivalent: 2261 2262 BackupPC_ls -h HOST -n 10 -s cDrive /home/craig/file 2263 BackupPC_ls /data/BackupPC/pc/HOST/10/fcDrive/fhome/fcraig/ffile 2264 BackupPC_ls /data/BackupPC/pc/HOST/10/cDrive/home/craig/file 2265 2266As you can see, the portion of the full path after the backup number can 2267be either mangled or not. Note that using the mangled form allows directory-name 2268completion via the shell, since those directories actually exist. 2269 2270It would be great if someone would like to volunteer to add features to BackupPC_ls 2271to make file and directory completion work with unmangled names via the shell. In 2272tcsh you can specify a completion program to run - BackupPC_ls could be given special 2273arguments to spit out the potential (unmangled) completions. I'm not sure how bash 2274does this. 2275 2276=back 2277 2278Each of these programs reside in __INSTALLDIR__/bin. 2279 2280=head1 Archive functions 2281 2282BackupPC supports archiving to removable media. For users that require 2283offsite backups, BackupPC can create archives that stream to tape 2284devices, or create files of specified sizes to fit onto cd or dvd media. 2285 2286Each archive type is specified by a BackupPC host with its XferMethod 2287set to 'archive'. This allows for multiple configurations at sites where 2288there might be a combination of tape and cd/dvd backups being made. 2289 2290BackupPC provides a menu that allows one or more hosts to be archived. 2291The most recent backup of each host is archived using BackupPC_tarCreate, 2292and the output is optionally compressed and split into fixed-sized 2293files (eg: 650MB). 2294 2295The archive for each host is done by default using 2296__INSTALLDIR__/bin/BackupPC_archiveHost. This script can be copied 2297and customized as needed. 2298 2299=head2 Configuring an Archive Host 2300 2301To create an Archive Host, add it to the hosts file just as any other host 2302and call it a name that best describes the type of archive, e.g. ArchiveDLT 2303 2304To tell BackupPC that the Host is for Archives, create a config.pl file in 2305the Archive Hosts's pc directory, adding the following line: 2306 2307$Conf{XferMethod} = 'archive'; 2308 2309To further customise the archive's parameters you can add the changed 2310parameters in the host's config.pl file. The parameters are explained in 2311the config.pl file. Parameters may be fixed or the user can be allowed 2312to change them (eg: output device). 2313 2314The per-host archive command is $Conf{ArchiveClientCmd}. By default 2315this invokes 2316 2317 __INSTALLDIR__/bin/BackupPC_archiveHost 2318 2319which you can copy and customize as necessary. 2320 2321=head2 Starting an Archive 2322 2323In the web interface, click on the Archive Host you wish to use. You will see a 2324list of previous archives and a summary on each. By clicking the "Start Archive" 2325button you are presented with the list of hosts and the approximate backup size 2326(note this is raw size, not projected compressed size) Select the hosts you wish 2327to archive and press the "Archive Selected Hosts" button. 2328 2329The next screen allows you to adjust the parameters for this archive run. 2330Press the "Start the Archive" to start archiving the selected hosts with the 2331parameters displayed. 2332 2333=head2 Starting an Archive from the command line 2334 2335The script BackupPC_archiveStart can be used to start an archive from 2336the command line (or cron etc). The usage is: 2337 2338 BackupPC_archiveStart archiveHost userName hosts... 2339 2340This creates an archive of the most recent backup of each of 2341the specified hosts. The first two arguments are the archive 2342host and the username making the request. 2343 2344=head1 Other Command Line Utilities 2345 2346These utilities are automatically run by BackupPC when needed. You don't 2347need to manually run these utilities. 2348 2349=over 2350 2351=item BackupPC_attribPrint 2352 2353BackupPC_attribPrint prints the contents of an attrib file. Usage: 2354 2355 BackupPC_attribPrint attribPath 2356 BackupPC_attribPrint inodePath/inodeNum 2357 2358=item BackupPC_backupDelete 2359 2360BackupPC_backupDelete deletes an entire backup, or a directory path within a backup. Usage: 2361 2362 BackupPC_backupDelete -h host -n num [-p] [-l] [-r] [-s shareName [dirs...]] 2363 Options: 2364 -h host hostname 2365 -n num backup number to delete 2366 -s shareName don't delete the backup; delete just this share 2367 (or only dirs below this share if specified) 2368 -p don't print progress information 2369 -l don't remove XferLOG files 2370 -r do a ref count update (default: none) 2371 If a shareName is specified, just that share (or share/dirs) are deleted. 2372 The backup itself is not deleted, nor is the log file removed. 2373 2374=item BackupPC_backupDuplicate 2375 2376BackupPC_backupDuplicate duplicates the last backup, which is used to create a filled backup 2377copy, and also to convert a V3 backup to a new V4 starting point. Usage: 2378 2379 BackupPC_backupDuplicate -h host [-p] 2380 Options: 2381 -h host hostname 2382 -p don't print progress information 2383 2384=item BackupPC_fixupBackupSummary 2385 2386BackupPC_fixupBackupSummary is used to re-create the backups file for all the hosts if it 2387is damaged or deleted. Usage: 2388 2389 BackupPC_fixupBackupSummary [-l] 2390 Options: 2391 -l legacy mode: try to reconstruct backups from LOG 2392 files for backups prior to BackupPC v3.0. 2393 2394=item BackupPC_fsck 2395 2396BackupPC_fsck can only be run manually, and only while BackupPC isn't running. It updates 2397the host reference counts, the overall pool reference counts and stats. Usage: 2398 2399 BackupPC_fsck [options] 2400 Options: 2401 -f force regeneration of per-host reference counts 2402 -n don't remove zero count pool files - print only 2403 -s recompute pool stats 2404 2405=item BackupPC_migrateV3toV4 2406 2407If you upgraded an existing 3.x installation, BackupPC 4.x is backward compatible with 3.x backups: 2408it can browse, view and restore files. However, the existing 3.x backups will still use hardlinks 2409for storage, and until those 3.x backups eventually expire, hardlinks will still be used for 3.x 2410backups. 2411 2412BackupPC_migrateV3toV4 is an optional utility that can migrate existing 3.x backups to 4.x stoage 2413format, eliminating hardlinks. This allows you to eliminate the old V3 pool and you can then 2414set $Conf{PoolV3Enabled} to 0. 2415 2416 BackupPC_migrateV3toV4 -a [-m] [-p] [-v] 2417 BackupPC_migrateV3toV4 -h host [-n V3backupNum] [-m] [-p] [-v] 2418 Options: 2419 -a migrate all hosts and all backups 2420 -h host migrate just a specific host 2421 -n V3backupNum migrate specific host backup; does all V3 backups 2422 for that host if not specified 2423 -m don't migrate anything; just print what would be done 2424 -p don't print progress information 2425 -v verbose 2426 2427The BackupPC server should not be running when you run BackupPC_migrateV3toV4. 2428It will check and exit if the BackupPC server is running. 2429 2430If you want to test BackupPC_migrateV3toV4, a cautious approach is to make 2431backup copies of the V3 backups, allowing you to restore them if there is 2432any issue. For example, if exampleHost has three 3.x backups numbered 5, 24336, 7, you can use cp -prl (preserving hardlinks) to make copies: 2434 2435 cd /data/BackupPC/pc/exampleHost 2436 mv 5 5.orig ; cp -prl 5.orig 5 2437 mv 6 6.orig ; cp -prl 6.orig 6 2438 mv 7 7.orig ; cp -prl 7.orig 7 2439 cp backups backups.save 2440 2441 BackupPC_migrateV3toV4 -h exampleHost -n 5 2442 BackupPC_migrateV3toV4 -h exampleHost -n 6 2443 BackupPC_migrateV3toV4 -h exampleHost -n 7 2444 2445If you want to put things back the way they were: 2446 2447 rm -rf 5 ; mv 5.orig 5 2448 rm -rf 6 ; mv 6.orig 6 2449 rm -rf 7 ; mv 7.orig 7 2450 # copy the [567] lines from backups.save into backups; 2451 # only do "cp backups.save backups" if you are sure no 2452 # new backups have been done 2453 2454Two important things to note with BackupPC_migrateV3toV4. First, V4 2455storage does use more filesystem inodes than V3 (that's the small cost 2456of getting rid of hardlinks). In particular, each directory in a backup 2457tree uses two inodes in V4 (one for the directory, and one for the (empty) 2458attrib file), and only one inode in V3 (one for the directory, and the 2459attrib and all other files are hardlinked to the pool). So before you run 2460BackupPC_migrateV3toV4, make sure you have enough inodes in __TOPDIR__; 2461use df -i to make sure you are under 45% inode usage. 2462 2463Secondly, if you run BackupPC_migrateV3toV4 on all your backups, the 2464old V3 pool should be empty, except for old-style attrib files, which 2465should all have only one link since no backups should reference them any 2466longer. Before you turn off the V3 pool by setting $Conf{PoolV3Enabled} 2467to 0, make sure BackupPC_nightly has run enough times (specifically, 2468$Conf{PoolSizeNightlyUpdatePeriod} times) so that the V3 pool can be 2469emptied. You could do this manually, but only if you are very careful 2470to check that the remaining files only have one link. 2471 2472=item BackupPC_poolCntPrint 2473 2474BackupPC_poolCntPrint is used to print reference count information, either per-backup, 2475per-host or for the entire pool depending on the file path you use. 2476 2477If you provide a hex md5 digest, the entire pool count for that digest is printed. 2478Usage: 2479 2480 BackupPC_poolCntPrint [poolCntFilePath|hexDigest]... 2481 2482=item BackupPC_refCountUpdate 2483 2484BackupPC_refCountUpdate is used to either update the per-backup and 2485per-host reference counts, or the system-wide reference counts. It 2486is used by BackupPC_dump, BackupPC_nightly, BackupPC_backupDelete, 2487BackupPC_backupDuplicate and BackupPC_fsck. Usage: 2488 2489 BackupPC_refCountUpdate -h HOST [-c] [-f] [-F] [-o N] [-p] [-v] 2490 With no other args, updates count db on backups with poolCntDelta files 2491 and computers the host's total reference counts. Also builds refCnt for 2492 any >=4.0 backups without refCnts. 2493 -f - do an fsck on this HOST, which involves a rebuild of the 2494 last two backup refCnts. poolCntDelta files are ignored. 2495 Also forces fsck if requested by needFsck flag files 2496 in TopDir/pc/HOST/refCnt. Equivalent to -o 2. 2497 -F - rebuild all the >=4.0 per-backup refCnt files for this 2498 host. Equivalent to -o 3. 2499 -c - compare current count db to new db before replacing 2500 -o N - override $Conf{RefCntFsck}. 2501 -p - don't show progress 2502 -v - verbose 2503 Notes: in case there are legacy (ie: <=4.0.0alpha3) unapplied poolCntDelta 2504 files in TopDir/pc/HOST/refCnt then the -f flag is turned on. 2505 2506 BackupPC_refCountUpdate -m [-f] [-p] [-c] [-r N-M] [-s] [-v] [-P phase] 2507 -m Updates main count db, based on each HOST 2508 -f - do an fsck on all the hosts, ignoring poolCntDelta files, 2509 and replacing each host's count db. Will wait for backups 2510 to finish if any are running. 2511 -F - rebuild all the >=4.0 per-backup refCnt files. 2512 -p - don't show progress 2513 -c - clean pool files 2514 -r N-M - process a subset of the main count db, 0 <= N <= M <= 255 2515 -s - prints stats 2516 -v - verbose 2517 -P phase Phase from 0..15 each time we run BackupPC_nightly. Used 2518 to compute exact pool size for portions of the pool based 2519 on the phase and $Conf{PoolSizeNightlyUpdatePeriod}. 2520 2521=back 2522 2523=head1 Other CGI Functions 2524 2525=head2 Configuration and Host Editor 2526 2527The CGI interface has a complete configuration and host editor. 2528Only the administrator can edit the main configuration settings 2529and hosts. The edit links are in the left navigation bar. 2530 2531When changes are made to any parameter a "Save" button appears 2532at the top of the page. If you are editing a text box you will 2533need to click outside of the text box to make the Save button 2534appear. If you don't select Save then the changes won't be saved. 2535 2536The host-specific configuration can be edited from the host 2537summary page using the link in the left navigation bar. 2538The administrator can edit any of the host-specific 2539configuration settings. 2540 2541When editing the host-specific configuration, each parameter has 2542an "override" setting that denotes the value is host-specific, 2543meaning that it overrides the setting in the main configuration. 2544If you deselect "override" then the setting is removed from 2545the host-specific configuration, and the main configuration 2546file is displayed. 2547 2548User's can edit their host-specific configuration if enabled 2549via $Conf{CgiUserConfigEditEnable}. The specific subset 2550of configuration settings that a user can edit is specified 2551with $Conf{CgiUserConfigEdit}. It is recommended to make this 2552list short as possible (you probably don't want your users saving 2553dozens of backups) and it is essential that they can't edit any 2554of the Cmd configuration settings, otherwise they can specify 2555an arbitrary command that will be executed as the BackupPC 2556user. 2557 2558=head2 RSS 2559 2560BackupPC supports a very basic RSS feed. Provided you have the 2561XML::RSS perl module installed, a URL similar to this will 2562provide RSS information: 2563 2564 http://localhost/cgi-bin/BackupPC/BackupPC_Admin?action=rss 2565 2566This feature is experimental. The information included will 2567probably change. 2568 2569=head1 BackupPC Design 2570 2571=head2 Some design issues 2572 2573=over 4 2574 2575=item Pooling common files 2576 2577To see if a file is already in the pool, an MD5 digest of the file 2578contents is used. This can't guarantee a file is identical: it 2579just reduces the search to often a single file or handful of files. 2580 2581Depending on the Xfer method and settings, a complete file comparison 2582is done to verify if two files are really the same. 2583 2584Prior to V4, identical files on multiples backups are represented 2585by hard links. Hardlinks are used so that identical files all refer 2586to the same physical file on the server's disk. Also, hard links 2587maintain reference counts so that BackupPC knows when to delete 2588unused files from the pool. 2589 2590In V4+, hardlinks are not used and reference counting is done at the 2591application level. It is done in a batch manner, which simplifies 2592the implementation. 2593 2594For the computer-science majors among you, you can think of the pooling 2595system used by BackupPC as just a chained hash table stored on a (big) 2596file system. 2597 2598=item The hashing function 2599 2600In V4+, the file digest is the MD5 digest of the complete file. 2601While MD5 collisions are now well known, and can be easily constructed, 2602in real use collisions will be extremely unlikely. 2603 2604Prior to V4, just a portion of all but the smallest files was used 2605for the digest. That decision was made long ago when CPUs were a 2606lot slower. For files less than 256K, the digest is the MD5 digest 2607of the file size and the full file. For files up to 1MB, the first 2608and last 128K of the file, and for over 1MB, the first and eighth 2609128K chunks are used, together with the file size. 2610 2611=item Compression 2612 2613BackupPC supports compression. It uses the deflate and inflate methods 2614in the Compress::Zlib module, which is based on the zlib compression 2615library (see L<http://www.gzip.org/zlib/>). 2616 2617The $Conf{CompressLevel} setting specifies the compression level to use. 2618Zero (0) means no compression. Compression levels can be from 1 (least 2619cpu time, slightly worse compression) to 9 (most cpu time, slightly 2620better compression). The recommended value is 3. Changing it to 5, for 2621example, will take maybe 20% more cpu time and will get another 2-3% 2622additional compression. Diminishing returns set in above 5. See the zlib 2623documentation for more information about compression levels. 2624 2625BackupPC implements compression with minimal CPU load. Rather than 2626compressing every incoming backup file and then trying to match it 2627against the pool, BackupPC computes the MD5 digest based on the 2628uncompressed file, and matches against the candidate pool files by 2629comparing each uncompressed pool file against the incoming backup file. 2630Since inflating a file takes roughly a factor of 10 less CPU time than 2631deflating there is a big saving in CPU time. 2632 2633The combination of pooling common files and compression can yield 2634a factor of 8 or more overall saving in backup storage. 2635 2636Note that you should not turn compression on and off are you have 2637started running BackupPC. It will result in double the storage needs, 2638since all the files will be stored in both the compressed and uncompressed 2639pools. 2640 2641=back 2642 2643=head2 BackupPC operation 2644 2645BackupPC reads the configuration information from 2646__CONFDIR__/config.pl. It then runs and manages all the backup 2647activity. It maintains queues of pending backup requests, user backup 2648requests and administrative commands. Based on the configuration various 2649requests will be executed simultaneously. 2650 2651As specified by $Conf{WakeupSchedule}, BackupPC wakes up periodically 2652to queue backups on all the PCs. This is a four step process: 2653 2654=over 4 2655 2656=item 1 2657 2658For each host and DHCP address backup requests are queued on the 2659background command queue. 2660 2661=item 2 2662 2663For each PC, BackupPC_dump is forked. Several of these may be run in 2664parallel, based on the configuration. First a ping is done to see if 2665the machine is alive. If this is a DHCP address, nmblookup is run to 2666get the netbios name, which is used as the hostname. If DNS lookup 2667fails, $Conf{NmbLookupFindHostCmd} is run to find the IP address from 2668the hostname. The file __TOPDIR__/pc/$host/backups is read to decide 2669whether a full or incremental backup needs to be run. If no backup is 2670scheduled, or the ping to $host fails, then BackupPC_dump exits. 2671 2672The backup is done using the specified XferMethod. Either samba's smbclient 2673or tar over ssh/rsh/nfs piped into BackupPC_tarExtract, or rsync over ssh/rsh 2674is run, or rsyncd is connected to, with the incoming data 2675extracted to __TOPDIR__/pc/$host/new. The XferMethod output is put 2676into __TOPDIR__/pc/$host/XferLOG. 2677 2678The letter in the XferLOG file shows the type of object, similar to the 2679first letter of the modes displayed by ls -l: 2680 2681 d -> directory 2682 l -> symbolic link 2683 b -> block special file 2684 c -> character special file 2685 p -> pipe file (fifo) 2686 nothing -> regular file 2687 2688The words mean: 2689 2690=over 4 2691 2692=item create 2693 2694new for this backup (ie: directory or file not in pool) 2695 2696=item pool 2697 2698found a match in the pool 2699 2700=item same 2701 2702file is identical to previous backup (contents were 2703checksummed and verified during full dump). 2704 2705=item skip 2706 2707file skipped in incremental because attributes are the 2708same (only displayed if $Conf{XferLogLevel} >= 2). 2709 2710=back 2711 2712As BackupPC_tarExtract extracts the files from smbclient or tar, or as 2713rsync or ftp runs, it checks each file in the backup to see if it is 2714identical to an existing file from any previous backup of any PC. It 2715does this without needed to write the file to disk. If the file matches 2716an existing file, a hardlink is created to the existing file in the 2717pool. If the file does not match any existing files, the file is written 2718to disk and inserted into the pool. 2719 2720BackupPC_tarExtract and rsync can handle arbitrarily large files 2721and multiple candidate matching files without needing to write the 2722file to disk in the case of a match. This significantly reduces 2723disk writes (and also reads, since the pool file comparison is done 2724disk to memory, rather than disk to disk). 2725 2726Based on the configuration settings, BackupPC_dump checks each 2727old backup to see if any should be removed. 2728 2729=item 3 2730 2731Once each night, BackupPC_nightly is run to complete some additional 2732administrative tasks, such as cleaning the pool. This involves 2733removing any files in the pool that only have a single hard link 2734(meaning no backups are using that file). 2735 2736If BackupPC_nightly takes too long to run, the settings 2737$Conf{MaxBackupPCNightlyJobs} and $Conf{BackupPCNightlyPeriod} can 2738be used to run several BackupPC_nightly processes in parallel, and 2739to split its job over several nights. 2740 2741=back 2742 2743BackupPC also listens for TCP connections on $Conf{ServerPort}, which 2744is used by the CGI script BackupPC_Admin for status reporting and 2745user-initiated backup or backup cancel requests. 2746 2747=head2 Storage layout 2748 2749BackupPC resides in several directories: 2750 2751=over 4 2752 2753=item __INSTALLDIR__ 2754 2755Perl scripts comprising BackupPC reside in __INSTALLDIR__/bin, 2756libraries are in __INSTALLDIR__/lib and documentation 2757is in __INSTALLDIR__/doc. 2758 2759=item __CGIDIR__ 2760 2761The CGI script BackupPC_Admin resides in this cgi binary directory. 2762 2763=item __CONFDIR__ 2764 2765All the configuration information resides below __CONFDIR__. 2766This directory contains: 2767 2768The directory __CONFDIR__ contains: 2769 2770=over 4 2771 2772=item config.pl 2773 2774Configuration file. See L<Configuration File> below for more details. 2775 2776=item hosts 2777 2778Hosts file, which lists all the PCs to backup. 2779 2780=item pc 2781 2782The directory __CONFDIR__/pc contains per-client configuration files 2783that override settings in the main configuration file. Each file 2784is named __CONFDIR__/pc/HOST.pl, where HOST is the hostname. 2785 2786In pre-FHS versions of BackupPC these files were located in 2787__TOPDIR__/pc/HOST/config.pl. 2788 2789=back 2790 2791=item __LOGDIR__ 2792 2793The directory __LOGDIR__ (__TOPDIR__/log on pre-FHS versions 2794of BackupPC) contains: 2795 2796=over 4 2797 2798=item LOG 2799 2800Current (today's) log file output from BackupPC. 2801 2802=item LOG.0 or LOG.0.z 2803 2804Yesterday's log file output. Log files are aged daily and compressed 2805(if compression is enabled), and old LOG files are deleted. 2806 2807=item status.pl 2808 2809A summary of BackupPC's status written periodically by BackupPC so 2810that certain state information can be maintained if BackupPC is 2811restarted. Should not be edited. 2812 2813=item UserEmailInfo.pl 2814 2815A summary of what email was last sent to each user, and when the 2816last email was sent. Should not be edited. 2817 2818=back 2819 2820=item __RUNDIR__ 2821 2822The directory __RUNDIR__ (__TOPDIR__/log on pre-FHS versions 2823of BackupPC) contains: 2824 2825=over 4 2826 2827=item BackupPC.pid 2828 2829Contains BackupPC's process id. 2830 2831=item BackupPC.sock 2832 2833A unix domain socket for communicating to the BackupPC server. 2834 2835=back 2836 2837=item __TOPDIR__ 2838 2839All of BackupPC's data (PC backup images, logs, configuration information) 2840is stored below this directory. 2841 2842Below __TOPDIR__ are several directories: 2843 2844=over 4 2845 2846=item __TOPDIR__/pool 2847 2848All uncompressed files from PC backups are stored below __TOPDIR__/pool. 2849Each file's name is based on the MD5 hex digest of the file contents. 2850 2851For V4+, the digest is the MD5 digest of the full file contents (the length 2852is not used). For V4+ the pool files are stored in a 2 level tree, using 28537 bits from the top of the first two bytes of the digest. So there are 128 2854directories are each level, numbered evenly in hex from 0x00, 0x02, to 0xfe. 2855 2856For example, if a file has an MD5 digest of 123456789abcdef0123456789abcdef0, 2857the uncompressed file is stored in __TOPDIR__/pool/12/34/123456789abcdef0123456789abcdef0. 2858 2859Duplicates digest are represented with one (or more) hex byte extensions. 2860So three colliding files would be stored as 2861 2862 __TOPDIR__/pool/12/34/123456789abcdef0123456789abcdef0 2863 __TOPDIR__/pool/12/34/123456789abcdef0123456789abcdef000 2864 __TOPDIR__/pool/12/34/123456789abcdef0123456789abcdef001 2865 2866The rest of this section describes the old pool layout. Note that both V3 and V4 2867pools can exist together, since they use different names for their directory trees. 2868 2869As exampled earlier, prior to V4 the digest is computed as follows. 2870For files less than 256K, the file length and the entire 2871file is used. For files up to 1MB, the file length and the first and 2872last 128K are used. Finally, for files longer than 1MB, the file length, 2873and the first and eighth 128K chunks for the file are used. 2874 2875Both BackupPC_dump (actually, BackupPC_tarExtract or rsync_bpc) are 2876responsible for checking newly backed up files against the pool. For 2877each file, the MD5 digest is used to generate a filename in the pool 2878directory. 2879 2880If the file exists in the pool, the contents are compared. 2881If there is no match, additional files in the chain are checked (if any). 2882(Actually, multiple candidate files are compared in parallel.) 2883 2884If $Conf{PoolV3Enabled} is set, then the V3 pool is checked 2885if there are no matches in the V4 pool. If a V3 file matches, it is 2886simply moved (renamed) the the V4 pool with it's new filename based on 2887the V4 digest. That still allows the V3 backups to be browsed etc, since 2888those backups are still based on hardlinks. 2889 2890If the file contents exactly match, a reference count is incremented. 2891Otherwise, the file is added to the pool by using an atomic link operation, 2892followed by unlinking the temporary file. 2893 2894One other issue: zero length files are not pooled, since there are a lot 2895of these files and on most file systems it doesn't save any disk space 2896to turn these files into hard links. 2897 2898Prior to V4, each pool file is stored in a subdirectory X/Y/Z, where X, 2899Y, Z are the first 3 hex digits of the MD5 digest. 2900 2901For example, if a file has an MD5 digest of 123456789abcdef0123456789abcdef0, 2902the file is stored in __TOPDIR__/pool/1/2/3/123456789abcdef0123456789abcdef0. 2903 2904The MD5 digest might not be unique (especially since not all the file's 2905contents are used for files bigger than 256K). Different files that have 2906the same MD5 digest are stored with a trailing suffix "_n" where n is 2907an incrementing number starting at 0. So, for example, if two additional 2908files were identical to the first, except the last byte was different, 2909and assuming the file was larger than 1MB (so the MD5 digests are the 2910same but the files are actually different), the three files would be 2911stored as: 2912 2913 __TOPDIR__/pool/1/2/3/123456789abcdef0123456789abcdef0 2914 __TOPDIR__/pool/1/2/3/123456789abcdef0123456789abcdef0_0 2915 __TOPDIR__/pool/1/2/3/123456789abcdef0123456789abcdef0_1 2916 2917=item __TOPDIR__/cpool 2918 2919All compressed files from PC backups are stored below __TOPDIR__/cpool. 2920Its layout is the same as __TOPDIR__/pool, and the hashing function 2921is the same (and, importantly, based on the uncompressed file, not 2922the compressed file). 2923 2924=item __TOPDIR__/pc/$host 2925 2926For each PC $host, all the backups for that PC are stored below 2927the directory __TOPDIR__/pc/$host. This directory contains the 2928following files: 2929 2930=over 4 2931 2932=item LOG 2933 2934Current log file for this PC from BackupPC_dump. 2935 2936=item LOG.MMYYYY or LOG.MMYYYY.z 2937 2938Last month's log file. Log files are aged monthly and compressed 2939(if compression is enabled), and old LOG files are deleted. 2940In earlier versions of BackupPC these files used to have 2941a suffix of 0, 1, .... 2942 2943=item XferERR or XferERR.z 2944 2945Output from the transport program (ie: smbclient, tar, rsync or ftp) 2946for the most recent failed backup. 2947 2948=item XferLOG or XferLOG.z 2949 2950Output from the transport program (ie: smbclient, tar, rsync or ftp) 2951for the current backup. 2952 2953=item nnn (an integer) 2954 2955Backups are in directories numbered sequentially starting at 0. Below 2956each backup directory are the inodes (in nnn/inode) and the reference 2957counts for this backup are in nnn/refCnt. 2958 2959=item refCnt 2960 2961The host's reference count database is stored below the refCnt directory. 2962 2963=item XferLOG.nnn or XferLOG.nnn.z 2964 2965Output from the transport program (ie: smbclient, tar, rsync or ftp) 2966corresponding to backup number nnn. 2967 2968=item RestoreInfo.nnn 2969 2970Information about restore request #nnn including who, what, when, and 2971why. This file is in Data::Dumper format. (Note that the restore 2972numbers are not related to the backup number.) 2973 2974=item RestoreLOG.nnn.z 2975 2976Output from smbclient, tar or rsync during restore #nnn. (Note that the restore 2977numbers are not related to the backup number.) 2978 2979=item ArchiveInfo.nnn 2980 2981Information about archive request #nnn including who, what, when, and 2982why. This file is in Data::Dumper format. (Note that the archive 2983numbers are not related to the restore or backup number.) 2984 2985=item ArchiveLOG.nnn.z 2986 2987Output from archive #nnn. (Note that the archive numbers are not related 2988to the backup or restore number.) 2989 2990=item config.pl 2991 2992Old location of optional configuration settings specific to this host. 2993Settings in this file override the main configuration file. 2994In new versions of BackupPC the per-host configuration files are 2995stored in __CONFDIR__/pc/HOST.pl. 2996 2997=item backups 2998 2999A tab-delimited ascii table listing information about each successful 3000backup, one per row. The columns are: 3001 3002=over 4 3003 3004=item num 3005 3006The backup number, an integer that starts at 0 and increments 3007for each successive backup. The corresponding backup is stored 3008in the directory num (eg: if this field is 5, then the backup is 3009stored in __TOPDIR__/pc/$host/5). 3010 3011=item type 3012 3013Set to "full" or "incr" for full or incremental backup. 3014 3015=item startTime 3016 3017Start time of the backup in unix seconds. 3018 3019=item endTime 3020 3021Stop time of the backup in unix seconds. 3022 3023=item nFiles 3024 3025Number of files backed up (as reported by smbclient, tar, rsync or ftp). 3026 3027=item size 3028 3029Total file size backed up (as reported by smbclient, tar, rsync or ftp). 3030 3031=item nFilesExist 3032 3033Number of files that were already in the pool 3034(as determined by BackupPC_dump). 3035 3036=item sizeExist 3037 3038Total size of files that were already in the pool 3039(as determined by BackupPC_dump). 3040 3041=item nFilesNew 3042 3043Number of files that were not in the pool 3044(as determined by BackupPC_dump). 3045 3046=item sizeNew 3047 3048Total size of files that were not in the pool 3049(as determined by BackupPC_dump). 3050 3051=item xferErrs 3052 3053Number of errors or warnings from smbclient, tar, rsync or ftp. 3054 3055=item xferBadFile 3056 3057Number of errors from smbclient that were bad file errors (zero otherwise). 3058 3059=item xferBadShare 3060 3061Number of errors from smbclient that were bad share errors (zero otherwise). 3062 3063=item tarErrs 3064 3065Number of errors from BackupPC_tarExtract. 3066 3067=item compress 3068 3069The compression level used on this backup. Zero or empty means no 3070compression. 3071 3072=item sizeExistComp 3073 3074Total compressed size of files that were already in the pool 3075(as determined by BackupPC_dump). 3076 3077=item sizeNewComp 3078 3079Total compressed size of files that were not in the pool 3080(as determined by BackupPC_dump). 3081 3082=item noFill 3083 3084Set if this backup has not been filled - it just includes the 3085deltas from the next backup necessary to reconstruct this backup. 3086 3087=item fillFromNum 3088 3089If this backup was filled (ie: noFill is 0) then this is the 3090number of the backup that it was filled from 3091 3092=item mangle 3093 3094Set if this backup has mangled filenames and attributes. Always 3095true for backups in v1.4.0 and above. False for all backups prior 3096to v1.4.0. 3097 3098=item xferMethod 3099 3100Set to the value of $Conf{XferMethod} when this dump was done. 3101 3102=item level 3103 3104The level of this dump. A full dump is level 0. Currently incrementals 3105are 1. In V4+ multi-level incrementals are no longer supported, so this 3106is just a 0 or 1. 3107 3108=back 3109 3110=item restores 3111 3112A tab-delimited ascii table listing information about each requested 3113restore, one per row. The columns are: 3114 3115=over 4 3116 3117=item num 3118 3119Restore number (matches the suffix of the RestoreInfo.nnn and 3120RestoreLOG.nnn.z file), unrelated to the backup number. 3121 3122=item startTime 3123 3124Start time of the restore in unix seconds. 3125 3126=item endTime 3127 3128End time of the restore in unix seconds. 3129 3130=item result 3131 3132Result (ok or failed). 3133 3134=item errorMsg 3135 3136Error message if restore failed. 3137 3138=item nFiles 3139 3140Number of files restored. 3141 3142=item size 3143 3144Size in bytes of the restored files. 3145 3146=item tarCreateErrs 3147 3148Number of errors from BackupPC_tarCreate during restore. 3149 3150=item xferErrs 3151 3152Number of errors from smbclient, tar, rsync or ftp during restore. 3153 3154=back 3155 3156=item archives 3157 3158A tab-delimited ascii table listing information about each requested 3159archive, one per row. The columns are: 3160 3161=over 4 3162 3163=item num 3164 3165Archive number (matches the suffix of the ArchiveInfo.nnn and 3166ArchiveLOG.nnn.z file), unrelated to the backup or restore number. 3167 3168=item startTime 3169 3170Start time of the restore in unix seconds. 3171 3172=item endTime 3173 3174End time of the restore in unix seconds. 3175 3176=item result 3177 3178Result (ok or failed). 3179 3180=item errorMsg 3181 3182Error message if archive failed. 3183 3184=back 3185 3186=back 3187 3188=back 3189 3190=back 3191 3192=head2 Compressed file format 3193 3194The compressed file format is as generated by Compress::Zlib::deflate 3195with one minor, but important, tweak. Since Compress::Zlib::inflate 3196fully inflates its argument in memory, it could take large amounts of 3197memory if it was inflating a highly compressed file. For example, a 3198200MB file of 0x0 bytes compresses to around 200K bytes. If 3199Compress::Zlib::inflate was called with this single 200K buffer, it 3200would need to allocate 200MB of memory to return the result. 3201 3202BackupPC watches how efficiently a file is compressing. If a big file 3203has very high compression (meaning it will use too much memory when it 3204is inflated), BackupPC calls the flush() method, which gracefully 3205completes the current compression. BackupPC then starts another 3206deflate and simply appends the output file. So the BackupPC compressed 3207file format is one or more concatenated deflations/flushes. The specific 3208ratios that BackupPC uses is that if a 6MB chunk compresses to less 3209than 64K then a flush will be done. 3210 3211Back to the example of the 200MB file of 0x0 bytes. Adding flushes 3212every 6MB adds only 200 or so bytes to the 200K output. So the 3213storage cost of flushing is negligible. 3214 3215To easily decompress a BackupPC compressed file, the script 3216BackupPC_zcat can be found in __INSTALLDIR__/bin. For each 3217filename argument it inflates the file and writes it to stdout. 3218 3219=head2 Rsync checksum caching 3220 3221Rsync checksum caching is not implemented in V4. That's because a full 3222backup with rsync in V4 uses client-side whole-file checksums during a full 3223backup, meaning that the server doesn't need to send block-level digests on 3224every full backup. 3225 3226The rest of this section applies to V3. 3227 3228An incremental backup with rsync compares attributes on the client 3229with the last full backup. Any files with identical attributes 3230are skipped. In V3, a full backup with rsync sets the --ignore-times 3231option, which causes every file to be examined independent of 3232attributes. 3233 3234Each file is examined by generating block checksums (default 2K 3235blocks) on the receiving side (that's the BackupPC side), sending 3236those checksums to the client, where the remote rsync matches those 3237checksums with the corresponding file. The matching blocks and new 3238data is sent back, allowing the client file to be reassembled. 3239A checksum for the entire file is sent to as an extra check the 3240the reconstructed file is correct. 3241 3242This results in significant disk IO and computation for BackupPC: 3243every file in a full backup, or any file with non-matching attributes 3244in an incremental backup, needs to be uncompressed, block checksums 3245computed and sent. Then the receiving side reassembles the file and 3246has to verify the whole-file checksum. Even if the file is identical, 3247prior to 2.1.0, BackupPC had to read and uncompress the file twice, 3248once to compute the block checksums and later to verify the whole-file 3249checksum. 3250 3251=head2 Filename mangling 3252 3253Backup filenames are stored in "mangled" form. Each node of 3254a path is preceded by "f" (mnemonic: file), and special characters 3255(\n, \r, % and /) are URI-encoded as "%xx", where xx is the ascii 3256character's hex value. So c:/craig/example.txt is now stored as 3257fc/fcraig/fexample.txt. 3258 3259This was done mainly so metadata could be stored alongside the backup 3260files without name collisions. In particular, the attributes for the 3261files in a directory are stored in a file called "attrib", and mangling 3262avoids filename collisions (I discarded the idea of having a duplicate 3263directory tree for every backup just to store the attributes). Other 3264metadata (eg: rsync checksums) could be stored in filenames preceded 3265by, eg, "c". There are two other benefits to mangling: the share name 3266might contain "/" (eg: "/home/craig" for tar transport), and I wanted 3267that represented as a single level in the storage tree. 3268 3269The CGI script undoes the mangling, so it is invisible to the user. 3270 3271=head2 Special files 3272 3273Linux/unix file systems support several special file types: symbolic 3274links, character and block device files, fifos (pipes) and unix-domain 3275sockets. All except unix-domain sockets are supported by BackupPC 3276(there's no point in backing up or restoring unix-domain sockets since 3277they only have meaning after a process creates them). Symbolic links are 3278stored as a plain file whose contents are the contents of the link (not 3279the file it points to). This file is compressed and pooled like any 3280normal file. Character and block device files are also stored as plain 3281files, whose contents are two integers separated by a comma; the numbers 3282are the major and minor device number. These files are compressed and 3283pooled like any normal file. Fifo files are stored as empty plain files 3284(which are not pooled since they have zero size). In all cases, the 3285original file type is stored in the attrib file so it can be correctly 3286restored. 3287 3288Hardlinks are supported. In V4, file metadata include an inode number 3289and a link count. Any file with more than one link points at the inode 3290information stored below the backup directory in the inode directory. 3291That directory contains a tree of up to 16K attrib files based on bits 329210-23 of the inode number. In particular, the directory name uses bits 329317-23, and the attrib filename includes bits 10-16. The key (index) in 3294the attrib file is the hex inode number. The original file metadata's 3295link count might not be accurate; it's more a flag (>1) for when to look 3296up the inode information. The correct link count is stored in the inode. 3297 3298In V3, hardlinks are stored in a similar manner to symlinks. When GNU 3299tar first encounters a file with more than one link (ie: hardlinks) 3300it dumps it as a regular file. When it sees the second and subsequent 3301hardlinks to the same file, it dumps just the hardlink information. 3302BackupPC correctly recognizes these hardlinks and stores them just like 3303symlinks: a regular text file whose contents is the path of the file 3304linked to. The CGI script will download the original file when you 3305click on a hardlink. 3306 3307Also, BackupPC_tarCreate has enough magic to re-create the hardlinks 3308dynamically based on whether or not the original file and hardlinks 3309are both included in the tar file. For example, imagine a/b/x is a 3310hardlink to a/c/y. If you use BackupPC_tarCreate to restore directory 3311a, then the tar file will include a/b/x as the original file and a/c/y 3312will be a hardlink to a/b/x. If, instead you restore a/c, then the 3313tar file will include a/c/y as the original file, not a hardlink. 3314 3315=head2 Attribute file format 3316 3317=over 4 3318 3319=item V4 attrib files 3320 3321The attribute file format is new in V4. Every backup directory contains 3322an attrib file, which is zero length and its name includes the MD5 pool 3323digest, eg: 3324 3325 attrib_33fe8f9ae2f5cedbea63b9d3ea767ac0 3326 3327The digest is used to look up the contents in the V4 cpool, eg: 3328 3329 __TOPDIR__/cpool/32/fe/33fe8f9ae2f5cedbea63b9d3ea767ac0 3330 3331For inode attrib files, bits 17-23 (XX in hex) of the inode number are used for the 3332directory name, and the attrib filename includes bits 10-16 (YY in hex), so 3333relative to the backup directory: 3334 3335 inode/XX/attribYY_33fe8f9ae2f5cedbea63b9d3ea767ac0 3336 3337An empty attrib file has the name "attrib_0" (or "attribYY_0" for inodes). 3338 3339The attrib file starts with a magic number, followed by the concatenation 3340of the following information for each file (all integers are stored in 3341perl's pack "w" format (variable length base 128)): 3342 3343=over 4 3344 3345=item * 3346 3347Filename length, followed by the filename 3348 3349=item * 3350 3351Count of extended attributes 3352 3353=item * 3354 3355The unix file type, mtime, mode, uid, gid, size, inode number, compress, 3356number of links 3357 3358=item * 3359 3360MD5 digest length, followed by the digest contents 3361 3362=item * 3363 3364Each extended attribute (length of xattr name, length of xattr value, name, value) 3365 3366=back 3367 3368=item V3 attrib files 3369 3370The unix attributes for the contents of a directory (all the files and 3371directories in that directory) are stored in a file called attrib. 3372There is a single attrib file for each directory in a backup. 3373For example, if c:/craig contains a single file c:/craig/example.txt, 3374that file would be stored as fc/fcraig/fexample.txt and there would be an 3375attribute file in fc/fcraig/attrib (and also fc/attrib and ./attrib). 3376The file fc/fcraig/attrib would contain a single entry containing the 3377attributes for fc/fcraig/fexample.txt. 3378 3379The attrib file starts with a magic number, followed by the 3380concatenation of the following information for each file: 3381 3382=over 4 3383 3384=item * 3385 3386Filename length in perl's pack "w" format (variable length base 128). 3387 3388=item * 3389 3390Filename. 3391 3392=item * 3393 3394The unix file type, mode, uid, gid and file size divided by 4GB and 3395file size modulo 4GB (type mode uid gid sizeDiv4GB sizeMod4GB), 3396in perl's pack "w" format (variable length base 128). 3397 3398=item * 3399 3400The unix mtime (unix seconds) in perl's pack "N" format (32 bit integer). 3401 3402=back 3403 3404The attrib file is also compressed if compression is enabled. 3405See the lib/BackupPC/Attrib.pm module for full details. 3406 3407Attribute files are pooled just like normal backup files. This saves 3408space if all the files in a directory have the same attributes across 3409multiple backups, which is common. 3410 3411=back 3412 3413=head2 Optimizations 3414 3415BackupPC doesn't care about the access time of files in the pool 3416since it saves attribute metadata separate from the files. Since 3417BackupPC mostly does reads from disk, maintaining the access time of 3418files generates a lot of unnecessary disk writes. So, provided 3419BackupPC has a dedicated data disk, you should consider mounting 3420BackupPC's data directory with the noatime (or, with Linux kernels 3421>=2.6.20, relatime) attribute (see mount(1)). 3422 3423=head2 Some Limitations 3424 3425BackupPC isn't perfect (but it is getting better). Please see 3426L<http://backuppc.sourceforge.net/faq/limitations.html> for a 3427discussion of some of BackupPC's limitations. 3428(Note, this is old and we should move this to the Github Wiki.) 3429 3430=head2 Security issues 3431 3432Please see L<http://backuppc.sourceforge.net/faq/security.html> for a 3433discussion of some of various security issues. 3434(Note, this is old and we should move this to the Github Wiki.) 3435 3436=head1 Configuration File 3437 3438The BackupPC configuration file resides in __CONFDIR__/config.pl. 3439Optional per-PC configuration files reside in __CONFDIR__/pc/$host.pl 3440(or __TOPDIR__/pc/$host/config.pl in non-FHS versions of BackupPC). 3441This file can be used to override settings just for a particular PC. 3442 3443=head2 Modifying the main configuration file 3444 3445The configuration file is a perl script that is executed by BackupPC, so 3446you should be careful to preserve the file syntax (punctuation, quotes 3447etc) when you edit it. Specifically, preserving quotes means you should never 3448use undef for configuration parameters that expect string values. An empty 3449string ('') should be used in this case. 3450It is recommended that you use CVS, RCS or some 3451other method of source control for changing config.pl. 3452 3453BackupPC reads or re-reads the main configuration file and 3454the hosts file in three cases: 3455 3456=over 4 3457 3458=item * 3459 3460Upon startup. 3461 3462=item * 3463 3464When BackupPC is sent a HUP (-1) signal. Assuming you installed the 3465init.d script, you can also do this with "/etc/init.d/backuppc reload". 3466 3467=item * 3468 3469When the modification time of config.pl file changes. BackupPC 3470checks the modification time once during each regular wakeup. 3471 3472=back 3473 3474Whenever you change the configuration file you can either do 3475a kill -HUP BackupPC_pid or simply wait until the next regular 3476wakeup period. 3477 3478Each time the configuration file is re-read a message is reported in the 3479LOG file, so you can tail it (or view it via the CGI interface) to make 3480sure your kill -HUP worked. Errors in parsing the configuration file are 3481also reported in the LOG file. 3482 3483The optional per-PC configuration file (__CONFDIR__/pc/$host.pl or 3484__TOPDIR__/pc/$host/config.pl in non-FHS versions of BackupPC) 3485is read whenever it is needed by BackupPC_dump, BackupPC_restore and others. 3486 3487=head1 Configuration Parameters 3488 3489The configuration parameters are divided into five general groups. 3490The first group (general server configuration) provides general 3491configuration for BackupPC. The next two groups describe what to 3492backup, when to do it, and how long to keep it. The fourth group 3493are settings for email reminders, and the final group contains 3494settings for the CGI interface. 3495 3496All configuration settings in the second through fifth groups can 3497be overridden by the per-PC config.pl file. 3498 3499__CONFIGPOD__ 3500 3501=head1 Version Numbers 3502 3503BackupPC uses a X.Y.Z version numbering system. The first digit is for 3504major new releases, the middle digit is for significant feature releases 3505and improvements (most of the releases have been in this category). 3506 3507=head1 Author 3508 3509Craig Barratt <cbarratt@users.sourceforge.net> 3510 3511See L<https://backuppc.github.io/backuppc/BackupPC.html>. 3512 3513=head1 Copyright 3514 3515Copyright (C) 2001-2020 Craig Barratt 3516 3517=head1 Credits 3518 3519Ryan Kucera contributed the directory navigation code and images 3520for v1.5.0. He contributed the first skeleton of BackupPC_restore. 3521He also added a significant revision to the CGI interface, including 3522CSS tags, in v2.1.0, and designed the BackupPC logo. 3523 3524Xavier Nicollet, with additions from Guillaume Filion, added the 3525internationalization (i18n) support to the CGI interface for v2.0.0. 3526Xavier provided the French translation fr.pm, with additions from 3527Guillaume. 3528 3529Guillaume Filion wrote BackupPC_zipCreate and added the CGI support 3530for zip download, in addition to some CGI cleanup, for v1.5.0. 3531Guillaume continues to support fr.pm updates for each new version. 3532 3533Josh Marshall implemented the Archive feature in v2.1.0. 3534 3535Ludovic Drolez supports the BackupPC Debian package. 3536 3537Javier Gonzalez provided the Spanish translation, es.pm for v2.0.0. 3538 3539Manfred Herrmann provided the German translation, de.pm for v2.0.0. 3540Manfred continues to support de.pm updates for each new version, 3541together with some help from Ralph Paßgang. 3542 3543Lorenzo Cappelletti provided the Italian translation, it.pm for v2.1.0. 3544Giuseppe Iuculano and Vittorio Macchi updated it for 3.0.0. 3545 3546Lieven Bridts provided the Dutch translation, nl.pm, for v2.1.0, 3547with some tweaks from Guus Houtzager, and updates for 3.0.0. 3548 3549Reginaldo Ferreira provided the Portuguese Brazilian translation 3550pt_br.pm for v2.2.0. 3551 3552Rich Duzenbury provided the RSS feed option to the CGI interface. 3553 3554Jono Woodhouse from CapeSoft Software (www.capesoft.com) provided a 3555new CSS skin for 3.0.0 with several layout improvements. Sean Cameron 3556(also from CapeSoft) designed new and more compact file icons for 3.0.0. 3557 3558Youlin Feng provided the Chinese translation for 3.1.0. 3559 3560Karol 'Semper' Stelmaczonek provided the Polish translation for 3.1.0. 3561 3562Jeremy Tietsort provided the host summary table sorting feature for 3.1.0. 3563 3564Paul Mantz contributed the ftp Xfer method for 3.2.0. 3565 3566Petr Pokorny provided the Czech translation for 3.2.1. 3567 3568Rikiya Yamamoto provided the Japanese translation for 3.3.0. 3569 3570Yakim provided the Ukrainian translation for 3.3.0. 3571 3572Sergei Butakov provided the Russian translation for 3.3.0. 3573 3574Alexander Moisseev provided the rrdtool graphing code in 4.0.0 and has provided 3575many fixes and improvements in 3.x and 4.x. 3576 3577Many people have provided user support on the mail lists, reported bugs, 3578made useful suggestions, and helped with testing; see the ChangeLog 3579and the mailing lists. 3580 3581Your name could appear here in the next version! 3582 3583=head1 License 3584 3585This program is free software: you can redistribute it and/or modify 3586it under the terms of the GNU General Public License as published by 3587the Free Software Foundation, either version 3 of the License, or 3588(at your option) any later version. 3589 3590This program is distributed in the hope that it will be useful, 3591but WITHOUT ANY WARRANTY; without even the implied warranty of 3592MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 3593GNU General Public License for more details. 3594 3595You should have received a copy of the GNU General Public License 3596along with this program. If not, see <http://www.gnu.org/licenses/>. 3597