1=encoding ISO8859-1
2
3=head1 BackupPC Introduction
4
5This documentation describes BackupPC version __VERSION__,
6released on __RELEASEDATE__.
7
8=head2 Overview
9
10BackupPC is a high-performance, enterprise-grade system for backing up
11Unix, Linux, WinXX, and MacOSX PCs, desktops and laptops to a server's
12disk.  BackupPC is highly configurable and easy to install and maintain.
13
14Given the ever decreasing cost of disks and raid systems, it is now
15practical and cost effective to backup a large number of machines onto
16a server's local disk or network storage.  For some sites this might be
17the complete backup solution.  For other sites additional permanent
18archives could be created by periodically backing up the server to tape.
19
20Features include:
21
22=over 4
23
24=item *
25
26A clever pooling scheme minimizes disk storage and disk I/O.
27Identical files across multiple backups of the same or different PC
28are stored only once, resulting in substantial savings in disk storage
29and disk writes.
30
31=item *
32
33Compression provides additional reductions in storage, depending
34on the type of data being backed up. The CPU impact of compression
35is low since only new files (those not already in the pool) need
36to be compressed.
37
38=item *
39
40A powerful http/cgi user interface allows administrators to view
41the current status, edit configuration, add/delete hosts, view log
42files, and allows users to initiate and cancel backups and browse
43and restore files from backups.
44
45=item *
46
47The http/cgi user interface has internationalization (i18n) support,
48currently providing English, French, German, Spanish, Italian,
49Dutch, Polish, Portuguese-Brazilian, Chinese, Polish, Czech,
50Japanese, Ukrainian, and Russian.
51
52=item *
53
54No client-side software is needed. On WinXX the standard smb
55protocol is used to extract backup data. On linux, unix or MacOSX
56clients, rsync, tar (over ssh/rsh/nfs) or ftp is used to extract
57backup data.  Alternatively, rsync can also be used on WinXX (using
58cygwin), since rsync provides for efficient transfers and allows
59incremental backups to detect almost all changes.
60
61=item *
62
63Flexible restore options.  Single files can be downloaded from
64any backup directly from the CGI interface.  Zip or Tar archives
65for selected files or directories from any backup can also be
66downloaded from the CGI interface.  Finally, direct restore to
67the client machine (using smb or tar) for selected files or
68directories is also supported from the CGI interface.
69
70=item *
71
72BackupPC supports mobile environments where laptops are only
73intermittently connected to the network and have dynamic IP addresses
74(DHCP).  Configuration settings allow machines connected via slower WAN
75connections (eg: dial up, DSL, cable) to not be backed up, even if they
76use the same fixed or dynamic IP address as when they are connected
77directly to the LAN.
78
79=item *
80
81Flexible configuration parameters allow multiple backups to be performed
82in parallel, specification of which shares to backup, which directories
83to backup or not backup, various schedules for full and incremental
84backups, schedules for email reminders to users and so on.  Configuration
85parameters can be set system-wide or also on a per-PC basis.
86
87=item *
88
89Users are sent periodic email reminders if their PC has not
90recently been backed up.  Email content, timing and policies
91are configurable.
92
93=item *
94
95BackupPC is Open Source software hosted by GitHub.
96
97=back
98
99=head2 BackupPC 4.0
100
101This is the first release of 4.0, which is a significant rewrite of
102BackupPC.  This section provides a short overview of the changes and
103features in 4.0.
104
105Here's a short summary of what has changed in V4:
106
107=over 4
108
109=item *
110
111No use of hardlinks (except temporarily to do atomic renames).  Reference
112counting is handled at the application level in a batch manner (hardlinks
113will still remain for any legacy V3 backups).
114
115=item *
116
117Backups are stored as "reverse deltas" - the most recent backup is always filled
118and older backups are reconstituted by merging all the deltas starting with the
119nearest future filled backup and working backwards.
120
121This is the opposite of V3 where incrementals are stored as "forward deltas"
122to a prior backup (typically the last full backup or prior lower-level
123incremental backup, or the last full in the case of rsync).
124
125=item *
126
127Since the most recent backup is filled, viewing/restoring that backup (which is
128the most common backup used) doesn't require merging any deltas from other backups.
129
130=item *
131
132The concepts of incr/full backups and unfilled/filled storage are decoupled.  The most
133recent backup is always filled.  By default, for the remaining backups, full backups
134are filled and incremental backups are unfilled, but that is configurable.
135
136=item *
137
138Uses full-file MD5 digests, which are stored in the directory attrib
139files.  Each backup directory only contains an empty attrib file whose
140name includes its own MD5 digest, which is used to look up the attrib
141file's contents in the pool.  In turn, that file contains the metadata
142for every file in that directory, including each files's MD5 digest.
143
144=item *
145
146The Pool layout still supports chains to handle md5 collisions.  While collisions
147can be constructed and are now well-known, they are highly unlikely in the wild.
148Pool files are never renamed or moved, unlike V3.
149
150=item *
151
152Any backup can be deleted (deltas are merged into next older backup if it is
153not filled).
154
155=item *
156
157The reverse deltas allow "infinite incrementals" - no need for a full backup
158if you are willing to trade speed for the risk that a file change will
159not be detected if the metadata (eg, mtime or size) doesn't change.
160
161=item *
162
163An rsync "full" backup now uses --checksum (instead of --ignore-times),
164which is much more efficient on the server side - the server just needs to
165check the full-file checksum computed by the client, together with the mtime,
166nlinks, size attributes, to see if the file has changed.  If you want a more
167conservative approach, you can change it back to --ignore-times, which
168requires the server to send block checksums to the client.
169
170=item *
171
172The use of rsync --checksum allows BackupPC to guess a potential match
173anywhere in the pool, even on a first-time backup.  In that case, the usual
174rsync block checksums are still exchanged to make sure the complete file
175is identical.
176
177=item *
178
179Uses a modified rsync called rsync_bpc (currently based on rsync-3.0.9)
180on the server side (in place of File::RsyncP), with a C code interface
181to the BackupPC storage.  So the whole data path for rsync is now in compiled
182C code, which is much faster than perl.
183
184=item *
185
186Due to the use of rsync-3.X, acls and xattrs are supported, and many other
187useful options (but not all) are supported.  Rsync protocol 30 supports
188the efficient incremental file list, which significantly improves memory
189usage and startup time.  It also supports MD5 full-file checksums, which
190match BackupPC's new digest.  That allows a full-file digest to be checked
191as easily as an mtime on the server side.
192
193=item *
194
195Significant portions of the BackupPC code are now compiled C code in a
196new module called BackupPC::XS that is dynamically linked to perl.
197
198=back
199
200Here is a more detailed discussion:
201
202=over 4
203
204=item *
205
206Completely new backup storage.  No hardlinks!  Backups are stored as reverse deltas,
207with the most recent backup always filled.  Prior backup "n" contains the changes
208relative to prior backup "n+1".
209
210=item *
211
212Since every backup is based on the last filled backup, the concept of incremental
213levels is removed.
214
215=item *
216
217Example: let's assume backup #4 is the most recent, and therefore filled, and
218backups #0..3 are not filled.
219
220Backups #0..3 store just the necessary reverse changes needed to
221reconstruct those backups, relative to the next backup.
222
223   - To view/restore backup #4, all the information is stored in backup #4.
224   - To view/restore backup #3, backup #4 (the filled one), is merged with the deltas in #3.
225   - To view/restore backup #2, backup #4 (the filled one), is merged with the deltas in #3 and #2
226   - etc.
227
228When a new backup is started (#5), we begin by renaming backup #4 to #5.
229At that instant, backup #4 storage is now empty (which means backups #4
230and #5 are currently identical).  As the backup runs, changes are made
231to #5 with the changed/new files in place, and the opposite changes are
232added to backup #4, to keep the "view" of backup #4 unchanged.
233
234After the backup is done, #5 is now the filled version of the latest
235backup, and #4 contains the changes necessary to turn #5 back into the state
236when backup #4 was done.  If there are no changes detected in the new
237backup, the storage tree for #4 will be empty.  If just one file changed,
238the new file will be below #5, and the prior file will be below #4 (well,
239technically not quite true, since files aren't stored below the backup
240trees; more correctly, the attrib file in #5 will point to the new pool
241file, and the attrib file in #4 will point to the old pool file).
242
243=item *
244
245The concepts of incr/full backups and unfilled/filled storage are now
246decoupled.  The most recent backup is always filled (whether or not the
247last backup was a full or incr).  Certain older backups can be filled
248for convenience to make restoring old backups faster (because fewer
249backups need to be merged), and are used to specify expiry schedules.
250
251=item *
252
253When a backup starts, there are several different cases that determine
254how the backups are stored and whether prior deltas are stored:
255
256=over 4
257
258=item 1
259
260No existing backups: create a new backup #0 and do a full backup in place
261(ie: no prior deltas are stored).
262
263=item 2
264
265V3 backups exist, but no V4 backups.  The last V3 backup is duplicated into
266V4 format, and a full backup is done in place (ie: no prior deltas are stored).
267
268=item 3
269
270Last V4 backup is a full, or more than $Conf{FillCycle} since last filled
271backup.  The last backup is duplicated to create a new filled backup, and
272the new backup is done in place (ie: no prior deltas are stored).
273
274=item 4
275
276There are V4 backups and it's less than $Conf{FillCycle} since last one is
277filled.  Renumber the last backup to #n+1, and put the reverse deltas in
278initially empty backup tree #n.
279
280=item 5
281
282CompressLevel has toggled on/off between backups.  This isn't well tested and
283it's very hard to support efficiently.  We treat this as a brand new (empty) backup
284in place, that is therefore filled.  That way we won't need to merge between
285backups with compress on/off.
286
287=item 6
288
289Last backup was a V4 partial.  If prior V4 backup is filled (and not partial),
290then just do another in-place backup.  Otherwise, treat as case 4.  When complete
291(whether successful or another partial), delete the prior deltas in #n, which
292merges the cumulative changes into #n-1.
293
294=back
295
296=item *
297
298The treatment of a "Partial" backup has changed.  Unlike in V3 where partials are
299removed prior to the next backup, in V4 partials are kept and are used as the starting
300point for the next backup.  See case 6 above.  If the new backup fails, if no files
301have been backed up, the empty backup #n is removed.
302
303=item *
304
305Backups are stored as mangled directory trees, but each directory only
306contains an "attrib" file.  The attrib file is zero-length, and its name
307includes the MD5 digest so the contents can be looked up in the pool.
308
309The attrib contents in the pool contains the directory contents: for each
310file, that means the metadata, xattrs and the MD5 digest of the file
311contents.
312
313=item *
314
315A modified rsync called rsync_bpc, based on rsync 3.0.9, is used on
316the server side, with a C code layer that emulates all the file-system
317OS calls to be compatible with the BackupPC store.  That means for
318rsync, the data path is now fully in compiled C, which should mean a
319significant speedup.  It also means many (but not all) of the rsync
320options are supported natively.
321
322=item *
323
324Significant parts of the BackupPC storage and pooling code have been written in C
325(the same code is used in the server rsync_bpc).  BackupPC::FileZIO, BackupPC::PoolWrite,
326BackupPC::Attrib, BackupPC::AttribCache and BackupPC::PoolRefCnt (reference counting and
327storage) are all replaced with BackupPC::XS, a C-code perl extension.
328
329=item *
330
331Extended attributes (xattr) are supported.  Rsync is configured to "store acls using xattr",
332meaning both acls and xattrs are supported.
333
334=item *
335
336infinite incrementals with rsync are supported.  The most recent backup
337is always filled, so an incremental will still leave the most recent
338backup filled.
339
340=item *
341
342any V4 backup can be deleted - dependencies are merged into the next older backup
343if it isn't already filled.
344
345=item *
346
347file digests are full-file MD5.  Collisions are much more unlikely than V3,
348but still possible.  Duplicates are implemented with an extension to the
34916 byte MD5 digest (ie: 16 bytes for plain file, 17 bytes for next
350255 duplicates etc).
351
352=item *
353
354V4 pool files are stored in a new hierarchy, two levels deep, with
3557 bits at each level (ie: 128 directories at top-level, and each
356with 128 directories at next level).
357
358=item *
359
360V4 pool files are never moved or renamed.
361
362=item *
363
364Inodes for hardlinked files are stored in each backup tree.  This makes
365backing up hardlinks accurate, compared to V3, and provides for consistent
366inode numbering across backups.
367
368=item *
369
370zero-sized files or empty attribute files don't get written or pooled.
371
372=item *
373
374the elimination of hardlinks means that reference counting has to be maintained by
375the BackupPC code.  This is one of the riskiest area in terms of development
376and testing.  Reference counts are maintained per-backup, per-host, and for the
377whole pool.
378
379Each operation that changes reference counts (eg: doing a new backup, deleting
380a backup, or duplicating (filling) a backup) creates one or more poolRefDelta
381files in that client's backup directory (ie: TopDir/pc/HOST/NNN).  These files
382are lists of MD5 digests, and corresponding counts deltas.
383
384Each night, BackupPC_nightly runs BackupPC_refCountUpdate, which, for each
385host, updates the per-host reference count database with the new deltas.
386It then combines all the per-host reference count files to create the
387global pool reference count database.
388
389BackupPC_refCountUpdate can run concurrently with backups.  If you still
390have V3 backups and pool, BackupPC_nightly still needs to run and check
391for old V3 pool files that can be deleted.  But since there are no
392new V3 backups happening, BackupPC_nightly can run concurrently with
393backups.
394
395=item *
396
397There is a new utility BackupPC_fsck that can check/fix the per-host
398and global reference counts.  The per-host reference count database
399is verified by parsing all the attrib files in each backup tree.
400The global reference count database is verified by combing all the
401per-host reference count databases and comparing them.
402
403BackupPC_fsck cannot run when BackupPC is.
404
405=item *
406
407When BackupPC_refCountUpdate updates the overall reference counts, it
408removes pool files that have a reference count of zero.  To avoid race
409conditions, it uses a two-phase process.  It first flags files that have
410zero reference counts using one of the file attributes.  The next time
411it runs (typically 24 hours later), any flagged files that still have
412zero reference count are then removed.  The rest of the code knows not
413to use flagged pool files to avoid race conditions.
414
415=item *
416
417Progress indication: a simple status that shows the number of files
418processed so far.  It's hard to convert that to a percentage, since
419the total isn't known until the end of the backup.  But knowing the
420number of files is quite helpful, since you can get an idea of the
421expected total based on the prior backups, or knowing what configuration
422you have changed (ie: adding a large new tree).
423
424=item *
425
426BackupPC_link is removed since it is no longer used.
427
428=item *
429
430Since files are no longer stored in backup trees, browsing the backup
431trees is even harder than V3 (where you just had to deal with mangling).
432A new utility BackupPC_ls acts like "ls -l", showing accurate directory
433listings of files, together with the MD5 digests.
434
435BackupPC_ls can be given either an explicit hostname, number,
436and unmangled path, or can be given the full (mangled) path,
437which makes it easier to use directory completion.  It should
438be possible to configure tcsh and bash, together with some new
439hooks in BackupPC_ls, to give a more natural file/directory
440completion.
441
442BackupPC_zcat also can take just the MD5 digest (which you can paste
443from BackupPC_ls).  Currently BackupPC_zcat doesn't support the tree
444parsing that BackupPC_ls does (it can only zcat actual files),  but
445that should be easy to rectify.
446
447=item *
448
449Configuration for expiry: since full/incr are decoupled from filled/unfilled,
450expiry is a bit trickier.
451
452The convention for expiry parameters is "FullKeepPeriod/FullKeepCnt"
453etc refer to B<Filled> backups, and "IncrKeepPeriod/IncrKeepCnt" refer
454to B<Unfilled> backups.
455
456=item *
457
458V3 migration: nothing specific is needed.  V4 can browse/view/restore
459V3 backups.  When you install V4, no changes are made to any V3 backups.
460If you are upgrading from V3, be sure to set $Conf{PoolV3Enabled} to 1 so
461the old V3 pool is searched for matching files.
462
463=over 4
464
465=item *
466
467When you install V4, it will notice that the V3 pool exists.  Running
468configure.pl should set $Conf{PoolV3Enabled} to 1 in that case, but
469you should be sure to check that.
470
471=item *
472
473When a V4 backup is first done, BackupPC_backupDuplicate is
474run to duplicate the most recent V3 backup to create a new V4 backup.
475A "filled" view of the most recent V3 backup is used to create
476a "filled" V4 backup tree.
477
478This step could be time consuming, since every file needs to be read
479(as a V3 file) and written as a V4 file.  However, the V4 pooling
480code knows about the V3 pool, so it will move the V3 pool file
481into the V4 pool.  So this duplication process doesn't burn a lot of
482pool storage space, but every file still needs to be read
483(to compute the MD5 digest) and "written" (really just
484matching/linking).
485
486=item *
487
488Expiry: all the V3 + V4 backups are considered on a combined basis
489for expiry checking.
490
491=item *
492
493On a clean new V4 install, the steps of computing and checking V3
494digests is eliminated.
495
496=item *
497
498Downgrading V4->V3: Not tested and not recommended.
499In theory you can remove any new V4 backups, remove the V4 pool
500itself, and you should be able to re-install V3 and still have
501access to your original full working V3 store (except for any
502V3 backups that V4 might have routinely removed based on normal
503backup expiry configuration).
504
505However, any V3 pool files moved to V4 will no longer be in the V3
506pool.  So subsequent V3 backups will burn more storage as files
507get re-added to the old V3 pool.
508
509Hopefully downgrading isn't necessary...
510
511=back
512
513=item *
514
515Optimizations: the C code implementation should give a significant performance
516advantage, as well as the more flexible.
517
518Potential V4 optimizations that are planned, but not yet implemented, include:
519
520=over 4
521
522=item *
523
524rsync-bpc doesn't support checksum caching.
525
526=item *
527
528rsync-bpc with --ignore-times actually reads each unchanged file three times,
529and writes it once (normal rsync reads twice and writes once; the extra one
530is due to compression).  Some careful optimization can eliminate two reads
531and the write.  The final read can be eliminated with checksum caching.
532
533=item *
534
535BackupPC_refCountUpdate, BackupPC_fsck, BackupPC_backupDuplicate,
536BackupPC_backupDelete are all single-threaded.
537
538=back
539
540=back
541
542=head2 Backup basics
543
544=over 4
545
546=item Full Backup
547
548A full backup is a complete backup of a share. BackupPC can be configured
549to do a full backup at a regular interval (typically weekly).  BackupPC
550can be configured to keep a certain number of full backups.  Exponential
551expiry is also supported, allowing full backups with various vintages to
552be kept (for example, a settable number of most recent weekly fulls, plus
553a settable number of older fulls that are 2, 4, 8, or 16 weeks apart).
554
555=item Incremental Backup
556
557An incremental backup is a backup of files that have changed since the
558last successful backup.
559
560Rsync is the best option for BackupPC.  Any files whose attributes
561have changed (ie: uid, gid, mtime, modes, size) since the last full
562are backed up.  Deleted, new files and renamed files are detected by
563rsync incrementals.
564
565For SMB and tar, BackupPC uses the modification time (mtime) to
566determine which files have changed since the last backup.  That
567means SMB and tar incrementals are not able to detect deleted files,
568renamed files or new files whose modification time is prior to the
569last lower-level backup.
570
571BackupPC can also be configured to keep a certain number of incremental
572backups, and to keep a smaller number of very old incremental backups.
573
574BackupPC "fills-in" incremental backups when browsing or restoring,
575based on the levels of each backup, giving every backup a "full"
576appearance.  This makes browsing and restoring backups much easier:
577you can restore from any one backup independent of whether it was
578an incremental or full.
579
580=item Partial Backup
581
582When a full or incremental backup fails or is canceled, the most
583recent backup is labeled "partial".  Prior to V4, that backup was
584incomplete, and would be deleted when the next backup completed.
585
586In V4 a partial backup denotes that the last backup is incomplete.
587However, since V4 does backup updating in place, it represents the best
588and latest backup.  A partial backup can be browsed or used to restore
589files just like a successful full or incremental backup.  And it will
590be used as the starting point for the next backup attempt.
591
592=item Identical Files
593
594BackupPC pools identical files.  By "identical files" we mean files
595with identical contents, not necessary the same permissions, ownership
596or modification time.  Two files might have different permissions,
597ownership, or modification time but will still be pooled whenever
598the contents are identical.  This is possible since BackupPC stores
599the file metadata (permissions, ownership, and modification time)
600separately from the file contents.
601
602Prior to V4, identical files were stored using hardlinks.  In V4+,
603hardlinks are eliminated (except for temporary atomic renames), and
604reference counting is done at the application level.
605
606=item Backup Policy
607
608Based on your site's requirements you need to decide what your backup
609policy is.  BackupPC is not designed to provide exact re-imaging of
610failed disks.  See L<Some Limitations> for more information.
611However, with rsync and tar transports for linux/unix clients, plus
612full support for special file types, extended attributes etc,
613likely means an exact image of a linux/unix file system can be made.
614
615BackupPC saves backups onto disk. Because of pooling you can relatively
616economically keep several weeks or months of old backups.
617
618At some sites the disk-based backup will be adequate, without a
619secondary offsite cloud, disk or tape backup. This system is robust
620to any single failure: if a client disk fails or loses files, the
621BackupPC server can be used to restore files. If the server disk
622fails, BackupPC can be restarted on a fresh file system, and create
623new backups from the clients. The chance of the server disk failing
624can be made very small by spending more money on increasingly better
625RAID systems.  However, there is still the risk of catastrophic
626events like fires or earthquakes that can destroy both the BackupPC
627server and the clients it is backing up if they are physically
628nearby.
629
630Some sites might choose to do periodic backups to tape or cd/dvd.
631This backup can be done perhaps weekly using the archive function of
632BackupPC.
633
634Other users have reported success with removable disks to rotate the
635BackupPC data drives, or using rsync to mirror the BackupPC data pool
636offsite.
637
638In V4, since hardlinks are not used permanently, duplicating a V4 pool
639is much easier, allowing remote copying of the pool.
640
641=back
642
643=head2 Resources
644
645=over 4
646
647=item BackupPC home page
648
649The BackupPC project page is at:
650
651    https://backuppc.github.io/backuppc
652
653This page has links to the current documentation, github project source
654and general information.
655
656=item Github
657
658BackupPC development is hosted on github:
659
660    https://github.com/backuppc
661
662Releases for BackupPC and the required packages BackupPC-XS and rsync-bpc are
663available at:
664
665    https://github.com/backuppc/backuppc/releases
666    https://github.com/backuppc/backuppc-xs/releases
667    https://github.com/backuppc/rsync-bpc/releases
668
669=item BackupPC Wiki
670
671BackupPC has a Wiki at L<https://github.com/backuppc/backuppc/wiki>.
672Everyone is encouraged to contribute to the Wiki.  Anyone with a
673Github account can edit the Wiki.
674
675=item Mailing lists
676
677Three BackupPC mailing lists exist for announcements (backuppc-announce),
678developers (backuppc-devel), and a general user list for support, asking
679questions or any other topic relevant to BackupPC (backuppc-users).
680
681The lists are archived on SourceForge:
682
683    https://sourceforge.net/p/backuppc/mailman/backuppc-users/
684
685You can subscribe to these lists by visiting:
686
687    http://lists.sourceforge.net/lists/listinfo/backuppc-announce
688    http://lists.sourceforge.net/lists/listinfo/backuppc-users
689    http://lists.sourceforge.net/lists/listinfo/backuppc-devel
690
691The backuppc-announce list is moderated and is used only for
692important announcements (eg: new versions).  It is low traffic.
693You only need to subscribe to one of backuppc-announce and
694backuppc-users: backuppc-users also receives any messages on
695backuppc-announce.
696
697The backuppc-devel list is only for developers who are working on BackupPC.
698Do not post questions or support requests there.  But detailed technical
699discussions should happen on this list.
700
701To post a message to the backuppc-users list, send an email to
702
703    backuppc-users@lists.sourceforge.net
704
705Do not send subscription requests to this address!
706
707=item Other Programs of Interest
708
709If you want to mirror linux or unix files or directories to a remote server
710you should use rsync, L<http://rsync.samba.org>.  BackupPC uses
711rsync as a transport mechanism; if you are already an rsync user you
712can think of BackupPC as adding efficient storage (compression and
713pooling) and a convenient user interface to rsync.
714
715Two popular open source packages that do tape backup are
716Amanda (L<http://www.amanda.org>)
717and Bacula (L<http://www.bacula.org>).
718These packages can be used as complete solutions, or also as back
719ends to BackupPC to backup the BackupPC server data to tape.
720
721Avery Pennarun's bup (L<https://github.com/bup/bup>) uses the git packfile format to
722do efficient incrementals and deduplication.
723Various programs and scripts use rsync to provide hardlinked backups.
724See, for example, Mike Rubel's site (L<http://www.mikerubel.org/computers/rsync_snapshots>),
725JW Schultz's dirvish (L<http://www.dirvish.org/>),
726Ben Escoto's rdiff-backup (L<http://www.nongnu.org/rdiff-backup>),
727and John Bowman's rlbackup (L<http://www.math.ualberta.ca/imaging/rlbackup>).
728
729BackupPC provides many additional features, such as compressed storage,
730deduplicating any matching files (rather than just files with the same name),
731and storing special files without root privileges.  But these other programs
732provide simple, effective and fast solutions and are definitely worthy of
733consideration.
734
735=back
736
737=head2 Road map
738
739The new features planned for future releases of BackupPC
740are on the Wiki at L<https://github.com/backuppc/backuppc/wiki>.
741
742Comments and suggestions are welcome.
743
744=head2 You can help
745
746BackupPC is free. I work on BackupPC because I enjoy doing it and I like
747to contribute to the open source community.
748
749BackupPC already has more than enough features for my own needs.  The
750main compensation for continuing to work on BackupPC is knowing that
751more and more people find it useful.  So feedback is certainly
752appreciated, both positive and negative.
753
754Also, everyone is encouraged to contribute patches, bug reports,
755feature and design suggestions, new code, Wiki additions (you can
756do those directly) and documentation corrections or improvements.
757Answering questions on the mailing list is a big help too.
758
759=head1 Installing BackupPC
760
761=head2 Requirements
762
763BackupPC requires:
764
765=over 4
766
767=item *
768
769A linux, solaris, or unix based server with a substantial amount of free
770disk space (see the next section for what that means). The CPU and disk
771performance on this server will determine how many simultaneous backups
772you can run. You should be able to run 4-8 simultaneous backups on a
773moderately configured server.
774
775It is also recommended you consider either an LVM or RAID setup so that
776you can expand the file system as necessary.
777
778=item *
779
780Perl version 5.8.0 or later.  If you don't have perl, please
781see L<http://www.cpan.org>.
782
783=item *
784
785The perl modules BackupPC::XS (version >= 0.50) is required, and
786several others, File::Listing, Archive::Zip, XML::RSS, Net::FTP,
787Net::FTP::RetrHandle, Net::FTP::AutoReconnect are recommended.
788
789Try "perldoc BackupPC::XS" and "perldoc Archive::Zip" to see if you have these
790modules.  If not, fetch them from L<http://www.cpan.org> and see the
791instructions below for how to build and install them.
792
793The CGI Perl module is required for the http/cgi user interface. CGI was a core module,
794but from version 5.22 Perl no longer ships with it.
795
796=item *
797
798If you are using rsync to backup linux/unix machines you should have
799rsync on each client machine.  Version 3+ is strongly recommended, but
800earlier versions will work too. See L<http://rsync.samba.org>.
801Use "rsync --version" to check your version.
802
803For BackupPC to use Rsync you will also need to install rsync-bpc on
804the server.
805
806=item *
807
808If you are using smb to backup WinXX machines you need smbclient and
809nmblookup from the samba package.  You will also need nmblookup if
810you are backing up linux/unix DHCP machines.  See L<http://www.samba.org>.
811
812See L<http://www.samba.org> for source and binaries.  It's pretty easy to
813fetch and compile samba, and just grab smbclient and nmblookup, without
814doing the installation. Alternatively, L<http://www.samba.org> has binary
815distributions for most platforms.
816
817=item *
818
819If you are using tar to backup linux/unix machines, those machines should have
820version 1.13.20 or higher recommended.  Use "tar --version" to check your version.
821Various GNU mirrors have the newest versions of tar;
822see L<http://www.gnu.org/software/tar/>.
823
824=item *
825
826The Apache web server, see L<http://www.apache.org>, preferably built
827with mod_perl support.
828
829=item *
830
831If rrdtool is installed on the BackupPC server, graphs of the pool usage
832will be maintained and displayed.  To enable the graphs, point $Conf{RrdToolPath}
833to the rrdtool executable.
834
835=back
836
837=head2 What type of storage space do I need?
838
839Starting with 4.0.0, BackupPC no longer uses hardlinks for storage of
840deduplicated files.  However, hardlinks are still used temporarily in
841a few places for doing atomic renames, with a fallback doing a file copy
842if the hardlink fails, and files are moved (renamed) across various paths
843that turn into expensive file copies if they span multiple file systems.
844
845So ideally BackupPC's data store (__TOPDIR__) is a single file system that
846supports hardlinks.  It is ok to use a single symbolic link at the top-level
847directory (__TOPDIR__) to point the entire data store somewhere else).
848You can of course use any kind of RAID system or logical volume manager
849that combines the capacity of multiple disks into a single, larger,
850file system. Such approaches have the advantage that the file system can
851be expanded without having to copy it.
852
853Any standard linux or unix file system supports hardlinks.  NFS mounted
854file systems work too (provided the underlying file system supports
855hardlinks).  But windows based FAT and NTFS file systems will not work.
856
857In BackupPC 3.x, hardlinks are fundamental to deduplication, so a startup
858check is done ensure that the file system can support hardlinks, since
859this is a common area of configuration problems in v3.  In 4.x, that check
860is only done if the pool still contains v3 backups and pool files.
861
862=head2 How much disk space do I need?
863
864Here's one real example (circa 2002) for an environment that is
865backing up 65 laptops with compression off. Each full backup averages
8663.2GB. Each incremental backup averages about 0.2GB. Storing one
867full backup and two incremental backups per laptop is around 240GB
868of raw data. But because of the pooling of identical files, only
86987GB is used.  This is without compression.
870
871Another example, with compression on: backing up 95 laptops, where
872each backup averages 3.6GB and each incremental averages about 0.3GB.
873Keeping three weekly full backups, and six incrementals is around
8741200GB of raw data.  Because of pooling and compression, only 150GB
875is needed.
876
877Here's a rule of thumb. Add up the disk usage of all the machines you
878want to backup (210GB in the first example above). This is a rough
879minimum space estimate that should allow a couple of full backups and at
880least half a dozen incremental backups per machine. If compression is on
881you can reduce the storage requirements by maybe 30-40%.  Add some margin
882in case you add more machines or decide to keep more old backups.
883
884Your actual mileage will depend upon the types of clients, operating
885systems and applications you have. The more uniform the clients and
886applications the bigger the benefit from pooling common files.
887
888In addition to total disk space, you should make sure you have
889plenty of inodes on your BackupPC data partition. Some users have
890reported running out of inodes on their BackupPC data partition.
891So even if you have plenty of disk space, BackupPC will report
892failures when the inodes are exhausted.  This is a particular
893problem with ext2/ext3 file systems that have a fixed number of
894inodes when the file system is built.  Use "df -i" to see your
895inode usage.
896
897=head2 Step 1: Getting BackupPC
898
899Many linux distributions now include BackupPC, so installing
900BackupPC via your package manager is the best approach.
901
902For example, for Debian, supported by Ludovic Drolez, can be found at
903L<http://packages.debian.org/backuppc> and is included in the current
904stable Debian release.  On Debian, BackupPC can be installed with
905the command:
906
907    apt-get install backuppc
908
909You should also install rsync-bpc; the BackupPC package might include
910it already, but if not:
911
912    apt-get install rsync-bpc
913
914If those commands work, you can skip to Step 3.
915
916Alternatively, manually fetching and installing BackupPC is easy.
917Start by downloading the latest version from
918
919    https://github.com/backuppc/backuppc/releases
920
921=head2 Step 2: Installing the distribution
922
923Note: most information in this step is only relevant if you build
924and install BackupPC yourself.  If you use a package provided by a
925distribution, the package management system should take of installing
926any needed dependencies.
927
928First off, there are several perl modules you should install.  The
929first one, BackupPC::XS, is required.  The others are optional
930but highly recommended.  Use either your linux package manager,
931or the cpan command, or follow the instructions in the README files
932to install these packages:
933
934=over 4
935
936=item BackupPC::XS
937
938Significant portions of BackupPC are implemented in C code contained in
939this module.  You can run "perldoc BackupPC::XS" to see if this module
940is installed.  You need to have version >= 0.50.  BackupPC::XS is
941available from:
942
943    https://github.com/backuppc/backuppc-xs/releases
944
945and also CPAN.
946
947=item Archive::Zip
948
949To support restore via Zip archives you will need to install
950Archive::Zip, also from L<http://www.cpan.org>.
951You can run "perldoc Archive::Zip" to see if this module is installed.
952
953=item XML::RSS
954
955To support the RSS feature you will need to install XML::RSS, also from
956L<http://www.cpan.org>.  There is not need to install this module if you
957don't plan on using RSS. You can run "perldoc XML::RSS" to see if this
958module is installed.
959
960=item CGI
961
962The CGI Perl module is required for the http/cgi user interface. CGI was a core module,
963but from version 5.22 Perl no longer ships with it so you'll need to install it if you
964are using a recent version of perl.
965
966=item SCGI
967
968The SCGI Perl module is required to use the S/CGI protocol for the http/cgi user interface.
969
970=item File::Listing, Net::FTP, Net::FTP::RetrHandle, Net::FTP::AutoReconnect
971
972To use ftp with BackupPC you will need four libraries, but actually
973need to install only File::Listing from L<http://www.cpan.org>.
974You can run "perldoc File::Listing" to see if this module is installed.
975Net::FTP is a standard module. Net::FTP::RetrHandle and
976Net::FTP::AutoReconnect included in BackupPC distribution.
977
978=back
979
980To build and install these packages you should use the cpan command.  At
981the prompt, type
982
983    install BackupPC::XS
984
985Alternatively, if you want to install these manually, you can fetch the tarball
986from L<http://www.cpan.org> and then run these commands:
987
988    tar zxvf BackupPC-XS-0.50.tar.gz
989    cd BackupPC-XS-0.50
990    perl Makefile.PL
991    make
992    make test
993    make install
994
995The same sequence of commands can be used for each module.
996
997Next, you should install rsync_bpc if you want to use rsync to backup clients
998(which is the recommended approach for all client types).  If you don't use
999your package manager, fetch the release from:
1000
1001    https://github.com/backuppc/rsync-bpc/releases
1002
1003Then run these commands (updating the version number as appropriate):
1004
1005    tar zxf rsync-bpc-3.0.9.5.tar.gz
1006    cd rsync-bpc-3.0.9.5
1007    ./configure
1008    make
1009    make install
1010
1011Now let's move onto BackupPC itself.  After fetching BackupPC-__VERSION__.tar.gz,
1012run these commands as root:
1013
1014    tar zxf BackupPC-__VERSION__.tar.gz
1015    cd BackupPC-__VERSION__
1016    perl configure.pl
1017
1018The configure.pl script also accepts command-line options if you
1019wish to run it in a non-interactive manner.  It has self-contained
1020documentation for all the command-line options, which you can
1021read with perldoc:
1022
1023    perldoc configure.pl
1024
1025Starting with BackupPC 3.0.0, the configure.pl script by default
1026complies with the file system hierarchy (FHS) conventions.  The
1027major difference compared to earlier versions is that by default
1028configuration files will be stored in /etc/BackupPC
1029rather than below the data directory, __TOPDIR__/conf,
1030and the log files will be stored in /var/log/BackupPC
1031rather than below the data directory, __TOPDIR__/log.
1032
1033Note that distributions may choose to use different locations for
1034BackupPC files than these defaults.
1035
1036If you are upgrading from an earlier version the configure.pl script
1037will keep the configuration files and log files in their original
1038location.
1039
1040When you run configure.pl you will be prompted for the full paths
1041of various executables, and you will be prompted for the following
1042information.
1043
1044=over 4
1045
1046=item BackupPC User
1047
1048It is best if BackupPC runs as a special user, eg backuppc, that has
1049limited privileges. It is preferred that backuppc belongs to a system
1050administrator group so that sysadmin members can browse BackupPC files,
1051edit the configuration files and so on. Although configurable, the
1052default settings leave group read permission on pool files, so make
1053sure the BackupPC user's group is chosen restrictively.
1054
1055On this installation, this is __BACKUPPCUSER__.
1056
1057For security purposes you might choose to configure the BackupPC
1058user with the shell set to /bin/false.  Since you might need to
1059run some BackupPC programs as the BackupPC user for testing
1060purposes, you can use the -s option to su to explicitly run
1061a shell, eg:
1062
1063    su -s /bin/bash __BACKUPPCUSER__
1064
1065Depending upon your configuration you might also need the -l option.
1066
1067If the -s option is not available on your operating system, you can
1068specify the -m option to use your login shell as invoked shell:
1069
1070    su -m __BACKUPPCUSER__
1071
1072=item Data Directory
1073
1074You need to decide where to put the data directory, below which
1075all the BackupPC data is stored.  This needs to be a big file system.
1076
1077On this installation, this is __TOPDIR__.
1078
1079=item Install Directory
1080
1081You should decide where the BackupPC scripts, libraries and documentation
1082should be installed, eg: /usr/local/BackupPC.
1083
1084On this installation, this is __INSTALLDIR__.
1085
1086=item CGI bin Directory
1087
1088You should decide where the BackupPC CGI script resides.  This will
1089usually be below Apache's cgi-bin directory.
1090
1091It is also possible to use a different directory and use Apache's
1092``<Directory>'' directive to specify that location.  See the Apache
1093HTTP Server documentation for additional information.
1094
1095On this installation, this is __CGIDIR__.
1096
1097=item Apache image Directory
1098
1099A directory where BackupPC's images are stored so that Apache can
1100serve them.  You should ensure this directory is readable by Apache and
1101create a symlink to this directory from the BackupPC CGI bin Directory.
1102
1103=item Config and Log Directories
1104
1105In this installation the configuration and log directories are
1106located in the following locations:
1107
1108    __CONFDIR__/config.pl    main config file
1109    __CONFDIR__/hosts        hosts file
1110    __CONFDIR__/pc/HOST.pl   per-pc config file
1111    __LOGDIR__/BackupPC      log files, pid, status
1112
1113The configure.pl script doesn't prompt for these locations but
1114they can be set for new installations using command-line options.
1115
1116=back
1117
1118=head2 Step 3: Setting up config.pl
1119
1120After running configure.pl, browse through the config file,
1121__CONFDIR__/config.pl, and make sure all the default settings are
1122correct. In particular, you will need to decide whether to use
1123smb, tar,or rsync or ftp transport (or whether to set it on a
1124per-PC basis) and set the relevant parameters for that transport
1125method. See the section L<Step 5: Client Setup> for
1126more details.
1127
1128=head2 Step 4: Setting up the hosts file
1129
1130The file __CONFDIR__/hosts contains the list of clients to backup.
1131BackupPC reads this file in three cases:
1132
1133=over 4
1134
1135=item *
1136
1137Upon startup.
1138
1139=item *
1140
1141When BackupPC is sent a HUP (-1) signal.  Assuming you installed the
1142init.d script, you can also do this with "/etc/init.d/backuppc reload".
1143
1144=item *
1145
1146When the modification time of the hosts file changes.  BackupPC
1147checks the modification time once during each regular wakeup.
1148
1149=back
1150
1151Whenever you change the hosts file (to add or remove a host) you can
1152either do a kill -HUP BackupPC_pid or simply wait until the next regular
1153wakeup period.
1154
1155Each line in the hosts file contains three fields, separated
1156by whitespace:
1157
1158=over 4
1159
1160=item Host name
1161
1162This is typically the hostname or NetBios name of the client machine
1163and should be in lowercase.  The hostname can contain spaces (escape
1164with a backslash), but it is not recommended.
1165
1166Please read the section L<How BackupPC Finds Hosts>.
1167
1168In certain cases you might want several distinct clients to refer
1169to the same physical machine.  For example, you might have a database
1170you want to backup, and you want to bracket the backup of the database
1171with shutdown/restart using $Conf{DumpPreUserCmd} and $Conf{DumpPostUserCmd}.
1172But you also want to backup the rest of the machine while the database
1173is still running.  In the case you can specify two different clients in
1174the host file, using any mnemonic name (eg: myhost_mysql and myhost), and
1175use $Conf{ClientNameAlias} in myhost_mysql's config.pl to specify the
1176real hostname of the machine.
1177
1178=item DHCP flag
1179
1180Starting with v2.0.0 the way hosts are discovered has changed and now
1181in most cases you should specify 0 for the DHCP flag, even if the host
1182has a dynamically assigned IP address.
1183Please read the section L<How BackupPC Finds Hosts>
1184to understand whether you need to set the DHCP flag.
1185
1186You only need to set DHCP to 1 if your client machine doesn't
1187respond to the NetBios multicast request:
1188
1189    nmblookup myHost
1190
1191but does respond to a request directed to its IP address:
1192
1193    nmblookup -A W.X.Y.Z
1194
1195If you do set DHCP to 1 on any client you will need to specify the range of
1196DHCP addresses to search is specified in $Conf{DHCPAddressRanges}.
1197
1198Note also that the $Conf{ClientNameAlias} feature does not work for
1199clients with DHCP set to 1.
1200
1201=item User name
1202
1203This should be the unix login/email name of the user who "owns" or uses
1204this machine. This is the user who will be sent email about this
1205machine, and this user will have permission to stop/start/browse/restore
1206backups for this host.  Leave this blank if no specific person should
1207receive email or be allowed to stop/start/browse/restore backups
1208for this host.  Administrators will still have full permissions.
1209
1210=item More users
1211
1212Additional usernames, separated by commas and with no whitespace,
1213can be specified.  These users will also have full permission in
1214the CGI interface to stop/start/browse/restore backups for this host.
1215These users will not be sent email about this host.
1216
1217=back
1218
1219The first non-comment line of the hosts file is special: it contains
1220the names of the columns and should not be edited.
1221
1222Here's a simple example of a hosts file:
1223
1224    host        dhcp    user      moreUsers
1225    farside     0       craig     jim,dave
1226    larson      1       gary      andy
1227
1228=head2 Step 5: Client Setup
1229
1230Four methods for getting backup data from a client are supported:
1231smb, tar, rsync and ftp.  Smb or rsync are the preferred methods
1232for WinXX clients and rsync or tar are the preferred methods for
1233linux/unix/MacOSX clients.
1234
1235The transfer method is set using the $Conf{XferMethod} configuration
1236setting. If you have a mixed environment (ie: you will use smb for some
1237clients and tar for others), you will need to pick the most common
1238choice for $Conf{XferMethod} for the main config.pl file, and then
1239override it in the per-PC config file for those hosts that will use
1240the other method.  (Or you could run two completely separate instances
1241of BackupPC, with different data directories, one for WinXX and the
1242other for linux/unix, but then common files between the different
1243machine types will duplicated.)
1244
1245Here are some brief client setup notes:
1246
1247=over 4
1248
1249=item WinXX
1250
1251One setup for WinXX clients is to set $Conf{XferMethod} to "smb".
1252Actually, rsyncd is the better method for WinXX if you are prepared to
1253run rsync/cygwin on your WinXX client.
1254
1255If you want to use rsyncd for WinXX clients you can find a pre-packaged
1256exe installer on L<https://github.com/backuppc/cygwin-rsyncd/releases>.
1257The package is called cygwin-rsync. It contains rsync.exe, template setup files
1258and the minimal set of cygwin libraries for everything to run.  The README file
1259contains instructions for running rsync as a service, so it starts
1260automatically everytime you boot your machine.  If you use rsync
1261to backup WinXX machines, be sure to set $Conf{ClientCharset}
1262correctly (eg: 'cp1252') so that the WinXX filename encoding is
1263correctly converted to utf8.
1264
1265Otherwise, to use SMB, you can either create shares for the data you want
1266to backup or your can use the existing C$ share.  To create a new
1267share, open "My Computer", right click on the drive (eg: C), and
1268select "Sharing..." (or select "Properties" and select the "Sharing"
1269tab). In this dialog box you can enable sharing, select the share name
1270and permissions.
1271
1272All Windows NT based OS (NT, 2000, XP Pro), are configured by default
1273to share the entire C drive as C$.  This is a special share used for
1274various administration functions, one of which is to grant access to backup
1275operators. All you need to do is create a new domain user, specifically
1276for backup. Then add the new backup user to the built in "Backup
1277Operators" group. You now have backup capability for any directory on
1278any computer in the domain in one easy step. This avoids using
1279administrator accounts and only grants permission to do exactly what you
1280want for the given user, i.e.: backup.
1281Also, for additional security, you may wish to deny the ability for this
1282user to logon to computers in the default domain policy.
1283
1284If this machine uses DHCP you will also need to make sure the
1285NetBios name is set.  Go to Control Panel|System|Network Identification
1286(on Win2K) or Control Panel|System|Computer Name (on WinXP).
1287Also, you should go to Control Panel|Network Connections|Local Area
1288Connection|Properties|Internet Protocol (TCP/IP)|Properties|Advanced|WINS
1289and verify that NetBios is not disabled.
1290
1291The relevant configuration settings are $Conf{SmbShareName},
1292$Conf{SmbShareUserName}, $Conf{SmbSharePasswd}, $Conf{SmbClientPath},
1293$Conf{SmbClientFullCmd}, $Conf{SmbClientIncrCmd} and
1294$Conf{SmbClientRestoreCmd}.
1295
1296BackupPC needs to know the smb share username and password for a
1297client machine that uses smb.  The username is specified in
1298$Conf{SmbShareUserName}. There are four ways to tell BackupPC the
1299smb share password:
1300
1301=over 4
1302
1303=item *
1304
1305As an environment variable BPC_SMB_PASSWD set before BackupPC starts.
1306If you start BackupPC manually the BPC_SMB_PASSWD variable must be set
1307manually first.  For backward compatibility for v1.5.0 and prior, the
1308environment variable PASSWD can be used if BPC_SMB_PASSWD is not set.
1309Warning: on some systems it is possible to see environment variables of
1310running processes.
1311
1312=item *
1313
1314Alternatively the BPC_SMB_PASSWD setting can be included in
1315/etc/init.d/backuppc, in which case you must make sure this file
1316is not world (other) readable.
1317
1318=item *
1319
1320As a configuration variable $Conf{SmbSharePasswd} in
1321__CONFDIR__/config.pl.  If you put the password
1322here you must make sure this file is not world (other) readable.
1323
1324=item *
1325
1326As a configuration variable $Conf{SmbSharePasswd} in the per-PC
1327configuration file (__CONFDIR__/pc/$host.pl or
1328__TOPDIR__/pc/$host/config.pl in non-FHS versions of BackupPC).
1329You will have to use this option if the smb share password is different
1330for each host. If you put the password here you must make sure this file
1331is not world (other) readable.
1332
1333=back
1334
1335Placement and protection of the smb share password is a significant
1336security issue, so please double-check the file and directory
1337permissions.  In a future version there might be support for
1338encryption of this password, but a private key will still have to
1339be stored in a protected place.  Suggestions are welcome.
1340
1341As an alternative to setting $Conf{XferMethod} to "smb" (using
1342smbclient) for WinXX clients, you can use an smb network filesystem (eg:
1343ksmbfs or similar) on your linux/unix server to mount the share,
1344and then set $Conf{XferMethod} to "tar" (use tar on the network
1345mounted file system).
1346
1347Also, to make sure that filenames with special characters are correctly
1348transferred by smbclient you should make sure that the smb.conf file
1349has (for samba 3.x):
1350
1351    [global]
1352	unix charset = UTF8
1353
1354UTF8 is the default setting, so if the parameter is missing then it
1355is ok.  With this setting $Conf{ClientCharset} should be empty,
1356since smbclient has already converted the filenames to utf8.
1357
1358=item Linux/Unix
1359
1360The preferred setup for linux/unix clients is to set $Conf{XferMethod}
1361to "rsync", "rsyncd" or "tar".
1362
1363You can use either rsync, smb, or tar for linux/unix machines. Smb requires
1364that the Samba server (smbd) be run to provide the shares. Since the smb
1365protocol can't represent special files like symbolic links and fifos,
1366tar and rsync are the better transport methods for linux/unix machines.
1367(In fact, by default samba makes symbolic links look like the file or
1368directory that they point to, so you could get an infinite loop if a
1369symbolic link points to the current or parent directory. If you really
1370need to use Samba shares for linux/unix backups you should turn off the
1371"follow symlinks" samba config setting. See the smb.conf manual page.)
1372
1373Important note: many linux systems use sparse files for /var/log/lastlog,
1374and have large special files below /proc and /run.  Make sure you
1375exclude those directories and files when you configure your client.
1376
1377The requirements for each Xfer Method are:
1378
1379=over 4
1380
1381=item rsync
1382
1383To use rsync, you need rsync-bpc installed on the BackupPC server.
1384
1385On the client, you should have at least rsync 3.x.  Rsync is run on
1386the remote client via ssh.
1387
1388The relevant configuration settings are $Conf{RsyncClientPath},
1389$Conf{RsyncSshArgs}, $Conf{RsyncShareName}, $Conf{RsyncArgs},
1390$Conf{RsyncArgsExtra}, $Conf{RsyncFullArgsExtra}, and $Conf{RsyncRestoreArgs}.
1391
1392=item rsyncd
1393
1394To use rsync, you need rsync-bpc installed on the BackupPC server.
1395
1396On the client, you should have at least rsync 3.x. In this case the
1397rsync daemon should be running on the client machine and BackupPC
1398connects directly to it.
1399
1400The relevant configuration settings are $Conf{RsyncBackupPCPath},
1401$Conf{RsyncdClientPort}, $Conf{RsyncdUserName}, $Conf{RsyncdPasswd},
1402$Conf{RsyncShareName}, $Conf{RsyncArgs}, $Conf{RsyncArgsExtra}, and
1403$Conf{RsyncRestoreArgs}. $Conf{RsyncShareName} is the name of an rsync
1404module (ie: the thing in square brackets in rsyncd's conf file -- see
1405rsyncd.conf), not a file system path.
1406
1407Be aware that rsyncd will remove the leading '/' from path names in
1408symbolic links if you specify "use chroot = no" in the rsynd.conf file.
1409See the rsyncd.conf manual page for more information.
1410
1411=item tar
1412
1413You must have GNU tar on the client machine.  Use "tar --version"
1414or "gtar --version" to verify.  The version should be at least
14151.13.20.  Tar is run on the client machine via rsh or ssh.
1416
1417The relevant configuration settings are $Conf{TarClientPath},
1418$Conf{TarShareName}, $Conf{TarClientCmd}, $Conf{TarFullArgs},
1419$Conf{TarIncrArgs}, and $Conf{TarClientRestoreCmd}.
1420
1421=item ftp
1422
1423FTP Xfer Method is supported in V4 but not recommended since it only
1424handles minimal metadata, it doesn't support hardlinks or special
1425files, and can only restore regular files (not symbolic links etc).
1426
1427You need to be running an ftp server on the client machine.
1428The relevant configuration settings are $Conf{FtpShareName},
1429$Conf{FtpUserName}, $Conf{FtpPasswd}, $Conf{FtpBlockSize},
1430$Conf{FtpPort}, $Conf{FtpTimeout}, and $Conf{FtpFollowSymlinks}.
1431
1432=back
1433
1434You need to set $Conf{ClientCharset} to the client's charset so that
1435filenames are correctly converted to utf8.  Use "locale charmap"
1436on the client to see its charset.  Note, however, that modern versions
1437of smbclient and rsync handle this conversion automatically, so in
1438most cases you won't need to set $Conf{ClientCharset}.
1439
1440For linux/unix machines you should not backup "/proc".  This directory
1441contains a variety of files that look like regular files but they are
1442special files that don't need to be backed up (eg: /proc/kcore is a
1443regular file that contains physical memory).  See $Conf{BackupFilesExclude}.
1444It is safe to backup /dev since it contains mostly character-special
1445and block-special files, which are correctly handed by BackupPC
1446(eg: backing up /dev/hda5 just saves the block-special file information,
1447not the contents of the disk).  Similarly, on many linux systems,
1448/var/log/lastlog is a sparse file, with a very large apparent size,
1449so you should exclude that too.
1450
1451Alternatively, rather than backup all the file systems as a single
1452share ("/"), it is easier to restore a single file system if you backup
1453each file system separately.  To do this you should list each file system
1454mount point in $Conf{TarShareName} or $Conf{RsyncShareName}, and add the
1455--one-file-system option to $Conf{TarClientCmd} or $Conf{RsyncArgs}.
1456In this case there is no need to exclude /proc explicitly since it looks
1457like a different file system.
1458
1459Ssh allows BackupPC to run as a privileged user on the client (eg:
1460root), since it needs sufficient permissions to read all the backup
1461files.  Ssh is setup so that BackupPC on the server (an otherwise low
1462privileged user) can ssh as root on the client, without being prompted
1463for a password.  However, directly enabled ssh root logins is not
1464good practice.  A better approach is the ssh as a regular user, and
1465then configure sudo to allow just rsync to be executed.
1466
1467There are two common versions of ssh: v1 and v2. Here are some
1468instructions for one way to setup ssh.  (Check which version of SSH
1469you have by typing "ssh" or "man ssh".)
1470
1471=item MacOSX
1472
1473In general this should be similar to Linux/Unix machines.
1474In versions 10.4 and later, the native MacOSX tar works,
1475and also supports resource forks.  xtar is another option,
1476and rsync works too (although the MacOSX-supplied rsync
1477has an extension for extended attributes that is not
1478compatible with standard rsync).
1479
1480=item SSH Setup
1481
1482SSH is a secure way to run tar or rsync on a backup client to extract
1483the data.  SSH provides strong authentication and encryption of
1484the network data.
1485
1486Note that if you run rsyncd (rsync daemon), ssh is not used.
1487In this case, rsyncd provides its own authentication, but there
1488is no encryption of network data.  If you want encryption of
1489network data you can use ssh to create a tunnel, or use a
1490program like stunnel.
1491
1492Setup instructions for ssh can be found on the
1493Wiki at L<https://github.com/backuppc/backuppc/wiki>.
1494
1495=item Clients that use DHCP
1496
1497If a client machine uses DHCP BackupPC needs some way to find the
1498IP address given the hostname.  One alternative is to set dhcp
1499to 1 in the hosts file, and BackupPC will search a pool of IP
1500addresses looking for hosts.  More efficiently, it is better to
1501set dhcp = 0 and provide a mechanism for BackupPC to find the
1502IP address given the hostname.
1503
1504For WinXX machines BackupPC uses the NetBios name server to determine
1505the IP address given the hostname.
1506For unix machines you can run nmbd (the NetBios name server) from
1507the Samba distribution so that the machine responds to a NetBios
1508name request. See the manual page and Samba documentation for more
1509information.
1510
1511Alternatively, you can set $Conf{NmbLookupFindHostCmd} to any command
1512that returns the IP address given the hostname.
1513
1514Please read the section L<How BackupPC Finds Hosts>
1515for more details.
1516
1517=back
1518
1519=head2 Step 6: Running BackupPC
1520
1521The installation contains an init.d backuppc script that can be copied
1522to /etc/init.d so that BackupPC can auto-start on boot.
1523See init.d/README for further instructions.
1524
1525BackupPC should be ready to start.  If you installed the init.d script,
1526then you should be able to run BackupPC with:
1527
1528    /etc/init.d/backuppc start
1529
1530(This script can also be invoked with "stop" to stop BackupPC and "reload"
1531to tell BackupPC to reload config.pl and the hosts file.)
1532
1533Otherwise, just run
1534
1535     __INSTALLDIR__/bin/BackupPC -d
1536
1537as user __BACKUPPCUSER__.  The -d option tells BackupPC to run as a daemon
1538(ie: it does an additional fork).
1539
1540Any immediate errors will be printed to stderr and BackupPC will quit.
1541Otherwise, look in __LOGDIR__/LOG and verify that BackupPC reports
1542it has started and all is ok.
1543
1544=head2 Step 7: Talking to BackupPC
1545
1546You should verify that BackupPC is running by using BackupPC_serverMesg.
1547This sends a message to BackupPC via the unix (or TCP) socket and prints
1548the response.  Like all BackupPC programs, BackupPC_serverMesg
1549should be run as the BackupPC user (__BACKUPPCUSER__), so you
1550should
1551
1552    su __BACKUPPCUSER__
1553
1554before running BackupPC_serverMesg.  If the BackupPC user is
1555configured with /bin/false as the shell, you can use the -s
1556option to su to explicitly run a shell, eg:
1557
1558    su -s /bin/bash __BACKUPPCUSER__
1559
1560Depending upon your configuration you might also need
1561the -l option.
1562
1563If the -s option is not available on your operating system, you can
1564specify the -m option to use your login shell as invoked shell:
1565
1566    su -m __BACKUPPCUSER__
1567
1568You can request status information and start and stop backups using this
1569interface. This socket interface is mainly provided for the CGI interface
1570(and some of the BackupPC subprograms use it too).  But right now we just
1571want to make sure BackupPC is happy.  Each of these commands should
1572produce some status output:
1573
1574    __INSTALLDIR__/bin/BackupPC_serverMesg status info
1575    __INSTALLDIR__/bin/BackupPC_serverMesg status jobs
1576    __INSTALLDIR__/bin/BackupPC_serverMesg status hosts
1577
1578The output should be some hashes printed with Data::Dumper.  If it
1579looks cryptic and confusing, and doesn't look like an error message,
1580then all is ok.
1581
1582The hosts status should produce a list of every host you have listed
1583in __CONFDIR__/hosts as part of a big cryptic output line.
1584
1585You can also request that all hosts be queued:
1586
1587    __INSTALLDIR__/bin/BackupPC_serverMesg backup all
1588
1589At this point you should make sure the CGI interface works since
1590it will be much easier to see what is going on.  We'll get to that
1591shortly.
1592
1593=head2 Step 8: Checking email delivery
1594
1595The script BackupPC_sendEmail sends status and error emails to
1596the administrator and users.  It is usually run each night
1597by BackupPC_nightly.
1598
1599To verify that it can run sendmail and deliver email correctly
1600you should ask it to send a test email to you:
1601
1602    su __BACKUPPCUSER__
1603    __INSTALLDIR__/bin/BackupPC_sendEmail -u MYNAME@MYDOMAIN.COM
1604
1605BackupPC_sendEmail also takes a -c option that checks if BackupPC
1606is running, and it sends an email to $Conf{EMailAdminUserName}
1607if it is not.  That can be used as a keep-alive check by adding
1608
1609    __INSTALLDIR__/bin/BackupPC_sendEmail -c
1610
1611to __BACKUPPCUSER__'s cron.
1612
1613The -t option to BackupPC_sendEmail causes it to print the email
1614message instead of invoking sendmail to deliver the message.
1615
1616=head2 Step 9: CGI interface
1617
1618The CGI interface script, BackupPC_Admin, is a powerful and flexible
1619way to see and control what BackupPC is doing.  It is written for an
1620Apache server.  If you don't have Apache, see L<http://www.apache.org>.
1621
1622There are three options for setting up the CGI interface:
1623
1624=over 4
1625
1626=item SCGI
1627
1628New to 4.x, SCGI uses the SCGI interface to Apache, which requires
1629the mod_scgi.so module to be installed and loaded by Apache.  This
1630allows Apache to run as any unprivileged user.  The actual SCGI
1631server runs as the as the BackupPC user (__BACKUPPCUSER__), and
1632handles the requests from Apache via a TCP socket.
1633
1634=item mod_perl
1635
1636Mod_perl required the mod_perl module to be loaded by Apache.  This
1637allows BackupPC_Admin to be run from inside Apache.  Unlike SCGI,
1638using mod_perl with BackupPC_Admin requires a dedicated Apache to
1639be run as the BackupPC user (__BACKUPPCUSER__).  This is because
1640BackupPC_Admin needs permission to access various files in BackupPC's
1641data directories.
1642
1643=item standard
1644
1645The standard mode, which is significantly slower than SCGI or
1646mod_perl, is where Apache runs BackupPC_Admin as a separate process
1647for every request.  This adds significant startup overhead for every
1648request, and also requires that BackupPC_Admin be run as setuid to
1649the BackupPC user (__BACKUPPCUSER__), if Apache isn't being run as
1650that user.  Setuid scripts are discouraged, so the preference is to
1651use SCGI or mod_perl.
1652
1653=back
1654
1655Here are some specifics for each setup:
1656
1657=over 4
1658
1659=item SCGI Setup
1660
1661First you need to install mod_scgi.  If you can't find a pre-built
1662package, the source is available at L<http://python.ca/scgi>.  The
1663release has subdirectories for apache1 and apache2.  Pick your
1664matching version (nowadays most likely apache2).  You'll need apxs,
1665the Apache Extension Tool, installed to build from source.  Once
1666compiled, the module mod_scgi.so should be installed via the Makefile.
1667
1668To enable the SCGI server, set $Conf{SCGIServerPort} to an available
1669non-privileged TCP port number, eg: 10268.  The matching port number
1670has to appear in the Apache configuration file.  Typical Apache
1671configuration entries will look like this:
1672
1673    LoadModule scgi_module modules/mod_scgi.so
1674    SCGIMount /BackupPC_Admin 127.0.0.1:10268
1675    <Location /BackupPC_Admin>
1676        AuthUserFile /etc/httpd/conf/passwd
1677        AuthType basic
1678        AuthName "access"
1679        require valid-user
1680    </Location>
1681
1682Or a typical Nginx configuration file:
1683
1684    server {
1685        listen 80;
1686        server_name yourBackupPCServerHost;
1687
1688        root  /var/www/backuppc;
1689
1690        access_log  /var/log/nginx/backuppc.access.log;
1691        error_log   /var/log/nginx/backuppc.error.log;
1692
1693        location /BackupPC_Admin {
1694            auth_basic "BackupPC";
1695            auth_basic_user_file conf.d/backuppc.users;
1696
1697            include   scgi_params;
1698            scgi_pass 127.0.0.1:10268;
1699	        scgi_param REMOTE_USER $remote_user;
1700	        scgi_param SCRIPT_NAME $document_uri;
1701        }
1702    }
1703
1704This allows the SCGI interface to be accessed with a URL:
1705
1706    http://yourBackupPCServerHost/BackupPC_Admin
1707
1708You can use a different path or name if you prefer a different URL.
1709Unlike traditional CGI, there is no need to specify a valid path to
1710a CGI script.
1711
1712Important security warning!!  The SCGIServerPort must not be
1713accessible by anyone untrusted.  That means you can't allow
1714untrusted users access to the BackupPC server, and you should
1715block the SCGIServerPort TCP port on the BackupPC server.  If you
1716don't understand what that means, or can't confirm you have
1717configured SCGI securely, then don't enable SCGI - use one of
1718the following two methods!!
1719
1720=item Mod_perl Setup
1721
1722The advantage of the mod_perl setup is that no setuid script is
1723needed (like in the standard method below), and there is a significant
1724performance advantage.  Not only does all the perl code need to be
1725parsed just once, the config.pl and hosts files, plus the connection
1726to the BackupPC server are cached between requests.  The typical
1727speedup is around 10-15x.
1728
1729To use mod_perl you need to run Apache as user __BACKUPPCUSER__.
1730If you need to run multiple Apaches for different services then
1731you need to create multiple top-level Apache directories, each
1732with their own config file.  You can make copies of /etc/init.d/httpd
1733and use the -d option to httpd to point each http to a different
1734top-level directory.  Or you can use the -f option to explicitly
1735point to the config file.  Multiple Apache's will run on different
1736Ports (eg: 80 is standard, 8080 is a typical alternative port accessed
1737via http://yourhost.com:8080).
1738
1739Inside BackupPC's Apache http.conf file you should check the
1740settings for ServerRoot, DocumentRoot, User, Group, and Port.  See
1741L<http://httpd.apache.org/docs/server-wide.html> for more details.
1742
1743For mod_perl, BackupPC_Admin should not have setuid permission, so
1744you should turn it off:
1745
1746    chmod u-s __CGIDIR__/BackupPC_Admin
1747
1748To tell Apache to use mod_perl to execute BackupPC_Admin, add this
1749to Apache's 1.x httpd.conf file:
1750
1751    <IfModule mod_perl.c>
1752	PerlModule Apache::Registry
1753	PerlTaintCheck On
1754	<Location /cgi-bin/BackupPC/BackupPC_Admin>   # <--- change path as needed
1755	   SetHandler perl-script
1756	   PerlHandler Apache::Registry
1757	   Options ExecCGI
1758	   PerlSendHeader On
1759	</Location>
1760    </IfModule>
1761
1762Apache 2.0.44 with Perl 5.8.0 on RedHat 7.1, Don Silvia reports that
1763this works (with tweaks from Michael Tuzi):
1764
1765    LoadModule perl_module modules/mod_perl.so
1766    PerlModule Apache2
1767
1768    <Directory /path/to/cgi/>
1769	SetHandler perl-script
1770	PerlResponseHandler ModPerl::Registry
1771	PerlOptions +ParseHeaders
1772	Options +ExecCGI
1773	Order deny,allow
1774	Deny from all
1775	Allow from 192.168.0
1776	AuthName "Backup Admin"
1777	AuthType Basic
1778	AuthUserFile /path/to/user_file
1779	Require valid-user
1780    </Directory>
1781
1782There are other optimizations and options with mod_perl.  For
1783example, you can tell mod_perl to preload various perl modules,
1784which saves memory compared to loading separate copies in every
1785Apache process after they are forked.  See Stas's definitive
1786mod_perl guide at L<http://perl.apache.org/guide>.
1787
1788=item Standard Setup
1789
1790The CGI interface should have been installed by the configure.pl script
1791in __CGIDIR__/BackupPC_Admin.  BackupPC_Admin should have been installed
1792as setuid to the BackupPC user (__BACKUPPCUSER__), in addition to user
1793and group execute permission.
1794
1795You should be very careful about permissions on BackupPC_Admin and
1796the directory __CGIDIR__: it is important that normal users cannot
1797directly execute or change BackupPC_Admin, otherwise they can access
1798backup files for any PC. You might need to change the group ownership
1799of BackupPC_Admin to a group that Apache belongs to so that Apache
1800can execute it (don't add "other" execute permission!).
1801The permissions should look like this:
1802
1803    ls -l __CGIDIR__/BackupPC_Admin
1804    -swxr-x---    1 __BACKUPPCUSER__   web      82406 Jun 17 22:58 __CGIDIR__/BackupPC_Admin
1805
1806The setuid script won't work unless perl on your machine was installed
1807with setuid emulation.  This is likely the problem if you get an error
1808saying such as "Wrong user: my userid is 25, instead of 150", meaning
1809the script is running as the httpd user, not the BackupPC user.
1810This is because setuid scripts are disabled by the kernel in most
1811flavors of unix and linux.
1812
1813To see if your perl has setuid emulation, see if there is a program
1814called sperl5.8.0 (or sperl5.8.2 etc, based on your perl version)
1815in the place where perl is installed. If you can't find this program,
1816then you have two options: rebuild and reinstall perl with the setuid
1817emulation turned on (answer "y" to the question "Do you want to do
1818setuid/setgid emulation?" when you run perl's configure script), or
1819switch to the mod_perl alternative for the CGI script (which doesn't
1820need setuid to work).
1821
1822=back
1823
1824BackupPC_Admin requires that users are authenticated by Apache.
1825Specifically, it expects that Apache sets the REMOTE_USER environment
1826variable when it runs.  There are several ways to do this.  One way
1827is to create a .htaccess file in the cgi-bin directory that looks like:
1828
1829    AuthGroupFile /etc/httpd/conf/group    # <--- change path as needed
1830    AuthUserFile /etc/http/conf/passwd     # <--- change path as needed
1831    AuthType basic
1832    AuthName "access"
1833    require valid-user
1834
1835You will also need "AllowOverride Indexes AuthConfig" in the Apache
1836httpd.conf file to enable the .htaccess file. Alternatively, everything
1837can go in the Apache httpd.conf file inside a Location directive. The
1838list of users and password file above can be extracted from the NIS
1839passwd file.
1840
1841One alternative is to use LDAP.  In Apache's http.conf add these lines:
1842
1843    LoadModule auth_ldap_module   modules/auth_ldap.so
1844    AddModule auth_ldap.c
1845
1846    # cgi-bin - auth via LDAP (for BackupPC)
1847    <Location /cgi-bin/BackupPC/BackupPC_Admin>    # <--- change path as needed
1848      AuthType Basic
1849      AuthName "BackupPC login"
1850      # replace MYDOMAIN, PORT, ORG and CO as needed
1851      AuthLDAPURL ldap://ldap.MYDOMAIN.com:PORT/o=ORG,c=CO?uid?sub?(objectClass=*)
1852      require valid-user
1853    </Location>
1854
1855If you want to disable the user authentication you can set
1856$Conf{CgiAdminUsers} to '*', which allows any user to have
1857full access to all hosts and backups.  In this case the REMOTE_USER
1858environment variable does not have to be set by Apache.
1859
1860Alternatively, you can force a particular username by getting Apache
1861to set REMOTE_USER, eg, to hard code the user to www you could add
1862this to Apache's httpd.conf:
1863
1864    <Location /cgi-bin/BackupPC/BackupPC_Admin>   # <--- change path as needed
1865        Setenv REMOTE_USER www
1866    </Location>
1867
1868Finally, you should also edit the config.pl file and adjust, as necessary,
1869the CGI-specific settings.  They're near the end of the config file. In
1870particular, you should specify which users or groups have administrator
1871(privileged) access: see the config settings $Conf{CgiAdminUserGroup}
1872and $Conf{CgiAdminUsers}.  Also, the configure.pl script placed various
1873images into $Conf{CgiImageDir} that BackupPC_Admin needs to serve
1874up.  You should make sure that $Conf{CgiImageDirURL} is the correct
1875URL for the image directory.
1876
1877See the section L<Fixing installation problems> for suggestions on debugging the Apache authentication setup.
1878
1879=head2 How BackupPC Finds Hosts
1880
1881Starting with v2.0.0 the way hosts are discovered has changed.  In most
1882cases you should specify 0 for the DHCP flag in the conf/hosts file,
1883even if the host has a dynamically assigned IP address.
1884
1885BackupPC (starting with v2.0.0) looks up hosts with DHCP = 0 in this manner:
1886
1887=over 4
1888
1889=item *
1890
1891First DNS is used to lookup the IP address given the client's name
1892using perl's gethostbyname() function.  This should succeed for machines
1893that have fixed IP addresses that are known via DNS.  You can manually
1894see whether a given host have a DNS entry according to perl's
1895gethostbyname function with this command:
1896
1897    perl -e 'print(gethostbyname("myhost") ? "ok\n" : "not found\n");'
1898
1899=item *
1900
1901If gethostbyname() fails, BackupPC then attempts a NetBios multicast to
1902find the host.  Provided your client machine is configured properly,
1903it should respond to this NetBios multicast request.  Specifically,
1904BackupPC runs a command of this form:
1905
1906    nmblookup myhost
1907
1908If this fails you will see output like:
1909
1910    querying myhost on 10.10.255.255
1911    name_query failed to find name myhost
1912
1913If it is successful you will see output like:
1914
1915    querying myhost on 10.10.255.255
1916    10.10.1.73 myhost<00>
1917
1918Depending on your netmask you might need to specify the -B option to
1919nmblookup.  For example:
1920
1921    nmblookup -B 10.10.1.255 myhost
1922
1923If necessary, experiment with the nmblookup command which will return the
1924IP address of the client given its name.  Then update
1925$Conf{NmbLookupFindHostCmd} with any necessary options to nmblookup.
1926
1927=back
1928
1929For hosts that have the DHCP flag set to 1, these machines are
1930discovered as follows:
1931
1932=over 4
1933
1934=item *
1935
1936A DHCP address pool ($Conf{DHCPAddressRanges}) needs to be specified.
1937BackupPC will check the NetBIOS name of each machine in the range using
1938a command of the form:
1939
1940    nmblookup -A W.X.Y.Z
1941
1942where W.X.Y.Z is each candidate address from $Conf{DHCPAddressRanges}.
1943Any host that has a valid NetBIOS name returned by this command (ie:
1944matching an entry in the hosts file) will be backed up.  You can
1945modify the specific nmblookup command if necessary via $Conf{NmbLookupCmd}.
1946
1947=item *
1948
1949You only need to use this DHCP feature if your client machine doesn't
1950respond to the NetBios multicast request:
1951
1952    nmblookup myHost
1953
1954but does respond to a request directed to its IP address:
1955
1956    nmblookup -A W.X.Y.Z
1957
1958=back
1959
1960=head2 Other installation topics
1961
1962=over 4
1963
1964=item Removing a client
1965
1966If there is a machine that no longer needs to be backed up (eg: a retired
1967machine) you have two choices.  First, you can keep the backups accessible
1968and browsable, but disable all new backups.  Alternatively, you can
1969completely remove the client and all its backups.
1970
1971To disable backups for a client $Conf{BackupsDisable} can be
1972set to two different values in that client's per-PC config.pl file:
1973
1974=over 4
1975
1976=item 1
1977
1978Don't do any regular backups on this machine.  Manually
1979requested backups (via the CGI interface) will still occur.
1980
1981=item 2
1982
1983Don't do any backups on this machine.  Manually requested
1984backups (via the CGI interface) will be ignored.
1985
1986=back
1987
1988This will still allow the client's old backups to be browsable
1989and restorable.
1990
1991To completely remove a client and all its backups, you should remove its
1992entry in the conf/hosts file, and then delete the __TOPDIR__/pc/$host
1993directory.  Whenever you change the hosts file, you should send
1994BackupPC a HUP (-1) signal so that it re-reads the hosts file.
1995If you don't do this, BackupPC will automatically re-read the
1996hosts file at the next regular wakeup.
1997
1998Note that when you remove a client's backups you won't initially
1999recover much disk space.  That's because the client's files are
2000still in the pool.  Overnight, when BackupPC_nightly next runs,
2001all the unused pool files will be deleted and this will recover
2002the disk space used by the client's backups.
2003
2004=item Copying the pool
2005
2006If the pool disk requirements grow you might need to copy the entire
2007data directory to a new (bigger) file system.  Hopefully you are lucky
2008enough to avoid this by having the data directory on a RAID file system
2009or LVM that allows the capacity to be grown in place by adding disks.
2010
2011Backups prior to V4 make extensive use of hardlinks.  So unless you have
2012a virgin V4 installation, your file system will contain large numbers
2013of hardlinks.  This makes it hard to copy.
2014
2015Prior to V4 (or a V4 upgrade to a V3 installation), the backup data
2016directories contain large numbers of hardlinks.  If you try to copy
2017the pool the target directory will occupy a lot more space if the
2018hardlinks aren't re-established.
2019
2020Unless you have a pure V4 installation, the best way to copy a pool
2021file system, if possible, is by copying the raw device at the block
2022level (eg: using dd).  Application level programs that understand
2023hardlinks include the GNU cp program with the -a option and rsync -H.
2024However, the large number of hardlinks in the pool will make the
2025memory usage large and the copy very slow.  Don't forget to stop
2026BackupPC while the copy runs.
2027
2028If you have a pure V4 installation, copying the pool and PC backup
2029directories should be quite easy.  Rsync 3.x should work well.
2030
2031=back
2032
2033=head2 Fixing installation problems
2034
2035If you find a solution to your problem that could help other users
2036please add it to the Wiki at L<https://github.com/backuppc/backuppc/wiki>.
2037
2038=head1 Restore functions
2039
2040BackupPC supports several different methods for restoring files. The
2041most convenient restore options are provided via the CGI interface.
2042Alternatively, backup files can be restored using manual commands.
2043
2044=head2 CGI restore options
2045
2046By selecting a host in the CGI interface, a list of all the backups
2047for that machine will be displayed.  By selecting the backup number
2048you can navigate the shares and directory tree for that backup.
2049
2050BackupPC's CGI interface automatically fills incremental backups
2051with the corresponding full backup, which means each backup has
2052a filled appearance.  Therefore, there is no need to do multiple
2053restores from the incremental and full backups: BackupPC does all
2054the hard work for you.  You simply select the files and directories
2055you want from the correct backup vintage in one step.
2056
2057You can download a single backup file at any time simply by selecting
2058it.  Your browser should prompt you with the filename and ask you
2059whether to open the file or save it to disk.
2060
2061Alternatively, you can select one or more files or directories in
2062the currently selected directory and select "Restore selected files".
2063(If you need to restore selected files and directories from several
2064different parent directories you will need to do that in multiple
2065steps.)
2066
2067If you select all the files in a directory, BackupPC will replace
2068the list of files with the parent directory.  You will be presented
2069with a screen that has three options:
2070
2071=over 4
2072
2073=item Option 1: Direct Restore
2074
2075With this option the selected files and directories are restored
2076directly back onto the host, by default in their original location.
2077Any old files with the same name will be overwritten, so use caution.
2078You can optionally change the target hostname, target share name,
2079and target path prefix for the restore, allowing you to restore the
2080files to a different location.
2081
2082Once you select "Start Restore" you will be prompted one last time
2083with a summary of the exact source and target files and directories
2084before you commit.  When you give the final go ahead the restore
2085operation will be queued like a normal backup job, meaning that it
2086will be deferred if there is a backup currently running for that host.
2087When the restore job is run, smbclient, tar, rsync or rsyncd is used
2088(depending upon $Conf{XferMethod}) to actually restore the files.
2089Sorry, there is currently no option to cancel a restore that has been
2090started.  Currently ftp restores are not fully implemented.
2091
2092A record of the restore request, including the result and list of
2093files and directories, is kept.  It can be browsed from the host's
2094home page.  $Conf{RestoreInfoKeepCnt} specifies how many old restore
2095status files to keep.
2096
2097Note that for direct restore to work, the $Conf{XferMethod} must
2098be able to write to the client.  For example, that means an SMB
2099share for smbclient needs to be writable, and the rsyncd module
2100needs "read only" set to "false".  This creates additional security
2101risks.  If you only create read-only SMB shares (which is a good
2102idea), then the direct restore will fail.  You can disable the
2103direct restore option by setting $Conf{SmbClientRestoreCmd},
2104$Conf{TarClientRestoreCmd} and $Conf{RsyncRestoreArgs} to undef.
2105
2106=item Option 2: Download Zip archive
2107
2108With this option a zip file containing the selected files and directories
2109is downloaded.  The zip file can then be unpacked or individual files
2110extracted as necessary on the host machine. The compression level can be
2111specified.  A value of 0 turns off compression.
2112
2113When you select "Download Zip File" you should be prompted where to
2114save the restore.zip file.
2115
2116BackupPC does not consider downloading a zip file as an actual
2117restore operation, so the details are not saved for later browsing
2118as in the first case.  However, a mention that a zip file was
2119downloaded by a particular user, and a list of the files, does
2120appear in BackupPC's log file.
2121
2122=item Option 3: Download Tar archive
2123
2124This is identical to the previous option, except a tar file is downloaded
2125rather than a zip file (and there is currently no compression option).
2126
2127=back
2128
2129=head2 Command-line restore options
2130
2131Apart from the CGI interface, BackupPC allows you to restore files
2132and directories from the command line.  The following programs can
2133be used:
2134
2135=over 4
2136
2137=item BackupPC_zcat
2138
2139For each filename argument it inflates (uncompresses) the file and
2140writes it to stdout.  To use BackupPC_zcat you could give it the
2141full filename, eg:
2142
2143    __INSTALLDIR__/bin/BackupPC_zcat __TOPDIR__/pc/host/5/fc/fcraig/fexample.txt > example.txt
2144
2145It's your responsibility to make sure the file is really compressed:
2146BackupPC_zcat doesn't check which backup the requested file is from.
2147BackupPC_zcat returns a nonzero status if it fails to uncompress
2148a file.
2149
2150In V4, BackupPC_zcat can be invoked in several other ways:
2151
2152    BackupPC_zcat file...
2153    BackupPC_zcat MD5_digest...
2154    BackupPC_zcat $TopDir/pc/host/num/share/mangledPath...
2155    BackupPC_zcat [-h host] [-n num] [-s share] clientPath...
2156
2157For example, you can do this:
2158
2159    BackupPC_zcat d73955e08410dfc5ea8069b05d2f43b2
2160
2161That digest can be pasted from the output of BackupPC_ls.
2162
2163The last form uses unmangled paths, so you can do this:
2164
2165    BackupPC_zcat -h HOST -n 10 -s / /home/craig/file
2166
2167You can also mix real paths with unmangled paths.  Both of these versions work:
2168
2169    BackupPC_zcat /data/BackupPC/pc/HOST/10/fhome/fcraig/ffile
2170    BackupPC_zcat /data/BackupPC/pc/HOST/10/home/craig/file
2171
2172=item BackupPC_tarCreate
2173
2174BackupPC_tarCreate creates a tar file for any files or directories in
2175a particular backup.  Merging of incrementals is done automatically,
2176so you don't need to worry about whether certain files appear in the
2177incremental or full backup.
2178
2179The usage is:
2180
2181    BackupPC_tarCreate [options] files/directories...
2182    Required options:
2183       -h host         host from which the tar archive is created
2184       -n dumpNum      dump number from which the tar archive is created
2185                       A negative number means relative to the end (eg -1
2186                       means the most recent dump, -2 2nd most recent etc).
2187       -s shareName    share name from which the tar archive is created;
2188                       can be "*" to mean all shares.
2189
2190    Other options:
2191       -t              print summary totals
2192       -r pathRemove   path prefix that will be replaced with pathAdd
2193       -p pathAdd      new path prefix
2194       -b BLOCKS       BLOCKS x 512 bytes per record (default 20; same as tar)
2195       -w writeBufSz   write buffer size (default 1048576 = 1MB)
2196       -e charset      charset for encoding filenames (default: value of
2197                       $Conf{ClientCharset} when backup was done)
2198       -l              just print a file listing; don't generate an archive
2199       -L              just print a detailed file listing; don't generate an archive
2200
2201The command-line files and directories are relative to the specified
2202shareName.  The tar file is written to stdout.
2203
2204The -h, -n and -s options specify which dump is used to generate
2205the tar archive.  The -r and -p options can be used to relocate
2206the paths in the tar archive so extracted files can be placed
2207in a location different from their original location.
2208
2209=item BackupPC_zipCreate
2210
2211BackupPC_zipCreate creates a zip file for any files or directories in
2212a particular backup.  Merging of incrementals is done automatically,
2213so you don't need to worry about whether certain files appear in the
2214incremental or full backup.
2215
2216The usage is:
2217
2218    BackupPC_zipCreate [options] files/directories...
2219    Required options:
2220       -h host         host from which the zip archive is created
2221       -n dumpNum      dump number from which the tar archive is created
2222                       A negative number means relative to the end (eg -1
2223                       means the most recent dump, -2 2nd most recent etc).
2224       -s shareName    share name from which the zip archive is created
2225
2226    Other options:
2227       -t              print summary totals
2228       -r pathRemove   path prefix that will be replaced with pathAdd
2229       -p pathAdd      new path prefix
2230       -c level        compression level (default is 0, no compression)
2231       -e charset      charset for encoding filenames (default: utf8)
2232
2233The command-line files and directories are relative to the specified
2234shareName.  The zip file is written to stdout. The -h, -n and -s
2235options specify which dump is used to generate the zip archive.  The
2236-r and -p options can be used to relocate the paths in the zip archive
2237so extracted files can be placed in a location different from their
2238original location.
2239
2240=item BackupPC_ls
2241
2242In V3, a full (or filled) backup tree contains all the files, albeit with "mangled"
2243names, and the file contents are compressed.  Some users found it convenient to
2244directly navigate a PC's backup tree to check for files.
2245
2246In V4 that is not possible, since only a single attrib file is stored per directory
2247in the PC backup tree, so the directory contents aren't visible without looking in
2248the attrib file.
2249
2250A new utility BackupPC_ls (like "ls") can be used to view PC backup trees.  It shows file digests,
2251which can be pasted to BackupPC_zcat if you want to view the file contents.  The arguments
2252are similar to BackupPC_zcat.  The usage is:
2253
2254    BackupPC_ls [-iR] [-h host] [-n bkupNum] [-s shareName] dirs/files...
2255
2256The -i option will show inodes (inode number and number of links).  The -R option recurses into
2257directories.
2258
2259If you don't specify -h, -n and -s, then you can specify the real file system path instead.
2260For example, the following three commands are equivalent:
2261
2262    BackupPC_ls -h HOST -n 10 -s cDrive /home/craig/file
2263    BackupPC_ls /data/BackupPC/pc/HOST/10/fcDrive/fhome/fcraig/ffile
2264    BackupPC_ls /data/BackupPC/pc/HOST/10/cDrive/home/craig/file
2265
2266As you can see, the portion of the full path after the backup number can
2267be either mangled or not.  Note that using the mangled form allows directory-name
2268completion via the shell, since those directories actually exist.
2269
2270It would be great if someone would like to volunteer to add features to BackupPC_ls
2271to make file and directory completion work with unmangled names via the shell.  In
2272tcsh you can specify a completion program to run - BackupPC_ls could be given special
2273arguments to spit out the potential (unmangled) completions.  I'm not sure how bash
2274does this.
2275
2276=back
2277
2278Each of these programs reside in __INSTALLDIR__/bin.
2279
2280=head1 Archive functions
2281
2282BackupPC supports archiving to removable media. For users that require
2283offsite backups, BackupPC can create archives that stream to tape
2284devices, or create files of specified sizes to fit onto cd or dvd media.
2285
2286Each archive type is specified by a BackupPC host with its XferMethod
2287set to 'archive'. This allows for multiple configurations at sites where
2288there might be a combination of tape and cd/dvd backups being made.
2289
2290BackupPC provides a menu that allows one or more hosts to be archived.
2291The most recent backup of each host is archived using BackupPC_tarCreate,
2292and the output is optionally compressed and split into fixed-sized
2293files (eg: 650MB).
2294
2295The archive for each host is done by default using
2296__INSTALLDIR__/bin/BackupPC_archiveHost.  This script can be copied
2297and customized as needed.
2298
2299=head2 Configuring an Archive Host
2300
2301To create an Archive Host, add it to the hosts file just as any other host
2302and call it a name that best describes the type of archive, e.g. ArchiveDLT
2303
2304To tell BackupPC that the Host is for Archives, create a config.pl file in
2305the Archive Hosts's pc directory, adding the following line:
2306
2307$Conf{XferMethod} = 'archive';
2308
2309To further customise the archive's parameters you can add the changed
2310parameters in the host's config.pl file. The parameters are explained in
2311the config.pl file.  Parameters may be fixed or the user can be allowed
2312to change them (eg: output device).
2313
2314The per-host archive command is $Conf{ArchiveClientCmd}.  By default
2315this invokes
2316
2317     __INSTALLDIR__/bin/BackupPC_archiveHost
2318
2319which you can copy and customize as necessary.
2320
2321=head2 Starting an Archive
2322
2323In the web interface, click on the Archive Host you wish to use. You will see a
2324list of previous archives and a summary on each. By clicking the "Start Archive"
2325button you are presented with the list of hosts and the approximate backup size
2326(note this is raw size, not projected compressed size) Select the hosts you wish
2327to archive and press the "Archive Selected Hosts" button.
2328
2329The next screen allows you to adjust the parameters for this archive run.
2330Press the "Start the Archive" to start archiving the selected hosts with the
2331parameters displayed.
2332
2333=head2 Starting an Archive from the command line
2334
2335The script BackupPC_archiveStart can be used to start an archive from
2336the command line (or cron etc).  The usage is:
2337
2338    BackupPC_archiveStart archiveHost userName hosts...
2339
2340This creates an archive of the most recent backup of each of
2341the specified hosts.  The first two arguments are the archive
2342host and the username making the request.
2343
2344=head1 Other Command Line Utilities
2345
2346These utilities are automatically run by BackupPC when needed.  You don't
2347need to manually run these utilities.
2348
2349=over
2350
2351=item BackupPC_attribPrint
2352
2353BackupPC_attribPrint prints the contents of an attrib file.  Usage:
2354
2355	BackupPC_attribPrint attribPath
2356	BackupPC_attribPrint inodePath/inodeNum
2357
2358=item BackupPC_backupDelete
2359
2360BackupPC_backupDelete deletes an entire backup, or a directory path within a backup.  Usage:
2361
2362    BackupPC_backupDelete -h host -n num [-p] [-l] [-r] [-s shareName [dirs...]]
2363    Options:
2364       -h host         hostname
2365       -n num          backup number to delete
2366       -s shareName    don't delete the backup; delete just this share
2367                       (or only dirs below this share if specified)
2368       -p              don't print progress information
2369       -l              don't remove XferLOG files
2370       -r              do a ref count update (default: none)
2371    If a shareName is specified, just that share (or share/dirs) are deleted.
2372    The backup itself is not deleted, nor is the log file removed.
2373
2374=item BackupPC_backupDuplicate
2375
2376BackupPC_backupDuplicate duplicates the last backup, which is used to create a filled backup
2377copy, and also to convert a V3 backup to a new V4 starting point.  Usage:
2378
2379    BackupPC_backupDuplicate -h host [-p]
2380    Options:
2381       -h host         hostname
2382       -p              don't print progress information
2383
2384=item BackupPC_fixupBackupSummary
2385
2386BackupPC_fixupBackupSummary is used to re-create the backups file for all the hosts if it
2387is damaged or deleted.  Usage:
2388
2389    BackupPC_fixupBackupSummary [-l]
2390    Options:
2391      -l    legacy mode: try to reconstruct backups from LOG
2392            files for backups prior to BackupPC v3.0.
2393
2394=item BackupPC_fsck
2395
2396BackupPC_fsck can only be run manually, and only while BackupPC isn't running.  It updates
2397the host reference counts, the overall pool reference counts and stats.  Usage:
2398
2399    BackupPC_fsck [options]
2400    Options:
2401       -f              force regeneration of per-host reference counts
2402       -n              don't remove zero count pool files - print only
2403       -s              recompute pool stats
2404
2405=item BackupPC_migrateV3toV4
2406
2407If you upgraded an existing 3.x installation, BackupPC 4.x is backward compatible with 3.x backups:
2408it can browse, view and restore files.  However, the existing 3.x backups will still use hardlinks
2409for storage, and until those 3.x backups eventually expire, hardlinks will still be used for 3.x
2410backups.
2411
2412BackupPC_migrateV3toV4 is an optional utility that can migrate existing 3.x backups to 4.x stoage
2413format, eliminating hardlinks.  This allows you to eliminate the old V3 pool and you can then
2414set $Conf{PoolV3Enabled} to 0.
2415
2416    BackupPC_migrateV3toV4 -a [-m] [-p] [-v]
2417    BackupPC_migrateV3toV4 -h host [-n V3backupNum] [-m] [-p] [-v]
2418    Options:
2419       -a              migrate all hosts and all backups
2420       -h host         migrate just a specific host
2421       -n V3backupNum  migrate specific host backup; does all V3 backups
2422      		       for that host if not specified
2423       -m              don't migrate anything; just print what would be done
2424       -p              don't print progress information
2425       -v              verbose
2426
2427The BackupPC server should not be running when you run BackupPC_migrateV3toV4.
2428It will check and exit if the BackupPC server is running.
2429
2430If you want to test BackupPC_migrateV3toV4, a cautious approach is to make
2431backup copies of the V3 backups, allowing you to restore them if there is
2432any issue.  For example, if exampleHost has three 3.x backups numbered 5,
24336, 7, you can use cp -prl (preserving hardlinks) to make copies:
2434
2435    cd /data/BackupPC/pc/exampleHost
2436    mv 5 5.orig ; cp -prl 5.orig 5
2437    mv 6 6.orig ; cp -prl 6.orig 6
2438    mv 7 7.orig ; cp -prl 7.orig 7
2439    cp backups backups.save
2440
2441    BackupPC_migrateV3toV4 -h exampleHost -n 5
2442    BackupPC_migrateV3toV4 -h exampleHost -n 6
2443    BackupPC_migrateV3toV4 -h exampleHost -n 7
2444
2445If you want to put things back the way they were:
2446
2447    rm -rf 5 ; mv 5.orig 5
2448    rm -rf 6 ; mv 6.orig 6
2449    rm -rf 7 ; mv 7.orig 7
2450    # copy the [567] lines from backups.save into backups;
2451    # only do "cp backups.save backups" if you are sure no
2452    # new backups have been done
2453
2454Two important things to note with BackupPC_migrateV3toV4.  First, V4
2455storage does use more filesystem inodes than V3 (that's the small cost
2456of getting rid of hardlinks).  In particular, each directory in a backup
2457tree uses two inodes in V4 (one for the directory, and one for the (empty)
2458attrib file), and only one inode in V3 (one for the directory, and the
2459attrib and all other files are hardlinked to the pool).  So before you run
2460BackupPC_migrateV3toV4, make sure you have enough inodes in __TOPDIR__;
2461use df -i to make sure you are under 45% inode usage.
2462
2463Secondly, if you run BackupPC_migrateV3toV4 on all your backups, the
2464old V3 pool should be empty, except for old-style attrib files, which
2465should all have only one link since no backups should reference them any
2466longer.  Before you turn off the V3 pool by setting $Conf{PoolV3Enabled}
2467to 0, make sure BackupPC_nightly has run enough times (specifically,
2468$Conf{PoolSizeNightlyUpdatePeriod} times) so that the V3 pool can be
2469emptied.  You could do this manually, but only if you are very careful
2470to check that the remaining files only have one link.
2471
2472=item BackupPC_poolCntPrint
2473
2474BackupPC_poolCntPrint is used to print reference count information, either per-backup,
2475per-host or for the entire pool depending on the file path you use.
2476
2477If you provide a hex md5 digest, the entire pool count for that digest is printed.
2478Usage:
2479
2480    BackupPC_poolCntPrint [poolCntFilePath|hexDigest]...
2481
2482=item BackupPC_refCountUpdate
2483
2484BackupPC_refCountUpdate is used to either update the per-backup and
2485per-host reference counts, or the system-wide reference counts. It
2486is used by BackupPC_dump, BackupPC_nightly, BackupPC_backupDelete,
2487BackupPC_backupDuplicate and BackupPC_fsck.  Usage:
2488
2489    BackupPC_refCountUpdate -h HOST [-c] [-f] [-F] [-o N] [-p] [-v]
2490        With no other args, updates count db on backups with poolCntDelta files
2491        and computers the host's total reference counts.  Also builds refCnt for
2492        any >=4.0 backups without refCnts.
2493          -f     - do an fsck on this HOST, which involves a rebuild of the
2494                   last two backup refCnts.  poolCntDelta files are ignored.
2495                   Also forces fsck if requested by needFsck flag files
2496                   in TopDir/pc/HOST/refCnt.  Equivalent to -o 2.
2497          -F     - rebuild all the >=4.0 per-backup refCnt files for this
2498                   host.  Equivalent to -o 3.
2499          -c     - compare current count db to new db before replacing
2500          -o N   - override $Conf{RefCntFsck}.
2501          -p     - don't show progress
2502          -v     - verbose
2503      Notes: in case there are legacy (ie: <=4.0.0alpha3) unapplied poolCntDelta
2504      files in TopDir/pc/HOST/refCnt then the -f flag is turned on.
2505
2506    BackupPC_refCountUpdate -m [-f] [-p] [-c] [-r N-M] [-s] [-v] [-P phase]
2507          -m       Updates main count db, based on each HOST
2508          -f     - do an fsck on all the hosts, ignoring poolCntDelta files,
2509                   and replacing each host's count db.  Will wait for backups
2510                   to finish if any are running.
2511          -F     - rebuild all the >=4.0 per-backup refCnt files.
2512          -p     - don't show progress
2513          -c     - clean pool files
2514          -r N-M - process a subset of the main count db, 0 <= N <= M <= 255
2515          -s     - prints stats
2516          -v     - verbose
2517          -P phase Phase from 0..15 each time we run BackupPC_nightly.  Used
2518                   to compute exact pool size for portions of the pool based
2519                   on the phase and $Conf{PoolSizeNightlyUpdatePeriod}.
2520
2521=back
2522
2523=head1 Other CGI Functions
2524
2525=head2 Configuration and Host Editor
2526
2527The CGI interface has a complete configuration and host editor.
2528Only the administrator can edit the main configuration settings
2529and hosts.  The edit links are in the left navigation bar.
2530
2531When changes are made to any parameter a "Save" button appears
2532at the top of the page.  If you are editing a text box you will
2533need to click outside of the text box to make the Save button
2534appear.  If you don't select Save then the changes won't be saved.
2535
2536The host-specific configuration can be edited from the host
2537summary page using the link in the left navigation bar.
2538The administrator can edit any of the host-specific
2539configuration settings.
2540
2541When editing the host-specific configuration, each parameter has
2542an "override" setting that denotes the value is host-specific,
2543meaning that it overrides the setting in the main configuration.
2544If you deselect "override" then the setting is removed from
2545the host-specific configuration, and the main configuration
2546file is displayed.
2547
2548User's can edit their host-specific configuration if enabled
2549via $Conf{CgiUserConfigEditEnable}.  The specific subset
2550of configuration settings that a user can edit is specified
2551with $Conf{CgiUserConfigEdit}.  It is recommended to make this
2552list short as possible (you probably don't want your users saving
2553dozens of backups) and it is essential that they can't edit any
2554of the Cmd configuration settings, otherwise they can specify
2555an arbitrary command that will be executed as the BackupPC
2556user.
2557
2558=head2 RSS
2559
2560BackupPC supports a very basic RSS feed.  Provided you have the
2561XML::RSS perl module installed, a URL similar to this will
2562provide RSS information:
2563
2564    http://localhost/cgi-bin/BackupPC/BackupPC_Admin?action=rss
2565
2566This feature is experimental.  The information included will
2567probably change.
2568
2569=head1 BackupPC Design
2570
2571=head2 Some design issues
2572
2573=over 4
2574
2575=item Pooling common files
2576
2577To see if a file is already in the pool, an MD5 digest of the file
2578contents is used.  This can't guarantee a file is identical: it
2579just reduces the search to often a single file or handful of files.
2580
2581Depending on the Xfer method and settings, a complete file comparison
2582is done to verify if two files are really the same.
2583
2584Prior to V4, identical files on multiples backups are represented
2585by hard links.  Hardlinks are used so that identical files all refer
2586to the same physical file on the server's disk. Also, hard links
2587maintain reference counts so that BackupPC knows when to delete
2588unused files from the pool.
2589
2590In V4+, hardlinks are not used and reference counting is done at the
2591application level.  It is done in a batch manner, which simplifies
2592the implementation.
2593
2594For the computer-science majors among you, you can think of the pooling
2595system used by BackupPC as just a chained hash table stored on a (big)
2596file system.
2597
2598=item The hashing function
2599
2600In V4+, the file digest is the MD5 digest of the complete file.
2601While MD5 collisions are now well known, and can be easily constructed,
2602in real use collisions will be extremely unlikely.
2603
2604Prior to V4, just a portion of all but the smallest files was used
2605for the digest.  That decision was made long ago when CPUs were a
2606lot slower.  For files less than 256K, the digest is the MD5 digest
2607of the file size and the full file.  For files up to 1MB, the first
2608and last 128K of the file, and for over 1MB, the first and eighth
2609128K chunks are used, together with the file size.
2610
2611=item Compression
2612
2613BackupPC supports compression. It uses the deflate and inflate methods
2614in the Compress::Zlib module, which is based on the zlib compression
2615library (see L<http://www.gzip.org/zlib/>).
2616
2617The $Conf{CompressLevel} setting specifies the compression level to use.
2618Zero (0) means no compression. Compression levels can be from 1 (least
2619cpu time, slightly worse compression) to 9 (most cpu time, slightly
2620better compression). The recommended value is 3. Changing it to 5, for
2621example, will take maybe 20% more cpu time and will get another 2-3%
2622additional compression.  Diminishing returns set in above 5.  See the zlib
2623documentation for more information about compression levels.
2624
2625BackupPC implements compression with minimal CPU load. Rather than
2626compressing every incoming backup file and then trying to match it
2627against the pool, BackupPC computes the MD5 digest based on the
2628uncompressed file, and matches against the candidate pool files by
2629comparing each uncompressed pool file against the incoming backup file.
2630Since inflating a file takes roughly a factor of 10 less CPU time than
2631deflating there is a big saving in CPU time.
2632
2633The combination of pooling common files and compression can yield
2634a factor of 8 or more overall saving in backup storage.
2635
2636Note that you should not turn compression on and off are you have
2637started running BackupPC.  It will result in double the storage needs,
2638since all the files will be stored in both the compressed and uncompressed
2639pools.
2640
2641=back
2642
2643=head2 BackupPC operation
2644
2645BackupPC reads the configuration information from
2646__CONFDIR__/config.pl. It then runs and manages all the backup
2647activity. It maintains queues of pending backup requests, user backup
2648requests and administrative commands. Based on the configuration various
2649requests will be executed simultaneously.
2650
2651As specified by $Conf{WakeupSchedule}, BackupPC wakes up periodically
2652to queue backups on all the PCs.  This is a four step process:
2653
2654=over 4
2655
2656=item 1
2657
2658For each host and DHCP address backup requests are queued on the
2659background command queue.
2660
2661=item 2
2662
2663For each PC, BackupPC_dump is forked. Several of these may be run in
2664parallel, based on the configuration. First a ping is done to see if
2665the machine is alive. If this is a DHCP address, nmblookup is run to
2666get the netbios name, which is used as the hostname. If DNS lookup
2667fails, $Conf{NmbLookupFindHostCmd} is run to find the IP address from
2668the hostname.  The file __TOPDIR__/pc/$host/backups is read to decide
2669whether a full or incremental backup needs to be run. If no backup is
2670scheduled, or the ping to $host fails, then BackupPC_dump exits.
2671
2672The backup is done using the specified XferMethod.  Either samba's smbclient
2673or tar over ssh/rsh/nfs piped into BackupPC_tarExtract, or rsync over ssh/rsh
2674is run, or rsyncd is connected to, with the incoming data
2675extracted to __TOPDIR__/pc/$host/new.  The XferMethod output is put
2676into __TOPDIR__/pc/$host/XferLOG.
2677
2678The letter in the XferLOG file shows the type of object, similar to the
2679first letter of the modes displayed by ls -l:
2680
2681    d -> directory
2682    l -> symbolic link
2683    b -> block special file
2684    c -> character special file
2685    p -> pipe file (fifo)
2686    nothing -> regular file
2687
2688The words mean:
2689
2690=over 4
2691
2692=item create
2693
2694new for this backup (ie: directory or file not in pool)
2695
2696=item pool
2697
2698found a match in the pool
2699
2700=item same
2701
2702file is identical to previous backup (contents were
2703checksummed and verified during full dump).
2704
2705=item skip
2706
2707file skipped in incremental because attributes are the
2708same (only displayed if $Conf{XferLogLevel} >= 2).
2709
2710=back
2711
2712As BackupPC_tarExtract extracts the files from smbclient or tar, or as
2713rsync or ftp runs, it checks each file in the backup to see if it is
2714identical to an existing file from any previous backup of any PC. It
2715does this without needed to write the file to disk. If the file matches
2716an existing file, a hardlink is created to the existing file in the
2717pool. If the file does not match any existing files, the file is written
2718to disk and inserted into the pool.
2719
2720BackupPC_tarExtract and rsync can handle arbitrarily large files
2721and multiple candidate matching files without needing to write the
2722file to disk in the case of a match.  This significantly reduces
2723disk writes (and also reads, since the pool file comparison is done
2724disk to memory, rather than disk to disk).
2725
2726Based on the configuration settings, BackupPC_dump checks each
2727old backup to see if any should be removed.
2728
2729=item 3
2730
2731Once each night, BackupPC_nightly is run to complete some additional
2732administrative tasks, such as cleaning the pool.  This involves
2733removing any files in the pool that only have a single hard link
2734(meaning no backups are using that file).
2735
2736If BackupPC_nightly takes too long to run, the settings
2737$Conf{MaxBackupPCNightlyJobs} and $Conf{BackupPCNightlyPeriod} can
2738be used to run several BackupPC_nightly processes in parallel, and
2739to split its job over several nights.
2740
2741=back
2742
2743BackupPC also listens for TCP connections on $Conf{ServerPort}, which
2744is used by the CGI script BackupPC_Admin for status reporting and
2745user-initiated backup or backup cancel requests.
2746
2747=head2 Storage layout
2748
2749BackupPC resides in several directories:
2750
2751=over 4
2752
2753=item __INSTALLDIR__
2754
2755Perl scripts comprising BackupPC reside in __INSTALLDIR__/bin,
2756libraries are in __INSTALLDIR__/lib and documentation
2757is in __INSTALLDIR__/doc.
2758
2759=item __CGIDIR__
2760
2761The CGI script BackupPC_Admin resides in this cgi binary directory.
2762
2763=item __CONFDIR__
2764
2765All the configuration information resides below __CONFDIR__.
2766This directory contains:
2767
2768The directory __CONFDIR__ contains:
2769
2770=over 4
2771
2772=item config.pl
2773
2774Configuration file. See L<Configuration File> below for more details.
2775
2776=item hosts
2777
2778Hosts file, which lists all the PCs to backup.
2779
2780=item pc
2781
2782The directory __CONFDIR__/pc contains per-client configuration files
2783that override settings in the main configuration file.  Each file
2784is named __CONFDIR__/pc/HOST.pl, where HOST is the hostname.
2785
2786In pre-FHS versions of BackupPC these files were located in
2787__TOPDIR__/pc/HOST/config.pl.
2788
2789=back
2790
2791=item __LOGDIR__
2792
2793The directory __LOGDIR__ (__TOPDIR__/log on pre-FHS versions
2794of BackupPC) contains:
2795
2796=over 4
2797
2798=item LOG
2799
2800Current (today's) log file output from BackupPC.
2801
2802=item LOG.0 or LOG.0.z
2803
2804Yesterday's log file output.  Log files are aged daily and compressed
2805(if compression is enabled), and old LOG files are deleted.
2806
2807=item status.pl
2808
2809A summary of BackupPC's status written periodically by BackupPC so
2810that certain state information can be maintained if BackupPC is
2811restarted.  Should not be edited.
2812
2813=item UserEmailInfo.pl
2814
2815A summary of what email was last sent to each user, and when the
2816last email was sent.  Should not be edited.
2817
2818=back
2819
2820=item __RUNDIR__
2821
2822The directory __RUNDIR__ (__TOPDIR__/log on pre-FHS versions
2823of BackupPC) contains:
2824
2825=over 4
2826
2827=item BackupPC.pid
2828
2829Contains BackupPC's process id.
2830
2831=item BackupPC.sock
2832
2833A unix domain socket for communicating to the BackupPC server.
2834
2835=back
2836
2837=item __TOPDIR__
2838
2839All of BackupPC's data (PC backup images, logs, configuration information)
2840is stored below this directory.
2841
2842Below __TOPDIR__ are several directories:
2843
2844=over 4
2845
2846=item __TOPDIR__/pool
2847
2848All uncompressed files from PC backups are stored below __TOPDIR__/pool.
2849Each file's name is based on the MD5 hex digest of the file contents.
2850
2851For V4+, the digest is the MD5 digest of the full file contents (the length
2852is not used).  For V4+ the pool files are stored in a 2 level tree, using
28537 bits from the top of the first two bytes of the digest.  So there are 128
2854directories are each level, numbered evenly in hex from 0x00, 0x02, to 0xfe.
2855
2856For example, if a file has an MD5 digest of 123456789abcdef0123456789abcdef0,
2857the uncompressed file is stored in __TOPDIR__/pool/12/34/123456789abcdef0123456789abcdef0.
2858
2859Duplicates digest are represented with one (or more) hex byte extensions.
2860So three colliding files would be stored as
2861
2862	__TOPDIR__/pool/12/34/123456789abcdef0123456789abcdef0
2863	__TOPDIR__/pool/12/34/123456789abcdef0123456789abcdef000
2864	__TOPDIR__/pool/12/34/123456789abcdef0123456789abcdef001
2865
2866The rest of this section describes the old pool layout.  Note that both V3 and V4
2867pools can exist together, since they use different names for their directory trees.
2868
2869As exampled earlier, prior to V4 the digest is computed as follows.
2870For files less than 256K, the file length and the entire
2871file is used. For files up to 1MB, the file length and the first and
2872last 128K are used. Finally, for files longer than 1MB, the file length,
2873and the first and eighth 128K chunks for the file are used.
2874
2875Both BackupPC_dump (actually, BackupPC_tarExtract or rsync_bpc) are
2876responsible for checking newly backed up files against the pool. For
2877each file, the MD5 digest is used to generate a filename in the pool
2878directory.
2879
2880If the file exists in the pool, the contents are compared.
2881If there is no match, additional files in the chain are checked (if any).
2882(Actually, multiple candidate files are compared in parallel.)
2883
2884If $Conf{PoolV3Enabled} is set, then the V3 pool is checked
2885if there are no matches in the V4 pool.  If a V3 file matches, it is
2886simply moved (renamed) the the V4 pool with it's new filename based on
2887the V4 digest.  That still allows the V3 backups to be browsed etc, since
2888those backups are still based on hardlinks.
2889
2890If the file contents exactly match, a reference count is incremented.
2891Otherwise, the file is added to the pool by using an atomic link operation,
2892followed by unlinking the temporary file.
2893
2894One other issue: zero length files are not pooled, since there are a lot
2895of these files and on most file systems it doesn't save any disk space
2896to turn these files into hard links.
2897
2898Prior to V4, each pool file is stored in a subdirectory X/Y/Z, where X,
2899Y, Z are the first 3 hex digits of the MD5 digest.
2900
2901For example, if a file has an MD5 digest of 123456789abcdef0123456789abcdef0,
2902the file is stored in __TOPDIR__/pool/1/2/3/123456789abcdef0123456789abcdef0.
2903
2904The MD5 digest might not be unique (especially since not all the file's
2905contents are used for files bigger than 256K). Different files that have
2906the same MD5 digest are stored with a trailing suffix "_n" where n is
2907an incrementing number starting at 0. So, for example, if two additional
2908files were identical to the first, except the last byte was different,
2909and assuming the file was larger than 1MB (so the MD5 digests are the
2910same but the files are actually different), the three files would be
2911stored as:
2912
2913	__TOPDIR__/pool/1/2/3/123456789abcdef0123456789abcdef0
2914	__TOPDIR__/pool/1/2/3/123456789abcdef0123456789abcdef0_0
2915	__TOPDIR__/pool/1/2/3/123456789abcdef0123456789abcdef0_1
2916
2917=item __TOPDIR__/cpool
2918
2919All compressed files from PC backups are stored below __TOPDIR__/cpool.
2920Its layout is the same as __TOPDIR__/pool, and the hashing function
2921is the same (and, importantly, based on the uncompressed file, not
2922the compressed file).
2923
2924=item __TOPDIR__/pc/$host
2925
2926For each PC $host, all the backups for that PC are stored below
2927the directory __TOPDIR__/pc/$host.  This directory contains the
2928following files:
2929
2930=over 4
2931
2932=item LOG
2933
2934Current log file for this PC from BackupPC_dump.
2935
2936=item LOG.MMYYYY or LOG.MMYYYY.z
2937
2938Last month's log file.  Log files are aged monthly and compressed
2939(if compression is enabled), and old LOG files are deleted.
2940In earlier versions of BackupPC these files used to have
2941a suffix of 0, 1, ....
2942
2943=item XferERR or XferERR.z
2944
2945Output from the transport program (ie: smbclient, tar, rsync or ftp)
2946for the most recent failed backup.
2947
2948=item XferLOG or XferLOG.z
2949
2950Output from the transport program (ie: smbclient, tar, rsync or ftp)
2951for the current backup.
2952
2953=item nnn (an integer)
2954
2955Backups are in directories numbered sequentially starting at 0.  Below
2956each backup directory are the inodes (in nnn/inode) and the reference
2957counts for this backup are in nnn/refCnt.
2958
2959=item refCnt
2960
2961The host's reference count database is stored below the refCnt directory.
2962
2963=item XferLOG.nnn or XferLOG.nnn.z
2964
2965Output from the transport program (ie: smbclient, tar, rsync or ftp)
2966corresponding to backup number nnn.
2967
2968=item RestoreInfo.nnn
2969
2970Information about restore request #nnn including who, what, when, and
2971why. This file is in Data::Dumper format.  (Note that the restore
2972numbers are not related to the backup number.)
2973
2974=item RestoreLOG.nnn.z
2975
2976Output from smbclient, tar or rsync during restore #nnn.  (Note that the restore
2977numbers are not related to the backup number.)
2978
2979=item ArchiveInfo.nnn
2980
2981Information about archive request #nnn including who, what, when, and
2982why. This file is in Data::Dumper format.  (Note that the archive
2983numbers are not related to the restore or backup number.)
2984
2985=item ArchiveLOG.nnn.z
2986
2987Output from archive #nnn.  (Note that the archive numbers are not related
2988to the backup or restore number.)
2989
2990=item config.pl
2991
2992Old location of optional configuration settings specific to this host.
2993Settings in this file override the main configuration file.
2994In new versions of BackupPC the per-host configuration files are
2995stored in __CONFDIR__/pc/HOST.pl.
2996
2997=item backups
2998
2999A tab-delimited ascii table listing information about each successful
3000backup, one per row.  The columns are:
3001
3002=over 4
3003
3004=item num
3005
3006The backup number, an integer that starts at 0 and increments
3007for each successive backup.  The corresponding backup is stored
3008in the directory num (eg: if this field is 5, then the backup is
3009stored in __TOPDIR__/pc/$host/5).
3010
3011=item type
3012
3013Set to "full" or "incr" for full or incremental backup.
3014
3015=item startTime
3016
3017Start time of the backup in unix seconds.
3018
3019=item endTime
3020
3021Stop time of the backup in unix seconds.
3022
3023=item nFiles
3024
3025Number of files backed up (as reported by smbclient, tar, rsync or ftp).
3026
3027=item size
3028
3029Total file size backed up (as reported by smbclient, tar, rsync or ftp).
3030
3031=item nFilesExist
3032
3033Number of files that were already in the pool
3034(as determined by BackupPC_dump).
3035
3036=item sizeExist
3037
3038Total size of files that were already in the pool
3039(as determined by BackupPC_dump).
3040
3041=item nFilesNew
3042
3043Number of files that were not in the pool
3044(as determined by BackupPC_dump).
3045
3046=item sizeNew
3047
3048Total size of files that were not in the pool
3049(as determined by BackupPC_dump).
3050
3051=item xferErrs
3052
3053Number of errors or warnings from smbclient, tar, rsync or ftp.
3054
3055=item xferBadFile
3056
3057Number of errors from smbclient that were bad file errors (zero otherwise).
3058
3059=item xferBadShare
3060
3061Number of errors from smbclient that were bad share errors (zero otherwise).
3062
3063=item tarErrs
3064
3065Number of errors from BackupPC_tarExtract.
3066
3067=item compress
3068
3069The compression level used on this backup.  Zero or empty means no
3070compression.
3071
3072=item sizeExistComp
3073
3074Total compressed size of files that were already in the pool
3075(as determined by BackupPC_dump).
3076
3077=item sizeNewComp
3078
3079Total compressed size of files that were not in the pool
3080(as determined by BackupPC_dump).
3081
3082=item noFill
3083
3084Set if this backup has not been filled - it just includes the
3085deltas from the next backup necessary to reconstruct this backup.
3086
3087=item fillFromNum
3088
3089If this backup was filled (ie: noFill is 0) then this is the
3090number of the backup that it was filled from
3091
3092=item mangle
3093
3094Set if this backup has mangled filenames and attributes.  Always
3095true for backups in v1.4.0 and above.  False for all backups prior
3096to v1.4.0.
3097
3098=item xferMethod
3099
3100Set to the value of $Conf{XferMethod} when this dump was done.
3101
3102=item level
3103
3104The level of this dump.  A full dump is level 0.  Currently incrementals
3105are 1.  In V4+ multi-level incrementals are no longer supported, so this
3106is just a 0 or 1.
3107
3108=back
3109
3110=item restores
3111
3112A tab-delimited ascii table listing information about each requested
3113restore, one per row.  The columns are:
3114
3115=over 4
3116
3117=item num
3118
3119Restore number (matches the suffix of the RestoreInfo.nnn and
3120RestoreLOG.nnn.z file), unrelated to the backup number.
3121
3122=item startTime
3123
3124Start time of the restore in unix seconds.
3125
3126=item endTime
3127
3128End time of the restore in unix seconds.
3129
3130=item result
3131
3132Result (ok or failed).
3133
3134=item errorMsg
3135
3136Error message if restore failed.
3137
3138=item nFiles
3139
3140Number of files restored.
3141
3142=item size
3143
3144Size in bytes of the restored files.
3145
3146=item tarCreateErrs
3147
3148Number of errors from BackupPC_tarCreate during restore.
3149
3150=item xferErrs
3151
3152Number of errors from smbclient, tar, rsync or ftp during restore.
3153
3154=back
3155
3156=item archives
3157
3158A tab-delimited ascii table listing information about each requested
3159archive, one per row.  The columns are:
3160
3161=over 4
3162
3163=item num
3164
3165Archive number (matches the suffix of the ArchiveInfo.nnn and
3166ArchiveLOG.nnn.z file), unrelated to the backup or restore number.
3167
3168=item startTime
3169
3170Start time of the restore in unix seconds.
3171
3172=item endTime
3173
3174End time of the restore in unix seconds.
3175
3176=item result
3177
3178Result (ok or failed).
3179
3180=item errorMsg
3181
3182Error message if archive failed.
3183
3184=back
3185
3186=back
3187
3188=back
3189
3190=back
3191
3192=head2 Compressed file format
3193
3194The compressed file format is as generated by Compress::Zlib::deflate
3195with one minor, but important, tweak. Since Compress::Zlib::inflate
3196fully inflates its argument in memory, it could take large amounts of
3197memory if it was inflating a highly compressed file. For example, a
3198200MB file of 0x0 bytes compresses to around 200K bytes. If
3199Compress::Zlib::inflate was called with this single 200K buffer, it
3200would need to allocate 200MB of memory to return the result.
3201
3202BackupPC watches how efficiently a file is compressing. If a big file
3203has very high compression (meaning it will use too much memory when it
3204is inflated), BackupPC calls the flush() method, which gracefully
3205completes the current compression.  BackupPC then starts another
3206deflate and simply appends the output file.  So the BackupPC compressed
3207file format is one or more concatenated deflations/flushes.  The specific
3208ratios that BackupPC uses is that if a 6MB chunk compresses to less
3209than 64K then a flush will be done.
3210
3211Back to the example of the 200MB file of 0x0 bytes.  Adding flushes
3212every 6MB adds only 200 or so bytes to the 200K output.  So the
3213storage cost of flushing is negligible.
3214
3215To easily decompress a BackupPC compressed file, the script
3216BackupPC_zcat can be found in __INSTALLDIR__/bin.  For each
3217filename argument it inflates the file and writes it to stdout.
3218
3219=head2 Rsync checksum caching
3220
3221Rsync checksum caching is not implemented in V4. That's because a full
3222backup with rsync in V4 uses client-side whole-file checksums during a full
3223backup, meaning that the server doesn't need to send block-level digests on
3224every full backup.
3225
3226The rest of this section applies to V3.
3227
3228An incremental backup with rsync compares attributes on the client
3229with the last full backup.  Any files with identical attributes
3230are skipped.  In V3, a full backup with rsync sets the --ignore-times
3231option, which causes every file to be examined independent of
3232attributes.
3233
3234Each file is examined by generating block checksums (default 2K
3235blocks) on the receiving side (that's the BackupPC side), sending
3236those checksums to the client, where the remote rsync matches those
3237checksums with the corresponding file.  The matching blocks and new
3238data is sent back, allowing the client file to be reassembled.
3239A checksum for the entire file is sent to as an extra check the
3240the reconstructed file is correct.
3241
3242This results in significant disk IO and computation for BackupPC:
3243every file in a full backup, or any file with non-matching attributes
3244in an incremental backup, needs to be uncompressed, block checksums
3245computed and sent.  Then the receiving side reassembles the file and
3246has to verify the whole-file checksum.  Even if the file is identical,
3247prior to 2.1.0, BackupPC had to read and uncompress the file twice,
3248once to compute the block checksums and later to verify the whole-file
3249checksum.
3250
3251=head2 Filename mangling
3252
3253Backup filenames are stored in "mangled" form. Each node of
3254a path is preceded by "f" (mnemonic: file), and special characters
3255(\n, \r, % and /) are URI-encoded as "%xx", where xx is the ascii
3256character's hex value.  So c:/craig/example.txt is now stored as
3257fc/fcraig/fexample.txt.
3258
3259This was done mainly so metadata could be stored alongside the backup
3260files without name collisions. In particular, the attributes for the
3261files in a directory are stored in a file called "attrib", and mangling
3262avoids filename collisions (I discarded the idea of having a duplicate
3263directory tree for every backup just to store the attributes). Other
3264metadata (eg: rsync checksums) could be stored in filenames preceded
3265by, eg, "c". There are two other benefits to mangling: the share name
3266might contain "/" (eg: "/home/craig" for tar transport), and I wanted
3267that represented as a single level in the storage tree.
3268
3269The CGI script undoes the mangling, so it is invisible to the user.
3270
3271=head2 Special files
3272
3273Linux/unix file systems support several special file types: symbolic
3274links, character and block device files, fifos (pipes) and unix-domain
3275sockets. All except unix-domain sockets are supported by BackupPC
3276(there's no point in backing up or restoring unix-domain sockets since
3277they only have meaning after a process creates them). Symbolic links are
3278stored as a plain file whose contents are the contents of the link (not
3279the file it points to). This file is compressed and pooled like any
3280normal file. Character and block device files are also stored as plain
3281files, whose contents are two integers separated by a comma; the numbers
3282are the major and minor device number. These files are compressed and
3283pooled like any normal file. Fifo files are stored as empty plain files
3284(which are not pooled since they have zero size). In all cases, the
3285original file type is stored in the attrib file so it can be correctly
3286restored.
3287
3288Hardlinks are supported.  In V4, file metadata include an inode number
3289and a link count.  Any file with more than one link points at the inode
3290information stored below the backup directory in the inode directory.
3291That directory contains a tree of up to 16K attrib files based on bits
329210-23 of the inode number.  In particular, the directory name uses bits
329317-23, and the attrib filename includes bits 10-16.  The key (index) in
3294the attrib file is the hex inode number.  The original file metadata's
3295link count might not be accurate; it's more a flag (>1) for when to look
3296up the inode information.  The correct link count is stored in the inode.
3297
3298In V3, hardlinks are stored in a similar manner to symlinks.  When GNU
3299tar first encounters a file with more than one link (ie: hardlinks)
3300it dumps it as a regular file.  When it sees the second and subsequent
3301hardlinks to the same file, it dumps just the hardlink information.
3302BackupPC correctly recognizes these hardlinks and stores them just like
3303symlinks: a regular text file whose contents is the path of the file
3304linked to.  The CGI script will download the original file when you
3305click on a hardlink.
3306
3307Also, BackupPC_tarCreate has enough magic to re-create the hardlinks
3308dynamically based on whether or not the original file and hardlinks
3309are both included in the tar file.  For example, imagine a/b/x is a
3310hardlink to a/c/y.  If you use BackupPC_tarCreate to restore directory
3311a, then the tar file will include a/b/x as the original file and a/c/y
3312will be a hardlink to a/b/x.  If, instead you restore a/c, then the
3313tar file will include a/c/y as the original file, not a hardlink.
3314
3315=head2 Attribute file format
3316
3317=over 4
3318
3319=item V4 attrib files
3320
3321The attribute file format is new in V4.  Every backup directory contains
3322an attrib file, which is zero length and its name includes the MD5 pool
3323digest, eg:
3324
3325    attrib_33fe8f9ae2f5cedbea63b9d3ea767ac0
3326
3327The digest is used to look up the contents in the V4 cpool, eg:
3328
3329    __TOPDIR__/cpool/32/fe/33fe8f9ae2f5cedbea63b9d3ea767ac0
3330
3331For inode attrib files, bits 17-23 (XX in hex) of the inode number are used for the
3332directory name, and the attrib filename includes bits 10-16 (YY in hex), so
3333relative to the backup directory:
3334
3335    inode/XX/attribYY_33fe8f9ae2f5cedbea63b9d3ea767ac0
3336
3337An empty attrib file has the name "attrib_0" (or "attribYY_0" for inodes).
3338
3339The attrib file starts with a magic number, followed by the concatenation
3340of the following information for each file (all integers are stored in
3341perl's pack "w" format (variable length base 128)):
3342
3343=over 4
3344
3345=item *
3346
3347Filename length, followed by the filename
3348
3349=item *
3350
3351Count of extended attributes
3352
3353=item *
3354
3355The unix file type, mtime, mode, uid, gid, size, inode number, compress,
3356number of links
3357
3358=item *
3359
3360MD5 digest length, followed by the digest contents
3361
3362=item *
3363
3364Each extended attribute (length of xattr name, length of xattr value, name, value)
3365
3366=back
3367
3368=item V3 attrib files
3369
3370The unix attributes for the contents of a directory (all the files and
3371directories in that directory) are stored in a file called attrib.
3372There is a single attrib file for each directory in a backup.
3373For example, if c:/craig contains a single file c:/craig/example.txt,
3374that file would be stored as fc/fcraig/fexample.txt and there would be an
3375attribute file in fc/fcraig/attrib (and also fc/attrib and ./attrib).
3376The file fc/fcraig/attrib would contain a single entry containing the
3377attributes for fc/fcraig/fexample.txt.
3378
3379The attrib file starts with a magic number, followed by the
3380concatenation of the following information for each file:
3381
3382=over 4
3383
3384=item *
3385
3386Filename length in perl's pack "w" format (variable length base 128).
3387
3388=item *
3389
3390Filename.
3391
3392=item *
3393
3394The unix file type, mode, uid, gid and file size divided by 4GB and
3395file size modulo 4GB (type mode uid gid sizeDiv4GB sizeMod4GB),
3396in perl's pack "w" format (variable length base 128).
3397
3398=item *
3399
3400The unix mtime (unix seconds) in perl's pack "N" format (32 bit integer).
3401
3402=back
3403
3404The attrib file is also compressed if compression is enabled.
3405See the lib/BackupPC/Attrib.pm module for full details.
3406
3407Attribute files are pooled just like normal backup files.  This saves
3408space if all the files in a directory have the same attributes across
3409multiple backups, which is common.
3410
3411=back
3412
3413=head2 Optimizations
3414
3415BackupPC doesn't care about the access time of files in the pool
3416since it saves attribute metadata separate from the files.  Since
3417BackupPC mostly does reads from disk, maintaining the access time of
3418files generates a lot of unnecessary disk writes.  So, provided
3419BackupPC has a dedicated data disk, you should consider mounting
3420BackupPC's data directory with the noatime (or, with Linux kernels
3421>=2.6.20, relatime) attribute (see mount(1)).
3422
3423=head2 Some Limitations
3424
3425BackupPC isn't perfect (but it is getting better). Please see
3426L<http://backuppc.sourceforge.net/faq/limitations.html> for a
3427discussion of some of BackupPC's limitations.
3428(Note, this is old and we should move this to the Github Wiki.)
3429
3430=head2 Security issues
3431
3432Please see L<http://backuppc.sourceforge.net/faq/security.html> for a
3433discussion of some of various security issues.
3434(Note, this is old and we should move this to the Github Wiki.)
3435
3436=head1 Configuration File
3437
3438The BackupPC configuration file resides in __CONFDIR__/config.pl.
3439Optional per-PC configuration files reside in __CONFDIR__/pc/$host.pl
3440(or __TOPDIR__/pc/$host/config.pl in non-FHS versions of BackupPC).
3441This file can be used to override settings just for a particular PC.
3442
3443=head2 Modifying the main configuration file
3444
3445The configuration file is a perl script that is executed by BackupPC, so
3446you should be careful to preserve the file syntax (punctuation, quotes
3447etc) when you edit it. Specifically, preserving quotes means you should never
3448use undef for configuration parameters that expect string values. An empty
3449string ('') should be used in this case.
3450It is recommended that you use CVS, RCS or some
3451other method of source control for changing config.pl.
3452
3453BackupPC reads or re-reads the main configuration file and
3454the hosts file in three cases:
3455
3456=over 4
3457
3458=item *
3459
3460Upon startup.
3461
3462=item *
3463
3464When BackupPC is sent a HUP (-1) signal.  Assuming you installed the
3465init.d script, you can also do this with "/etc/init.d/backuppc reload".
3466
3467=item *
3468
3469When the modification time of config.pl file changes.  BackupPC
3470checks the modification time once during each regular wakeup.
3471
3472=back
3473
3474Whenever you change the configuration file you can either do
3475a kill -HUP BackupPC_pid or simply wait until the next regular
3476wakeup period.
3477
3478Each time the configuration file is re-read a message is reported in the
3479LOG file, so you can tail it (or view it via the CGI interface) to make
3480sure your kill -HUP worked. Errors in parsing the configuration file are
3481also reported in the LOG file.
3482
3483The optional per-PC configuration file (__CONFDIR__/pc/$host.pl or
3484__TOPDIR__/pc/$host/config.pl in non-FHS versions of BackupPC)
3485is read whenever it is needed by BackupPC_dump, BackupPC_restore and others.
3486
3487=head1 Configuration Parameters
3488
3489The configuration parameters are divided into five general groups.
3490The first group (general server configuration) provides general
3491configuration for BackupPC.  The next two groups describe what to
3492backup, when to do it, and how long to keep it.  The fourth group
3493are settings for email reminders, and the final group contains
3494settings for the CGI interface.
3495
3496All configuration settings in the second through fifth groups can
3497be overridden by the per-PC config.pl file.
3498
3499__CONFIGPOD__
3500
3501=head1 Version Numbers
3502
3503BackupPC uses a X.Y.Z version numbering system.  The first digit is for
3504major new releases, the middle digit is for significant feature releases
3505and improvements (most of the releases have been in this category).
3506
3507=head1 Author
3508
3509Craig Barratt  <cbarratt@users.sourceforge.net>
3510
3511See L<https://backuppc.github.io/backuppc/BackupPC.html>.
3512
3513=head1 Copyright
3514
3515Copyright (C) 2001-2020 Craig Barratt
3516
3517=head1 Credits
3518
3519Ryan Kucera contributed the directory navigation code and images
3520for v1.5.0.  He contributed the first skeleton of BackupPC_restore.
3521He also added a significant revision to the CGI interface, including
3522CSS tags, in v2.1.0, and designed the BackupPC logo.
3523
3524Xavier Nicollet, with additions from Guillaume Filion, added the
3525internationalization (i18n) support to the CGI interface for v2.0.0.
3526Xavier provided the French translation fr.pm, with additions from
3527Guillaume.
3528
3529Guillaume Filion wrote BackupPC_zipCreate and added the CGI support
3530for zip download, in addition to some CGI cleanup, for v1.5.0.
3531Guillaume continues to support fr.pm updates for each new version.
3532
3533Josh Marshall implemented the Archive feature in v2.1.0.
3534
3535Ludovic Drolez supports the BackupPC Debian package.
3536
3537Javier Gonzalez provided the Spanish translation, es.pm for v2.0.0.
3538
3539Manfred Herrmann provided the German translation, de.pm for v2.0.0.
3540Manfred continues to support de.pm updates for each new version,
3541together with some help from Ralph Paßgang.
3542
3543Lorenzo Cappelletti provided the Italian translation, it.pm for v2.1.0.
3544Giuseppe Iuculano and Vittorio Macchi updated it for 3.0.0.
3545
3546Lieven Bridts provided the Dutch translation, nl.pm, for v2.1.0,
3547with some tweaks from Guus Houtzager, and updates for 3.0.0.
3548
3549Reginaldo Ferreira provided the Portuguese Brazilian translation
3550pt_br.pm for v2.2.0.
3551
3552Rich Duzenbury provided the RSS feed option to the CGI interface.
3553
3554Jono Woodhouse from CapeSoft Software (www.capesoft.com) provided a
3555new CSS skin for 3.0.0 with several layout improvements.  Sean Cameron
3556(also from CapeSoft) designed new and more compact file icons for 3.0.0.
3557
3558Youlin Feng provided the Chinese translation for 3.1.0.
3559
3560Karol 'Semper' Stelmaczonek provided the Polish translation for 3.1.0.
3561
3562Jeremy Tietsort provided the host summary table sorting feature for 3.1.0.
3563
3564Paul Mantz contributed the ftp Xfer method for 3.2.0.
3565
3566Petr Pokorny provided the Czech translation for 3.2.1.
3567
3568Rikiya Yamamoto provided the Japanese translation for 3.3.0.
3569
3570Yakim provided the Ukrainian translation for 3.3.0.
3571
3572Sergei Butakov provided the Russian translation for 3.3.0.
3573
3574Alexander Moisseev provided the rrdtool graphing code in 4.0.0 and has provided
3575many fixes and improvements in 3.x and 4.x.
3576
3577Many people have provided user support on the mail lists, reported bugs,
3578made useful suggestions, and helped with testing; see the ChangeLog
3579and the mailing lists.
3580
3581Your name could appear here in the next version!
3582
3583=head1 License
3584
3585This program is free software: you can redistribute it and/or modify
3586it under the terms of the GNU General Public License as published by
3587the Free Software Foundation, either version 3 of the License, or
3588(at your option) any later version.
3589
3590This program is distributed in the hope that it will be useful,
3591but WITHOUT ANY WARRANTY; without even the implied warranty of
3592MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
3593GNU General Public License for more details.
3594
3595You should have received a copy of the GNU General Public License
3596along with this program.  If not, see <http://www.gnu.org/licenses/>.
3597