1bup: It backs things up 2======================= 3 4bup is a program that backs things up. It's short for "backup." Can you 5believe that nobody else has named an open source program "bup" after all 6this time? Me neither. 7 8Despite its unassuming name, bup is pretty cool. To give you an idea of 9just how cool it is, I wrote you this poem: 10 11 Bup is teh awesome 12 What rhymes with awesome? 13 I guess maybe possum 14 But that's irrelevant. 15 16Hmm. Did that help? Maybe prose is more useful after all. 17 18 19Reasons bup is awesome 20---------------------- 21 22bup has a few advantages over other backup software: 23 24 - It uses a rolling checksum algorithm (similar to rsync) to split large 25 files into chunks. The most useful result of this is you can backup huge 26 virtual machine (VM) disk images, databases, and XML files incrementally, 27 even though they're typically all in one huge file, and not use tons of 28 disk space for multiple versions. 29 30 - It uses the packfile format from git (the open source version control 31 system), so you can access the stored data even if you don't like bup's 32 user interface. 33 34 - Unlike git, it writes packfiles *directly* (instead of having a separate 35 garbage collection / repacking stage) so it's fast even with gratuitously 36 huge amounts of data. bup's improved index formats also allow you to 37 track far more filenames than git (millions) and keep track of far more 38 objects (hundreds or thousands of gigabytes). 39 40 - Data is "automagically" shared between incremental backups without having 41 to know which backup is based on which other one - even if the backups 42 are made from two different computers that don't even know about each 43 other. You just tell bup to back stuff up, and it saves only the minimum 44 amount of data needed. 45 46 - You can back up directly to a remote bup server, without needing tons of 47 temporary disk space on the computer being backed up. And if your backup 48 is interrupted halfway through, the next run will pick up where you left 49 off. And it's easy to set up a bup server: just install bup on any 50 machine where you have ssh access. 51 52 - Bup can use "par2" redundancy to recover corrupted backups even if your 53 disk has undetected bad sectors. 54 55 - Even when a backup is incremental, you don't have to worry about 56 restoring the full backup, then each of the incrementals in turn; an 57 incremental backup *acts* as if it's a full backup, it just takes less 58 disk space. 59 60 - You can mount your bup repository as a FUSE filesystem and access the 61 content that way, and even export it over Samba. 62 63 - It's written in python (with some C parts to make it faster) so it's easy 64 for you to extend and maintain. 65 66 67Reasons you might want to avoid bup 68----------------------------------- 69 70 - It's not remotely as well tested as something like tar, so it's 71 more likely to eat your data. It's also missing some 72 probably-critical features, though fewer than it used to be. 73 74 - It requires python >= 2.6, a C compiler, and an installed git 75 version >= 1.5.6. It also requires par2 if you want fsck to be 76 able to generate the information needed to recover from some types 77 of corruption. 78 79 - It currently only works on Linux, FreeBSD, NetBSD, OS X >= 10.4, 80 Solaris, or Windows (with Cygwin, and maybe with WSL). Patches to 81 support other platforms are welcome. 82 83 - Until resolved, a [glibc bug](https://sourceware.org/bugzilla/show_bug.cgi?id=26034) 84 might cause bup to crash on startup for some (unusual) command line 85 argument values, when bup is configured to use Python 3. 86 87 - Any items in "Things that are stupid" below. 88 89 90Notable changes introduced by a release 91======================================= 92 93 - <a href="note/0.31-from-0.30.1.md">Changes in 0.31 as compared to 0.30.1</a> 94 - <a href="note/0.30.1-from-0.30.md">Changes in 0.30.1 as compared to 0.30</a> 95 - <a href="note/0.30-from-0.29.3.md">Changes in 0.30 as compared to 0.29.3</a> 96 - <a href="note/0.29.3-from-0.29.2.md">Changes in 0.29.3 as compared to 0.29.2</a> 97 - <a href="note/0.29.2-from-0.29.1.md">Changes in 0.29.2 as compared to 0.29.1</a> 98 - <a href="note/0.29.1-from-0.29.md">Changes in 0.29.1 as compared to 0.29</a> 99 - <a href="note/0.29-from-0.28.1.md">Changes in 0.29 as compared to 0.28.1</a> 100 - <a href="note/0.28.1-from-0.28.md">Changes in 0.28.1 as compared to 0.28</a> 101 - <a href="note/0.28-from-0.27.1.md">Changes in 0.28 as compared to 0.27.1</a> 102 - <a href="note/0.27.1-from-0.27.md">Changes in 0.27.1 as compared to 0.27</a> 103 104 105Test status 106=========== 107 108| branch | Debian | FreeBSD | macOS | 109|--------|------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------| 110| master | [![Debian test status](https://api.cirrus-ci.com/github/bup/bup.svg?branch=master&task=debian)](https://cirrus-ci.com/github/bup/bup) | [![FreeBSD test status](https://api.cirrus-ci.com/github/bup/bup.svg?branch=master&task=freebsd)](https://cirrus-ci.com/github/bup/bup) | [![macOS test status](https://api.cirrus-ci.com/github/bup/bup.svg?branch=master&task=macos)](https://cirrus-ci.com/github/bup/bup) | 111| 0.30.x | [![Debian test status](https://api.cirrus-ci.com/github/bup/bup.svg?branch=0.30.x&task=debian)](https://cirrus-ci.com/github/bup/bup) | [![FreeBSD test status](https://api.cirrus-ci.com/github/bup/bup.svg?branch=0.30.x&task=freebsd)](https://cirrus-ci.com/github/bup/bup) | [![macOS test status](https://api.cirrus-ci.com/github/bup/bup.svg?branch=0.30.x&task=macos)](https://cirrus-ci.com/github/bup/bup) | 112| 0.29.x | [![Debian test status](https://api.cirrus-ci.com/github/bup/bup.svg?branch=0.29.x&task=debian)](https://cirrus-ci.com/github/bup/bup) | [![FreeBSD test status](https://api.cirrus-ci.com/github/bup/bup.svg?branch=0.29.x&task=freebsd)](https://cirrus-ci.com/github/bup/bup) | [![macOS test status](https://api.cirrus-ci.com/github/bup/bup.svg?branch=0.29.x&task=macos)](https://cirrus-ci.com/github/bup/bup) | 113 114Getting started 115=============== 116 117From source 118----------- 119 120 - Check out the bup source code using git: 121 122 ```sh 123 git clone https://github.com/bup/bup 124 ``` 125 126 - This will leave you on the master branch, which is perfect if you 127 would like to help with development, but if you'd just like to use 128 bup, please check out the latest stable release like this: 129 130 ```sh 131 git checkout 0.29.1 132 ``` 133 134 You can see the latest stable release here: 135 https://github.com/bup/bup/releases. 136 137 - Install the required python libraries (including the development 138 libraries). 139 140 On very recent Debian/Ubuntu versions, this may be sufficient (run 141 as root): 142 143 ```sh 144 apt-get build-dep bup 145 ``` 146 147 Otherwise try this (substitute python2.6-dev if you have an older 148 system): 149 150 ```sh 151 apt-get install python2.7-dev python-fuse 152 apt-get install python-pyxattr 153 apt-get install pkg-config linux-libc-dev libacl1-dev 154 apt-get install acl attr 155 apt-get install libreadline-dev # optional (bup ftp) 156 apt-get install python-tornado # optional (bup web) 157 ``` 158 159 On CentOS (for CentOS 6, at least), this should be sufficient (run 160 as root): 161 162 ```sh 163 yum groupinstall "Development Tools" 164 yum install python python-devel libacl-devel 165 yum install fuse-python pyxattr 166 yum install perl-Time-HiRes 167 yum install readline-devel # optional (bup ftp) 168 yum install python-tornado # optional (bup web) 169 ``` 170 171 In addition to the default CentOS repositories, you may need to add 172 RPMForge (for fuse-python) and EPEL (for pyxattr). 173 174 On Cygwin, install python, make, rsync, and gcc4. 175 176 If you would like to use the optional bup web server on systems 177 without a tornado package, you may want to try this: 178 179 ```sh 180 pip install tornado 181 ``` 182 183 - Build the python module and symlinks: 184 185 ```sh 186 make 187 ``` 188 189 - Run the tests: 190 191 ```sh 192 make long-check 193 ``` 194 195 or if you're in a bit more of a hurry: 196 197 ```sh 198 make check 199 ``` 200 201 The tests should pass. If they don't pass for you, stop here and 202 send an email to bup-list@googlegroups.com. Though if there are 203 symbolic links along the current working directory path, the tests 204 may fail. Running something like this before "make test" should 205 sidestep the problem: 206 207 ```sh 208 cd "$(pwd -P)" 209 ``` 210 211 - You can install bup via "make install", and override the default 212 destination with DESTDIR and PREFIX. 213 214 Files are normally installed to "$DESTDIR/$PREFIX" where DESTDIR is 215 empty by default, and PREFIX is set to /usr/local. So if you wanted to 216 install bup to /opt/bup, you might do something like this: 217 218 ```sh 219 make install DESTDIR=/opt/bup PREFIX='' 220 ``` 221 222 - The Python executable that bup will use is chosen by ./configure, 223 which will search for a reasonable version unless PYTHON is set in 224 the environment, in which case, bup will use that path. You can 225 see which Python executable was chosen by looking at the 226 configure output, or examining cmd/python-cmd.sh, and you can 227 change the selection by re-running ./configure. 228 229From binary packages 230-------------------- 231 232Binary packages of bup are known to be built for the following OSes: 233 234 - Debian: 235 http://packages.debian.org/search?searchon=names&keywords=bup 236 - Ubuntu: 237 http://packages.ubuntu.com/search?searchon=names&keywords=bup 238 - pkgsrc (NetBSD, Dragonfly, and others) 239 http://pkgsrc.se/sysutils/bup 240 http://cvsweb.netbsd.org/bsdweb.cgi/pkgsrc/sysutils/bup/ 241 - Arch Linux: 242 https://www.archlinux.org/packages/?sort=&q=bup 243 - Fedora: 244 https://apps.fedoraproject.org/packages/bup 245 246 247Using bup 248--------- 249 250 - Get help for any bup command: 251 252 ```sh 253 bup help 254 bup help init 255 bup help index 256 bup help save 257 bup help restore 258 ... 259 ``` 260 261 - Initialize the default BUP_DIR (~/.bup -- you can choose another by 262 either specifying `bup -d DIR ...` or setting the `BUP_DIR` 263 environment variable for a command): 264 265 ```sh 266 bup init 267 ``` 268 269 - Make a local backup (-v or -vv will increase the verbosity): 270 271 ```sh 272 bup index /etc 273 bup save -n local-etc /etc 274 ``` 275 276 - Restore a local backup to ./dest: 277 278 ```sh 279 bup restore -C ./dest local-etc/latest/etc 280 ls -l dest/etc 281 ``` 282 283 - Look at how much disk space your backup took: 284 285 ```sh 286 du -s ~/.bup 287 ``` 288 289 - Make another backup (which should be mostly identical to the last one; 290 notice that you don't have to *specify* that this backup is incremental, 291 it just saves space automatically): 292 293 ```sh 294 bup index /etc 295 bup save -n local-etc /etc 296 ``` 297 298 - Look how little extra space your second backup used (on top of the first): 299 300 ```sh 301 du -s ~/.bup 302 ``` 303 304 - Get a list of your previous backups: 305 306 ```sh 307 bup ls local-etc 308 ``` 309 310 - Restore your first backup again: 311 312 ```sh 313 bup restore -C ./dest-2 local-etc/2013-11-23-11195/etc 314 ``` 315 316 - Make a backup to a remote server which must already have the 'bup' command 317 somewhere in its PATH (see /etc/profile, etc/environment, ~/.profile, or 318 ~/.bashrc), and be accessible via ssh. 319 Make sure to replace SERVERNAME with the actual hostname of your server: 320 321 ```sh 322 bup init -r SERVERNAME:path/to/remote-bup-dir 323 bup index /etc 324 bup save -r SERVERNAME:path/to/remote-bup-dir -n local-etc /etc 325 ``` 326 327 - Make a remote backup to ~/.bup on SERVER: 328 329 ```sh 330 bup index /etc 331 bup save -r SERVER: -n local-etc /etc 332 ``` 333 334 - See what saves are available in ~/.bup on SERVER: 335 336 ```sh 337 bup ls -r SERVER: 338 ``` 339 340 - Restore the remote backup to ./dest: 341 342 ```sh 343 bup restore -r SERVER: -C ./dest local-etc/latest/etc 344 ls -l dest/etc 345 ``` 346 347 - Defend your backups from death rays (OK fine, more likely from the 348 occasional bad disk block). This writes parity information 349 (currently via par2) for all of the existing data so that bup may 350 be able to recover from some amount of repository corruption: 351 352 ```sh 353 bup fsck -g 354 ``` 355 356 - Use split/join instead of index/save/restore. Try making a local 357 backup using tar: 358 359 ```sh 360 tar -cvf - /etc | bup split -n local-etc -vv 361 ``` 362 363 - Try restoring the tarball: 364 365 ```sh 366 bup join local-etc | tar -tf - 367 ``` 368 369 - Look at how much disk space your backup took: 370 371 ```sh 372 du -s ~/.bup 373 ``` 374 375 - Make another tar backup: 376 377 ```sh 378 tar -cvf - /etc | bup split -n local-etc -vv 379 ``` 380 381 - Look at how little extra space your second backup used on top of 382 the first: 383 384 ```sh 385 du -s ~/.bup 386 ``` 387 388 - Restore the first tar backup again (the ~1 is git notation for "one 389 older than the most recent"): 390 391 ```sh 392 bup join local-etc~1 | tar -tf - 393 ``` 394 395 - Get a list of your previous split-based backups: 396 397 ```sh 398 GIT_DIR=~/.bup git log local-etc 399 ``` 400 401 - Save a tar archive to a remote server (without tar -z to facilitate 402 deduplication): 403 404 ```sh 405 tar -cvf - /etc | bup split -r SERVERNAME: -n local-etc -vv 406 ``` 407 408 - Restore the archive: 409 410 ```sh 411 bup join -r SERVERNAME: local-etc | tar -tf - 412 ``` 413 414That's all there is to it! 415 416 417Notes on FreeBSD 418---------------- 419 420- FreeBSD's default 'make' command doesn't like bup's Makefile. In order to 421 compile the code, run tests and install bup, you need to install GNU Make 422 from the port named 'gmake' and use its executable instead in the commands 423 seen above. (i.e. 'gmake test' runs bup's test suite) 424 425- Python's development headers are automatically installed with the 'python' 426 port so there's no need to install them separately. 427 428- To use the 'bup fuse' command, you need to install the fuse kernel module 429 from the 'fusefs-kmod' port in the 'sysutils' section and the libraries from 430 the port named 'py-fusefs' in the 'devel' section. 431 432- The 'par2' command can be found in the port named 'par2cmdline'. 433 434- In order to compile the documentation, you need pandoc which can be found in 435 the port named 'hs-pandoc' in the 'textproc' section. 436 437 438Notes on NetBSD/pkgsrc 439---------------------- 440 441 - See pkgsrc/sysutils/bup, which should be the most recent stable 442 release and includes man pages. It also has a reasonable set of 443 dependencies (git, par2, py-fuse-bindings). 444 445 - The "fuse-python" package referred to is hard to locate, and is a 446 separate tarball for the python language binding distributed by the 447 fuse project on sourceforge. It is available as 448 pkgsrc/filesystems/py-fuse-bindings and on NetBSD 5, "bup fuse" 449 works with it. 450 451 - "bup fuse" presents every directory/file as inode 0. The directory 452 traversal code ("fts") in NetBSD's libc will interpret this as a 453 cycle and error out, so "ls -R" and "find" will not work. 454 455 - There is no support for ACLs. If/when some enterprising person 456 fixes this, adjust t/compare-trees. 457 458 459Notes on Cygwin 460--------------- 461 462 - There is no support for ACLs. If/when some enterprising person 463 fixes this, adjust t/compare-trees. 464 465 - In t/test.sh, two tests have been disabled. These tests check to 466 see that repeated saves produce identical trees and that an 467 intervening index doesn't change the SHA1. Apparently Cygwin has 468 some unusual behaviors with respect to access times (that probably 469 warrant further investigation). Possibly related: 470 http://cygwin.com/ml/cygwin/2007-06/msg00436.html 471 472 473Notes on OS X 474------------- 475 476 - There is no support for ACLs. If/when some enterprising person 477 fixes this, adjust t/compare-trees. 478 479 480How it works 481============ 482 483Basic storage: 484-------------- 485 486bup stores its data in a git-formatted repository. Unfortunately, git 487itself doesn't actually behave very well for bup's use case (huge numbers of 488files, files with huge sizes, retaining file permissions/ownership are 489important), so we mostly don't use git's *code* except for a few helper 490programs. For example, bup has its own git packfile writer written in 491python. 492 493Basically, 'bup split' reads the data on stdin (or from files specified on 494the command line), breaks it into chunks using a rolling checksum (similar to 495rsync), and saves those chunks into a new git packfile. There is at least one 496git packfile per backup. 497 498When deciding whether to write a particular chunk into the new packfile, bup 499first checks all the other packfiles that exist to see if they already have that 500chunk. If they do, the chunk is skipped. 501 502git packs come in two parts: the pack itself (*.pack) and the index (*.idx). 503The index is pretty small, and contains a list of all the objects in the 504pack. Thus, when generating a remote backup, we don't have to have a copy 505of the packfiles from the remote server: the local end just downloads a copy 506of the server's *index* files, and compares objects against those when 507generating the new pack, which it sends directly to the server. 508 509The "-n" option to 'bup split' and 'bup save' is the name of the backup you 510want to create, but it's actually implemented as a git branch. So you can 511do cute things like checkout a particular branch using git, and receive a 512bunch of chunk files corresponding to the file you split. 513 514If you use '-b' or '-t' or '-c' instead of '-n', bup split will output a 515list of blobs, a tree containing that list of blobs, or a commit containing 516that tree, respectively, to stdout. You can use this to construct your own 517scripts that do something with those values. 518 519The bup index: 520-------------- 521 522'bup index' walks through your filesystem and updates a file (whose name is, 523by default, ~/.bup/bupindex) to contain the name, attributes, and an 524optional git SHA1 (blob id) of each file and directory. 525 526'bup save' basically just runs the equivalent of 'bup split' a whole bunch 527of times, once per file in the index, and assembles a git tree 528that contains all the resulting objects. Among other things, that makes 529'git diff' much more useful (compared to splitting a tarball, which is 530essentially a big binary blob). However, since bup splits large files into 531smaller chunks, the resulting tree structure doesn't *exactly* correspond to 532what git itself would have stored. Also, the tree format used by 'bup save' 533will probably change in the future to support storing file ownership, more 534complex file permissions, and so on. 535 536If a file has previously been written by 'bup save', then its git blob/tree 537id is stored in the index. This lets 'bup save' avoid reading that file to 538produce future incremental backups, which means it can go *very* fast unless 539a lot of files have changed. 540 541 542Things that are stupid for now but which we'll fix later 543======================================================== 544 545Help with any of these problems, or others, is very welcome. Join the 546mailing list (see below) if you'd like to help. 547 548 - 'bup save' and 'bup restore' have immature metadata support. 549 550 On the plus side, they actually do have support now, but it's new, 551 and not remotely as well tested as tar/rsync/whatever's. However, 552 you have to start somewhere, and as of 0.25, we think it's ready 553 for more general use. Please let us know if you have any trouble. 554 555 Also, if any strip or graft-style options are specified to 'bup 556 save', then no metadata will be written for the root directory. 557 That's obviously less than ideal. 558 559 - bup is overly optimistic about mmap. Right now bup just assumes 560 that it can mmap as large a block as it likes, and that mmap will 561 never fail. Yeah, right... If nothing else, this has failed on 562 32-bit architectures (and 31-bit is even worse -- looking at you, 563 s390). 564 565 To fix this, we might just implement a FakeMmap[1] class that uses 566 normal file IO and handles all of the mmap methods[2] that bup 567 actually calls. Then we'd swap in one of those whenever mmap 568 fails. 569 570 This would also require implementing some of the methods needed to 571 support "[]" array access, probably at a minimum __getitem__, 572 __setitem__, and __setslice__ [3]. 573 574 [1] http://comments.gmane.org/gmane.comp.sysutils.backup.bup/613 575 [2] http://docs.python.org/2/library/mmap.html 576 [3] http://docs.python.org/2/reference/datamodel.html#emulating-container-types 577 578 - 'bup index' is slower than it should be. 579 580 It's still rather fast: it can iterate through all the filenames on my 581 600,000 file filesystem in a few seconds. But it still needs to rewrite 582 the entire index file just to add a single filename, which is pretty 583 nasty; it should just leave the new files in a second "extra index" file 584 or something. 585 586 - bup could use inotify for *really* efficient incremental backups. 587 588 You could even have your system doing "continuous" backups: whenever a 589 file changes, we immediately send an image of it to the server. We could 590 give the continuous-backup process a really low CPU and I/O priority so 591 you wouldn't even know it was running. 592 593 - bup only has experimental support for pruning old backups. 594 595 While you should now be able to drop old saves and branches with 596 `bup rm`, and reclaim the space occupied by data that's no longer 597 needed by other backups with `bup gc`, these commands are 598 experimental, and should be handled with great care. See the 599 man pages for more information. 600 601 Unless you want to help test the new commands, one possible 602 workaround is to just start a new BUP_DIR occasionally, 603 i.e. bup-2013, bup-2014... 604 605 - bup has never been tested on anything but Linux, FreeBSD, NetBSD, 606 OS X, and Windows+Cygwin. 607 608 There's nothing that makes it *inherently* non-portable, though, so 609 that's mostly a matter of someone putting in some effort. (For a 610 "native" Windows port, the most annoying thing is the absence of ssh in 611 a default Windows installation.) 612 613 - bup needs better documentation. 614 615 According to an article about bup in Linux Weekly News 616 (https://lwn.net/Articles/380983/), "it's a bit short on examples and 617 a user guide would be nice." Documentation is the sort of thing that 618 will never be great unless someone from outside contributes it (since 619 the developers can never remember which parts are hard to understand). 620 621 - bup is "relatively speedy" and has "pretty good" compression. 622 623 ...according to the same LWN article. Clearly neither of those is good 624 enough. We should have awe-inspiring speed and crazy-good compression. 625 Must work on that. Writing more parts in C might help with the speed. 626 627 - bup has no GUI. 628 629 Actually, that's not stupid, but you might consider it a 630 limitation. See the ["Related Projects"](https://bup.github.io/) 631 list for some possible options. 632 633More Documentation 634================== 635 636bup has an extensive set of man pages. Try using 'bup help' to get 637started, or use 'bup help SUBCOMMAND' for any bup subcommand (like split, 638join, index, save, etc.) to get details on that command. 639 640For further technical details, please see ./DESIGN. 641 642 643How you can help 644================ 645 646bup is a work in progress and there are many ways it can still be improved. 647If you'd like to contribute patches, ideas, or bug reports, please join the 648bup mailing list. 649 650You can find the mailing list archives here: 651 652 http://groups.google.com/group/bup-list 653 654and you can subscribe by sending a message to: 655 656 bup-list+subscribe@googlegroups.com 657 658Please see <a href="HACKING">./HACKING</a> for 659additional information, i.e. how to submit patches (hint - no pull 660requests), how we handle branches, etc. 661 662 663Have fun, 664 665Avery 666