1.. _cyrus-backups:
2
3=============
4Cyrus Backups
5=============
6
7.. contents::
8
9
10Introduction
11========================
12
13Cyrus Backups are a replication-based backup service for Cyrus IMAP servers.
14This is currently an experimental feature. If you have the resources to try it
15out alongside your existing backup solutions, feedback would be appreciated.
16
17This document is intended to be a guide to the configuration and
18administration of Cyrus Backups.
19
20This document is a work in progress and at this point is incomplete.
21
22This document assumes that you are familiar with compiling, installing,
23configuring and maintaining Cyrus IMAP servers generally, and will only discuss
24backup-related portions in detail.
25
26This document assumes a passing familiarity with
27:ref:`Cyrus Replication <replication>`.
28
29Limitations
30===========
31
32Cyrus Backups are experimental and incomplete.
33
34The following components exist and appear to work:
35
36-  backupd, and therefore inbound replication
37-  autovivification of backup storage for new users, with automatic partition
38   selection
39-  rebuilding of backup indexes from backup data files
40-  compaction of backup files to remove stale data and combine chunks for
41   better compression
42-  deep verification of backup file/index state
43-  examination of backup data
44-  locking tool, for safe non-cyrus operations on backup files
45-  recovery of data back into a Cyrus IMAP server
46
47The following components don't yet exist in a workable state -- these tasks
48must be massaged through manually (with care):
49
50-  reconstruct of backups.db from backup files
51
52The following types of information are currently backed up and recoverable
53
54-  mailbox state and annotations
55-  messages
56-  mailbox message records, flags, and annotations
57
58The following types of information are currently backed up, but tools to
59recover them don't yet exist:
60
61-  sieve scripts (but not active script status)
62-  subscriptions
63-  seen data
64
65The following types of information are not currently backed up
66
67-  quota information
68
69Architecture
70============
71
72Cyrus Backups are designed to run on one or more standalone, dedicated backup
73servers, with suitably-sized storage partitions. These servers generally do
74not run an IMAP daemon, nor do they have conventional mailbox storage.
75
76Your Cyrus IMAP servers synchronise mailbox state to the Cyrus Backup server(s)
77using the Cyrus replication (aka sync, aka csync) protocol.
78
79Backup data is stored in two files per user: a data file, containing gzipped
80chunks of replication commands; and an SQLite database, which indexes the
81current state of the backed up data. User backup files are stored in a hashed
82subdirectory of their containing partition.
83
84A twoskip database, backups.db, stores mappings of users to their backup file
85locations
86
87Installation
88============
89
90Requirements
91------------
92
93-  At least one Cyrus IMAP server, serving and storing user data.
94-  At least one machine which will become the first backup server.
95
96Cyrus Backups server
97--------------------
98
99#. Compile cyrus with the ``--enable-backup`` configure option and install it.
100#. Set up an :cyrusman:`imapd.conf(5)` file for it with the following options
101   (default values shown):
102
103    backup\_db: twoskip
104        The twoskip database format is recommended for backups.db
105    backup\_db\_path: {configdirectory}/backups.db
106        The backups db contains a mapping of user ids to their backup locations
107    backup\_staging\_path: {temp\_path}/backup
108        Directory to use for staging message files during backup operations.
109        The replication protocol will transfer as many as 1024 messages in a
110        single sync operation, so, conservatively, this directory needs to
111        contain enough storage for 1024 \* your maximum message size \* number
112        of running backupd's, plus some wiggle room.
113    backup\_retention\_days: 7
114        Number of days for which backup data (messages etc) should be kept
115        within the backup storage after the corresponding item has been
116        deleted/expunged from the Cyrus IMAP server.
117    backuppartition-\ *name*: /path/to/this/partition
118        You need at least one backuppartition-\ *name* to store backup data.
119        These work similarly to regular/archive IMAP partitions, but note that
120        there is no relationship between backup partition names and
121        regular/archive partition names. New users will be have their backup
122        storage provisioned according to the usual partition selection rules.
123    backup\_compact\_minsize: 0
124        The ideal minimum data chunk size within backup files, in kB. The
125        compact tool will try to combine chunks that are smaller than this
126        into neighbouring chunks. Larger values tend to yield better
127        compression ratios, but if the data is corrupted on disk, the entire
128        chunk will become unreadable. Zero turns this behaviour off.
129    backup\_compact\_maxsize: 0
130        The ideal maximum data chunk size within backup files, in kB. The
131        compact tool will try to split chunks that are larger than this into
132        multiple smaller chunks. Zero turns this behaviour off.
133    backup\_compact\_work\_threshold: 1
134        The number of chunks within a backup file that must obviously need
135        compaction before the compact tool will attempt to compact the file.
136        Larger values are expected to reduce compaction I/O load at the expense
137        of delayed recovery of storage space.
138
139#. Create a user for authenticating to the backup system, and add it to the
140   ``admins`` setting in :cyrusman:`imapd.conf(5)`
141#. Add appropriate ``sasl_*`` settings for your authentication method to
142   :cyrusman:`imapd.conf(5)`
143#. Set up a :cyrusman:`cyrus.conf(5)` file for it::
144
145    START {
146        # this is required
147        recover cmd="ctl_cyrusdb -r"
148    }
149
150    SERVICES {
151        # backupd is probably the only service entry your backup server needs
152        backupd cmd="backupd" listen="csync" prefork=0
153    }
154
155    EVENTS {
156        # this is required
157        checkpoint cmd="ctl_cyrusdb -c" period=30
158
159        # arrange for compact to run at some interval
160        compact cmd="ctl_backups compact -A" at=0400
161    }
162
163#. Start up the server, and use :cyrusman:`synctest(1)` to verify that you can
164   authenticate to backupd
165
166Cyrus IMAP servers
167------------------
168
169Your Cyrus IMAP servers must be running version 3 or later of Cyrus, and must
170have been compiled with the ``--enable-replication`` configure option.  It does
171*not* need to be recompiled with the ``--enable-backup`` option.
172
173It's recommended to set up a dedicated replication channel for backups, so that
174your backup replication can coexist independently of your other replication
175configurations
176
177Add settings to :cyrusman:`imapd.conf(5)` like (default values shown):
178
179*channel*\ \_sync\_host: backup-server.example.com
180    The host name of your Cyrus Backup server
181*channel*\ \_sync\_port: csync
182    The port on which your Cyrus Backup server's backupd process listens
183*channel*\ \_sync\_authname: ...
184    Credentials for authenticating to the Cyrus Backup server
185*channel*\ \_sync\_password: ...
186    Credentials for authenticating to the Cyrus Backup server
187
188Using rolling replication
189+++++++++++++++++++++++++
190
191You can configure backups to use rolling replication.  Depending on the sync
192repeat interval you configure, this can be used to keep your backups very
193current -- potentially as current as your other replicas.
194
195To configure rolling replication, add additional settings to
196:cyrusman:`imapd.conf(5)` like:
197
198sync\_log: 1
199    Enable sync log if it wasn't already.
200sync\_log\_channels: *channel*
201    Add a new channel "*channel*" to whatever was already here. Suggest calling
202    this "backup"
203*channel*\ \_sync\_repeat\_interval: 1
204    Minimum time in seconds between rolling replication runs. Smaller value
205    means livelier backups but more network I/O. Larger value reduces I/O.
206
207Update :cyrusman:`cyrus.conf(5)` to add a :cyrusman:`sync_client(8)` invocation
208to the DAEMON section specifying (at least) the ``-r`` and ``-n channel``
209options.
210
211See :cyrusman:`imapd.conf(5)` for additional *sync\_* settings that can
212be used to affect the replication behaviour.  Many can be prefixed with
213a channel to limit their affect to only backups, if necessary.
214
215Using scheduled replication (push)
216++++++++++++++++++++++++++++++++++
217
218You can configure backups to occur on a schedule determined by the IMAP
219server.
220
221To do this, add :cyrusman:`sync_client(8)` invocations to the EVENTS section
222of :cyrusman:`cyrus.conf(5)` (or cron, etc), specifying at least the
223``-n channel`` option (to use the channel-specific configuration), plus
224whatever other options you need for selecting users to back up. See the
225:cyrusman:`sync_client(8)` manpage for details.
226
227You could also invoke :cyrusman:`sync_client(8)` in a similar way from a
228custom script running on the IMAP server.
229
230Using scheduled replication (pull)
231++++++++++++++++++++++++++++++++++
232
233You can configure backups to occur on a schedule determined by the
234backup server.  For example, you may have a custom script that examines
235the existing backups, and provokes fresh backups to occur if they are
236determined to be out of date.
237
238To to this, enable XBACKUP on your IMAP server by adding the following
239setting to :cyrusman:`imapd.conf(5)`:
240
241xbackup\_enabled: yes
242    Enables the XBACKUP command in imapd.
243
244Your custom script can then authenticate to the IMAP server as an admin
245user, and invoke the command ``XBACKUP pattern [channel]``.  A replication
246of the users or shared mailboxes matching the specified pattern will occur
247to the backup server defined by the named channel.  If no channel is
248specified, default sync configuration will be used.
249
250For example::
251
252    C: 1 XBACKUP user.* backup
253    S: * OK USER anne
254    S: * OK USER bethany
255    S: * NO USER cassandane (Operation is not supported on mailbox)
256    S: * OK USER demi
257    S: * OK USER ellie
258    S: 1 OK Completed
259
260This replicates all users to the channel *backup*.
261
262
263Administration
264==============
265
266Storage requirements
267--------------------
268
269It's not really known yet how to predict the storage requirements for a backup
270server. Experimentation in dev environment suggests around 20-40% compressed
271backup file size relative to the backed up data, depending on compact settings,
272but this is with relatively tiny mailboxes and non-pathological data.
273
274The backup staging spool conservatively needs to be large enough to hold an
275entire sync's worth of message files at once. Which is your maximum message
276size \* 1024 messages \* the number of backupd processes you're running, plus
277some wiggle room probably. In practice it'll probably not hit this limit
278unless someone is trying to. (Most users, I suspect, don't have 1024
279maximum-sized messages in their account, or don't receive them all at once
280anyway.)
281
282Certain invocations of ctl\_backups and cyr\_backup also require staging spool
283space, due to the way replication protocol (and thus backup data) parsing
284handles messages. So keep this in mind I suppose.
285
286Initial backups
287---------------
288
289Once a Cyrus Backup system is configured and running, new users that are
290created on the IMAP servers will be backed up seamlessly without administrator
291intervention.
292
293The very first backup taken of a pre-existing mailbox will be big -- the entire
294mailbox in one hit. It's suggested that, when initially provisioning a Cyrus
295Backup server for an existing Cyrus IMAP environment, that the
296:cyrusman:`sync_client(8)` commands be run carefully, for a small group of
297mailboxes at a time, until all/most of your mailboxes have been backed up at
298least once. Also run the :cyrusman:`ctl_backups(8)` ``compact`` command on the
299backups, to break up big chunks, if you wish.  Only then should you enable
300rolling/scheduled replication.
301
302Restoring from backups
303----------------------
304
305The :cyrusman:`restore(8)` tool will restore mailboxes and messages from a
306specified backup to a specified destination server. The destination server must
307be running a replication-capable :cyrusman:`imapd(8)` or
308:cyrusman:`sync_server(8)`. The restore tool should be run from the backup
309server containing the specified backup.
310
311File locking
312------------
313
314All :cyrusman:`backupd(8)`/:cyrusman:`ctl_backups(8)`/:cyrusman:`cyr_backup(8)`
315operations first obtain a lock on the relevant backup file.  ctl\_backups and
316cyr\_backup will try to do this without blocking (unless told otherwise),
317whereas backupd will never block.
318
319Moving backup files to different backup partitions
320--------------------------------------------------
321
322There's no tool for this (yet). To do it manually, stop backupd, copy the files
323to the new partition, then use :cyrusman:`cyr_dbtool(8)` to update the user's
324backups.db entry to point to the new location. Run the
325:cyrusman:`ctl_backups(8)` ``verify`` command on both the new filename (``-f``
326mode) and the user's userid (``-u`` mode) to ensure everything is okay, then
327restart backupd.
328
329Provoking a backup for a particular user/user group/everyone/etc right now
330--------------------------------------------------------------------------
331
332Just run :cyrusman:`sync_client(8)` by hand with appropriate options (as cyrus
333user, of course). See its man page for ways of specifying items to replicate.
334
335If the IMAP server with the user's mail has been configured with the
336``xbackup_enabled: yes`` option in :cyrusman:`imapd.conf(5)`, then an admin
337user can cause a backup to occur by sending the IMAP server an ``XBACKUP``
338command.
339
340What about tape backups?
341------------------------
342
343As long as backupd, ctl\_backups and cyr\_backup are not currently running (and
344assuming no-one's poking around in things otherwise), it's safe to take/restore
345a filesystem snapshot of backup partitions. So to schedule, say, a nightly tape
346dump of your Cyrus Backup server, make your cron job shut down Cyrus, make the
347copy, then restart Cyrus.
348
349Meanwhile, your Cyrus IMAP servers are still online and available.  Regular
350backups will resume once your backupd is running again.
351
352If you can work at a finer granularity than file system, you don't need to shut
353down backupd. Just use the :cyrusman:`ctl_backups(8)` ``lock`` command to hold
354a lock on each backup while you work with its files, and the rest of the backup
355system will work around that.
356
357Restoring is more complicated, depending on what you actually need to do:
358when you restart the backupd after restoring a filesystem snapshot, the next
359time your Cyrus IMAP server replicates to it, the restored backups will be
360brought up to date. Probably not what you wanted -- so don't restart backupd
361until you've done whatever you were doing.
362
363Multiple IMAP servers, one backup server
364----------------------------------------
365
366This is fine, as long as each user being backed up is only being backed up by
367one server (or they are otherwise synchronised). If IMAP servers have different
368ideas about the state of a user's mailboxes, one of those will be in sync with
369the backup server and the other will get a lot of replication failures.
370
371Multiple IMAP servers, multiple backup servers
372----------------------------------------------
373
374Make sure your :cyrusman:`sync_client(8)` configuration(s) on each IMAP server
375knows which users are being backed up to which backup servers, and selects
376them appropriately. See the :cyrusman:`sync_client(8)` man page for options for
377specifying users, and run it as an event (rather than rolling).
378
379Or just distribute it at server granularity, such that backup server A serves
380IMAP servers A, B and C, and backup server B serves IMAP servers D, E, F, etc.
381
382One IMAP server, multiple backup servers
383----------------------------------------
384
385Configure one channel plus one rolling :cyrusman:`sync_client(8)` per backup
386server, and your IMAP server can be more or less simultaneously backed up to
387multiple backup destinations.
388
389Reducing load
390-------------
391
392To reduce load on your client-facing IMAP servers, configure sync log chaining
393on their replicas and let those take the load of replicating to the backup
394servers.
395
396To reduce network traffic, do the same thing, specifically using replicas that
397are already co-located with the backup server.
398
399Other setups
400------------
401
402The use of the replication protocol and :cyrusman:`sync_client(8)` allows a lot
403of interesting configuration possibilities to shake out. Have a rummage in the
404:cyrusman:`sync_client(8)` man page for inspiration.
405
406Tools
407=====
408
409ctl\_backups
410------------
411
412This tool is generally for mass operations that require few/fixed arguments
413across multiple/all backups
414
415Supported operations:
416
417compact
418    Reduce backups' disk usage by:
419
420    * combining small chunks for better gzip compression -- especially
421      important for hot backups, which produce many tiny chunks
422    * removing deleted content that has passed its retention period
423list
424    List known backups.
425lock
426    Lock a single backup, so you can safely work on it with non-cyrus tools.
427reindex
428    Regenerate indexes for backups from their data files. Useful if index
429    becomes corrupted by some bug, or invalidated by working on data with
430    non-cyrus tools.
431stat
432    Show statistics about backups -- disk usage, compression ratio, etc.
433verify
434    Deep verification of backups. Verifies that:
435
436    * Checksums for each chunk in index match data
437    * Mailbox states are in the chunk that the index says they're in
438    * Mailbox states match indexed states
439    * Messages are in the chunk the index says they're in
440    * Message data checksum matches indexed checksums
441
442See the :cyrusman:`ctl_backups(8)` man page for more information.
443
444cyr\_backup
445-----------
446
447This tool is generally for operations on a single mailbox that require multiple
448additional arguments
449
450Supported operations
451
452list [ chunks \| mailboxes \| messages \| all ]
453    Line-per-item listing of information stored in a backup.
454show [ chunks \| mailboxes \| messages ] items...
455    Paragraph-per-item listing of information for specified items. Chunk items
456    are specified by id, mailboxes by mboxname or uniqueid, messages by guid.
457dump [ chunk \| message ] item
458    Full dump of one item. chunk dumps the uncompressed content of a chunk
459    (i.e. a bunch of sync protocol commands). message dumps a raw rfc822
460    message (useful for manually restoring)
461
462See the :cyrusman:`cyr_backup(8)` man page for more information.
463
464restore
465-------
466
467This tool is for restoring mail from backup files.
468
469Required arguments are a destination server (in ip:port or host:port format),
470a backup file, and mboxnames, uniqueids or guids specifying the mailboxes or
471messages to be restored.
472
473If the target mailbox does not already exist on the destination server, options
474are available to preserve the mailbox and message properties as they existed
475in the backup. This is useful for rebuilding a lost server from backups, such
476that client state remains consistent.
477
478If the target mailbox already exists on the destination server, restored
479messages will be assigned new, unused uids and will appear to the client as new
480messages.
481
482See the :cyrusman:`restore(8)` man page for more information.
483