1Notes for backup implementation
2
3Backup index database (one per user):
4
5chunk:
6    int id
7    timestamp ts
8    int offset
9    int length
10    text file_sha1              -> sha1 of (compressed) data prior to this chunk
11    text data_sha1              -> sha1 of (uncompressed) data contained in this chunk
12
13mailbox:
14    int id
15    int last_chunk_id           -> chunk that knows the current state
16    char uniqueid               -> unique
17    char mboxname               -> altered by a rename
18    char mboxtype
19    int last_uid
20    int highestmodseq
21    int recentuid
22    timestamp recenttime
23    timestamp last_appenddate
24    timestamp pop3_last_login
25    timestamp pop3_show_after
26    timestamp uidvalidity
27    char partition
28    char acl
29    char options
30    int sync_crc
31    int sync_crc_annot
32    char quotaroot
33    int xconvmodseq
34    char annotations
35    timestamp deleted           -> time that it was unmailbox'd, or NULL if still alive
36
37message:
38    int id
39    char guid
40    char partition              -> this is used to set the spool directory for the temp file - we might not need it
41    int chunk_id
42    int offset                  -> offset within chunk of dlist containing this message
43    int size                    -> size of this message (n.b. not length of dlist)
44
45mailbox_message:
46    int mailbox_id
47    int message_id
48    int last_chunk_id           -> chunk that has a RECORD in a MAILBOX for this
49    int uid
50    int modseq
51    timestamp last_updated
52    char flags
53    timestamp internaldate
54    int size
55    char annotations
56    timestamp expunged          -> time that it was expunged, or NULL if still alive
57
58subscription:
59    int last_chunk_id           -> chunk that knows the current state
60    char mboxname               -> no linkage to mailbox table, users can be sub'd to nonexistent
61    timestamp unsubscribed      -> time that it was unsubscribed, or NULL if still alive
62
63seen:
64    int last_chunk_id           -> chunk that knows the current state
65    char uniqueid               -> mailbox (not necessarily ours) this applies to
66    timestamp lastread
67    int lastuid
68    timestamp lastchange
69    char seenuids               -> a uid sequence encoded as a string
70
71sieve:
72    int chunk_id
73    timestamp last_update
74    char filename
75    char guid
76    int offset                  -> offset within chunk of the dlist containing this script
77    timestamp deleted           -> time that it was deleted, or NULL if still alive
78
79
80sieve scripts and messages are both identified by a GUID
81but APPLY SIEVE doesn't take a GUID, it seems to be generated locally?
82the GUID in the response to APPLY SIEVE is generated in the process of
83reading the script from disk (sync_sieve_list_generate)
84
85can't activate scripts because only bytecode files are activated, but
86we neither receive bytecode files over sync protocol nor do we compile
87them ourselves.
88
89possibly reduce index size by breaking deleted/expunged values into their
90own tables, such that we only store a deleted value for things that are
91actually deleted.  use left join + is null to find undeleted content
92
93
94messages
95--------
96
97APPLY MESSAGE is a list of messages, not necessarily only one message.
98Actually, it's a list of messages for potentially multiple users, but we avoid
99this by rejecting GET MESSAGES requests that span multiple users (so that
100sync_client retries at USER level, and so we only see APPLY MESSAGE requests
101for a single user).
102
103Cheap first implementation is to index the start/end of the entire APPLY
104MESSAGE command identically for each message within it, and at restore time
105we grab that chunk and loop over it looking for the correct guid.
106
107Ideal implementation would be to index the offset and length of each message
108exactly (even excluding the dlist wrapper), but this is rather complicated
109by the dlist API.
110
111For now, we just index the offset of the dlist entry for the message,
112and we can parse the pure message data back out later from that, when
113we need to.  Slightly less efficient on reads, but works->good->fast.  We
114need to loop over the entries in the MESSAGE dlist to find the one with the
115desired GUID.
116
117The indexed length needs to be the length of the message, not the length of the
118dlist wrapper, because we need to know this cheaply to supply RECORDs in
119MAILBOX responses.
120
121
122renames
123-------
124
125APPLY RENAME %(OLDMBOXNAME old NEWMBOXNAME new PARTITION p UIDVALIDITY 123)
126
127We identify mboxes by uniqueid, so when we start seeing sync data for the same
128uniqueid with a new mboxname we just transparently update it anyway, without
129needing to handle the APPLY RENAME.  Not sure if this is a problem...  Do we
130need to record an mbox's previous names somehow?
131
132I think it's possible to use this to rename a USER though, something like:
133
134APPLY RENAME %(OLDMBOXNAME example.com!user.smithj NEWMBOXNAME example.com!user.jsmith ...)
135
136-- in which case, without special handling of the RENAME command itself, there
137will be a backup for the old user that ends with the RENAME, and a backup of
138the new user that (probably) duplicates everything again (except for stuff
139that's been expunged).
140
141And if someone else gets given the original name, like
142
143APPLY RENAME %(OLDMBOXNAME example.com!user.samantha-mithj NEWMBOXNAME example.com!user.smithj ...)
144
145Then anything that was expunged from the original user but still available in
146backup disappears?  Or the two backups get conflated, and samantha can
147"restore" the original smithj's old mail?
148
149Uggh.
150
151if there's a mailboxes database pointing to the backup files, then the backup
152file names don't need to be based on the userid, they could e.g. be based on
153the user's inbox's uniqueid.  this would make it easier to deal with user
154renames because the backup filename wouldn't need to change.  but this depends
155on the uniqueid(s) in question being present on most areas of the sync
156protocol, otherwise when starting a backup of a brand new user we won't be
157able to tell where to store it.  workaround in the meantime could be to make
158some kind of backup id from the mailboxes database, and base the filename on
159this.
160
161actually, using "some kind of backup id from the mailboxes database" is probably
162the best solution.  otherwise the lock complexity of renaming a user while making
163sure their new backup filename doesn't already exist is frightful.
164
165maybe do something with mkstemp()?
166
167furthermore: what if a mailbox is moved from one user to another?  like:
168
169APPLY RENAME %(OLD... example.com!user.foo.something NEW... example.com!user.bar.something ...)
170
171when a different-uid rename IS a rename of a user (and not just a folder
172being moved to a different user), what does it look like?
173* does it do a single APPLY RENAME for the user, and expect their folders to
174  shake out of that?
175* does it do an APPLY RENAME for each of their folders?
176
177in the latter case, we need to append each of those RENAMEs to the old backup
178so they can take effect correctly, and THEN rename the backup file itself. but
179how to tell when the appends are finished?
180
181how can we tell the difference between folder(s) moved to a different user vs
182user has been renamed?
183
184there is a setting: 'allowusermoves: 0' which, when enabled, allows users to
185be renamed via IMAP rename/xfer commands.  but the default is that this is
186disabled.  we could initially require this to be disabled while using backups...
187
188not sure what the workflow looks like for renaming a user if this is not enabled.
189
190not sure what the sync flow looks like in either case.
191
192looking at sync_apply_rename and mboxlist_renamemailbox, it seems like we'll
193see an APPLY RENAME for each affected mbox when a recursive rename is occurring.
194
195there doesn't seem to be anything preventing user/a/foo -> user/b/foo in the
196general (non-INBOX) case.
197
198renames might be a little easier to handle if the index replicated the mailbox
199hierarchy rather than just being a flat structure.  though this adds complexity
200wrt hiersep handling.  something like:
201
202 mailbox:
203    mboxname        # just the name of this mbox
204    parent_id       # fk to parent mailbox
205    full_mboxname   # cached value, parent.full_mboxname + mboxname
206
207
208locking
209-------
210
211just use a normal flock/fcntl lock on the data file and only open the index
212if that lock succeeded
213
214backup:   needs to append foo and update foo.index
215reindex:  only needs to read foo, but needs a write lock to prevent writes
216          while it does so. needs to write to (replace) foo.index
217compact:  needs to re-write foo and foo.index
218restore:  needs to read
219
220
221verifying index
222---------------
223
224how to tell whether the .index file is the correct one for the backup data it
225ostensibly represents?
226
227one way to do this would be to have backup_index_end() store a checksum of
228the corresponding data contents in the index.
229
230when opening a backup, verify this checksum against the data, and refuse to
231load the index if it doesn't match.
232
233- sha1sum of (compressed) contents of file prior to each chunk
234
235how to tell whether the chunk data is any good?  store a checksum of the chunk
236contents along with the rest of the chunk index
237
238- sha1sum of (uncompressed) contents of each chunk
239
240
241mailboxes database
242------------------
243
244bron reckons use twoskip for this
245userid -> backup_filename
246
247lib/cyrusdb module implements this, look into that
248
249look at conversations db code to see how to use it
250
251need a tool:
252    * given a user, show their backup filename
253    * dump/undump
254    * rebuild based on files discovered in backup directory
255
256where does this fit into the locking scheme?
257
258
259reindex
260-------
261
262* convert user mailbox name to backup name
263* complain if there's no backup data file?
264* lock, rename .index to .index.old, init new .index
265* foreach file chunk:
266*   timestamp is from first line in chunk
267*   complain if timestamp has gone backwards?
268*   index records from chunk
269* unlock
270* clean up .index.old
271
272on error:
273* discard partial new index
274* restore .index.old
275* bail out
276
277
278backupd
279-------
280
281cmdloop:
282* (periodic cleanup)
283* read command, determine backup name
284* already holding lock ? bump timestamp : obtain lock
285* write data to gzname, flush immediately
286* index data
287
288periodic cleanup:
289* check timestamp of each held lock
290* if stale (define: stale?), release
291* FIXME if we've appended more than the chunk size we would compact to, release
292
293sync restart:
294* release each held lock
295
296exit:
297* release each held lock
298
299need a "backup_index_abort" to complete the backup_index_start/end set.
300_start should create a transaction, _end should commit it, and _abort should
301roll it back.  then, if backupd fails to write to the gzip file for some
302reason, the (now invalid) index info we added can be discarded too.
303
304flushing immediately on write results in poor gzip compression, but for
305incremental backups that's not a problem.  when the compact process hits the
306file it will recompress the data more efficiently.
307
308
309questions
310---------
311* what does it look like when uidvalidity changes?
312
313
314restore
315-------
316
317restoration is effectively a reverse-direction replication (replicating TO master),
318which means we can't necessarily supply things like uid, modseq, etc without racing
319against normal message arrivals.  so instead we add an extra command to the protocol
320to restore a message to a folder but let the destination determine the tasty bits.
321
322protocol flow looks something like:
323
324c: APPLY RESERVE ... # as usual
325s: * MISSING (foo bar)
326s: OK
327c: APPLY MESSAGE ... # as usual
328s: OK
329c: RESTORE MAILBOX ... # new sync proto command
330s: OK
331
332we introduce a new command, RESTORE MAILBOX, which is similar to the existing
333APPLY MAILBOX.  it specifies, for a mailbox, the mailbox state plus the message
334records relevant to the restore.
335
336the imapd/sync_server receiving the RESTORE command creates the mailbox if necessary,
337and then adds the message records to it as new records (i.e. generating new uid etc).
338this will end up generating new events in the backup channel's sync log, and then the
339messages will be backed up again with their new uids, etc.  additional wire transfer
340of message data should be avoided by keeping the same guid.
341
342if the mailbox already exists but its uniqueid does not match the one from the backup,
343then what?  this probably means user has deleted folder and contents, then made new
344folder with same name.  so it's probably v common for mailbox uniqueid to not match
345like this.  so we don't care about special handling for this case.  just add any
346messages that aren't already there.
347
348if the mailbox doesn't already exist on the destination (e.g. if rebuilding a server
349from backups) then it's safe and good to reuse uidvalidity, uniqueid, uid, modseq etc,
350such that connecting clients can preserve their state.  so the imapd/sync_server
351receiving the restore request accepts these fields as optional, but only preserves
352them if it's safe to do so.
353
354* restore: sbin program for selecting and restoring messages
355
356restore command needs options:
357+ whether or not to trim deletedprefix off mailbox names to be restored
358+ whether or not to restore uniqueid, highestmodseq, uid and so on
359+ whether or not to limit to/exclude expunged messages
360+ whether or not to restore sub-mailboxes
361+ sync_client-like options (servername, local_only, partition, ...)
362+ user/mailbox/backup file(s) to restore from
363+ mailbox to restore to (override location in backup)
364+ override acl?
365
366can we heuristically determine whether an argument is an mboxname, uniqueid or guid?
367    => libuuid uniqueid is 36 bytes of hyphen (at fixed positions) and hex digits
368    => non-libuuid uniqueid is 24 bytes of hex digits
369    => mboxname usually contains at least one . somewhere
370    => guid is 40 bytes of hex digits
371
372usage:
373    restore [options] server [mode] backup [mboxname | uniqueid | guid]...
374
375options:
376    -A acl              # apply specified acl to restored mailboxes
377    -C alt_config       # alternate config file
378    -D                  # don't trim deletedprefix before restoring
379    -F input-file       # read mailboxes/messages from file rather than argv
380    -L                  # local mailbox operations only (no mupdate)
381    -M mboxname         # restore messages to specified mailbox
382    -P partition        # restore mailboxes to specified partition
383    -U                  # try to preserve uniqueid, uid, modseq, etc
384    -X                  # don't restore expunged messages
385    -a                  # try to restore all mailboxes in backup
386    -n                  # calculate work required but don't perform restoration
387    -r                  # recurse into submailboxes
388    -v                  # verbose
389    -w seconds          # wait before starting (useful for attaching a debugger)
390    -x                  # only restore expunged messages (not sure if useful?)
391    -z                  # require compression (abort if compression unavailable)
392
393mode:
394    -f                  # specified backup interpreted as filename
395    -m                  # specified backup interpreted as mboxname
396    -u                  # specified backup interpreted as userid (default)
397
398
399compact
400--------
401
402# finding messages that are to be kept (either exist as unexpunged somewhere,
403# or exist as expunged but more recently than threshold)
404# (to get unique rows, add "distinct" and remove mm.expunged from fields)
405sqlite> select m.*, mm.expunged from message as m join mailbox_message as mm on m.id = mm.message_id and (mm.expunged is null or mm.expunged > 1437709300);
406id|guid|partition|chunk_id|offset|length|expunged
4071|1c7cca361502dfed2d918da97e506f1c1e97dfbe|default|1|458|2159|
4081|1c7cca361502dfed2d918da97e506f1c1e97dfbe|default|1|458|2159|1446179047
4091|1c7cca361502dfed2d918da97e506f1c1e97dfbe|default|1|458|2159|1446179047
410
411# finding chunks that are still needed (due to containing last state
412# of mailbox or mailbox_message, or containing a message)
413sqlite> select * from chunk where id in (select last_chunk_id from mailbox where deleted is null or deleted > 1437709300 union select last_chunk_id from mailbox_message where expunged is null or expunged > 1437709300 union select chunk_id from message as m join mailbox_message as mm on m.id = mm.message_id and (mm.expunged is null or mm.expunged > 1437709300));
414id|timestamp|offset|length|file_sha1|data_sha1
4151|1437709276|0|3397|da39a3ee5e6b4b0d3255bfef95601890afd80709|6836d0110252d08a0656c14c2d2d314124755491
4163|1437709355|1977|2129|fee183c329c011ead7757f59182116500776eaaf|a5677cfa1f5f7b627763652f4bb9b99f5970748c
4174|1437709425|2746|1719|3d9f02135bf964ff0b6a917921b862c3420e48f0|7b64ec321457715ee61fe238f178f5d72adaef64
4185|1437709508|3589|2890|0cee599b1573110fee428f8323690cbcb9589661|90d104346ef3cba9e419461dd26045035f4cba02
419
420remember: a single APPLY MESSAGE line can contain many messages!
421
422thoughts:
423* need a heuristic for quickly determining whether a backup needs to be compacted
424    -> sum(chunks to discard, chunks to combine, chunks to split) > threshold
425    -> can we detect chunks that are going to significantly reduce in size as
426       result of discarding individual lines?
427* "quick" vs "full" compaction
428
429settings:
430
431* backup retention period
432* chunk combination size (byte length or elapsed time)
433
434combining chunks:
435* size threshold below which adjacent chunks can be joined
436* size threshold above which chunks should be split
437* duration threshold below which adjacent chunks can be joined
438* duration threshold above which chunks should be split
439backup_min_chunk_size: 0 for no minimum
440backup_max_chunk_size: 0 for no maximum
441backup_min_chunk_duration: 0 for no minimum
442backup_max_chunk_duration: 0 for no maximum
443priority: size or duration??
444
445data we absolutely need to keep:
446
447* the most recent APPLY MAILBOX for each mailbox we're keeping (mailbox state)
448* the APPLY MAILBOX containing the most recent RECORD for each message we're keeping (record state)
449* the APPLY MESSAGE for each message we're keeping (message data)
450
451data that we should practically keep:
452
453* all APPLY MAILBOXes for a given mailbox from the chunk identified as its last
454* all APPLY MAILBOXes containing a RECORD for a given message from the chunk identified as its last
455* the APPLY MESSAGE for each message we're keeping
456
457four kinds of compaction (probably at least two simultaneously):
458
459* removing unused chunks
460* combining adjacent chunks into a single chunk (for better gz compression)
461* removing unused message lines from within a chunk (important after combining)
462* removing unused messages from within a message line
463
464"unused messages"
465    messages for which all records have been expunged for longer
466    than the retention period
467"unused chunks"
468    chunks which contain only unused messages
469
470algorithm:
471
472*   open (and lock) backup and backup.new (or bail out)
473*   use backup index to identify chunks we still need
474*   create a chunk in backup.new
475*   foreach chunk we still need:
476*       foreach line in the chunk:
477*           next line if we don't need to keep it
478*           create new line
479*           foreach message in line:
480*               if we still need the message, or if we're not doing message granularity
481*                   add the message to the new line
482*           write and index tmp line to backup.new
483*       if the new chunk is big enough, or if we're not combining
484*           end chunk and start a new one
485*   end the new chunk
486*   rename backup->backup.old, backup.new->backup
487*   close (and unlock) backup.old and backup
488
489
490command line locking utility
491-------
492
493command line utility to lock a backup (for e.g. safely poking around in the
494.index on a live system).
495
496example failure:
497$ctl_backups lock -f /path/to/backup
498* Trying to obtain lock on /path/to/backup...
499NO some error
500<EOF>
501
502example success:
503$ctl_backups lock -f /path/to/backup
504* Trying to obtain lock on /path/to/backup...
505[potentially a delay here if we need to wait for another process to release the lock]
506OK locked
507[waits for its stdin to close, then unlocks and exits]
508
509if you need to rummage around in backup.index, run this program in another
510shell, do your work, then ^D it when you're finished.
511
512you could also call this from e.g. perl over a bidirectional pipe - wait to
513read "OK locked", then you've got your lock.  close the pipe to unlock when
514you're finished working.  if you don't read "OK locked" before the pipe closes
515then something went wrong and you didn't get the lock.
516
517specify backups by -f filename, -m mailbox, -u userid
518default run mode as above
519-s to fork an sqlite of the index (and unlock when it exits)
520-x to fork a command of your choosing (and unlock when it exits)
521
522
523reconstruct
524-----------
525
526rebuilding backups.db from on disk files
527
528scan each backup partition for backup files:
529  * skip timestamped files (i.e. backups from compact/reindex)
530  * skip .old files (old backups from reindex)
531  * .index files => skip???
532  * skip unreadable files
533  * skip empty files
534  * skip directories etc
535
536what's the correct procedure for repopulating a cyrus database?
537keep copy of the previous (presumably broken) one?
538
539trim off mkstemp suffix (if any) to find userid
540can we use a recognisable character to delimit the mkstemp suffix?
541
542what if there's multiple backup files for a given userid? precedence?
543
544verify found backups before recording.  reindex?
545
546locking? what if something has a filename and does stuff with it while
547reconstruct runs?
548
549backupd always uses db for opens, so as long as reconstruct keeps the db
550locked while it works, the db won't clash.  but backupd might have backups
551still open from before reconstruct started, which it will write to quite
552happily, even though reconstruct might decide that some other file is the
553correct one for that user...
554
555a backup server would generally be used only for backups, and sync_client
556is quite resilient when the destination isn't there, so it's actually
557no problem to just shut down cyrus while reconstruct runs.  no outage to
558user-facing services, just maybe some sync backlog to catch up on once
559cyrus is restarted.
560
561
562ctl_backups
563-------------
564
565sbin tool for mass backup/index/database operations
566
567needs:
568    * rebuild backups.db from disk contents
569    * list backups/info
570    * rename a backup
571    * delete a backup
572    * verify a backup (check all sha1's, not just most recent)
573
574not sure if these should be included, or separate tools:
575    * reindex a backup (or more)
576    * compact a backup (or more)
577    * lock a backup
578    * some sort of rolling compaction?
579
580usage:
581    ctl_backups [options] reconstruct                       # reconstruct backups.db from disk files
582    ctl_backups [options] list [list_opts] [[mode] backup...] # list backup info for given/all users
583    ctl_backups [options] move new_fname [mode] backup      # rename a backup (think about this more)
584    ctl_backups [options] delete [mode] backup              # delete a backup
585    ctl_backups [options] verify [mode] backup...           # verify specified backups
586    ctl_backups [options] reindex [mode] backup...          # reindex specified backups
587    ctl_backups [options] compact [mode] backup...          # compact specified backups
588    ctl_backups [options] lock [lock_opts] [mode] backup    # lock specified backup
589
590options:
591    -C alt_config       # alternate config file
592    -F                  # force (run command even if not needed)
593    -S                  # stop on error
594    -v                  # verbose
595    -w                  # wait for locks (i.e. don't skip locked backups)
596
597mode:
598    -A                  # all known backups (not valid for single backup commands)
599    -D                  # specified backups interpreted as domains (nvfsbc)
600    -P                  # specified backups interpreted as userid prefixes (nvfsbc)
601    -f                  # specified backups interpreted as filenames
602    -m                  # specified backups interpreted as mboxnames
603    -u                  # specified backups interpreted as userids (default)
604
605lock_opts:
606    -c                  # exclusively create backup
607    -s                  # lock backup and open index in sqlite
608    -x cmd              # lock backup and execute cmd
609    -p                  # lock backup and wait for eof on stdin (default)
610
611list_opts:
612    -t [hours]          # "stale" (no update in hours) backups only (default: 24)
613
614
615cyr_backup
616------
617
618sbin tool for inspecting backups
619
620needs:
621    * better name?
622    * list stuff
623    * show stuff
624    * dump stuff
625    * restore?
626
627* should lock/move/delete (single backup commands) from ctl_backups be moved here?
628
629usage:
630    cyr_backup [options] [mode] backup list [all | chunks | mailboxes | messages]...
631    cyr_backup [options] [mode] backup show chunks [id...]
632    cyr_backup [options] [mode] backup show messages [guid...]
633    cyr_backup [options] [mode] backup show mailboxes [mboxname | uniqueid]...
634    cyr_backup [options] [mode] backup dump [dump_opts] chunk id
635    cyr_backup [options] [mode] backup dump [dump_opts] message guid
636    cyr_backup [options] [mode] backup json [chunks | mailboxes | messages]...
637
638options:
639    -C alt_config       # alternate config file
640    -v                  # verbose
641
642mode:
643    -f                  # backup interpreted as filename
644    -m                  # backup interpreted as mboxname
645    -u                  # backup interpreted as userid (default)
646
647commands:
648    list: table of contents, one per line
649    show: indexed details of listed items, one per paragraph, detail per line
650    dump: relevant contents from backup stream
651    json: indexed details of listed items in json format
652
653dump options:
654    -o filename         # dump to named file instead of stdout
655
656
657partitions
658----------
659
660not enough information in sync protocol to handle partitions easily?
661
662we know what the partition is when we do an APPLY operation (mailbox, message,
663etc), but the initial GET operations don't include it.  so we need to already
664know where the appropriate backup is partitioned in order to find the backup
665file in order to look inside it to respond to the GET request
666
667if we have a mailboxes database (indexed by mboxname, uniqueid and userid) then
668maybe that would make it feasible?  if it's not in the mailboxes database then
669we don't have a backup for it yet, so we respond accordingly, and get sent
670enough information to create it.
671
672does that mean the backup api needs to take an mbname on open, and it handles
673the job of looking it up in the mailboxes database to find the appropriate
674thing to open?
675
676can we use sqlite for such a database, or is the load on it going to be too
677heavy?  locking?  we have lots of database formats up our sleeves here, so
678even though we use sqlite for the backup index there isn't any particular
679reason we're beholden to it for the mailboxes db too
680
681if we have a mailboxes db then we need a reconstruct tool for that, too
682
683what if we support multiple backup partitions, but don't expect these
684to necessarily correspond with mailbox partitions.  they're just for spreading
685disk usage around.
686
687* when creating a backup for a previously-unseen user we'd pick a random
688  partition to put them on
689* ctl_backups would need a command to move an existing backup to a
690  given partition
691* ctl_backups would need a command to pre-create a user backup on a
692  given partition for initial distribution
693* instead of "backup_data_path" setting, have one-or-more
694  "backuppartition-<name>" settings, ala partition- and friends
695
696see imap/partlist.[ch] for partition list management stuff.  it's complicated
697and doesn't have a test suite, so maybe save this implementation until needed.
698
699but... maybe rename backup_data_path to backuppartition-default in the meantime,
700so that when we do add this it's not a complicated reconfig to update?
701
702partlist_local_select (and lazy-loaded partlist_local_init) are where the
703mailbox partitions come from (see also mboxlist_create_partition), do something
704similar for backup partitions
705
706
707data corruption
708---------------
709
710backups.db:
711    * can be reconstructed from on disk files at any time
712    * how to detect corruption? does cyrus_db detect/repair on its own?
713
714backup indexes:
715    * can be reindexed at any time from backup data
716    * how to detect corruption? assume sqlite will notice, complain?
717
718backup data:
719    * what's zlib's failure mode? do we lose the entire chunk or just the corrupt bit?
720    * verify will notice sha1sum mismatches
721    * dlist format will reject some kinds of corruption (but not all)
722    * reindex: should skip unparseable dlist lines
723    * message data has its own checksums (guid)
724    * reindex: should skip messages that don't match their own checksums
725    * compact: "full" compact will only keep useful data according to index
726    * backupd: will sync anything that's in user mailbox but not in backup index
727
728i think this means that if a message or mailbox state becomes corrupted in
729the backup data file, and it still exists in the user's real mailbox, you
730recover from the corruption by reindexing and then letting the sync process
731copy the missing data back in again.  and you can tidy up the data file by
732running a compact over it.
733
734you detect data corruption in most recent chunk reactively as soon as the
735backup system needs to open it again (quick verify on open)
736
737you detect data corruption in older chunks reactively by trying to restore from
738it.  may be too late: if a message needs restoring it's because user mailbox no
739longer has it
740
741you detect data corruption preemptively by running the verify tool over it.
742recommend scheduling this in EVENTS/cron?
743
744if data corruption occurs in message that's no longer in user's mailbox, that
745message is lost.  it was going to be deleted from the backup after $retention
746period anyway (by compact), but if it needs restoring in the meantime, sorry
747
748
749installation instructions
750-------------------------
751
752(obviously, most of this won't work at this point, because the code doesn't
753exist.  but this is, approximately, where things are heading.)
754
755on your backup server:
756    * compile with --enable-backup configure option and install
757    * imapd.conf:
758        backuppartition-default: /var/spool/backup  # FIXME better example
759        backup_db: twoskip
760        backup_db_path: /var/imap/backups.db
761        backup_staging_path: /var/spool/backup
762        backup_retention_days: 7
763    * cyrus.conf SERVICES:
764        backupd cmd="backupd" listen="csync" prefork=0
765        (remove other services, most likely)
766        (should i create a master/conf/backup.conf example file?)
767    * cyrus.conf EVENTS:
768        compact cmd="ctl_backups compact -A" at=0400
769    * start server as usual
770    * do i want a special port for backupd?
771
772on your imap server:
773    * imapd.conf:
774        sync_log_channels: backup
775        sync_log: 1
776        backup_sync_host: backup-server.example.com
777        backup_sync_port: csync
778        backup_sync_authname: ...
779        backup_sync_password: ...
780        backup_sync_repeat_interval: ... # seconds, smaller value = livelier backups but more i/o
781        backup_sync_shutdown_file: ....
782    * cyrus.conf STARTUP:
783        backup_sync cmd="sync_client -r -n backup"
784    * cyrus.conf SERVICES:
785        restored cmd="restored" [...]
786    * start/restart master
787
788files and such:
789    {configdirectory}/backups.db                        - database mapping userids to backup locations
790    {backuppartition-name}/<hash>/<userid>_XXXXXX       - backup data stream for userid
791    {backuppartition-name}/<hash>/<userid>_XXXXXX.index - index into userid's backup data stream
792
793do i want rhost in the path?
794    * protects from issue if multiple servers are trying to back up their own version of same user
795      (though this is its own problem that the backup system shouldn't have to compensate for)
796    * but makes location of undifferentiated user unpredictable
797    * so probably not, actually
798
799
800chatting about implementation 20/10
801-----------------------------------
80209:54 elliefm_
803here's a fun sync question
804APPLY MESSAGE provides a list of messages
805can a single APPLY MESSAGE contain messages for multiple mailboxes and/or users?
806my first hunch is that it doesn't cross users, since the broadest granularity for a single sync run is USER
80710:06 kmurchison
808We'd have to check with Bron, but I *think* messages can cross mailboxes for a single user
80910:06 brong_
810yes
811APPLY MESSAGE just adds it to the reserve list
81210:07 elliefm_
813nah apply message uploads the message, APPLY RESERVE adds it to the reserve list :P
81410:07 brong_
815same same
816APPLY RESERVE copies it from a local mailbox
817APPLY MESSAGE uploads it
81810:07 elliefm_
819yep
82010:07 brong_
821they both wind up in the reserve list
82210:07 elliefm_
823ahh i see what you mean, gotcha
82410:07 brong_
825until you send a RESTART
826ideally you want it reserve in the same partition, but it will copy the message over if it's not on the same partition
827there's no restriction on which mailbox it came from/went to
828good for user renames, and good for an append to a bunch of mailboxes in different users / shared space all at once
829(which LMTP can do)
83010:10 elliefm_
831i can handle the case where a single APPLY MESSAGE contains messages for multiple mailboxes belonging to the same user
832but i'm in trouble if a single APPLY MESSAGE can contain messages belonging to different users
83310:14 brong_
834elliefm_: why?
83510:14 brong_
836you don't have to keep them if they aren't used
83710:15 elliefm_
838for backups - when i see the apply, i need to know which user's backup to add it to.  that's easy enough if it doesn't cross users but gets mega fiddly if it does
839i'm poking around in sync client to see if it's likely to be an issue or not
84011:00 brong__
841elliefm_: I would stage it, and add it to users as it gets refcounted in by an index file
84211:07 elliefm_
843that's pretty much what we do for ordinary sync and delivery stuff yeah?
84411:08 brong__
845yep
846and it's what the backup thing does
84711:09 elliefm_
848i'm pretty sure that APPLY RESERVE and APPLY MESSAGE don't give a damn about users, they're just "here's every message you might not have already had since last time we spoke" and it lets the APPLY MAILBOX work out where to attach them later
84911:09 brong__
850yep
85111:09 elliefm_
852so yeah, i'll need to do something here
853i've been working so far on the idea that a single user's backup consists of 1) an append-only gzip stream of the sync protocol chat that built it, and 2) an index that tracks current state of mailboxes, and offsets within (1) of message data
854that gets us good compression (file per user, not file per message), and if the index gets corrupted or lost, it's rebuildable purely from (1), it doesn't need a live copy of the original mailbox
85511:12 brong_
856yep, that all works
85711:12 elliefm_
858(so if you lose your imap server, you're not unable to rebuild a broken index on the backup)
85911:13 brong_
860it's easy enough to require the sync protocol stream to only contain messages per user
861though "apply reserve" is messy
862because you need to return "yes, I have that message"
86311:13 elliefm_
864with that implementation i can't (easily) keep user.a's messages from not existing in user.b's data stream (though they won't be indexed)
86511:14 brong_
866I'm not too adverse to the idea of just unpacking each message as it comes off the wire into a temporary directory
86711:14 elliefm_
868(because at the time i'm receiving the sync data i don't know which it needs to go in, so if they come in in the same reserve i'd need to append them to both data streams)
869which isn't a huge problem, just… irks me a bit
87011:14 brong_
871and then reading the indexes as they come in, checking against the state DB to see if we already have them, and streaming them into the gzip if they aren't there yet
872what we can do is something like the current format, where files go into a tar
87311:16 elliefm_
874i guess the fiddly bit there is that there's one more moving part to keep synchronised across failure states
875a backup for a single user becomes 1) data stream + 2) any messages that were uploaded but not yet added to a mailbox + 3) index (which doesn't know what to do with (2))
876which in the general case is fine, the next sync will update the mailboxes, which will push (2) into (1) and index it nicely, and on we go
877but it's just a little bit more mess if there's a failure that you need to recover from between those states — it's no longer a simple case of "it's in the backup and we know everything about it" or "it doesn't exist", there's a third case of "well we might have the data but don't really know what to do with it"
878the other fiddly bit is that the process of appending to the data stream is suddenly in the business of crafting output rather than simply dumping what it gets, which isn't really burdensome, but it is one more little crack for bugs to crawl into
879i guess in terms of sync protocol, one thing i could do on my end is identify apply operations that seem to contain multiple users' data, and just return an error on those.  the sync client on the other end will promote them until they're eventually user syncs, which i think are always user granularity
88011:50 elliefm_
881i think for now, first stage implementation will be to stream the reserve/message commands in full to every user backup they might apply to.  and optimising that down so that each stream only contains messages belonging to that user can be a future optimisation
882
883
884todo list
885---------
886
887* clean up error handling
888* perl tool to anonymise sync proto talk
889* verification step to check entire data stream for errors (even chunks that aren't indexed)
890* prot_fill_cb: extra argument to pass back an error string to prot_fill
891* ctl_backups verify: set level
892* backupd: don't block on locked backups, return mailbox locked -- but sync_client doesn't handle this
893* test multiple backup partitions
894* configure: error if backups requested and we don't have zlib
895* valgrind
896* finish reconstruct
897* compact: split before append?
898
899compact implementation steps:
900    1 remove unused chunks, keep everything else as is
901    2 join adjacent chunks if small enough, split large chunks
902    3 parse/rebuild message lines
903    4 discard unused mailbox lines
904