1<title>Fossil Delta Format</title>
2<h1>1.0 Overview</h1>
3
4<p>Fossil achieves efficient storage and low-bandwidth synchronization
5through the use of delta-compression.  Instead of storing
6or transmitting the complete content of an artifact, Fossil stores or
7transmits only the changes relative to a related artifact.
8</p>
9
10<p>This document describes the delta-encoding format used by Fossil.
11The intended audience is developers working on either
12<a href="index.wiki">Fossil</a> itself, or on tools compatible with
13Fossil. Understanding of this document is <em>not</em> required for
14ordinary users of Fossil.  This document is an implementation detail.</p>
15
16<p>This document only describes the delta file format.  A
17[./delta_encoder_algorithm.wiki|separate document] describes one possible
18algorithm for generating deltas in this format.</p>
19
20<h2>1.1 Sample Software And Analysis Tools</h2>
21
22<p>The core routines used to generate and apply deltas in this format
23are contained in the [../src/delta.c|delta.c] source file.  Interface
24logic, including "test-*" commands to directly manipulate deltas are
25contained in the [../src/deltacmd.c|deltacmd.c] source file.  SQL functions
26to create, apply, and analyze deltas are implemented by code in the
27[../src/deltafunc.c|deltafunc.c] source file.
28
29<p>The following command-line tools are available to create and apply
30deltas and to test the delta logic:
31
32   *   [/help?cmd=test-delta|fossil test-delta] &rarr; Run self-tests of
33       the delta logic
34
35   *   [/help?cmd=test-delta-create|fossil test-delta-create X Y] &rarr; compute
36       a delta that converts file X into file Y.  Output that delta.
37
38   *   [/help?cmd=test-delta-apply|fossil test-delta-apply X D] &rarr; apply
39       delta D to input file X and output the result.
40
41   *   [/help?cmd=test-delta-analyze|fossil test-delta-analyze X Y] &rarr; compute
42       and delta that converts file X into file Y but instead of writing the
43       delta to output, write performance information about the delta.
44
45<p>When running the [/help?cmd=sqlite3|fossil sql] command to get an
46interactive SQL session connected to the repository, the following
47additional SQL functions are provided:
48
49   *   <b>delta_create(</b><i>X</i><b>,</b><i>Y</i><b>)</b> &rarr;
50       Compute a data that carries blob X into blob Y and return that delta
51       as a blob.
52
53   *   <b>delta_apply(</b><i>X</i><b>,</b><i>D</i><b>)</b> &rarr;
54       Apply delta D to input blob X return a new blob which is the result.
55
56
57   *   <b>delta_output_size(</b><i>D</i>)</b> &rarr;
58       Return the size of the output that would result from applying delta D.
59
60   *   <b>delta_parse(</b><i>D</i>)</b> &rarr; This is a table-valued function
61       that returns one row for the header, for the trailer, and for each segment
62       in delta D.
63
64
65<h1 id="structure">2.0 Structure</h1>
66<verbatim type="pikchr">
67    leftmargin = 0.1
68    box height 50% "Header"
69    box same "Segments"
70    box same "Trailer"
71</verbatim>
72
73<p>A delta consists of three parts, a "header", a "trailer", and a
74"segment-list" between them.</p>
75
76<p>Both header and trailer provide information about the target
77helping the decoder, and the segment-list describes how the target can
78be constructed from the original.</p>
79
80<h2 id="header">2.1 Header</h2>
81<verbatim type="pikchr">
82    leftmargin = 0.1
83    box height 50% "Size"
84    box same "\"\\n\""
85</verbatim>
86
87<p>The header consists of a single number followed by a newline
88character (ASCII 0x0a). The number is the length of the target in
89bytes.</p>
90
91<p>This means that, given a delta, the decoder can compute the size of
92the target (and allocate any necessary memory based on that) by simply
93reading the first line of the delta and decoding the number found
94there. In other words, before it has to decode everything else.</p>
95
96<h2 id="trailer">2.2 Trailer</h2>
97<verbatim type="pikchr">
98    leftmargin = 0.1
99    box height 50% "Checksum"
100    box same "\";\""
101</verbatim>
102
103<p>The trailer consists of a single number followed by a semicolon (ASCII
1040x3b). This number is a checksum of the target and can be used by a
105decoder to verify that the delta applied correctly, reconstructing the
106target from the original.</p>
107
108<p>The checksum is computed by treating the target as a series of
10932-bit integer numbers (MSB first), and summing these up, modulo
1102^32-1. A target whose length is not a multiple of 4 is padded with
1110-bytes (ASCII 0x00) at the end.</p>
112
113<p>By putting this information at the end of the delta a decoder has
114it available immediately after the target has been reconstructed
115fully.</p>
116
117<h2 id="slist">2.3 Segment-List</h2>
118<verbatim type="pikchr">
119    leftmargin = 0.1
120    PART1: [
121        B1: box height 50% width 15% ""
122        B2: box same ""
123        B3: box same ""
124            "***"
125            box height 50% width 15% ""
126        I1: line down 50% from B2 .s
127            arrow right until even with B3.e
128            box "Insert Literal" height 50%
129            line down 75% from I1 .s
130            arrow right until even with B3.e
131            box "Copy Range" height 50%
132    ]
133    down
134    PART2: [
135        ""
136        box "Length" height 50%
137        right
138        box "\":\"" same
139        box "Bytes" same
140    ] with .nw at previous.sw
141</verbatim>
142
143<p>The segment-list of a delta describes how to create the target from
144the original by a combination of inserting literal byte-sequences and
145copying ranges of bytes from the original. This is where the
146compression takes place, by encoding the large common parts of
147original and target in small copy instructions.</p>
148
149<p>The target is constructed from beginning to end, with the data
150generated by each instruction appended after the data of all previous
151instructions, with no gaps.</p>
152
153<h3 id="insertlit">2.3.1 Insert Literal</h3>
154
155<p>A literal is specified by two elements, the size of the literal in
156bytes, and the bytes of the literal itself.</p>
157
158<verbatim type="pikchr">
159    leftmargin = 0.1
160    box "Length" height 50%
161    box "\":\"" same
162    box "Bytes" same
163</verbatim>
164
165<p>The length is written first, followed by a colon character (ASCII
1660x3a), followed by the bytes of the literal.</p>
167
168<h3 id="copyrange">2.3.2 Copy Range</h3>
169
170<p>A range to copy is specified by two numbers, the offset of the
171first byte in the original to copy, and the size of the range, in
172bytes. The size zero is special, its usage indicates that the range
173extends to the end of the original.</p>
174
175<verbatim type="pikchr">
176    leftmargin = 0.1
177    box "Length" height 50%
178    box "\"@\"" same
179    box "Offset" same
180    box "\",\"" same
181</verbatim>
182
183
184<p>The length is written first, followed by an "at" character (ASCII
1850x40), then the offset, followed by a comma (ASCII 0x2c).</p>
186
187<h1 id="intcoding">3.0 Encoding of integers</h1>
188
189<p>
190The format currently handles only 32 bit integer numbers. They are
191written base-64 encoded, MSB first, and without leading
192"0"-characters, except if they are significant (i.e. 0 => "0").
193</p>
194
195<p>
196The base-64 encoding uses one character for each 6 bits of
197the integer to be encoded.  The encoding characters are:
198</p>
199
200<blockquote><pre>
2010123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz~
202</pre></blockquote>
203
204<p>The least significant 6 bits of the integer are encoded by
205the first character, followed by
206the next 6 bits, and so on until all non-zero bits of the integer
207are encoded.  The minimum number of encoding characters is used.
208Note that for integers less than 10, the base-64 coding is a
209ASCII decimal rendering of the number itself.
210</p>
211
212<h1 id="examples">4.0 Examples</h1>
213
214<h2 id="examplesint">4.1 Integer encoding</h2>
215
216<table border=1>
217<tr>
218<th>Value</th>
219<th>Encoding</th>
220</tr>
221<tr>
222<td>0</td>
223<td>0</td>
224</tr>
225<tr>
226<td>6246</td>
227<td>1Xb</td>
228</tr>
229<tr>
230<td>-1101438770</td>
231<td>2zMM3E</td>
232</tr>
233</table>
234
235<h2 id="examplesdelta">4.2 Delta encoding</h2>
236
237<p>An example of a delta using the specified encoding is:</p>
238
239<table border=1><tr><td><pre>
2401Xb
2414E@0,2:thFN@4C,6:scenda1B@Jd,6:scenda5x@Kt,6:pieces79@Qt,F: Example: eskil~E@Y0,2zMM3E;</pre>
242</td></tr></table>
243
244<p>This can be taken apart into the following parts:</p>
245
246<table border=1>
247<tr><th>What  </th> <th>Encoding         </th><th>Meaning </th><th>Details</th></tr>
248<tr><td>Header</td> <td>1Xb              </td><td>Size    </td><td> 6246         </td></tr>
249<tr><td>S-List</td> <td>4E@0,            </td><td>Copy    </td><td> 270 @ 0      </td></tr>
250<tr><td>&nbsp;</td> <td>2:th             </td><td>Literal </td><td> 2 'th'       </td></tr>
251<tr><td>&nbsp;</td> <td>FN@4C,           </td><td>Copy    </td><td> 983 @ 268        </td></tr>
252<tr><td>&nbsp;</td> <td>6:scenda         </td><td>Literal </td><td> 6 'scenda'       </td></tr>
253<tr><td>&nbsp;</td> <td>1B@Jd,           </td><td>Copy    </td><td> 75 @ 1256        </td></tr>
254<tr><td>&nbsp;</td> <td>6:scenda         </td><td>Literal </td><td> 6 'scenda'       </td></tr>
255<tr><td>&nbsp;</td> <td>5x@Kt,           </td><td>Copy    </td><td> 380 @ 1336       </td></tr>
256<tr><td>&nbsp;</td> <td>6:pieces     </td><td>Literal </td><td> 6 'pieces'       </td></tr>
257<tr><td>&nbsp;</td> <td>79@Qt,           </td><td>Copy    </td><td> 457 @ 1720     </td></tr>
258<tr><td>&nbsp;</td> <td>F: Example: eskil</td><td>Literal </td><td> 15 ' Example: eskil'</td></tr>
259<tr><td>&nbsp;</td> <td>~E@Y0,           </td><td>Copy    </td><td>  4046 @ 2176        </td></tr>
260<tr><td>Trailer</td><td>2zMM3E           </td><td>Checksum</td><td> -1101438770         </td></tr>
261</table>
262
263<p>The unified diff behind the above delta is</p>
264
265<table border=1><tr><td><pre>
266bluepeak:(761) ~/Projects/Tcl/Fossil/Devel/devel > diff -u ../DELTA/old ../DELTA/new
267--- ../DELTA/old        2007-08-23 21:14:40.000000000 -0700
268+++ ../DELTA/new        2007-08-23 21:14:33.000000000 -0700
269@@ -5,7 +5,7 @@
270
271  *  If the server does not have write permission on the database
272     file, or on the directory containing the database file (and
273-    it is thus unable to update database because it cannot create
274+    it is thus unable to update the database because it cannot create
275     a rollback journal) then it currently fails silently on a push.
276     It needs to return a helpful error.
277
278@@ -27,8 +27,8 @@
279  *  Additional information displayed for the "vinfo" page:
280
281      +  All leaves of this version that are not included in the
282-        descendant list.  With date, user, comment, and hyperlink.
283-        Leaves in the descendant table should be marked as such.
284+        descendant list.  With date, user, comment, and hyperlink.
285+        Leaves in the descendant table should be marked as such.
286         See the compute_leaves() function to see how to find all
287         leaves.
288      +  Add file diff links to the file change list.
289@@ -37,7 +37,7 @@
290
291  *  The /xfer handler (for push, pull, and clone) does not do
292     delta compression.  This results in excess bandwidth usage.
293-    There are some code in xfer.c that are sketches of ideas on
294+    There are some pieces in xfer.c that are sketches of ideas on
295     how to do delta compression, but nothing has been implemented.
296
297  *  Enhancements to the diff and tkdiff commands in the cli.
298@@ -45,7 +45,7 @@
299     single file.  Allow diffs against any two arbitrary versions,
300     not just diffs against the current check-out.  Allow
301     configuration options to replace tkdiff with some other
302-    visual differ of the users choice.
303+    visual differ of the users choice. Example: eskil.
304
305  *  Ticketing interface (expand this bullet)
306
307</pre></td></tr></table>
308
309
310
311<h1 id="notes">Notes</h1>
312
313<ul>
314<li>Pure text files generate a pure text delta.
315</li>
316<li>Binary files generate a delta that may contain some binary data.
317</li>
318<li>The delta encoding does not attempt to compress the content.
319It was considered to be much
320more sensible to do compression using a separate general-purpose
321compression library, like <a href="http://www.zlib.net">zlib</a>.
322</li>
323</ul>
324