1<title>Fossil Delta Format</title> 2<h1>1.0 Overview</h1> 3 4<p>Fossil achieves efficient storage and low-bandwidth synchronization 5through the use of delta-compression. Instead of storing 6or transmitting the complete content of an artifact, Fossil stores or 7transmits only the changes relative to a related artifact. 8</p> 9 10<p>This document describes the delta-encoding format used by Fossil. 11The intended audience is developers working on either 12<a href="index.wiki">Fossil</a> itself, or on tools compatible with 13Fossil. Understanding of this document is <em>not</em> required for 14ordinary users of Fossil. This document is an implementation detail.</p> 15 16<p>This document only describes the delta file format. A 17[./delta_encoder_algorithm.wiki|separate document] describes one possible 18algorithm for generating deltas in this format.</p> 19 20<h2>1.1 Sample Software And Analysis Tools</h2> 21 22<p>The core routines used to generate and apply deltas in this format 23are contained in the [../src/delta.c|delta.c] source file. Interface 24logic, including "test-*" commands to directly manipulate deltas are 25contained in the [../src/deltacmd.c|deltacmd.c] source file. SQL functions 26to create, apply, and analyze deltas are implemented by code in the 27[../src/deltafunc.c|deltafunc.c] source file. 28 29<p>The following command-line tools are available to create and apply 30deltas and to test the delta logic: 31 32 * [/help?cmd=test-delta|fossil test-delta] → Run self-tests of 33 the delta logic 34 35 * [/help?cmd=test-delta-create|fossil test-delta-create X Y] → compute 36 a delta that converts file X into file Y. Output that delta. 37 38 * [/help?cmd=test-delta-apply|fossil test-delta-apply X D] → apply 39 delta D to input file X and output the result. 40 41 * [/help?cmd=test-delta-analyze|fossil test-delta-analyze X Y] → compute 42 and delta that converts file X into file Y but instead of writing the 43 delta to output, write performance information about the delta. 44 45<p>When running the [/help?cmd=sqlite3|fossil sql] command to get an 46interactive SQL session connected to the repository, the following 47additional SQL functions are provided: 48 49 * <b>delta_create(</b><i>X</i><b>,</b><i>Y</i><b>)</b> → 50 Compute a data that carries blob X into blob Y and return that delta 51 as a blob. 52 53 * <b>delta_apply(</b><i>X</i><b>,</b><i>D</i><b>)</b> → 54 Apply delta D to input blob X return a new blob which is the result. 55 56 57 * <b>delta_output_size(</b><i>D</i>)</b> → 58 Return the size of the output that would result from applying delta D. 59 60 * <b>delta_parse(</b><i>D</i>)</b> → This is a table-valued function 61 that returns one row for the header, for the trailer, and for each segment 62 in delta D. 63 64 65<h1 id="structure">2.0 Structure</h1> 66<verbatim type="pikchr"> 67 leftmargin = 0.1 68 box height 50% "Header" 69 box same "Segments" 70 box same "Trailer" 71</verbatim> 72 73<p>A delta consists of three parts, a "header", a "trailer", and a 74"segment-list" between them.</p> 75 76<p>Both header and trailer provide information about the target 77helping the decoder, and the segment-list describes how the target can 78be constructed from the original.</p> 79 80<h2 id="header">2.1 Header</h2> 81<verbatim type="pikchr"> 82 leftmargin = 0.1 83 box height 50% "Size" 84 box same "\"\\n\"" 85</verbatim> 86 87<p>The header consists of a single number followed by a newline 88character (ASCII 0x0a). The number is the length of the target in 89bytes.</p> 90 91<p>This means that, given a delta, the decoder can compute the size of 92the target (and allocate any necessary memory based on that) by simply 93reading the first line of the delta and decoding the number found 94there. In other words, before it has to decode everything else.</p> 95 96<h2 id="trailer">2.2 Trailer</h2> 97<verbatim type="pikchr"> 98 leftmargin = 0.1 99 box height 50% "Checksum" 100 box same "\";\"" 101</verbatim> 102 103<p>The trailer consists of a single number followed by a semicolon (ASCII 1040x3b). This number is a checksum of the target and can be used by a 105decoder to verify that the delta applied correctly, reconstructing the 106target from the original.</p> 107 108<p>The checksum is computed by treating the target as a series of 10932-bit integer numbers (MSB first), and summing these up, modulo 1102^32-1. A target whose length is not a multiple of 4 is padded with 1110-bytes (ASCII 0x00) at the end.</p> 112 113<p>By putting this information at the end of the delta a decoder has 114it available immediately after the target has been reconstructed 115fully.</p> 116 117<h2 id="slist">2.3 Segment-List</h2> 118<verbatim type="pikchr"> 119 leftmargin = 0.1 120 PART1: [ 121 B1: box height 50% width 15% "" 122 B2: box same "" 123 B3: box same "" 124 "***" 125 box height 50% width 15% "" 126 I1: line down 50% from B2 .s 127 arrow right until even with B3.e 128 box "Insert Literal" height 50% 129 line down 75% from I1 .s 130 arrow right until even with B3.e 131 box "Copy Range" height 50% 132 ] 133 down 134 PART2: [ 135 "" 136 box "Length" height 50% 137 right 138 box "\":\"" same 139 box "Bytes" same 140 ] with .nw at previous.sw 141</verbatim> 142 143<p>The segment-list of a delta describes how to create the target from 144the original by a combination of inserting literal byte-sequences and 145copying ranges of bytes from the original. This is where the 146compression takes place, by encoding the large common parts of 147original and target in small copy instructions.</p> 148 149<p>The target is constructed from beginning to end, with the data 150generated by each instruction appended after the data of all previous 151instructions, with no gaps.</p> 152 153<h3 id="insertlit">2.3.1 Insert Literal</h3> 154 155<p>A literal is specified by two elements, the size of the literal in 156bytes, and the bytes of the literal itself.</p> 157 158<verbatim type="pikchr"> 159 leftmargin = 0.1 160 box "Length" height 50% 161 box "\":\"" same 162 box "Bytes" same 163</verbatim> 164 165<p>The length is written first, followed by a colon character (ASCII 1660x3a), followed by the bytes of the literal.</p> 167 168<h3 id="copyrange">2.3.2 Copy Range</h3> 169 170<p>A range to copy is specified by two numbers, the offset of the 171first byte in the original to copy, and the size of the range, in 172bytes. The size zero is special, its usage indicates that the range 173extends to the end of the original.</p> 174 175<verbatim type="pikchr"> 176 leftmargin = 0.1 177 box "Length" height 50% 178 box "\"@\"" same 179 box "Offset" same 180 box "\",\"" same 181</verbatim> 182 183 184<p>The length is written first, followed by an "at" character (ASCII 1850x40), then the offset, followed by a comma (ASCII 0x2c).</p> 186 187<h1 id="intcoding">3.0 Encoding of integers</h1> 188 189<p> 190The format currently handles only 32 bit integer numbers. They are 191written base-64 encoded, MSB first, and without leading 192"0"-characters, except if they are significant (i.e. 0 => "0"). 193</p> 194 195<p> 196The base-64 encoding uses one character for each 6 bits of 197the integer to be encoded. The encoding characters are: 198</p> 199 200<blockquote><pre> 2010123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz~ 202</pre></blockquote> 203 204<p>The least significant 6 bits of the integer are encoded by 205the first character, followed by 206the next 6 bits, and so on until all non-zero bits of the integer 207are encoded. The minimum number of encoding characters is used. 208Note that for integers less than 10, the base-64 coding is a 209ASCII decimal rendering of the number itself. 210</p> 211 212<h1 id="examples">4.0 Examples</h1> 213 214<h2 id="examplesint">4.1 Integer encoding</h2> 215 216<table border=1> 217<tr> 218<th>Value</th> 219<th>Encoding</th> 220</tr> 221<tr> 222<td>0</td> 223<td>0</td> 224</tr> 225<tr> 226<td>6246</td> 227<td>1Xb</td> 228</tr> 229<tr> 230<td>-1101438770</td> 231<td>2zMM3E</td> 232</tr> 233</table> 234 235<h2 id="examplesdelta">4.2 Delta encoding</h2> 236 237<p>An example of a delta using the specified encoding is:</p> 238 239<table border=1><tr><td><pre> 2401Xb 2414E@0,2:thFN@4C,6:scenda1B@Jd,6:scenda5x@Kt,6:pieces79@Qt,F: Example: eskil~E@Y0,2zMM3E;</pre> 242</td></tr></table> 243 244<p>This can be taken apart into the following parts:</p> 245 246<table border=1> 247<tr><th>What </th> <th>Encoding </th><th>Meaning </th><th>Details</th></tr> 248<tr><td>Header</td> <td>1Xb </td><td>Size </td><td> 6246 </td></tr> 249<tr><td>S-List</td> <td>4E@0, </td><td>Copy </td><td> 270 @ 0 </td></tr> 250<tr><td> </td> <td>2:th </td><td>Literal </td><td> 2 'th' </td></tr> 251<tr><td> </td> <td>FN@4C, </td><td>Copy </td><td> 983 @ 268 </td></tr> 252<tr><td> </td> <td>6:scenda </td><td>Literal </td><td> 6 'scenda' </td></tr> 253<tr><td> </td> <td>1B@Jd, </td><td>Copy </td><td> 75 @ 1256 </td></tr> 254<tr><td> </td> <td>6:scenda </td><td>Literal </td><td> 6 'scenda' </td></tr> 255<tr><td> </td> <td>5x@Kt, </td><td>Copy </td><td> 380 @ 1336 </td></tr> 256<tr><td> </td> <td>6:pieces </td><td>Literal </td><td> 6 'pieces' </td></tr> 257<tr><td> </td> <td>79@Qt, </td><td>Copy </td><td> 457 @ 1720 </td></tr> 258<tr><td> </td> <td>F: Example: eskil</td><td>Literal </td><td> 15 ' Example: eskil'</td></tr> 259<tr><td> </td> <td>~E@Y0, </td><td>Copy </td><td> 4046 @ 2176 </td></tr> 260<tr><td>Trailer</td><td>2zMM3E </td><td>Checksum</td><td> -1101438770 </td></tr> 261</table> 262 263<p>The unified diff behind the above delta is</p> 264 265<table border=1><tr><td><pre> 266bluepeak:(761) ~/Projects/Tcl/Fossil/Devel/devel > diff -u ../DELTA/old ../DELTA/new 267--- ../DELTA/old 2007-08-23 21:14:40.000000000 -0700 268+++ ../DELTA/new 2007-08-23 21:14:33.000000000 -0700 269@@ -5,7 +5,7 @@ 270 271 * If the server does not have write permission on the database 272 file, or on the directory containing the database file (and 273- it is thus unable to update database because it cannot create 274+ it is thus unable to update the database because it cannot create 275 a rollback journal) then it currently fails silently on a push. 276 It needs to return a helpful error. 277 278@@ -27,8 +27,8 @@ 279 * Additional information displayed for the "vinfo" page: 280 281 + All leaves of this version that are not included in the 282- descendant list. With date, user, comment, and hyperlink. 283- Leaves in the descendant table should be marked as such. 284+ descendant list. With date, user, comment, and hyperlink. 285+ Leaves in the descendant table should be marked as such. 286 See the compute_leaves() function to see how to find all 287 leaves. 288 + Add file diff links to the file change list. 289@@ -37,7 +37,7 @@ 290 291 * The /xfer handler (for push, pull, and clone) does not do 292 delta compression. This results in excess bandwidth usage. 293- There are some code in xfer.c that are sketches of ideas on 294+ There are some pieces in xfer.c that are sketches of ideas on 295 how to do delta compression, but nothing has been implemented. 296 297 * Enhancements to the diff and tkdiff commands in the cli. 298@@ -45,7 +45,7 @@ 299 single file. Allow diffs against any two arbitrary versions, 300 not just diffs against the current check-out. Allow 301 configuration options to replace tkdiff with some other 302- visual differ of the users choice. 303+ visual differ of the users choice. Example: eskil. 304 305 * Ticketing interface (expand this bullet) 306 307</pre></td></tr></table> 308 309 310 311<h1 id="notes">Notes</h1> 312 313<ul> 314<li>Pure text files generate a pure text delta. 315</li> 316<li>Binary files generate a delta that may contain some binary data. 317</li> 318<li>The delta encoding does not attempt to compress the content. 319It was considered to be much 320more sensible to do compression using a separate general-purpose 321compression library, like <a href="http://www.zlib.net">zlib</a>. 322</li> 323</ul> 324