1<title>Technical Overview</title> 2<h2 align="center"> 3A Technical Overview<br>Of The Design And Implementation<br>Of Fossil 4</h2> 5 6<h2>1.0 Introduction</h2> 7 8At its lowest level, a Fossil repository consists of an unordered set 9of immutable "artifacts". You might think of these artifacts as "files", 10since in many cases the artifacts are exactly that. 11But other "structural artifacts" are also included in the mix. 12These structural artifacts define the relationships 13between artifacts - which files go together to form a particular 14version of the project, who checked in that version and when, what was 15the check-in comment, what wiki pages are included with the project, what 16are the edit histories of each wiki page, what bug reports or tickets are 17included, who contributed to the evolution of each ticket, and so forth. 18This low-level file format is called the "global state" of 19the repository, since this is the information that is synced to peer 20repositories using push and pull operations. The low-level file format 21is also called "enduring" since it is intended to last for many years. 22The details of the low-level, enduring, global file format 23are [./fileformat.wiki | described separately]. 24 25This article is about how Fossil is currently implemented. Instead of 26dealing with vague abstractions of "enduring file formats" as the 27[./fileformat.wiki | other document] does, this article provides 28some detail on how Fossil actually stores information on disk. 29 30<h2>2.0 Three Databases</h2> 31 32Fossil stores state information in 33[http://www.sqlite.org/ | SQLite] database files. 34SQLite keeps an entire relational database, including multiple tables and 35indices, in a single disk file. The SQLite library allows the database 36files to be efficiently queried and updated using the industry-standard 37SQL language. SQLite updates are atomic, so even in the event of 38a system crashes or power failure the repository content is protected. 39 40Fossil uses three separate classes of SQLite databases: 41 42<ol> 43<li>The configuration database 44<li>Repository databases 45<li>Checkout databases 46</ol> 47 48The configuration database is a one-per-user database that holds 49global configuration information used by Fossil. There is one 50repository database per project. The repository database is the 51file that people are normally referring to when they say 52"a Fossil repository". The checkout database is found in the working 53checkout for a project and contains state information that is unique 54to that working checkout. 55 56Fossil does not always use all three database files. The web interface, 57for example, typically only uses the repository database. And the 58[/help/all | fossil settings] command only opens the configuration database 59when the --global option is used. But other commands use all three 60databases at once. For example, the [/help/status | fossil status] 61command will first locate the checkout database, then use the checkout 62database to find the repository database, then open the configuration 63database. Whenever multiple databases are used at the same time, 64they are all opened on the same SQLite database connection using 65SQLite's [http://www.sqlite.org/lang_attach.html | ATTACH] command. 66 67The chart below provides a quick summary of how each of these 68database files are used by Fossil, with detailed discussion following. 69 70<table border="1" width="80%" cellpadding="0" align="center"> 71<tr> 72<td width="33%" valign="top"> 73<h3 align="center">Configuration Database<br>"~/.fossil" or<br> 74"~/.config/fossil.db"</h3> 75<ul> 76<li>Global [/help/settings |settings] 77<li>List of active repositories used by the [/help/all | all] command 78</ul> 79</td> 80<td width="34%" valign="top"> 81<h3 align="center">Repository Database<br>"<i>project</i>.fossil"</h3> 82<ul> 83<li>[./fileformat.wiki | Global state of the project] 84 encoded using delta-compression 85<li>Local [/help/settings|settings] 86<li>Web interface display preferences 87<li>User credentials and permissions 88<li>Metadata about the global state to facilitate rapid 89 queries 90</ul> 91</td> 92<td width="33%" valign="top"> 93<h3 align="center">Checkout Database<br>"_FOSSIL_" or ".fslckout"</h3> 94<ul> 95<li>The repository database used by this checkout 96<li>The version currently checked out 97<li>Other versions [/help/merge | merged] in but not 98 yet [/help/commit | committed] 99<li>Changes from the [/help/add | add], [/help/delete | delete], 100 and [/help/rename | rename] commands that have not yet been committed 101<li>"mtime" values and other information used to efficiently detect 102 local edits 103<li>The "[/help/stash | stash]" 104<li>Information needed to "[/help/undo|undo]" or "[/help/redo|redo]" 105</ul> 106</td> 107</tr> 108</table> 109 110<h3 id="configdb">2.1 The Configuration Database</h3> 111 112The configuration database holds cross-repository preferences and a list of all 113repositories for a single user. 114 115The [/help/settings | fossil settings] command can be used to specify various 116operating parameters and preferences for Fossil repositories. Settings can 117apply to a single repository, or they can apply globally to all repositories 118for a user. If both a global and a repository value exists for a setting, 119then the repository-specific value takes precedence. All of the settings 120have reasonable defaults, and so many users will never need to change them. 121But if changes to settings are desired, the configuration database provides 122a way to change settings for all repositories with a single command, rather 123than having to change the setting individually on each repository. 124 125The configuration database also maintains a list of repositories. This 126list is used by the [/help/all | fossil all] command in order to run various 127operations such as "sync" or "rebuild" on all repositories managed by a user. 128 129<h4 id="configloc">2.1.1 Location Of The Configuration Database</h4> 130 131On Unix systems, the configuration database is named by the following 132algorithm: 133 134<blockquote><table border="0"> 135<tr><td>1. if environment variable FOSSIL_HOME exists 136<td> → <td>$FOSSIL_HOME/.fossil 137<tr><td>2. if file ~/.fossil exists<td> →<td>~/.fossil 138<tr><td>3. if environment variable XDG_CONFIG_HOME exists 139 <td> →<td>$XDG_CONFIG_HOME/fossil.db 140<tr><td>4. if the directory ~/.config exists 141 <td> →<td>~/.config/fossil.db 142<tr><td>5. Otherwise<td> →<td>~/.fossil 143</table></blockquote> 144 145Another way of thinking of this algorithm is the following: 146 147 * Use "$FOSSIL_HOME/.fossil" if the FOSSIL_HOME variable is defined 148 * Use the XDG-compatible name (usually ~/.config/fossil.db) on XDG systems 149 if the ~/.fossil file does not already exist 150 * Otherwise, use the traditional unix name of "~/.fossil" 151 152This algorithm is complex due to the need for historical compatibility. 153Originally, the database was always just "~/.fossil". Then support 154for the FOSSIL_HOME environment variable as added. Later, support for the 155[https://specifications.freedesktop.org/basedir-spec/basedir-spec-latest.html|XDG-compatible configation filenames] 156was added. Each of these changes needed to continue to support legacy 157installations. 158 159On Windows, the configuration database is the first of the following 160for which the corresponding environment variables exist: 161 162 * %FOSSIL_HOME%/_fossil 163 * %LOCALAPPDATA%/_fossil 164 * %APPDATA%/_fossil 165 * %USERPROFILES%/_fossil 166 * %HOMEDRIVE%%HOMEPATH%/_fossil 167 168The second case is the one that usually determines the name Note that the 169FOSSIL_HOME environment variable can always be set to determine the 170location of the configuration database. Note also that the configuration 171database file itself is called ".fossil" or "fossil.db" on unix but 172"_fossil" on windows. 173 174The [/help?cmd=info|fossil info] command will show the location of 175the configuration database on a line that starts with "config-db:". 176 177<h3>2.2 Repository Databases</h3> 178 179The repository database is the file that is commonly referred to as 180"the repository". This is because the repository database contains, 181among other things, the complete revision, ticket, and wiki history for 182a project. It is customary to name the repository database after then 183name of the project, with a ".fossil" suffix. For example, the repository 184database for the self-hosting Fossil repository is called "fossil.fossil" 185and the repository database for SQLite is called "sqlite.fossil". 186 187<h4>2.2.1 Global Project State</h4> 188 189The bulk of the repository database (typically 75 to 85%) consists 190of the artifacts that comprise the 191[./fileformat.wiki | enduring, global, shared state] of the project. 192The artifacts are stored as BLOBs, compressed using 193[http://www.zlib.net/ | zlib compression] and, where applicable, 194using [./delta_encoder_algorithm.wiki | delta compression]. 195The combination of zlib and delta compression results in a considerable 196space savings. For the SQLite project (when this paragraph was last 197updated on 2020-02-08) 198the total size of all artifacts is over 7.1 GB but thanks to the 199combined zlib and delta compression, that content only takes less than 20097 MB of space in the repository database, for a compression ratio 201of about 74:1. The median size of all content BLOBs after delta 202and zlib compression have been applied is 156 bytes. 203The median size of BLOBs without compression is 45,312 bytes. 204 205Note that the zlib and delta compression is not an inherent part of the 206Fossil file format; it is just an optimization. 207The enduring file format for Fossil is the unordered 208set of artifacts. The compression techniques are just a detail of 209how the current implementation of Fossil happens to store these artifacts 210efficiently on disk. 211 212All of the original uncompressed and un-delta'd artifacts can be extracted 213from a Fossil repository database using 214the [/help/deconstruct | fossil deconstruct] 215command. Individual artifacts can be extracted using the 216[/help/artifact | fossil artifact] command. 217When accessing the repository database using raw SQL and the 218[/help/sqlite3 | fossil sql] command, the extension function 219"<tt>content()</tt>" with a single argument which is the SHA1 or 220SHA3-256 hash 221of an artifact will return the complete uncompressed 222content of that artifact. 223 224Going the other way, the [/help/reconstruct | fossil reconstruct] 225command will scan a directory hierarchy and add all files found to 226a new repository database. The [/help/import | fossil import] command 227works by reading the input git-fast-export stream and using it to construct 228corresponding artifacts which are then written into the repository database. 229 230<h4>2.2.2 Project Metadata</h4> 231 232The global project state information in the repository database is 233supplemented by computed metadata that makes querying the project state 234more efficient. Metadata includes information such as the following: 235 236 * The names for all files found in any check-in. 237 * All check-ins that modify a given file 238 * Parents and children of each check-in. 239 * Potential timeline rows. 240 * The names of all symbolic tags and the check-ins they apply to. 241 * The names of all wiki pages and the artifacts that comprise each 242 wiki page. 243 * Attachments and the wiki pages or tickets they apply to. 244 * Current content of each ticket. 245 * Cross-references between tickets, check-ins, and wiki pages. 246 247The metadata is held in various SQL tables in the repository database. 248The metadata is designed to facilitate queries for the various timelines and 249reports that Fossil generates. 250As the functionality of Fossil evolves, 251the schema for the metadata can and does change. 252But schema changes do not invalidate the repository. Remember that the 253metadata contains no new information - only information that has been 254extracted from the canonical artifacts and saved in a more useful form. 255Hence, when the metadata schema changes, the prior metadata can be discarded 256and the entire metadata corpus can be recomputed from the canonical 257artifacts. That is what the 258[/help/rebuild | fossil rebuild] command does. 259 260<h4>2.2.3 Display And Processing Preferences</h4> 261 262The repository database also holds information used to help format 263the display of web pages and configuration settings that override the 264global configuration settings for the specific repository. All of 265this information (and the user credentials and privileges too) is 266local to each repository database; it is not shared between repositories 267by [/help/sync | fossil sync]. That is because it is entirely reasonable 268that two different websites for the same project might have completely 269different display preferences and user communities. One instance of the 270project might be a fork of the other, for example, which pulls from the 271other but never pushes and extends the project in ways that the keepers of 272the other website disapprove of. 273 274Display and processing information includes the following: 275 276 * The name and description of the project 277 * The CSS file, header, and footer used by all web pages 278 * The project logo image 279 * Fields of tickets that are considered "significant" and which are 280 therefore collected from artifacts and made available for display 281 * Templates for screens to view, edit, and create tickets 282 * Ticket report formats and display preferences 283 * Local values for [/help/settings | settings] that override the 284 global values defined in the per-user configuration database. 285 286Though the display and processing preferences do not move between 287repository instances using [/help/sync | fossil sync], this information 288can be shared between repositories using the 289[/help/config | fossil config push] and 290[/help/config | fossil config pull] commands. 291The display and processing information is also copied into new 292repositories when they are created using 293[/help/clone | fossil clone]. 294 295<h4>2.2.4 User Credentials And Privileges</h4> 296 297Just because two development teams are collaborating on a project and allow 298push and/or pull between their repositories does not mean that they 299trust each other enough to share passwords and access privileges. 300Hence the names and emails and passwords and privileges of users are 301considered private information that is kept locally in each repository. 302 303Each repository database has a table holding the username, privileges, 304and login credentials for users authorized to interact with that particular 305database. In addition, there is a table named "concealed" that maps the 306SHA1 hash of each users email address back into their true email address. 307The concealed table allows just the SHA1 hash of email addresses to 308be stored in tickets, and thus prevents actual email addresses from falling 309into the hands of spammers who happen to clone the repository. 310 311The content of the user and concealed tables can be pushed and pulled using the 312[/help/config | fossil config push] and 313[/help/config | fossil config pull] commands with the "user" and 314"email" as the AREA argument, but only if you have administrative 315privileges on the remote repository. 316 317<h4>2.2.5 Shunned Artifact List</h4> 318 319The set of canonical artifacts for a project - the global state for the 320project - is intended to be an append-only database. In other words, 321new artifacts can be added but artifacts can never be removed. But 322it sometimes happens that inappropriate content is mistakenly or 323maliciously added to a repository. The only way to get rid of 324the undesired content is to [./shunning.wiki | "shun"] it. 325The "shun" table in the repository database records the hash values for 326all shunned artifacts. 327 328The shun table can be pushed or pulled using 329the [/help/config | fossil config] command with the "shun" AREA argument. 330The shun table is also copied during a [/help/clone | clone]. 331 332<h3 id="localdb">2.3 Checkout Databases</h3> 333 334Fossil allows a single repository 335to have multiple working checkouts. Each working checkout has a single 336database in its root directory that records the state of that checkout. 337The checkout database is named "_FOSSIL_" or ".fslckout". 338The checkout database records information such as the following: 339 340 * The name of the repository database file. 341 * The version that is currently checked out. 342 * Files that have been [/help/add | added], 343 [/help/rm | removed], or [/help/mv | renamed] but not 344 yet committed. 345 * The mtime and size of files as they were originally checked out, 346 in order to expedite checking which files have been edited. 347 * Other check-ins that have been [/help/merge | merged] into the 348 working checkout but not yet committed. 349 * Copies of files prior to the most recent undoable operation - needed to 350 implement the [/help/undo | undo] and [/help/redo | redo] commands. 351 * The [/help/stash | stash]. 352 * State information for the [/help/bisect | bisect] command. 353 354For Fossil commands that run from within a working checkout, the 355first thing that happens is that Fossil locates the checkout database. 356Fossil first looks in the current directory. If not found there, it 357looks in the parent directory. If not found there, the parent of the 358parent. And so forth until either the checkout database is found 359or the search reaches the root of the file system. (In the latter case, 360Fossil returns an error, of course.) Once the checkout database is 361located, it is used to locate the repository database. 362 363Notice that the checkout database contains a pointer to the repository 364database but that the repository database has no record of the checkout 365databases. That means that a working checkout directory tree can be 366freely renamed or copied or deleted without consequence. But the 367repository database file, on the other hand, has to stay in the same 368place with the same name or else the open checkout databases will not 369be able to find it. 370 371A checkout database is created by the [/help/open | fossil open] command. 372A checkout database is deleted by [/help/close | fossil close]. The 373fossil close command really isn't needed; one can accomplish the same 374thing simply by deleting the checkout database. 375 376Note that the stash, the undo stack, and the state of the bisect command 377are all contained within the checkout database. That means that the 378fossil close command will delete all stash content, the undo stack, and 379the bisect state. The close command is not undoable. Use it with care. 380 381<h2>3.0 See Also</h2> 382 383 * [./makefile.wiki | The Fossil Build Process] 384 * [./contribute.wiki | How To Contribute Code To Fossil] 385 * [./adding_code.wiki | Adding New Features To Fossil] 386