1<title>Technical Overview</title>
2<h2 align="center">
3A Technical Overview<br>Of The Design And Implementation<br>Of Fossil
4</h2>
5
6<h2>1.0 Introduction</h2>
7
8At its lowest level, a Fossil repository consists of an unordered set
9of immutable "artifacts".  You might think of these artifacts as "files",
10since in many cases the artifacts are exactly that.
11But other "structural artifacts" are also included in the mix.
12These structural artifacts define the relationships
13between artifacts - which files go together to form a particular
14version of the project, who checked in that version and when, what was
15the check-in comment, what wiki pages are included with the project, what
16are the edit histories of each wiki page, what bug reports or tickets are
17included, who contributed to the evolution of each ticket, and so forth.
18This low-level file format is called the "global state" of
19the repository, since this is the information that is synced to peer
20repositories using push and pull operations.   The low-level file format
21is also called "enduring" since it is intended to last for many years.
22The details of the low-level, enduring, global file format
23are [./fileformat.wiki | described separately].
24
25This article is about how Fossil is currently implemented.  Instead of
26dealing with vague abstractions of "enduring file formats" as the
27[./fileformat.wiki | other document] does, this article provides
28some detail on how Fossil actually stores information on disk.
29
30<h2>2.0 Three Databases</h2>
31
32Fossil stores state information in
33[http://www.sqlite.org/ | SQLite] database files.
34SQLite keeps an entire relational database, including multiple tables and
35indices, in a single disk file.  The SQLite library allows the database
36files to be efficiently queried and updated using the industry-standard
37SQL language.  SQLite updates are atomic, so even in the event of
38a system crashes or power failure the repository content is protected.
39
40Fossil uses three separate classes of SQLite databases:
41
42<ol>
43<li>The configuration database
44<li>Repository databases
45<li>Checkout databases
46</ol>
47
48The configuration database is a one-per-user database that holds
49global configuration information used by Fossil.  There is one
50repository database per project.  The repository database is the
51file that people are normally referring to when they say
52"a Fossil repository".  The checkout database is found in the working
53checkout for a project and contains state information that is unique
54to that working checkout.
55
56Fossil does not always use all three database files.  The web interface,
57for example, typically only uses the repository database.  And the
58[/help/all | fossil settings] command only opens the configuration database
59when the --global option is used.  But other commands use all three
60databases at once.  For example, the [/help/status | fossil status]
61command will first locate the checkout database, then use the checkout
62database to find the repository database, then open the configuration
63database.  Whenever multiple databases are used at the same time,
64they are all opened on the same SQLite database connection using
65SQLite's [http://www.sqlite.org/lang_attach.html | ATTACH] command.
66
67The chart below provides a quick summary of how each of these
68database files are used by Fossil, with detailed discussion following.
69
70<table border="1" width="80%" cellpadding="0" align="center">
71<tr>
72<td width="33%" valign="top">
73<h3 align="center">Configuration Database<br>"~/.fossil" or<br>
74"~/.config/fossil.db"</h3>
75<ul>
76<li>Global [/help/settings |settings]
77<li>List of active repositories used by the [/help/all | all] command
78</ul>
79</td>
80<td width="34%" valign="top">
81<h3 align="center">Repository Database<br>"<i>project</i>.fossil"</h3>
82<ul>
83<li>[./fileformat.wiki | Global state of the project]
84    encoded using delta-compression
85<li>Local [/help/settings|settings]
86<li>Web interface display preferences
87<li>User credentials and permissions
88<li>Metadata about the global state to facilitate rapid
89    queries
90</ul>
91</td>
92<td width="33%" valign="top">
93<h3 align="center">Checkout Database<br>"_FOSSIL_" or ".fslckout"</h3>
94<ul>
95<li>The repository database used by this checkout
96<li>The version currently checked out
97<li>Other versions [/help/merge | merged] in but not
98    yet [/help/commit | committed]
99<li>Changes from the [/help/add | add], [/help/delete | delete],
100    and [/help/rename | rename] commands that have not yet been committed
101<li>"mtime" values and other information used to efficiently detect
102     local edits
103<li>The "[/help/stash | stash]"
104<li>Information needed to "[/help/undo|undo]" or "[/help/redo|redo]"
105</ul>
106</td>
107</tr>
108</table>
109
110<h3 id="configdb">2.1 The Configuration Database</h3>
111
112The configuration database holds cross-repository preferences and a list of all
113repositories for a single user.
114
115The [/help/settings | fossil settings] command can be used to specify various
116operating parameters and preferences for Fossil repositories.  Settings can
117apply to a single repository, or they can apply globally to all repositories
118for a user.  If both a global and a repository value exists for a setting,
119then the repository-specific value takes precedence.  All of the settings
120have reasonable defaults, and so many users will never need to change them.
121But if changes to settings are desired, the configuration database provides
122a way to change settings for all repositories with a single command, rather
123than having to change the setting individually on each repository.
124
125The configuration database also maintains a list of repositories.  This
126list is used by the [/help/all | fossil all] command in order to run various
127operations such as "sync" or "rebuild" on all repositories managed by a user.
128
129<h4 id="configloc">2.1.1 Location Of The Configuration Database</h4>
130
131On Unix systems, the configuration database is named by the following
132algorithm:
133
134<blockquote><table border="0">
135<tr><td>1. if environment variable FOSSIL_HOME exists
136<td>&nbsp;&rarr;&nbsp;<td>$FOSSIL_HOME/.fossil
137<tr><td>2. if file ~/.fossil exists<td>&nbsp;&rarr;<td>~/.fossil
138<tr><td>3. if environment variable XDG_CONFIG_HOME exists
139    <td>&nbsp;&rarr;<td>$XDG_CONFIG_HOME/fossil.db
140<tr><td>4. if the directory ~/.config exists
141    <td>&nbsp;&rarr;<td>~/.config/fossil.db
142<tr><td>5. Otherwise<td>&nbsp;&rarr;<td>~/.fossil
143</table></blockquote>
144
145Another way of thinking of this algorithm is the following:
146
147  *  Use "$FOSSIL_HOME/.fossil" if the FOSSIL_HOME variable is defined
148  *  Use the XDG-compatible name (usually ~/.config/fossil.db) on XDG systems
149     if the ~/.fossil file does not already exist
150  *  Otherwise, use the traditional unix name of "~/.fossil"
151
152This algorithm is complex due to the need for historical compatibility.
153Originally, the database was always just "~/.fossil".  Then support
154for the FOSSIL_HOME environment variable as added.  Later, support for the
155[https://specifications.freedesktop.org/basedir-spec/basedir-spec-latest.html|XDG-compatible configation filenames]
156was added.  Each of these changes needed to continue to support legacy
157installations.
158
159On Windows, the configuration database is the first of the following
160for which the corresponding environment variables exist:
161
162  *  %FOSSIL_HOME%/_fossil
163  *  %LOCALAPPDATA%/_fossil
164  *  %APPDATA%/_fossil
165  *  %USERPROFILES%/_fossil
166  *  %HOMEDRIVE%%HOMEPATH%/_fossil
167
168The second case is the one that usually determines the name  Note that the
169FOSSIL_HOME environment variable can always be set to determine the
170location of the configuration database.  Note also that the configuration
171database file itself is called ".fossil" or "fossil.db" on unix but
172"_fossil" on windows.
173
174The [/help?cmd=info|fossil info] command will show the location of
175the configuration database on a line that starts with "config-db:".
176
177<h3>2.2 Repository Databases</h3>
178
179The repository database is the file that is commonly referred to as
180"the repository".  This is because the repository database contains,
181among other things, the complete revision, ticket, and wiki history for
182a project.  It is customary to name the repository database after then
183name of the project, with a ".fossil" suffix.  For example, the repository
184database for the self-hosting Fossil repository is called "fossil.fossil"
185and the repository database for SQLite is called "sqlite.fossil".
186
187<h4>2.2.1 Global Project State</h4>
188
189The bulk of the repository database (typically 75 to 85%) consists
190of the artifacts that comprise the
191[./fileformat.wiki | enduring, global, shared state] of the project.
192The artifacts are stored as BLOBs, compressed using
193[http://www.zlib.net/ | zlib compression] and, where applicable,
194using [./delta_encoder_algorithm.wiki | delta compression].
195The combination of zlib and delta compression results in a considerable
196space savings.  For the SQLite project (when this paragraph was last
197updated on 2020-02-08)
198the total size of all artifacts is over 7.1 GB but thanks to the
199combined zlib and delta compression, that content only takes less than
20097 MB of space in the repository database, for a compression ratio
201of about 74:1.  The median size of all content BLOBs after delta
202and zlib compression have been applied is 156 bytes.
203The median size of BLOBs without compression is 45,312 bytes.
204
205Note that the zlib and delta compression is not an inherent part of the
206Fossil file format; it is just an optimization.
207The enduring file format for Fossil is the unordered
208set of artifacts. The compression techniques are just a detail of
209how the current implementation of Fossil happens to store these artifacts
210efficiently on disk.
211
212All of the original uncompressed and un-delta'd artifacts can be extracted
213from a Fossil repository database using
214the [/help/deconstruct | fossil deconstruct]
215command. Individual artifacts can be extracted using the
216[/help/artifact | fossil artifact] command.
217When accessing the repository database using raw SQL and the
218[/help/sqlite3 | fossil sql] command, the extension function
219"<tt>content()</tt>" with a single argument which is the SHA1 or
220SHA3-256 hash
221of an artifact will return the complete uncompressed
222content of that artifact.
223
224Going the other way, the [/help/reconstruct | fossil reconstruct]
225command will scan a directory hierarchy and add all files found to
226a new repository database.  The [/help/import | fossil import] command
227works by reading the input git-fast-export stream and using it to construct
228corresponding artifacts which are then written into the repository database.
229
230<h4>2.2.2 Project Metadata</h4>
231
232The global project state information in the repository database is
233supplemented by computed metadata that makes querying the project state
234more efficient.  Metadata includes information such as the following:
235
236  *  The names for all files found in any check-in.
237  *  All check-ins that modify a given file
238  *  Parents and children of each check-in.
239  *  Potential timeline rows.
240  *  The names of all symbolic tags and the check-ins they apply to.
241  *  The names of all wiki pages and the artifacts that comprise each
242     wiki page.
243  *  Attachments and the wiki pages or tickets they apply to.
244  *  Current content of each ticket.
245  *  Cross-references between tickets, check-ins, and wiki pages.
246
247The metadata is held in various SQL tables in the repository database.
248The metadata is designed to facilitate queries for the various timelines and
249reports that Fossil generates.
250As the functionality of Fossil evolves,
251the schema for the metadata can and does change.
252But schema changes do not invalidate the repository.  Remember that the
253metadata contains no new information - only information that has been
254extracted from the canonical artifacts and saved in a more useful form.
255Hence, when the metadata schema changes, the prior metadata can be discarded
256and the entire metadata corpus can be recomputed from the canonical
257artifacts.  That is what the
258[/help/rebuild | fossil rebuild] command does.
259
260<h4>2.2.3 Display And Processing Preferences</h4>
261
262The repository database also holds information used to help format
263the display of web pages and configuration settings that override the
264global configuration settings for the specific repository.  All of
265this information (and the user credentials and privileges too) is
266local to each repository database; it is not shared between repositories
267by [/help/sync | fossil sync].  That is because it is entirely reasonable
268that two different websites for the same project might have completely
269different display preferences and user communities.  One instance of the
270project might be a fork of the other, for example, which pulls from the
271other but never pushes and extends the project in ways that the keepers of
272the other website disapprove of.
273
274Display and processing information includes the following:
275
276  *  The name and description of the project
277  *  The CSS file, header, and footer used by all web pages
278  *  The project logo image
279  *  Fields of tickets that are considered "significant" and which are
280     therefore collected from artifacts and made available for display
281  *  Templates for screens to view, edit, and create tickets
282  *  Ticket report formats and display preferences
283  *  Local values for [/help/settings | settings] that override the
284     global values defined in the per-user configuration database.
285
286Though the display and processing preferences do not move between
287repository instances using [/help/sync | fossil sync], this information
288can be shared between repositories using the
289[/help/config | fossil config push] and
290[/help/config | fossil config pull] commands.
291The display and processing information is also copied into new
292repositories when they are created using
293[/help/clone | fossil clone].
294
295<h4>2.2.4 User Credentials And Privileges</h4>
296
297Just because two development teams are collaborating on a project and allow
298push and/or pull between their repositories does not mean that they
299trust each other enough to share passwords and access privileges.
300Hence the names and emails and passwords and privileges of users are
301considered private information that is kept locally in each repository.
302
303Each repository database has a table holding the username, privileges,
304and login credentials for users authorized to interact with that particular
305database.  In addition, there is a table named "concealed" that maps the
306SHA1 hash of each users email address back into their true email address.
307The concealed table allows just the SHA1 hash of email addresses to
308be stored in tickets, and thus prevents actual email addresses from falling
309into the hands of spammers who happen to clone the repository.
310
311The content of the user and concealed tables can be pushed and pulled using the
312[/help/config | fossil config push] and
313[/help/config | fossil config pull] commands with the "user" and
314"email" as the AREA argument, but only if you have administrative
315privileges on the remote repository.
316
317<h4>2.2.5 Shunned Artifact List</h4>
318
319The set of canonical artifacts for a project - the global state for the
320project - is intended to be an append-only database.  In other words,
321new artifacts can be added but artifacts can never be removed.  But
322it sometimes happens that inappropriate content is mistakenly or
323maliciously added to a repository.  The only way to get rid of
324the undesired content is to [./shunning.wiki | "shun"] it.
325The "shun" table in the repository database records the hash values for
326all shunned artifacts.
327
328The shun table can be pushed or pulled using
329the [/help/config | fossil config] command with the "shun" AREA argument.
330The shun table is also copied during a [/help/clone | clone].
331
332<h3 id="localdb">2.3 Checkout Databases</h3>
333
334Fossil allows a single repository
335to have multiple working checkouts.  Each working checkout has a single
336database in its root directory that records the state of that checkout.
337The checkout database is named "_FOSSIL_" or ".fslckout".
338The checkout database records information such as the following:
339
340  *  The name of the repository database file.
341  *  The version that is currently checked out.
342  *  Files that have been [/help/add | added],
343     [/help/rm | removed], or [/help/mv | renamed] but not
344     yet committed.
345  *  The mtime and size of files as they were originally checked out,
346     in order to expedite checking which files have been edited.
347  *  Other check-ins that have been [/help/merge | merged] into the
348     working checkout but not yet committed.
349  *  Copies of files prior to the most recent undoable operation - needed to
350     implement the [/help/undo | undo] and [/help/redo | redo] commands.
351  *  The [/help/stash | stash].
352  *  State information for the [/help/bisect | bisect] command.
353
354For Fossil commands that run from within a working checkout, the
355first thing that happens is that Fossil locates the checkout database.
356Fossil first looks in the current directory.  If not found there, it
357looks in the parent directory.  If not found there, the parent of the
358parent.  And so forth until either the checkout database is found
359or the search reaches the root of the file system.  (In the latter case,
360Fossil returns an error, of course.)  Once the checkout database is
361located, it is used to locate the repository database.
362
363Notice that the checkout database contains a pointer to the repository
364database but that the repository database has no record of the checkout
365databases.  That means that a working checkout directory tree can be
366freely renamed or copied or deleted without consequence.  But the
367repository database file, on the other hand, has to stay in the same
368place with the same name or else the open checkout databases will not
369be able to find it.
370
371A checkout database is created by the [/help/open | fossil open] command.
372A checkout database is deleted by [/help/close | fossil close].  The
373fossil close command really isn't needed; one can accomplish the same
374thing simply by deleting the checkout database.
375
376Note that the stash, the undo stack, and the state of the bisect command
377are all contained within the checkout database.  That means that the
378fossil close command will delete all stash content, the undo stack, and
379the bisect state.  The close command is not undoable.  Use it with care.
380
381<h2>3.0 See Also</h2>
382
383  *  [./makefile.wiki | The Fossil Build Process]
384  *  [./contribute.wiki | How To Contribute Code To Fossil]
385  *  [./adding_code.wiki | Adding New Features To Fossil]
386