fossil-src-2.17/www/tech_overview.wiki

<title>Technical Overview</title>
<h2 align="center">
A Technical Overview<br>Of The Design And Implementation<br>Of Fossil
</h2>

<h2>1.0 Introduction</h2>

At its lowest level, a Fossil repository consists of an unordered set
of immutable "artifacts".  You might think of these artifacts as "files",
since in many cases the artifacts are exactly that.
But other "structural artifacts" are also included in the mix.
These structural artifacts define the relationships
between artifacts - which files go together to form a particular
version of the project, who checked in that version and when, what was
the check-in comment, what wiki pages are included with the project, what
are the edit histories of each wiki page, what bug reports or tickets are
included, who contributed to the evolution of each ticket, and so forth.
This low-level file format is called the "global state" of
the repository, since this is the information that is synced to peer
repositories using push and pull operations.   The low-level file format
is also called "enduring" since it is intended to last for many years.
The details of the low-level, enduring, global file format
are [./fileformat.wiki | described separately].

This article is about how Fossil is currently implemented.  Instead of
dealing with vague abstractions of "enduring file formats" as the
[./fileformat.wiki | other document] does, this article provides
some detail on how Fossil actually stores information on disk.

<h2>2.0 Three Databases</h2>

Fossil stores state information in
[http://www.sqlite.org/ | SQLite] database files.
SQLite keeps an entire relational database, including multiple tables and
indices, in a single disk file.  The SQLite library allows the database
files to be efficiently queried and updated using the industry-standard
SQL language.  SQLite updates are atomic, so even in the event of
a system crashes or power failure the repository content is protected.

Fossil uses three separate classes of SQLite databases:

<ol>
<li>The configuration database
<li>Repository databases
<li>Checkout databases
</ol>

The configuration database is a one-per-user database that holds
global configuration information used by Fossil.  There is one
repository database per project.  The repository database is the
file that people are normally referring to when they say
"a Fossil repository".  The checkout database is found in the working
checkout for a project and contains state information that is unique
to that working checkout.

Fossil does not always use all three database files.  The web interface,
for example, typically only uses the repository database.  And the
[/help/all | fossil settings] command only opens the configuration database
when the --global option is used.  But other commands use all three
databases at once.  For example, the [/help/status | fossil status]
command will first locate the checkout database, then use the checkout
database to find the repository database, then open the configuration
database.  Whenever multiple databases are used at the same time,
they are all opened on the same SQLite database connection using
SQLite's [http://www.sqlite.org/lang_attach.html | ATTACH] command.

The chart below provides a quick summary of how each of these
database files are used by Fossil, with detailed discussion following.

<table border="1" width="80%" cellpadding="0" align="center">
<tr>
<td width="33%" valign="top">
<h3 align="center">Configuration Database<br>"~/.fossil" or<br>
"~/.config/fossil.db"</h3>
<ul>
<li>Global [/help/settings |settings]
<li>List of active repositories used by the [/help/all | all] command
</ul>
</td>
<td width="34%" valign="top">
<h3 align="center">Repository Database<br>"<i>project</i>.fossil"</h3>
<ul>
<li>[./fileformat.wiki | Global state of the project]
    encoded using delta-compression
<li>Local [/help/settings|settings]
<li>Web interface display preferences
<li>User credentials and permissions
<li>Metadata about the global state to facilitate rapid
    queries
</ul>
</td>
<td width="33%" valign="top">
<h3 align="center">Checkout Database<br>"_FOSSIL_" or ".fslckout"</h3>
<ul>
<li>The repository database used by this checkout
<li>The version currently checked out
<li>Other versions [/help/merge | merged] in but not
    yet [/help/commit | committed]
<li>Changes from the [/help/add | add], [/help/delete | delete],
    and [/help/rename | rename] commands that have not yet been committed
<li>"mtime" values and other information used to efficiently detect
     local edits
<li>The "[/help/stash | stash]"
<li>Information needed to "[/help/undo|undo]" or "[/help/redo|redo]"
</ul>
</td>
</tr>
</table>

<h3 id="configdb">2.1 The Configuration Database</h3>

The configuration database holds cross-repository preferences and a list of all
repositories for a single user.

The [/help/settings | fossil settings] command can be used to specify various
operating parameters and preferences for Fossil repositories.  Settings can
apply to a single repository, or they can apply globally to all repositories
for a user.  If both a global and a repository value exists for a setting,
then the repository-specific value takes precedence.  All of the settings
have reasonable defaults, and so many users will never need to change them.
But if changes to settings are desired, the configuration database provides
a way to change settings for all repositories with a single command, rather
than having to change the setting individually on each repository.

The configuration database also maintains a list of repositories.  This
list is used by the [/help/all | fossil all] command in order to run various
operations such as "sync" or "rebuild" on all repositories managed by a user.

<h4 id="configloc">2.1.1 Location Of The Configuration Database</h4>

On Unix systems, the configuration database is named by the following
algorithm:

<blockquote><table border="0">
<tr><td>1. if environment variable FOSSIL_HOME exists
<td>&nbsp;&rarr;&nbsp;<td>$FOSSIL_HOME/.fossil
<tr><td>2. if file ~/.fossil exists<td>&nbsp;&rarr;<td>~/.fossil
<tr><td>3. if environment variable XDG_CONFIG_HOME exists
    <td>&nbsp;&rarr;<td>$XDG_CONFIG_HOME/fossil.db
<tr><td>4. if the directory ~/.config exists
    <td>&nbsp;&rarr;<td>~/.config/fossil.db
<tr><td>5. Otherwise<td>&nbsp;&rarr;<td>~/.fossil
</table></blockquote>

Another way of thinking of this algorithm is the following:

  *  Use "$FOSSIL_HOME/.fossil" if the FOSSIL_HOME variable is defined
  *  Use the XDG-compatible name (usually ~/.config/fossil.db) on XDG systems
     if the ~/.fossil file does not already exist
  *  Otherwise, use the traditional unix name of "~/.fossil"

This algorithm is complex due to the need for historical compatibility.
Originally, the database was always just "~/.fossil".  Then support
for the FOSSIL_HOME environment variable as added.  Later, support for the
[https://specifications.freedesktop.org/basedir-spec/basedir-spec-latest.html|XDG-compatible configation filenames]
was added.  Each of these changes needed to continue to support legacy
installations.

On Windows, the configuration database is the first of the following
for which the corresponding environment variables exist:

  *  %FOSSIL_HOME%/_fossil
  *  %LOCALAPPDATA%/_fossil
  *  %APPDATA%/_fossil
  *  %USERPROFILES%/_fossil
  *  %HOMEDRIVE%%HOMEPATH%/_fossil

The second case is the one that usually determines the name  Note that the
FOSSIL_HOME environment variable can always be set to determine the
location of the configuration database.  Note also that the configuration
database file itself is called ".fossil" or "fossil.db" on unix but
"_fossil" on windows.

The [/help?cmd=info|fossil info] command will show the location of
the configuration database on a line that starts with "config-db:".

<h3>2.2 Repository Databases</h3>

The repository database is the file that is commonly referred to as
"the repository".  This is because the repository database contains,
among other things, the complete revision, ticket, and wiki history for
a project.  It is customary to name the repository database after then
name of the project, with a ".fossil" suffix.  For example, the repository
database for the self-hosting Fossil repository is called "fossil.fossil"
and the repository database for SQLite is called "sqlite.fossil".

<h4>2.2.1 Global Project State</h4>

The bulk of the repository database (typically 75 to 85%) consists
of the artifacts that comprise the
[./fileformat.wiki | enduring, global, shared state] of the project.
The artifacts are stored as BLOBs, compressed using
[http://www.zlib.net/ | zlib compression] and, where applicable,
using [./delta_encoder_algorithm.wiki | delta compression].
The combination of zlib and delta compression results in a considerable
space savings.  For the SQLite project (when this paragraph was last
updated on 2020-02-08)
the total size of all artifacts is over 7.1 GB but thanks to the
combined zlib and delta compression, that content only takes less than
97 MB of space in the repository database, for a compression ratio
of about 74:1.  The median size of all content BLOBs after delta
and zlib compression have been applied is 156 bytes.
The median size of BLOBs without compression is 45,312 bytes.

Note that the zlib and delta compression is not an inherent part of the
Fossil file format; it is just an optimization.
The enduring file format for Fossil is the unordered
set of artifacts. The compression techniques are just a detail of
how the current implementation of Fossil happens to store these artifacts
efficiently on disk.

All of the original uncompressed and un-delta'd artifacts can be extracted
from a Fossil repository database using
the [/help/deconstruct | fossil deconstruct]
command. Individual artifacts can be extracted using the
[/help/artifact | fossil artifact] command.
When accessing the repository database using raw SQL and the
[/help/sqlite3 | fossil sql] command, the extension function
"<tt>content()</tt>" with a single argument which is the SHA1 or
SHA3-256 hash
of an artifact will return the complete uncompressed
content of that artifact.

Going the other way, the [/help/reconstruct | fossil reconstruct]
command will scan a directory hierarchy and add all files found to
a new repository database.  The [/help/import | fossil import] command
works by reading the input git-fast-export stream and using it to construct
corresponding artifacts which are then written into the repository database.

<h4>2.2.2 Project Metadata</h4>

The global project state information in the repository database is
supplemented by computed metadata that makes querying the project state
more efficient.  Metadata includes information such as the following:

  *  The names for all files found in any check-in.
  *  All check-ins that modify a given file
  *  Parents and children of each check-in.
  *  Potential timeline rows.
  *  The names of all symbolic tags and the check-ins they apply to.
  *  The names of all wiki pages and the artifacts that comprise each
     wiki page.
  *  Attachments and the wiki pages or tickets they apply to.
  *  Current content of each ticket.
  *  Cross-references between tickets, check-ins, and wiki pages.

The metadata is held in various SQL tables in the repository database.
The metadata is designed to facilitate queries for the various timelines and
reports that Fossil generates.
As the functionality of Fossil evolves,
the schema for the metadata can and does change.
But schema changes do not invalidate the repository.  Remember that the
metadata contains no new information - only information that has been
extracted from the canonical artifacts and saved in a more useful form.
Hence, when the metadata schema changes, the prior metadata can be discarded
and the entire metadata corpus can be recomputed from the canonical
artifacts.  That is what the
[/help/rebuild | fossil rebuild] command does.

<h4>2.2.3 Display And Processing Preferences</h4>

The repository database also holds information used to help format
the display of web pages and configuration settings that override the
global configuration settings for the specific repository.  All of
this information (and the user credentials and privileges too) is
local to each repository database; it is not shared between repositories
by [/help/sync | fossil sync].  That is because it is entirely reasonable
that two different websites for the same project might have completely
different display preferences and user communities.  One instance of the
project might be a fork of the other, for example, which pulls from the
other but never pushes and extends the project in ways that the keepers of
the other website disapprove of.

Display and processing information includes the following:

  *  The name and description of the project
  *  The CSS file, header, and footer used by all web pages
  *  The project logo image
  *  Fields of tickets that are considered "significant" and which are
     therefore collected from artifacts and made available for display
  *  Templates for screens to view, edit, and create tickets
  *  Ticket report formats and display preferences
  *  Local values for [/help/settings | settings] that override the
     global values defined in the per-user configuration database.

Though the display and processing preferences do not move between
repository instances using [/help/sync | fossil sync], this information
can be shared between repositories using the
[/help/config | fossil config push] and
[/help/config | fossil config pull] commands.
The display and processing information is also copied into new
repositories when they are created using
[/help/clone | fossil clone].

<h4>2.2.4 User Credentials And Privileges</h4>

Just because two development teams are collaborating on a project and allow
push and/or pull between their repositories does not mean that they
trust each other enough to share passwords and access privileges.
Hence the names and emails and passwords and privileges of users are
considered private information that is kept locally in each repository.

Each repository database has a table holding the username, privileges,
and login credentials for users authorized to interact with that particular
database.  In addition, there is a table named "concealed" that maps the
SHA1 hash of each users email address back into their true email address.
The concealed table allows just the SHA1 hash of email addresses to
be stored in tickets, and thus prevents actual email addresses from falling
into the hands of spammers who happen to clone the repository.

The content of the user and concealed tables can be pushed and pulled using the
[/help/config | fossil config push] and
[/help/config | fossil config pull] commands with the "user" and
"email" as the AREA argument, but only if you have administrative
privileges on the remote repository.

<h4>2.2.5 Shunned Artifact List</h4>

The set of canonical artifacts for a project - the global state for the
project - is intended to be an append-only database.  In other words,
new artifacts can be added but artifacts can never be removed.  But
it sometimes happens that inappropriate content is mistakenly or
maliciously added to a repository.  The only way to get rid of
the undesired content is to [./shunning.wiki | "shun"] it.
The "shun" table in the repository database records the hash values for
all shunned artifacts.

The shun table can be pushed or pulled using
the [/help/config | fossil config] command with the "shun" AREA argument.
The shun table is also copied during a [/help/clone | clone].

<h3 id="localdb">2.3 Checkout Databases</h3>

Fossil allows a single repository
to have multiple working checkouts.  Each working checkout has a single
database in its root directory that records the state of that checkout.
The checkout database is named "_FOSSIL_" or ".fslckout".
The checkout database records information such as the following:

  *  The name of the repository database file.
  *  The version that is currently checked out.
  *  Files that have been [/help/add | added],
     [/help/rm | removed], or [/help/mv | renamed] but not
     yet committed.
  *  The mtime and size of files as they were originally checked out,
     in order to expedite checking which files have been edited.
  *  Other check-ins that have been [/help/merge | merged] into the
     working checkout but not yet committed.
  *  Copies of files prior to the most recent undoable operation - needed to
     implement the [/help/undo | undo] and [/help/redo | redo] commands.
  *  The [/help/stash | stash].
  *  State information for the [/help/bisect | bisect] command.

For Fossil commands that run from within a working checkout, the
first thing that happens is that Fossil locates the checkout database.
Fossil first looks in the current directory.  If not found there, it
looks in the parent directory.  If not found there, the parent of the
parent.  And so forth until either the checkout database is found
or the search reaches the root of the file system.  (In the latter case,
Fossil returns an error, of course.)  Once the checkout database is
located, it is used to locate the repository database.

Notice that the checkout database contains a pointer to the repository
database but that the repository database has no record of the checkout
databases.  That means that a working checkout directory tree can be
freely renamed or copied or deleted without consequence.  But the
repository database file, on the other hand, has to stay in the same
place with the same name or else the open checkout databases will not
be able to find it.

A checkout database is created by the [/help/open | fossil open] command.
A checkout database is deleted by [/help/close | fossil close].  The
fossil close command really isn't needed; one can accomplish the same
thing simply by deleting the checkout database.

Note that the stash, the undo stack, and the state of the bisect command
are all contained within the checkout database.  That means that the
fossil close command will delete all stash content, the undo stack, and
the bisect state.  The close command is not undoable.  Use it with care.

<h2>3.0 See Also</h2>

  *  [./makefile.wiki | The Fossil Build Process]
  *  [./contribute.wiki | How To Contribute Code To Fossil]
  *  [./adding_code.wiki | Adding New Features To Fossil]