1# File Name Glob Patterns
2
3
4A [glob pattern][glob] is a text expression that matches one or more
5file names using wild cards familiar to most users of a command line.
6For example, `*` is a glob that matches any name at all and
7`Readme.txt` is a glob that matches exactly one file.
8
9A glob should not be confused with a [regular expression][regexp] (RE),
10even though they use some of the same special characters for similar
11purposes, because [they are not fully compatible][greinc] pattern
12matching languages. Fossil uses globs when matching file names with the
13settings described in this document, not REs.
14
15[glob]:   https://en.wikipedia.org/wiki/Glob_(programming)
16[greinc]: https://unix.stackexchange.com/a/57958/138
17[regexp]: https://en.wikipedia.org/wiki/Regular_expression
18
19These settings hold one or more file glob patterns to cause Fossil to
20give matching named files special treatment.  Glob patterns are also
21accepted in options to certain commands and as query parameters to
22certain Fossil UI web pages.
23
24Where Fossil also accepts globs in commands, this handling may interact
25with your OS’s command shell or its C runtime system, because they may
26have their own glob pattern handling. We will detail such interactions
27below.
28
29
30## Syntax
31
32Where Fossil accepts glob patterns, it will usually accept a *list* of
33such patterns, each individual pattern separated from the others
34by white space or commas. If a glob must contain white spaces or
35commas, it can be quoted with either single or double quotation marks.
36A list is said to match if any one glob in the list
37matches.
38
39A glob pattern matches a given file name if it successfully consumes and
40matches the *entire* name. Partial matches are failed matches.
41
42Most characters in a glob pattern consume a single character of the file
43name and must match it exactly. For instance, “a” in a glob simply
44matches the letter “a” in the file name unless it is inside a special
45character sequence.
46
47Other characters have special meaning, and they may include otherwise
48normal characters to give them special meaning:
49
50:Pattern |:Effect
51---------------------------------------------------------------------
52`*`      | Matches any sequence of zero or more characters
53`?`      | Matches exactly one character
54`[...]`  | Matches one character from the enclosed list of characters
55`[^...]` | Matches one character *not* in the enclosed list
56
57Note that unlike [POSIX globs][pg], these special characters and
58sequences are allowed to match `/` directory separators as well as the
59initial `.` in the name of a hidden file or directory. This is because
60Fossil file names are stored as complete path names. The distinction
61between file name and directory name is “below” Fossil in this sense.
62
63[pg]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_13
64
65The bracket expressions above require some additional explanation:
66
67 *  A range of characters may be specified with `-`, so `[a-f]` matches
68    exactly the same characters as `[abcdef]`. Ranges reflect Unicode
69    code points without any locale-specific collation sequence.
70    Therefore, this particular sequence never matches the Unicode
71    pre-composed character `é`, for example. (U+00E9)
72
73 *  This dependence on character/code point ordering may have other
74    effects to surprise you. For example, the glob `[A-z]` not only
75    matches upper and lowercase ASCII letters, it also matches several
76    punctuation characters placed between `Z` and `a` in both ASCII and
77    Unicode: `[`, `\`, `]`, `^`, `_`, and <tt>\`</tt>.
78
79 *  You may include a literal `-` in a list by placing it last, just
80    before the `]`.
81
82 *  You may include a literal `]` in a list by making the first
83    character after the `[` or `[^`. At any other place, `]` ends the list.
84
85 *  You may include a literal `^` in a list by placing it anywhere
86    except after the opening `[`.
87
88 *  Beware that a range must be specified from low value to high
89    value: `[z-a]` does not match any character at all, preventing the
90    entire glob from matching.
91
92Some examples of character lists:
93
94:Pattern |:Effect
95---------------------------------------------------------------------
96`[a-d]`  | Matches any one of `a`, `b`, `c`, or `d` but not `ä`
97`[^a-d]` | Matches exactly one character other than `a`, `b`, `c`, or `d`
98`[0-9a-fA-F]` | Matches exactly one hexadecimal digit
99`[a-]`   | Matches either `a` or `-`
100`[][]`   | Matches either `]` or `[`
101`[^]]`   | Matches exactly one character other than `]`
102`[]^]`   | Matches either `]` or `^`
103`[^-]`   | Matches exactly one character other than `-`
104
105White space means the specific ASCII characters TAB, LF, VT, FF, CR,
106and SPACE.  Note that this does not include any of the many additional
107spacing characters available in Unicode such as
108U+00A0, NO-BREAK SPACE.
109
110Because both LF and CR are white space and leading and trailing spaces
111are stripped from each glob in a list, a list of globs may be broken
112into lines between globs when the list is stored in a file, as for a
113versioned setting.
114
115Note that 'single quotes' and "double quotes" are the ASCII straight
116quote characters, not any of the other quotation marks provided in
117Unicode and specifically not the "curly" quotes preferred by
118typesetters and word processors.
119
120
121## File Names to Match
122
123Before it is compared to a glob pattern, each file name is transformed
124to a canonical form:
125
126  *  all directory separators are changed to `/`
127  *  redundant slashes are removed
128  *  all `.` path components are removed
129  *  all `..` path components are resolved
130
131(There are additional details we are ignoring here, but they cover rare
132edge cases and follow the principle of least surprise.)
133
134The glob must match the *entire* canonical file name to be considered a
135match.
136
137The goal is to have a name that is the simplest possible for each
138particular file, and that will be the same regardless of the platform
139you run Fossil on. This is important when you have a repository cloned
140from multiple platforms and have globs in versioned settings: you want
141those settings to be interpreted the same way everywhere.
142
143Beware, however, that all glob matching in Fossil is case sensitive
144regardless of host platform and file system. This will not be a surprise
145on POSIX platforms where file names are usually treated case
146sensitively. However, most Windows file systems are case preserving but
147case insensitive. That is, on Windows, the names `ReadMe` and `README`
148are usually names of the same file. The same is true in other cases,
149such as by default on macOS file systems and in the file system drivers
150for Windows file systems running on non-Windows systems. (e.g. exfat on
151Linux.) Therefore, write your Fossil glob patterns to match the name of
152the file as checked into the repository.
153
154Some example cases:
155
156:Pattern     |:Effect
157--------------------------------------------------------------------------------
158`README`     | Matches only a file named `README` in the root of the tree. It does not match a file named `src/README` because it does not include any characters that consume (and match) the `src/` part.
159`*/README`   | Matches `src/README`. Unlike Unix file globs, it also matches `src/library/README`. However it does not match the file `README` in the root of the tree.
160`*README`    | Matches `src/README` as well as the file `README` in the root of the tree as well as `foo/bar/README` or any other file named `README` in the tree. However, it also matches `A-DIFFERENT-README` and `src/DO-NOT-README`, or any other file whose name ends with `README`.
161`src/README` | Matches `src\README` on Windows because all directory separators are rewritten as `/` in the canonical name before the glob is matched. This makes it much easier to write globs that work on both Unix and Windows.
162`*.[ch]`     | Matches every C source or header file in the tree at the root or at any depth. Again, this is (deliberately) different from Unix file globs and Windows wild cards.
163
164## Where Globs are Used
165
166### Settings that are Globs
167
168These settings are all lists of glob patterns:
169
170:Setting        |:Description
171--------------------------------------------------------------------------------
172`binary-glob`   | Files that should be treated as binary files for committing and merging purposes
173`clean-glob`    | Files that the [`clean`][] command will delete without prompting or allowing undo
174`crlf-glob`     | Files in which it is okay to have `CR`, `CR`+`LF` or mixed line endings.  Set to "`*`" to disable CR+LF checking
175`crnl-glob`     | Alias for the `crlf-glob` setting
176`encoding-glob` | Files that the [`commit`][] command will ignore when issuing warnings about text files that may use another encoding than ASCII or UTF-8.  Set to "`*`" to disable encoding checking
177`ignore-glob`   | Files that the [`add`][], [`addremove`][], [`clean`][], and [`extras`][] commands will ignore
178`keep-glob`     | Files that the [`clean`][] command will keep
179
180All may be [versioned, local, or global](settings.wiki). Use `fossil
181settings` to manage local and global settings, or a file in the
182repository's `.fossil-settings/` folder at the root of the tree named
183for each for versioned setting.
184
185Using versioned settings for these not only has the advantage that
186they are tracked in the repository just like the rest of your project,
187but you can more easily keep longer lists of more complicated glob
188patterns than would be practical in either local or global settings.
189
190The `ignore-glob` is an example of one setting that frequently grows
191to be an elaborate list of files that should be ignored by most
192commands. This is especially true when one (or more) IDEs are used in
193a project because each IDE has its own ideas of how and where to cache
194information that speeds up its browsing and building tasks but which
195need not be preserved in your project's history.
196
197
198### Commands that Refer to Globs
199
200Many of the commands that respect the settings containing globs have
201options to override some or all of the settings. These options are
202usually named to correspond to the setting they override, such as
203`--ignore` to override the `ignore-glob` setting. These commands are:
204
205 *  [`add`][]
206 *  [`addremove`][]
207 *  [`changes`][]
208 *  [`clean`][]
209 *  [`commit`][]
210 *  [`extras`][]
211 *  [`merge`][]
212 *  [`settings`][]
213 *  [`status`][]
214 *  [`touch`][]
215 *  [`unset`][]
216
217The commands [`tarball`][] and [`zip`][] produce compressed archives of a
218specific checkin. They may be further restricted by options that
219specify glob patterns that name files to include or exclude rather
220than archiving the entire checkin.
221
222The commands [`http`][], [`cgi`][], [`server`][], and [`ui`][] that
223implement or support with web servers provide a mechanism to name some
224files to serve with static content where a list of glob patterns
225specifies what content may be served.
226
227[`add`]: /help?cmd=add
228[`addremove`]: /help?cmd=addremove
229[`changes`]: /help?cmd=changes
230[`clean`]: /help?cmd=clean
231[`commit`]: /help?cmd=commit
232[`extras`]: /help?cmd=extras
233[`merge`]: /help?cmd=merge
234[`settings`]: /help?cmd=settings
235[`status`]: /help?cmd=status
236[`touch`]: /help?cmd=touch
237[`unset`]: /help?cmd=unset
238
239[`tarball`]: /help?cmd=tarball
240[`zip`]: /help?cmd=zip
241
242[`http`]: /help?cmd=http
243[`cgi`]: /help?cmd=cgi
244[`server`]: /help?cmd=server
245[`ui`]: /help?cmd=ui
246
247
248### Web Pages that Refer to Globs
249
250The [`/timeline`][] page supports the query parameter `chng=GLOBLIST` that
251names a list of glob patterns defining which files to focus the
252timeline on. It also has the query parameters `t=TAG` and `r=TAG` that
253names a tag to focus on, which can be configured with `ms=STYLE` to
254use a glob pattern to match tag names instead of the default exact
255match or a couple of other comparison styles.
256
257The pages [`/tarball`][] and [`/zip`][] generate compressed archives
258of a specific checkin. They may be further restricted by query
259parameters that specify glob patterns that name files to include or
260exclude rather than taking the entire checkin.
261
262[`/timeline`]: /help?cmd=/timeline
263[`/tarball`]: /help?cmd=/tarball
264[`/zip`]: /help?cmd=/zip
265
266
267## Platform Quirks
268
269Fossil glob patterns are based on the glob pattern feature of POSIX
270shells. Fossil glob patterns also have a quoting mechanism, discussed
271above. Because other parts of your operating system may interpret glob
272patterns and quotes separately from Fossil, it is often difficult to
273give glob patterns correctly to Fossil on the command line. Quotes and
274special characters in glob patterns are likely to be interpreted when
275given as part of a `fossil` command, causing unexpected behavior.
276
277These problems do not affect [versioned settings files](settings.wiki)
278or Admin &rarr; Settings in Fossil UI. Consequently, it is better to
279set long-term `*-glob` settings via these methods than to use `fossil
280settings` commands.
281
282That advice does not help you when you are giving one-off glob patterns
283in `fossil` commands. The remainder of this section gives remedies and
284workarounds for these problems.
285
286
287### <a id="posix"></a>POSIX Systems
288
289If you are using Fossil on a system with a POSIX-compatible shell
290&mdash; Linux, macOS, the BSDs, Unix, Cygwin, WSL etc. &mdash; the shell
291may expand the glob patterns before passing the result to the `fossil`
292executable.
293
294Sometimes this is exactly what you want.  Consider this command for
295example:
296
297    $ fossil add RE*
298
299If you give that command in a directory containing `README.txt` and
300`RELEASE-NOTES.txt`, the shell will expand the command to:
301
302    $ fossil add README.txt RELEASE-NOTES.txt
303
304…which is compatible with the `fossil add` command's argument list,
305which allows multiple files.
306
307Now consider what happens instead if you say:
308
309    $ fossil add --ignore RE* src/*.c
310
311This *does not* do what you want because the shell will expand both `RE*`
312and `src/*.c`, causing one of the two files matching the `RE*` glob
313pattern to be ignored and the other to be added to the repository. You
314need to say this in that case:
315
316    $ fossil add --ignore 'RE*' src/*.c
317
318The single quotes force a POSIX shell to pass the `RE*` glob pattern
319through to Fossil untouched, which will do its own glob pattern
320matching. There are other methods of quoting a glob pattern or escaping
321its special characters; see your shell's manual.
322
323Beware that Fossil's `--ignore` option does not override explicit file
324mentions:
325
326    $ fossil add --ignore 'REALLY SECRET STUFF.txt' RE*
327
328You might think that would add everything beginning with `RE` *except*
329for `REALLY SECRET STUFF.txt`, but when a file is both given
330explicitly to Fossil and also matches an ignore rule, Fossil asks what
331you want to do with it in the default case; and it does not even ask
332if you gave the `-f` or `--force` option along with `--ignore`.
333
334The spaces in the ignored file name above bring us to another point:
335such file names must be quoted in Fossil glob patterns, lest Fossil
336interpret it as multiple glob patterns, but the shell interprets
337quotation marks itself.
338
339One way to fix both this and the previous problem is:
340
341    $ fossil add --ignore "'REALLY SECRET STUFF.txt'" READ*
342
343The nested quotation marks cause the inner set to be passed through to
344Fossil, and the more specific glob pattern at the end &mdash; that is,
345`READ*` vs `RE*` &mdash; avoids a conflict between explicitly-listed
346files and `--ignore` rules in the `fossil add` command.
347
348Another solution would be to use shell escaping instead of nested
349quoting:
350
351    $ fossil add --ignore "\"REALLY SECRET STUFF.txt\"" READ*
352
353It bears repeating that the two glob patterns here are not interpreted
354the same way when running this command from a *subdirectory* of the top
355checkout directory as when running it at the top of the checkout tree.
356If these files were in a subdirectory of the checkout tree called `doc`
357and that was your current working directory, the command would have to
358be:
359
360    $ fossil add --ignore "'doc/REALLY SECRET STUFF.txt'" READ*
361
362instead. The Fossil glob pattern still needs the `doc/` prefix because
363Fossil always interprets glob patterns from the base of the checkout
364directory, not from the current working directory as POSIX shells do.
365
366When in doubt, use `fossil status` after running commands like the
367above to make sure the right set of files were scheduled for insertion
368into the repository before checking the changes in. You never want to
369accidentally check something like a password, an API key, or the
370private half of a public cryptographic key into Fossil repository that
371can be read by people who should not have such secrets.
372
373
374### <a id="windows"></a>Windows
375
376Before we get into Windows-specific details here, beware that this
377section does not apply to the several Microsoft Windows extensions that
378provide POSIX semantics to Windows, for which you want to use the advice
379in [the POSIX section above](#posix) instead:
380
381  *  the ancient and rarely-used [Microsoft POSIX subsystem][mps];
382  *  its now-discontinued replacement feature, [Services for Unix][sfu]; or
383  *  their modern replacement, the [Windows Subsystem for Linux][wsl]
384
385[mps]: https://en.wikipedia.org/wiki/Microsoft_POSIX_subsystem
386[sfu]: https://en.wikipedia.org/wiki/Windows_Services_for_UNIX
387[wsl]: https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux
388
389(The latter is sometimes incorrectly called "Bash on Windows" or "Ubuntu
390on Windows," but the feature provides much more than just Bash or Ubuntu
391for Windows.)
392
393Neither standard Windows command shell &mdash; `cmd.exe` or PowerShell
394&mdash; expands glob patterns the way POSIX shells do. Windows command
395shells rely on the command itself to do the glob pattern expansion. The
396way this works depends on several factors:
397
398 *  the version of Windows you are using
399 *  which OS upgrades have been applied to it
400 *  the compiler that built your Fossil executable
401 *  whether you are running the command interactively
402 *  whether the command is built against a runtime system that does this
403    at all
404 *  whether the Fossil command is being run from a file named `*.BAT` vs
405    being named `*.CMD`
406
407Usually (but not always!) the C runtime library that your `fossil.exe`
408executable is built against does this glob expansion on Windows so the
409program proper does not have to. This may then interact with the way the
410Windows command shell you’re using handles argument quoting. Because of
411these differences, it is common to find perfectly valid Fossil command
412examples that were written and tested on a POSIX system which then fail
413when tried on Windows.
414
415The most common problem is figuring out how to get a glob pattern passed
416on the command line into `fossil.exe` without it being expanded by the C
417runtime library that your particular Fossil executable is linked to,
418which tries to act like [the POSIX systems described above](#posix). Windows is
419not strongly governed by POSIX, so it has not historically hewed closely
420to its strictures.
421
422For example, consider how you would set `crlf-glob` to `*` in order to
423get normal Windows text files with CR+LF line endings past Fossil's
424"looks like a binary file" check. The na&iuml;ve approach will not work:
425
426    C:\...> fossil setting crlf-glob *
427
428The C runtime library will expand that to the list of all files in the
429current directory, which will probably cause a Fossil error because
430Fossil expects either nothing or option flags after the setting's new
431value, not a list of file names. (To be fair, the same thing will happen
432on POSIX systems, only at the shell level, before `.../bin/fossil` even
433gets run by the shell.)
434
435Let's try again:
436
437    C:\...> fossil setting crlf-glob '*'
438
439Quoting the argument like that will work reliably on POSIX, but it may
440or may not work on Windows. If your Windows command shell interprets the
441quotes, it means `fossil.exe` will see only the bare `*` so the C
442runtime library it is linked to will likely expand the list of files in
443the current directory before the `setting` command gets a chance to
444parse the command line arguments, causing the same failure as above.
445This alternative only works if you’re using a Windows command shell that
446passes the quotes through to the executable *and* you have linked Fossil
447to a C runtime library that interprets the quotes properly itself,
448resulting in a bare `*` getting clear down to Fossil’s `setting` command
449parser.
450
451An approach that *will* work reliably is:
452
453    C:\...> echo * | fossil setting crlf-glob --args -
454
455This works because the built-in Windows command `echo` does not expand its
456arguments, and the `--args -` option makes Fossil read further command
457arguments from its standard input, which is connected to the output
458of `echo` by the pipe. (`-` is a common Unix convention meaning
459"standard input," which Fossil obeys.) A [batch script][fng.cmd] to automate this trick was
460posted on the now-inactive Fossil Mailing List.
461
462[fng.cmd]: https://www.mail-archive.com/fossil-users@lists.fossil-scm.org/msg25099.html
463
464(Ironically, this method will *not* work on POSIX systems because it is
465not up to the command to expand globs. The shell will expand the `*` in
466the `echo` command, so the list of file names will be passed to the
467`fossil` standard input, just as with the first example above!)
468
469Another (usually) correct approach which will work on both Windows and
470POSIX systems:
471
472    C:\...> fossil setting crlf-glob *,
473
474This works because the trailing comma prevents the glob pattern from
475matching any files, unless you happen to have files named with a
476trailing comma in the current directory. If the pattern matches no
477files, it is passed into Fossil's `main()` function as-is by the C
478runtime system. Since Fossil uses commas to separate multiple glob
479patterns, this means "all files from the root of the Fossil checkout
480directory downward and nothing else," which is of course equivalent to
481"all managed files in this repository," our original goal.
482
483
484## Experimenting
485
486To preview the effects of command line glob pattern expansion for
487various glob patterns (unquoted, quoted, comma-terminated), for any
488combination of command shell, OS, C run time, and Fossil version,
489precede the command you want to test with [`test-echo`][] like so:
490
491    $ fossil test-echo setting crlf-glob "*"
492    C:\> echo * | fossil test-echo setting crlf-glob --args -
493
494The [`test-glob`][] command is also handy to test if a string
495matches a glob pattern.
496
497[`test-echo`]: /help?cmd=test-echo
498[`test-glob`]: /help?cmd=test-glob
499
500
501## Converting `.gitignore` to `ignore-glob`
502
503Many other version control systems handle the specific case of
504ignoring certain files differently from Fossil: they have you create
505individual "ignore" files in each folder, which specify things ignored
506in that folder and below. Usually some form of glob patterns are used
507in those files, but the details differ from Fossil.
508
509In many simple cases, you can just store a top level "ignore" file in
510`.fossil-settings/ignore-glob`. But as usual, there will be lots of
511edge cases.
512
513[Git has a rich collection of ignore files][gitignore] which
514accumulate rules that affect the current command. There are global
515files, per-user files, per workspace unmanaged files, and fully
516version controlled files. Some of the files used have no set name, but
517are called out in configuration files.
518
519[gitignore]: https://git-scm.com/docs/gitignore
520
521In contrast, Fossil has a global setting and a local setting, but the local setting
522overrides the global rather than extending it. Similarly, a Fossil
523command's `--ignore` option replaces the `ignore-glob` setting rather
524than extending it.
525
526With that in mind, translating a `.gitignore` file into
527`.fossil-settings/ignore-glob` may be possible in many cases. Here are
528some of features of `.gitignore` and comments on how they relate to
529Fossil:
530
531 *  "A blank line matches no files...": same in Fossil.
532 *  "A line starting with # serves as a comment....": not in Fossil.
533 *  "Trailing spaces are ignored unless they are quoted..." is similar
534    in Fossil. All whitespace before and after a glob is trimmed in
535    Fossil unless quoted with single or double quotes. Git uses
536    backslash quoting instead, which Fossil does not.
537 *  "An optional prefix "!" which negates the pattern...": not in
538    Fossil.
539 *  Git's globs are relative to the location of the `.gitignore` file:
540    Fossil's globs are relative to the root of the workspace.
541 *  Git's globs and Fossil's globs treat directory separators
542    differently. Git includes a notation for zero or more directories
543    that is not needed in Fossil.
544
545### Example
546
547In a project with source and documentation:
548
549    work
550      +-- doc
551      +-- src
552
553The file `doc/.gitignore` might contain:
554
555    # Finished documents by pandoc via LaTeX
556    *.pdf
557    # Intermediate files
558    *.tex
559    *.toc
560    *.log
561    *.out
562    *.tmp
563
564Entries in `.fossil-settings/ignore-glob` with similar effect, also
565limited to the `doc` folder:
566
567    doc/*.pdf
568    doc/*.tex, doc/*.toc, doc/*.log, doc/*.out, doc/*.tmp
569
570
571
572
573
574## Implementation and References
575
576The implementation of the Fossil-specific glob pattern handling is here:
577
578:File            |:Description
579--------------------------------------------------------------------------------
580[`src/glob.c`][] | pattern list loading, parsing, and generic matching code
581[`src/file.c`][] | application of glob patterns to file names
582
583[`src/glob.c`]: https://fossil-scm.org/home/file/src/glob.c
584[`src/file.c`]: https://fossil-scm.org/home/file/src/file.c
585
586See the [Adding Features to Fossil][aff] document for broader details
587about finding and working with such code.
588
589The actual pattern matching leverages the `GLOB` operator in SQLite, so
590you may find [its documentation][gdoc], [source code][gsrc] and [test
591harness][gtst] helpful.
592
593[aff]:  ./adding_code.wiki
594[gdoc]: https://sqlite.org/lang_expr.html#like
595[gsrc]: https://www.sqlite.org/src/artifact?name=9d52522cc8ae7f5c&ln=570-768
596[gtst]: https://www.sqlite.org/src/artifact?name=66a2c9ac34f74f03&ln=586-673
597