1# File Name Glob Patterns 2 3 4A [glob pattern][glob] is a text expression that matches one or more 5file names using wild cards familiar to most users of a command line. 6For example, `*` is a glob that matches any name at all and 7`Readme.txt` is a glob that matches exactly one file. 8 9A glob should not be confused with a [regular expression][regexp] (RE), 10even though they use some of the same special characters for similar 11purposes, because [they are not fully compatible][greinc] pattern 12matching languages. Fossil uses globs when matching file names with the 13settings described in this document, not REs. 14 15[glob]: https://en.wikipedia.org/wiki/Glob_(programming) 16[greinc]: https://unix.stackexchange.com/a/57958/138 17[regexp]: https://en.wikipedia.org/wiki/Regular_expression 18 19These settings hold one or more file glob patterns to cause Fossil to 20give matching named files special treatment. Glob patterns are also 21accepted in options to certain commands and as query parameters to 22certain Fossil UI web pages. 23 24Where Fossil also accepts globs in commands, this handling may interact 25with your OS’s command shell or its C runtime system, because they may 26have their own glob pattern handling. We will detail such interactions 27below. 28 29 30## Syntax 31 32Where Fossil accepts glob patterns, it will usually accept a *list* of 33such patterns, each individual pattern separated from the others 34by white space or commas. If a glob must contain white spaces or 35commas, it can be quoted with either single or double quotation marks. 36A list is said to match if any one glob in the list 37matches. 38 39A glob pattern matches a given file name if it successfully consumes and 40matches the *entire* name. Partial matches are failed matches. 41 42Most characters in a glob pattern consume a single character of the file 43name and must match it exactly. For instance, “a” in a glob simply 44matches the letter “a” in the file name unless it is inside a special 45character sequence. 46 47Other characters have special meaning, and they may include otherwise 48normal characters to give them special meaning: 49 50:Pattern |:Effect 51--------------------------------------------------------------------- 52`*` | Matches any sequence of zero or more characters 53`?` | Matches exactly one character 54`[...]` | Matches one character from the enclosed list of characters 55`[^...]` | Matches one character *not* in the enclosed list 56 57Note that unlike [POSIX globs][pg], these special characters and 58sequences are allowed to match `/` directory separators as well as the 59initial `.` in the name of a hidden file or directory. This is because 60Fossil file names are stored as complete path names. The distinction 61between file name and directory name is “below” Fossil in this sense. 62 63[pg]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_13 64 65The bracket expressions above require some additional explanation: 66 67 * A range of characters may be specified with `-`, so `[a-f]` matches 68 exactly the same characters as `[abcdef]`. Ranges reflect Unicode 69 code points without any locale-specific collation sequence. 70 Therefore, this particular sequence never matches the Unicode 71 pre-composed character `é`, for example. (U+00E9) 72 73 * This dependence on character/code point ordering may have other 74 effects to surprise you. For example, the glob `[A-z]` not only 75 matches upper and lowercase ASCII letters, it also matches several 76 punctuation characters placed between `Z` and `a` in both ASCII and 77 Unicode: `[`, `\`, `]`, `^`, `_`, and <tt>\`</tt>. 78 79 * You may include a literal `-` in a list by placing it last, just 80 before the `]`. 81 82 * You may include a literal `]` in a list by making the first 83 character after the `[` or `[^`. At any other place, `]` ends the list. 84 85 * You may include a literal `^` in a list by placing it anywhere 86 except after the opening `[`. 87 88 * Beware that a range must be specified from low value to high 89 value: `[z-a]` does not match any character at all, preventing the 90 entire glob from matching. 91 92Some examples of character lists: 93 94:Pattern |:Effect 95--------------------------------------------------------------------- 96`[a-d]` | Matches any one of `a`, `b`, `c`, or `d` but not `ä` 97`[^a-d]` | Matches exactly one character other than `a`, `b`, `c`, or `d` 98`[0-9a-fA-F]` | Matches exactly one hexadecimal digit 99`[a-]` | Matches either `a` or `-` 100`[][]` | Matches either `]` or `[` 101`[^]]` | Matches exactly one character other than `]` 102`[]^]` | Matches either `]` or `^` 103`[^-]` | Matches exactly one character other than `-` 104 105White space means the specific ASCII characters TAB, LF, VT, FF, CR, 106and SPACE. Note that this does not include any of the many additional 107spacing characters available in Unicode such as 108U+00A0, NO-BREAK SPACE. 109 110Because both LF and CR are white space and leading and trailing spaces 111are stripped from each glob in a list, a list of globs may be broken 112into lines between globs when the list is stored in a file, as for a 113versioned setting. 114 115Note that 'single quotes' and "double quotes" are the ASCII straight 116quote characters, not any of the other quotation marks provided in 117Unicode and specifically not the "curly" quotes preferred by 118typesetters and word processors. 119 120 121## File Names to Match 122 123Before it is compared to a glob pattern, each file name is transformed 124to a canonical form: 125 126 * all directory separators are changed to `/` 127 * redundant slashes are removed 128 * all `.` path components are removed 129 * all `..` path components are resolved 130 131(There are additional details we are ignoring here, but they cover rare 132edge cases and follow the principle of least surprise.) 133 134The glob must match the *entire* canonical file name to be considered a 135match. 136 137The goal is to have a name that is the simplest possible for each 138particular file, and that will be the same regardless of the platform 139you run Fossil on. This is important when you have a repository cloned 140from multiple platforms and have globs in versioned settings: you want 141those settings to be interpreted the same way everywhere. 142 143Beware, however, that all glob matching in Fossil is case sensitive 144regardless of host platform and file system. This will not be a surprise 145on POSIX platforms where file names are usually treated case 146sensitively. However, most Windows file systems are case preserving but 147case insensitive. That is, on Windows, the names `ReadMe` and `README` 148are usually names of the same file. The same is true in other cases, 149such as by default on macOS file systems and in the file system drivers 150for Windows file systems running on non-Windows systems. (e.g. exfat on 151Linux.) Therefore, write your Fossil glob patterns to match the name of 152the file as checked into the repository. 153 154Some example cases: 155 156:Pattern |:Effect 157-------------------------------------------------------------------------------- 158`README` | Matches only a file named `README` in the root of the tree. It does not match a file named `src/README` because it does not include any characters that consume (and match) the `src/` part. 159`*/README` | Matches `src/README`. Unlike Unix file globs, it also matches `src/library/README`. However it does not match the file `README` in the root of the tree. 160`*README` | Matches `src/README` as well as the file `README` in the root of the tree as well as `foo/bar/README` or any other file named `README` in the tree. However, it also matches `A-DIFFERENT-README` and `src/DO-NOT-README`, or any other file whose name ends with `README`. 161`src/README` | Matches `src\README` on Windows because all directory separators are rewritten as `/` in the canonical name before the glob is matched. This makes it much easier to write globs that work on both Unix and Windows. 162`*.[ch]` | Matches every C source or header file in the tree at the root or at any depth. Again, this is (deliberately) different from Unix file globs and Windows wild cards. 163 164## Where Globs are Used 165 166### Settings that are Globs 167 168These settings are all lists of glob patterns: 169 170:Setting |:Description 171-------------------------------------------------------------------------------- 172`binary-glob` | Files that should be treated as binary files for committing and merging purposes 173`clean-glob` | Files that the [`clean`][] command will delete without prompting or allowing undo 174`crlf-glob` | Files in which it is okay to have `CR`, `CR`+`LF` or mixed line endings. Set to "`*`" to disable CR+LF checking 175`crnl-glob` | Alias for the `crlf-glob` setting 176`encoding-glob` | Files that the [`commit`][] command will ignore when issuing warnings about text files that may use another encoding than ASCII or UTF-8. Set to "`*`" to disable encoding checking 177`ignore-glob` | Files that the [`add`][], [`addremove`][], [`clean`][], and [`extras`][] commands will ignore 178`keep-glob` | Files that the [`clean`][] command will keep 179 180All may be [versioned, local, or global](settings.wiki). Use `fossil 181settings` to manage local and global settings, or a file in the 182repository's `.fossil-settings/` folder at the root of the tree named 183for each for versioned setting. 184 185Using versioned settings for these not only has the advantage that 186they are tracked in the repository just like the rest of your project, 187but you can more easily keep longer lists of more complicated glob 188patterns than would be practical in either local or global settings. 189 190The `ignore-glob` is an example of one setting that frequently grows 191to be an elaborate list of files that should be ignored by most 192commands. This is especially true when one (or more) IDEs are used in 193a project because each IDE has its own ideas of how and where to cache 194information that speeds up its browsing and building tasks but which 195need not be preserved in your project's history. 196 197 198### Commands that Refer to Globs 199 200Many of the commands that respect the settings containing globs have 201options to override some or all of the settings. These options are 202usually named to correspond to the setting they override, such as 203`--ignore` to override the `ignore-glob` setting. These commands are: 204 205 * [`add`][] 206 * [`addremove`][] 207 * [`changes`][] 208 * [`clean`][] 209 * [`commit`][] 210 * [`extras`][] 211 * [`merge`][] 212 * [`settings`][] 213 * [`status`][] 214 * [`touch`][] 215 * [`unset`][] 216 217The commands [`tarball`][] and [`zip`][] produce compressed archives of a 218specific checkin. They may be further restricted by options that 219specify glob patterns that name files to include or exclude rather 220than archiving the entire checkin. 221 222The commands [`http`][], [`cgi`][], [`server`][], and [`ui`][] that 223implement or support with web servers provide a mechanism to name some 224files to serve with static content where a list of glob patterns 225specifies what content may be served. 226 227[`add`]: /help?cmd=add 228[`addremove`]: /help?cmd=addremove 229[`changes`]: /help?cmd=changes 230[`clean`]: /help?cmd=clean 231[`commit`]: /help?cmd=commit 232[`extras`]: /help?cmd=extras 233[`merge`]: /help?cmd=merge 234[`settings`]: /help?cmd=settings 235[`status`]: /help?cmd=status 236[`touch`]: /help?cmd=touch 237[`unset`]: /help?cmd=unset 238 239[`tarball`]: /help?cmd=tarball 240[`zip`]: /help?cmd=zip 241 242[`http`]: /help?cmd=http 243[`cgi`]: /help?cmd=cgi 244[`server`]: /help?cmd=server 245[`ui`]: /help?cmd=ui 246 247 248### Web Pages that Refer to Globs 249 250The [`/timeline`][] page supports the query parameter `chng=GLOBLIST` that 251names a list of glob patterns defining which files to focus the 252timeline on. It also has the query parameters `t=TAG` and `r=TAG` that 253names a tag to focus on, which can be configured with `ms=STYLE` to 254use a glob pattern to match tag names instead of the default exact 255match or a couple of other comparison styles. 256 257The pages [`/tarball`][] and [`/zip`][] generate compressed archives 258of a specific checkin. They may be further restricted by query 259parameters that specify glob patterns that name files to include or 260exclude rather than taking the entire checkin. 261 262[`/timeline`]: /help?cmd=/timeline 263[`/tarball`]: /help?cmd=/tarball 264[`/zip`]: /help?cmd=/zip 265 266 267## Platform Quirks 268 269Fossil glob patterns are based on the glob pattern feature of POSIX 270shells. Fossil glob patterns also have a quoting mechanism, discussed 271above. Because other parts of your operating system may interpret glob 272patterns and quotes separately from Fossil, it is often difficult to 273give glob patterns correctly to Fossil on the command line. Quotes and 274special characters in glob patterns are likely to be interpreted when 275given as part of a `fossil` command, causing unexpected behavior. 276 277These problems do not affect [versioned settings files](settings.wiki) 278or Admin → Settings in Fossil UI. Consequently, it is better to 279set long-term `*-glob` settings via these methods than to use `fossil 280settings` commands. 281 282That advice does not help you when you are giving one-off glob patterns 283in `fossil` commands. The remainder of this section gives remedies and 284workarounds for these problems. 285 286 287### <a id="posix"></a>POSIX Systems 288 289If you are using Fossil on a system with a POSIX-compatible shell 290— Linux, macOS, the BSDs, Unix, Cygwin, WSL etc. — the shell 291may expand the glob patterns before passing the result to the `fossil` 292executable. 293 294Sometimes this is exactly what you want. Consider this command for 295example: 296 297 $ fossil add RE* 298 299If you give that command in a directory containing `README.txt` and 300`RELEASE-NOTES.txt`, the shell will expand the command to: 301 302 $ fossil add README.txt RELEASE-NOTES.txt 303 304…which is compatible with the `fossil add` command's argument list, 305which allows multiple files. 306 307Now consider what happens instead if you say: 308 309 $ fossil add --ignore RE* src/*.c 310 311This *does not* do what you want because the shell will expand both `RE*` 312and `src/*.c`, causing one of the two files matching the `RE*` glob 313pattern to be ignored and the other to be added to the repository. You 314need to say this in that case: 315 316 $ fossil add --ignore 'RE*' src/*.c 317 318The single quotes force a POSIX shell to pass the `RE*` glob pattern 319through to Fossil untouched, which will do its own glob pattern 320matching. There are other methods of quoting a glob pattern or escaping 321its special characters; see your shell's manual. 322 323Beware that Fossil's `--ignore` option does not override explicit file 324mentions: 325 326 $ fossil add --ignore 'REALLY SECRET STUFF.txt' RE* 327 328You might think that would add everything beginning with `RE` *except* 329for `REALLY SECRET STUFF.txt`, but when a file is both given 330explicitly to Fossil and also matches an ignore rule, Fossil asks what 331you want to do with it in the default case; and it does not even ask 332if you gave the `-f` or `--force` option along with `--ignore`. 333 334The spaces in the ignored file name above bring us to another point: 335such file names must be quoted in Fossil glob patterns, lest Fossil 336interpret it as multiple glob patterns, but the shell interprets 337quotation marks itself. 338 339One way to fix both this and the previous problem is: 340 341 $ fossil add --ignore "'REALLY SECRET STUFF.txt'" READ* 342 343The nested quotation marks cause the inner set to be passed through to 344Fossil, and the more specific glob pattern at the end — that is, 345`READ*` vs `RE*` — avoids a conflict between explicitly-listed 346files and `--ignore` rules in the `fossil add` command. 347 348Another solution would be to use shell escaping instead of nested 349quoting: 350 351 $ fossil add --ignore "\"REALLY SECRET STUFF.txt\"" READ* 352 353It bears repeating that the two glob patterns here are not interpreted 354the same way when running this command from a *subdirectory* of the top 355checkout directory as when running it at the top of the checkout tree. 356If these files were in a subdirectory of the checkout tree called `doc` 357and that was your current working directory, the command would have to 358be: 359 360 $ fossil add --ignore "'doc/REALLY SECRET STUFF.txt'" READ* 361 362instead. The Fossil glob pattern still needs the `doc/` prefix because 363Fossil always interprets glob patterns from the base of the checkout 364directory, not from the current working directory as POSIX shells do. 365 366When in doubt, use `fossil status` after running commands like the 367above to make sure the right set of files were scheduled for insertion 368into the repository before checking the changes in. You never want to 369accidentally check something like a password, an API key, or the 370private half of a public cryptographic key into Fossil repository that 371can be read by people who should not have such secrets. 372 373 374### <a id="windows"></a>Windows 375 376Before we get into Windows-specific details here, beware that this 377section does not apply to the several Microsoft Windows extensions that 378provide POSIX semantics to Windows, for which you want to use the advice 379in [the POSIX section above](#posix) instead: 380 381 * the ancient and rarely-used [Microsoft POSIX subsystem][mps]; 382 * its now-discontinued replacement feature, [Services for Unix][sfu]; or 383 * their modern replacement, the [Windows Subsystem for Linux][wsl] 384 385[mps]: https://en.wikipedia.org/wiki/Microsoft_POSIX_subsystem 386[sfu]: https://en.wikipedia.org/wiki/Windows_Services_for_UNIX 387[wsl]: https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux 388 389(The latter is sometimes incorrectly called "Bash on Windows" or "Ubuntu 390on Windows," but the feature provides much more than just Bash or Ubuntu 391for Windows.) 392 393Neither standard Windows command shell — `cmd.exe` or PowerShell 394— expands glob patterns the way POSIX shells do. Windows command 395shells rely on the command itself to do the glob pattern expansion. The 396way this works depends on several factors: 397 398 * the version of Windows you are using 399 * which OS upgrades have been applied to it 400 * the compiler that built your Fossil executable 401 * whether you are running the command interactively 402 * whether the command is built against a runtime system that does this 403 at all 404 * whether the Fossil command is being run from a file named `*.BAT` vs 405 being named `*.CMD` 406 407Usually (but not always!) the C runtime library that your `fossil.exe` 408executable is built against does this glob expansion on Windows so the 409program proper does not have to. This may then interact with the way the 410Windows command shell you’re using handles argument quoting. Because of 411these differences, it is common to find perfectly valid Fossil command 412examples that were written and tested on a POSIX system which then fail 413when tried on Windows. 414 415The most common problem is figuring out how to get a glob pattern passed 416on the command line into `fossil.exe` without it being expanded by the C 417runtime library that your particular Fossil executable is linked to, 418which tries to act like [the POSIX systems described above](#posix). Windows is 419not strongly governed by POSIX, so it has not historically hewed closely 420to its strictures. 421 422For example, consider how you would set `crlf-glob` to `*` in order to 423get normal Windows text files with CR+LF line endings past Fossil's 424"looks like a binary file" check. The naïve approach will not work: 425 426 C:\...> fossil setting crlf-glob * 427 428The C runtime library will expand that to the list of all files in the 429current directory, which will probably cause a Fossil error because 430Fossil expects either nothing or option flags after the setting's new 431value, not a list of file names. (To be fair, the same thing will happen 432on POSIX systems, only at the shell level, before `.../bin/fossil` even 433gets run by the shell.) 434 435Let's try again: 436 437 C:\...> fossil setting crlf-glob '*' 438 439Quoting the argument like that will work reliably on POSIX, but it may 440or may not work on Windows. If your Windows command shell interprets the 441quotes, it means `fossil.exe` will see only the bare `*` so the C 442runtime library it is linked to will likely expand the list of files in 443the current directory before the `setting` command gets a chance to 444parse the command line arguments, causing the same failure as above. 445This alternative only works if you’re using a Windows command shell that 446passes the quotes through to the executable *and* you have linked Fossil 447to a C runtime library that interprets the quotes properly itself, 448resulting in a bare `*` getting clear down to Fossil’s `setting` command 449parser. 450 451An approach that *will* work reliably is: 452 453 C:\...> echo * | fossil setting crlf-glob --args - 454 455This works because the built-in Windows command `echo` does not expand its 456arguments, and the `--args -` option makes Fossil read further command 457arguments from its standard input, which is connected to the output 458of `echo` by the pipe. (`-` is a common Unix convention meaning 459"standard input," which Fossil obeys.) A [batch script][fng.cmd] to automate this trick was 460posted on the now-inactive Fossil Mailing List. 461 462[fng.cmd]: https://www.mail-archive.com/fossil-users@lists.fossil-scm.org/msg25099.html 463 464(Ironically, this method will *not* work on POSIX systems because it is 465not up to the command to expand globs. The shell will expand the `*` in 466the `echo` command, so the list of file names will be passed to the 467`fossil` standard input, just as with the first example above!) 468 469Another (usually) correct approach which will work on both Windows and 470POSIX systems: 471 472 C:\...> fossil setting crlf-glob *, 473 474This works because the trailing comma prevents the glob pattern from 475matching any files, unless you happen to have files named with a 476trailing comma in the current directory. If the pattern matches no 477files, it is passed into Fossil's `main()` function as-is by the C 478runtime system. Since Fossil uses commas to separate multiple glob 479patterns, this means "all files from the root of the Fossil checkout 480directory downward and nothing else," which is of course equivalent to 481"all managed files in this repository," our original goal. 482 483 484## Experimenting 485 486To preview the effects of command line glob pattern expansion for 487various glob patterns (unquoted, quoted, comma-terminated), for any 488combination of command shell, OS, C run time, and Fossil version, 489precede the command you want to test with [`test-echo`][] like so: 490 491 $ fossil test-echo setting crlf-glob "*" 492 C:\> echo * | fossil test-echo setting crlf-glob --args - 493 494The [`test-glob`][] command is also handy to test if a string 495matches a glob pattern. 496 497[`test-echo`]: /help?cmd=test-echo 498[`test-glob`]: /help?cmd=test-glob 499 500 501## Converting `.gitignore` to `ignore-glob` 502 503Many other version control systems handle the specific case of 504ignoring certain files differently from Fossil: they have you create 505individual "ignore" files in each folder, which specify things ignored 506in that folder and below. Usually some form of glob patterns are used 507in those files, but the details differ from Fossil. 508 509In many simple cases, you can just store a top level "ignore" file in 510`.fossil-settings/ignore-glob`. But as usual, there will be lots of 511edge cases. 512 513[Git has a rich collection of ignore files][gitignore] which 514accumulate rules that affect the current command. There are global 515files, per-user files, per workspace unmanaged files, and fully 516version controlled files. Some of the files used have no set name, but 517are called out in configuration files. 518 519[gitignore]: https://git-scm.com/docs/gitignore 520 521In contrast, Fossil has a global setting and a local setting, but the local setting 522overrides the global rather than extending it. Similarly, a Fossil 523command's `--ignore` option replaces the `ignore-glob` setting rather 524than extending it. 525 526With that in mind, translating a `.gitignore` file into 527`.fossil-settings/ignore-glob` may be possible in many cases. Here are 528some of features of `.gitignore` and comments on how they relate to 529Fossil: 530 531 * "A blank line matches no files...": same in Fossil. 532 * "A line starting with # serves as a comment....": not in Fossil. 533 * "Trailing spaces are ignored unless they are quoted..." is similar 534 in Fossil. All whitespace before and after a glob is trimmed in 535 Fossil unless quoted with single or double quotes. Git uses 536 backslash quoting instead, which Fossil does not. 537 * "An optional prefix "!" which negates the pattern...": not in 538 Fossil. 539 * Git's globs are relative to the location of the `.gitignore` file: 540 Fossil's globs are relative to the root of the workspace. 541 * Git's globs and Fossil's globs treat directory separators 542 differently. Git includes a notation for zero or more directories 543 that is not needed in Fossil. 544 545### Example 546 547In a project with source and documentation: 548 549 work 550 +-- doc 551 +-- src 552 553The file `doc/.gitignore` might contain: 554 555 # Finished documents by pandoc via LaTeX 556 *.pdf 557 # Intermediate files 558 *.tex 559 *.toc 560 *.log 561 *.out 562 *.tmp 563 564Entries in `.fossil-settings/ignore-glob` with similar effect, also 565limited to the `doc` folder: 566 567 doc/*.pdf 568 doc/*.tex, doc/*.toc, doc/*.log, doc/*.out, doc/*.tmp 569 570 571 572 573 574## Implementation and References 575 576The implementation of the Fossil-specific glob pattern handling is here: 577 578:File |:Description 579-------------------------------------------------------------------------------- 580[`src/glob.c`][] | pattern list loading, parsing, and generic matching code 581[`src/file.c`][] | application of glob patterns to file names 582 583[`src/glob.c`]: https://fossil-scm.org/home/file/src/glob.c 584[`src/file.c`]: https://fossil-scm.org/home/file/src/file.c 585 586See the [Adding Features to Fossil][aff] document for broader details 587about finding and working with such code. 588 589The actual pattern matching leverages the `GLOB` operator in SQLite, so 590you may find [its documentation][gdoc], [source code][gsrc] and [test 591harness][gtst] helpful. 592 593[aff]: ./adding_code.wiki 594[gdoc]: https://sqlite.org/lang_expr.html#like 595[gsrc]: https://www.sqlite.org/src/artifact?name=9d52522cc8ae7f5c&ln=570-768 596[gtst]: https://www.sqlite.org/src/artifact?name=66a2c9ac34f74f03&ln=586-673 597