• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

aspell/H11-Oct-2021-440306

bincimapmime/H11-Oct-2021-2,3501,593

common/H03-May-2022-5,9844,059

desktop/H03-May-2022-204128

doc/H11-Oct-2021-11,4299,689

filters/H03-May-2022-11,7028,025

index/H11-Oct-2021-6,2904,195

internfile/H11-Oct-2021-7,4024,932

kde/kioslave/H11-Oct-2021-2,5121,847

m4/H11-Oct-2021-9,2908,409

python/H03-May-2022-11,8129,539

qtgui/H11-Oct-2021-117,86276,971

query/H11-Oct-2021-7,6635,382

rcldb/H11-Oct-2021-10,4737,069

sampleconf/H11-Oct-2021-2,7281,516

testmains/H11-Oct-2021-8869

unac/H11-Oct-2021-15,9564,229

utils/H03-May-2022-29,75121,546

xaposix/H11-Oct-2021-284107

COPYINGH A D20-Aug-202017.6 KiB341281

ChangeLogH A D26-Feb-2021326.1 KiB10,5537,167

INSTALLH A D26-Feb-202156.9 KiB1,349967

Makefile.amH A D11-Oct-202120.2 KiB792736

Makefile.inH A D03-May-2022106 KiB2,8682,696

READMEH A D26-Feb-2021199.1 KiB4,6133,359

aclocal.m4H A D07-Oct-202195.2 KiB2,5402,379

compileH A D11-Oct-20217.2 KiB349259

config.guessH A D11-Oct-202143.2 KiB1,4811,288

config.rpathH A D20-Aug-202018.1 KiB685588

config.subH A D11-Oct-202135.3 KiB1,8021,661

configureH A D03-May-2022672.4 KiB22,20218,984

configure.acH A D07-Oct-202119.3 KiB600518

depcompH A D11-Oct-202123 KiB792502

install-shH A D11-Oct-202115 KiB519337

ltmain.shH A D06-Jun-2018316.8 KiB11,1577,986

missingH A D11-Oct-20216.7 KiB216143

ylwrapH A D11-Oct-20211.5 KiB5833

README

1
2More documentation can be found in the doc/ directory or at http://www.recoll.org
3
4
5                               Recoll user manual
6
7  Jean-Francois Dockes
8
9   <jfd@recoll.org>
10
11   Copyright (c) 2005-2015 Jean-Francois Dockes
12
13   Permission is granted to copy, distribute and/or modify this document
14   under the terms of the GNU Free Documentation License, Version 1.3 or any
15   later version published by the Free Software Foundation; with no Invariant
16   Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the
17   license can be found at the following location: GNU web site.
18
19   This document introduces full text search notions and describes the
20   installation and use of the Recoll application. This version describes
21   Recoll 1.21.
22
23     ----------------------------------------------------------------------
24
25   Table of Contents
26
27   1. Introduction
28
29                1.1. Giving it a try
30
31                1.2. Full text search
32
33                1.3. Recoll overview
34
35   2. Indexing
36
37                2.1. Introduction
38
39                             2.1.1. Indexing modes
40
41                             2.1.2. Configurations, multiple indexes
42
43                             2.1.3. Document types
44
45                             2.1.4. Indexing failures
46
47                             2.1.5. Recovery
48
49                2.2. Index storage
50
51                             2.2.1. Xapian index formats
52
53                             2.2.2. Security aspects
54
55                2.3. Index configuration
56
57                             2.3.1. Multiple indexes
58
59                             2.3.2. Index case and diacritics sensitivity
60
61                             2.3.3. The index configuration GUI
62
63                2.4. Indexing WEB pages you wisit
64
65                2.5. Extended attributes data
66
67                2.6. Importing external tags
68
69                2.7. Periodic indexing
70
71                             2.7.1. Running indexing
72
73                             2.7.2. Using cron to automate indexing
74
75                2.8. Real time indexing
76
77                             2.8.1. Slowing down the reindexing rate for fast
78                             changing files
79
80   3. Searching
81
82                3.1. Searching with the Qt graphical user interface
83
84                             3.1.1. Simple search
85
86                             3.1.2. The default result list
87
88                             3.1.3. The result table
89
90                             3.1.4. Running arbitrary commands on result
91                             files (1.20 and later)
92
93                             3.1.5. Displaying thumbnails
94
95                             3.1.6. The preview window
96
97                             3.1.7. The Query Fragments window
98
99                             3.1.8. Complex/advanced search
100
101                             3.1.9. The term explorer tool
102
103                             3.1.10. Multiple indexes
104
105                             3.1.11. Document history
106
107                             3.1.12. Sorting search results and collapsing
108                             duplicates
109
110                             3.1.13. Search tips, shortcuts
111
112                             3.1.14. Saving and restoring queries (1.21 and
113                             later)
114
115                             3.1.15. Customizing the search interface
116
117                3.2. Searching with the KDE KIO slave
118
119                             3.2.1. What's this
120
121                             3.2.2. Searchable documents
122
123                3.3. Searching on the command line
124
125                3.4. Path translations
126
127                3.5. The query language
128
129                             3.5.1. Modifiers
130
131                3.6. Search case and diacritics sensitivity
132
133                3.7. Anchored searches and wildcards
134
135                             3.7.1. More about wildcards
136
137                             3.7.2. Anchored searches
138
139                3.8. Desktop integration
140
141                             3.8.1. Hotkeying recoll
142
143                             3.8.2. The KDE Kicker Recoll applet
144
145   4. Programming interface
146
147                4.1. Writing a document input handler
148
149                             4.1.1. Simple input handlers
150
151                             4.1.2. "Multiple" handlers
152
153                             4.1.3. Telling Recoll about the handler
154
155                             4.1.4. Input handler HTML output
156
157                             4.1.5. Page numbers
158
159                4.2. Field data processing
160
161                4.3. API
162
163                             4.3.1. Interface elements
164
165                             4.3.2. Python interface
166
167   5. Installation and configuration
168
169                5.1. Installing a binary copy
170
171                5.2. Supporting packages
172
173                5.3. Building from source
174
175                             5.3.1. Prerequisites
176
177                             5.3.2. Building
178
179                             5.3.3. Installation
180
181                5.4. Configuration overview
182
183                             5.4.1. Environment variables
184
185                             5.4.2. The main configuration file, recoll.conf
186
187                             5.4.3. The fields file
188
189                             5.4.4. The mimemap file
190
191                             5.4.5. The mimeconf file
192
193                             5.4.6. The mimeview file
194
195                             5.4.7. The ptrans file
196
197                             5.4.8. Examples of configuration adjustments
198
199Chapter 1. Introduction
200
2011.1. Giving it a try
202
203   If you do not like reading manuals (who does?) but wish to give Recoll a
204   try, just install the application and start the recoll graphical user
205   interface (GUI), which will ask permission to index your home directory by
206   default, allowing you to search immediately after indexing completes.
207
208   Do not do this if your home directory contains a huge number of documents
209   and you do not want to wait or are very short on disk space. In this case,
210   you may first want to customize the configuration to restrict the indexed
211   area (for the very impatient with a completed package install, from the
212   recoll GUI: Preferences -> Indexing configuration, then adjust the Top
213   directories section).
214
215   Also be aware that you may need to install the appropriate supporting
216   applications for document types that need them (for example antiword for
217   Microsoft Word files).
218
2191.2. Full text search
220
221   Recoll is a full text search application. Full text search finds your data
222   by content rather than by external attributes (like a file name). You
223   specify words (terms) which should or should not appear in the text you
224   are looking for, and receive in return a list of matching documents,
225   ordered so that the most relevant documents will appear first.
226
227   You do not need to remember in what file or email message you stored a
228   given piece of information. You just ask for related terms, and the tool
229   will return a list of documents where these terms are prominent, in a
230   similar way to Internet search engines.
231
232   Full text search applications try to determine which documents are most
233   relevant to the search terms you provide. Computer algorithms for
234   determining relevance can be very complex, and in general are inferior to
235   the power of the human mind to rapidly determine relevance. The quality of
236   relevance guessing is probably the most important aspect when evaluating a
237   search application.
238
239   In many cases, you are looking for all the forms of a word, including
240   plurals, different tenses for a verb, or terms derived from the same root
241   or stem (example: floor, floors, floored, flooring...). Queries are
242   usually automatically expanded to all such related terms (words that
243   reduce to the same stem). This can be prevented for searching for a
244   specific form.
245
246   Stemming, by itself, does not accommodate for misspellings or phonetic
247   searches. A full text search application may also support this form of
248   approximation. For example, a search for aliterattion returning no result
249   may propose, depending on index contents, alliteration alteration
250   alterations altercation as possible replacement terms.
251
2521.3. Recoll overview
253
254   Recoll uses the Xapian information retrieval library as its storage and
255   retrieval engine. Xapian is a very mature package using a sophisticated
256   probabilistic ranking model.
257
258   The Xapian library manages an index database which describes where terms
259   appear in your document files. It efficiently processes the complex
260   queries which are produced by the Recoll query expansion mechanism, and is
261   in charge of the all-important relevance computation task.
262
263   Recoll provides the mechanisms and interface to get data into and out of
264   the index. This includes translating the many possible document formats
265   into pure text, handling term variations (using Xapian stemmers), and
266   spelling approximations (using the aspell speller), interpreting user
267   queries and presenting results.
268
269   In a shorter way, Recoll does the dirty footwork, Xapian deals with the
270   intelligent parts of the process.
271
272   The Xapian index can be big (roughly the size of the original document
273   set), but it is not a document archive. Recoll can only display documents
274   that still exist at the place from which they were indexed. (Actually,
275   there is a way to reconstruct a document from the information in the
276   index, but the result is not nice, as all formatting, punctuation and
277   capitalization are lost).
278
279   Recoll stores all internal data in Unicode UTF-8 format, and it can index
280   files of many types with different character sets, encodings, and
281   languages into the same index. It can process documents embedded inside
282   other documents (for example a pdf document stored inside a Zip archive
283   sent as an email attachment...), down to an arbitrary depth.
284
285   Stemming is the process by which Recoll reduces words to their radicals so
286   that searching does not depend, for example, on a word being singular or
287   plural (floor, floors), or on a verb tense (flooring, floored). Because
288   the mechanisms used for stemming depend on the specific grammatical rules
289   for each language, there is a separate Xapian stemmer module for most
290   common languages where stemming makes sense.
291
292   Recoll stores the unstemmed versions of terms in the main index and uses
293   auxiliary databases for term expansion (one for each stemming language),
294   which means that you can switch stemming languages between searches, or
295   add a language without needing a full reindex.
296
297   Storing documents written in different languages in the same index is
298   possible, and commonly done. In this situation, you can specify several
299   stemming languages for the index.
300
301   Recoll currently makes no attempt at automatic language recognition, which
302   means that the stemmer will sometimes be applied to terms from other
303   languages with potentially strange results. In practise, even if this
304   introduces possibilities of confusion, this approach has been proven quite
305   useful, and it is much less cumbersome than separating your documents
306   according to what language they are written in.
307
308   Before version 1.18, Recoll stripped most accents and diacritics from
309   terms, and converted them to lower case before either storing them in the
310   index or searching for them. As a consequence, it was impossible to search
311   for a particular capitalization of a term (US / us), or to discriminate
312   two terms based on diacritics (sake / sake, mate / mate).
313
314   As of version 1.18, Recoll can optionally store the raw terms, without
315   accent stripping or case conversion. In this configuration, it is still
316   possible (and most common) for a query to be insensitive to case and/or
317   diacritics. Appropriate term expansions are performed before actually
318   accessing the main index. This is described in more detail in the section
319   about index case and diacritics sensitivity.
320
321   Recoll has many parameters which define exactly what to index, and how to
322   classify and decode the source documents. These are kept in configuration
323   files. A default configuration is copied into a standard location (usually
324   something like /usr/[local/]share/recoll/examples) during installation.
325   The default values set by the configuration files in this directory may be
326   overridden by values that you set inside your personal configuration,
327   found by default in the .recoll sub-directory of your home directory. The
328   default configuration will index your home directory with default
329   parameters and should be sufficient for giving Recoll a try, but you may
330   want to adjust it later, which can be done either by editing the text
331   files or by using configuration menus in the recoll GUI. Some other
332   parameters affecting only the recoll GUI are stored in the standard
333   location defined by Qt.
334
335   The indexing process is started automatically the first time you execute
336   the recoll GUI. Indexing can also be performed by executing the
337   recollindex command. Recoll indexing is multithreaded by default when
338   appropriate hardware resources are available, and can perform in parallel
339   multiple tasks among text extraction, segmentation and index updates.
340
341   Searches are usually performed inside the recoll GUI, which has many
342   options to help you find what you are looking for. However, there are
343   other ways to perform Recoll searches: mostly a command line interface, a
344   Python programming interface, a KDE KIO slave module, and Ubuntu Unity
345   Lens (for older versions) or Scope (for current versions) modules.
346
347Chapter 2. Indexing
348
3492.1. Introduction
350
351   Indexing is the process by which the set of documents is analyzed and the
352   data entered into the database. Recoll indexing is normally incremental:
353   documents will only be processed if they have been modified since the last
354   run. On the first execution, all documents will need processing. A full
355   index build can be forced later by specifying an option to the indexing
356   command (recollindex -z or -Z).
357
358   recollindex skips files which caused an error during a previous pass. This
359   is a performance optimization, and a new behaviour in version 1.21 (failed
360   files were always retried by previous versions). The command line option
361   -k can be set to retry failed files, for example after updating a filter.
362
363   The following sections give an overview of different aspects of the
364   indexing processes and configuration, with links to detailed sections.
365
366   Depending on your data, temporary files may be needed during indexing,
367   some of them possibly quite big. You can use the RECOLL_TMPDIR or TMPDIR
368   environment variables to determine where they are created (the default is
369   to use /tmp). Using TMPDIR has the nice property that it may also be taken
370   into account by auxiliary commands executed by recollindex.
371
372  2.1.1. Indexing modes
373
374   Recoll indexing can be performed along two different modes:
375
376     o Periodic (or batch) indexing: indexing takes place at discrete times,
377       by executing the recollindex command. The typical usage is to have a
378       nightly indexing run programmed into your cron file.
379
380     o Real time indexing: indexing takes place as soon as a file is created
381       or changed. recollindex runs as a daemon and uses a file system
382       alteration monitor such as inotify, Fam or Gamin to detect file
383       changes.
384
385   The choice between the two methods is mostly a matter of preference, and
386   they can be combined by setting up multiple indexes (ie: use periodic
387   indexing on a big documentation directory, and real time indexing on a
388   small home directory). Monitoring a big file system tree can consume
389   significant system resources.
390
391   The choice of method and the parameters used can be configured from the
392   recoll GUI: Preferences -> Indexing schedule
393
394  2.1.2. Configurations, multiple indexes
395
396   The parameters describing what is to be indexed and local preferences are
397   defined in text files contained in a configuration directory.
398
399   All parameters have defaults, defined in system-wide files.
400
401   Without further configuration, Recoll will index all appropriate files
402   from your home directory, with a reasonable set of defaults.
403
404   A default personal configuration directory ($HOME/.recoll/) is created
405   when a Recoll program is first executed. It is possible to create other
406   configuration directories, and use them by setting the RECOLL_CONFDIR
407   environment variable, or giving the -c option to any of the Recoll
408   commands.
409
410   In some cases, it may be interesting to index different areas of the file
411   system to separate databases. You can do this by using multiple
412   configuration directories, each indexing a file system area to a specific
413   database. Typically, this would be done to separate personal and shared
414   indexes, or to take advantage of the organization of your data to improve
415   search precision.
416
417   The generated indexes can be queried concurrently in a transparent manner.
418
419   For index generation, multiple configurations are totally independent from
420   each other. When multiple indexes need to be used for a single search,
421   some parameters should be consistent among the configurations.
422
423  2.1.3. Document types
424
425   Recoll knows about quite a few different document types. The parameters
426   for document types recognition and processing are set in configuration
427   files.
428
429   Most file types, like HTML or word processing files, only hold one
430   document. Some file types, like email folders or zip archives, can hold
431   many individually indexed documents, which may themselves be compound
432   ones. Such hierarchies can go quite deep, and Recoll can process, for
433   example, a LibreOffice document stored as an attachment to an email
434   message inside an email folder archived in a zip file...
435
436   Recoll indexing processes plain text, HTML, OpenDocument
437   (Open/LibreOffice), email formats, and a few others internally.
438
439   Other file types (ie: postscript, pdf, ms-word, rtf ...) need external
440   applications for preprocessing. The list is in the installation section.
441   After every indexing operation, Recoll updates a list of commands that
442   would be needed for indexing existing files types. This list can be
443   displayed by selecting the menu option File -> Show Missing Helpers in the
444   recoll GUI. It is stored in the missing text file inside the configuration
445   directory.
446
447   By default, Recoll will try to index any file type that it has a way to
448   read. This is sometimes not desirable, and there are ways to either
449   exclude some types, or on the contrary to define a positive list of types
450   to be indexed. In the latter case, any type not in the list will be
451   ignored.
452
453   Excluding types can be done by adding wildcard name patterns to the
454   skippedNames list, which can be done from the GUI Index configuration
455   menu. For versions 1.20 and later, you can alternatively set the
456   excludedmimetypes list in the configuration file. This can be redefined
457   for subdirectories.
458
459   You can also define an exclusive list of MIME types to be indexed (no
460   others will be indexed), by setting the indexedmimetypes configuration
461   variable. Example:
462
463 indexedmimetypes = text/html application/pdf
464
465
466   It is possible to redefine this parameter for subdirectories. Example:
467
468 [/path/to/my/dir]
469 indexedmimetypes = application/pdf
470
471
472   (When using sections like this, don't forget that they remain in effect
473   until the end of the file or another section indicator).
474
475   excludedmimetypes or indexedmimetypes, can be set either by editing the
476   main configuration file (recoll.conf), or from the GUI index configuration
477   tool.
478
479  2.1.4. Indexing failures
480
481   Indexing may fail for some documents, for a number of reasons: a helper
482   program may be missing, the document may be corrupt, we may fail to
483   uncompress a file because no file system space is available, etc.
484
485   Recoll versions prior to 1.21 always retried to index files which had
486   previously caused an error. This guaranteed that anything that may have
487   become indexable (for example because a helper had been installed) would
488   be indexed. However this was bad for performance because some indexing
489   failures may be quite costly (for example failing to uncompress a big file
490   because of insufficient disk space).
491
492   The indexer in Recoll versions 1.21 and later do not retry failed file by
493   default. Retrying will only occur if an explicit option (-k) is set on the
494   recollindex command line, or if a script executed when recollindex starts
495   up says so. The script is defined by a configuration variable
496   (checkneedretryindexscript), and makes a rather lame attempt at deciding
497   if a helper command may have been installed, by checking if any of the
498   common bin directories have changed.
499
500  2.1.5. Recovery
501
502   In the rare case where the index becomes corrupted (which can signal
503   itself by weird search results or crashes), the index files need to be
504   erased before restarting a clean indexing pass. Just delete the xapiandb
505   directory (see next section), or, alternatively, start the next
506   recollindex with the -z option, which will reset the database before
507   indexing.
508
5092.2. Index storage
510
511   The default location for the index data is the xapiandb subdirectory of
512   the Recoll configuration directory, typically $HOME/.recoll/xapiandb/.
513   This can be changed via two different methods (with different purposes):
514
515     o You can specify a different configuration directory by setting the
516       RECOLL_CONFDIR environment variable, or using the -c option to the
517       Recoll commands. This method would typically be used to index
518       different areas of the file system to different indexes. For example,
519       if you were to issue the following commands:
520
521 export RECOLL_CONFDIR=~/.indexes-email
522 recoll
523
524
525       Then Recoll would use configuration files stored in ~/.indexes-email/
526       and, (unless specified otherwise in recoll.conf) would look for the
527       index in ~/.indexes-email/xapiandb/.
528
529       Using multiple configuration directories and configuration options
530       allows you to tailor multiple configurations and indexes to handle
531       whatever subset of the available data you wish to make searchable.
532
533     o For a given configuration directory, you can specify a non-default
534       storage location for the index by setting the dbdir parameter in the
535       configuration file (see the configuration section). This method would
536       mainly be of use if you wanted to keep the configuration directory in
537       its default location, but desired another location for the index,
538       typically out of disk occupation concerns.
539
540   The size of the index is determined by the size of the set of documents,
541   but the ratio can vary a lot. For a typical mixed set of documents, the
542   index size will often be close to the data set size. In specific cases (a
543   set of compressed mbox files for example), the index can become much
544   bigger than the documents. It may also be much smaller if the documents
545   contain a lot of images or other non-indexed data (an extreme example
546   being a set of mp3 files where only the tags would be indexed).
547
548   Of course, images, sound and video do not increase the index size, which
549   means that nowadays (2012), typically, even a big index will be negligible
550   against the total amount of data on the computer.
551
552   The index data directory (xapiandb) only contains data that can be
553   completely rebuilt by an index run (as long as the original documents
554   exist), and it can always be destroyed safely.
555
556  2.2.1. Xapian index formats
557
558   Xapian versions usually support several formats for index storage. A given
559   major Xapian version will have a current format, used to create new
560   indexes, and will also support the format from the previous major version.
561
562   Xapian will not convert automatically an existing index from the older
563   format to the newer one. If you want to upgrade to the new format, or if a
564   very old index needs to be converted because its format is not supported
565   any more, you will have to explicitly delete the old index, then run a
566   normal indexing process.
567
568   Using the -z option to recollindex is not sufficient to change the format,
569   you will have to delete all files inside the index directory (typically
570   ~/.recoll/xapiandb) before starting the indexing.
571
572  2.2.2. Security aspects
573
574   The Recoll index does not hold copies of the indexed documents. But it
575   does hold enough data to allow for an almost complete reconstruction. If
576   confidential data is indexed, access to the database directory should be
577   restricted.
578
579   Recoll (since version 1.4) will create the configuration directory with a
580   mode of 0700 (access by owner only). As the index data directory is by
581   default a sub-directory of the configuration directory, this should result
582   in appropriate protection.
583
584   If you use another setup, you should think of the kind of protection you
585   need for your index, set the directory and files access modes
586   appropriately, and also maybe adjust the umask used during index updates.
587
5882.3. Index configuration
589
590   Variables set inside the Recoll configuration files control which areas of
591   the file system are indexed, and how files are processed. These variables
592   can be set either by editing the text files or by using the dialogs in the
593   recoll GUI.
594
595   The first time you start recoll, you will be asked whether or not you
596   would like it to build the index. If you want to adjust the configuration
597   before indexing, just click Cancel at this point, which will get you into
598   the configuration interface. If you exit at this point, recoll will have
599   created a ~/.recoll directory containing empty configuration files, which
600   you can edit by hand.
601
602   The configuration is documented inside the installation chapter of this
603   document, or in the recoll.conf(5) man page, but the most current
604   information will most likely be the comments inside the sample file. The
605   most immediately useful variable you may interested in is probably
606   topdirs, which determines what subtrees get indexed.
607
608   The applications needed to index file types other than text, HTML or email
609   (ie: pdf, postscript, ms-word...) are described in the external packages
610   section.
611
612   As of Recoll 1.18 there are two incompatible types of Recoll indexes,
613   depending on the treatment of character case and diacritics. The next
614   section describes the two types in more detail.
615
616  2.3.1. Multiple indexes
617
618   Multiple Recoll indexes can be created by using several configuration
619   directories which are usually set to index different areas of the file
620   system. A specific index can be selected for updating or searching, using
621   the RECOLL_CONFDIR environment variable or the -c option to recoll and
622   recollindex.
623
624   A typical usage scenario for the multiple index feature would be for a
625   system administrator to set up a central index for shared data, that you
626   choose to search or not in addition to your personal data. Of course,
627   there are other possibilities. There are many cases where you know the
628   subset of files that should be searched, and where narrowing the search
629   can improve the results. You can achieve approximately the same effect
630   with the directory filter in advanced search, but multiple indexes will
631   have much better performance and may be worth the trouble.
632
633   A recollindex program instance can only update one specific index.
634
635   The main index (defined by RECOLL_CONFDIR or -c) is always active. If this
636   is undesirable, you can set up your base configuration to index an empty
637   directory.
638
639   The different search interfaces (GUI, command line, ...) have different
640   methods to define the set of indexes to be used, see the appropriate
641   section.
642
643   If a set of multiple indexes are to be used together for searches, some
644   configuration parameters must be consistent among the set. These are
645   parameters which need to be the same when indexing and searching. As the
646   parameters come from the main configuration when searching, they need to
647   be compatible with what was set when creating the other indexes (which
648   came from their respective configuration directories).
649
650   Most importantly, all indexes to be queried concurrently must have the
651   same option concerning character case and diacritics stripping, but there
652   are other constraints. Most of the relevant parameters are described in
653   the linked section.
654
655  2.3.2. Index case and diacritics sensitivity
656
657   As of Recoll version 1.18 you have a choice of building an index with
658   terms stripped of character case and diacritics, or one with raw terms.
659   For a source term of Resume, the former will store resume, the latter
660   Resume.
661
662   Each type of index allows performing searches insensitive to case and
663   diacritics: with a raw index, the user entry will be expanded to match all
664   case and diacritics variations present in the index. With a stripped
665   index, the search term will be stripped before searching.
666
667   A raw index allows for another possibility which a stripped index cannot
668   offer: using case and diacritics to discriminate between terms, returning
669   different results when searching for US and us or resume and resume. Read
670   the section about search case and diacritics sensitivity for more details.
671
672   The type of index to be created is controlled by the indexStripChars
673   configuration variable which can only be changed by editing the
674   configuration file. Any change implies an index reset (not automated by
675   Recoll), and all indexes in a search must be set in the same way (again,
676   not checked by Recoll).
677
678   If the indexStripChars is not set, Recoll 1.18 creates a stripped index by
679   default, for compatibility with previous versions.
680
681   As a cost for added capability, a raw index will be slightly bigger than a
682   stripped one (around 10%). Also, searches will be more complex, so
683   probably slightly slower, and the feature is still young, so that a
684   certain amount of weirdness cannot be excluded.
685
686   One of the most adverse consequence of using a raw index is that some
687   phrase and proximity searches may become impossible: because each term
688   needs to be expanded, and all combinations searched for, the
689   multiplicative expansion may become unmanageable.
690
691  2.3.3. The index configuration GUI
692
693   Most parameters for a given index configuration can be set from a recoll
694   GUI running on this configuration (either as default, or by setting
695   RECOLL_CONFDIR or the -c option.)
696
697   The interface is started from the Preferences -> Index Configuration menu
698   entry. It is divided in four tabs, Global parameters, Local parameters,
699   Web history (which is explained in the next section) and Search
700   parameters.
701
702   The Global parameters tab allows setting global variables, like the lists
703   of top directories, skipped paths, or stemming languages.
704
705   The Local parameters tab allows setting variables that can be redefined
706   for subdirectories. This second tab has an initially empty list of
707   customisation directories, to which you can add. The variables are then
708   set for the currently selected directory (or at the top level if the empty
709   line is selected).
710
711   The Search parameters section defines parameters which are used at query
712   time, but are global to an index and affect all search tools, not only the
713   GUI.
714
715   The meaning for most entries in the interface is self-evident and
716   documented by a ToolTip popup on the text label. For more detail, you will
717   need to refer to the configuration section of this guide.
718
719   The configuration tool normally respects the comments and most of the
720   formatting inside the configuration file, so that it is quite possible to
721   use it on hand-edited files, which you might nevertheless want to backup
722   first...
723
7242.4. Indexing WEB pages you wisit
725
726   With the help of a Firefox extension, Recoll can index the Internet pages
727   that you visit. The extension was initially designed for the Beagle
728   indexer, but it has recently be renamed and better adapted to Recoll.
729
730   The extension works by copying visited WEB pages to an indexing queue
731   directory, which Recoll then processes, indexing the data, storing it into
732   a local cache, then removing the file from the queue.
733
734   This feature can be enabled in the GUI Index configuration panel, or by
735   editing the configuration file (set processwebqueue to 1).
736
737   A current pointer to the extension can be found, along with up-to-date
738   instructions, on the Recoll wiki.
739
740   A copy of the indexed WEB pages is retained by Recoll in a local cache
741   (from which previews can be fetched). The cache size can be adjusted from
742   the Index configuration / Web history panel. Once the maximum size is
743   reached, old pages are purged - both from the cache and the index - to
744   make room for new ones, so you need to explicitly archive in some other
745   place the pages that you want to keep indefinitely.
746
7472.5. Extended attributes data
748
749   User extended attributes are named pieces of information that most modern
750   file systems can attach to any file.
751
752   Recoll versions 1.19 and later process extended attributes as document
753   fields by default. For older versions, this has to be activated at build
754   time.
755
756   A freedesktop standard defines a few special attributes, which are handled
757   as such by Recoll:
758
759   mime_type
760
761           If set, this overrides any other determination of the file MIME
762           type.
763
764   charset
765           If set, this defines the file character set (mostly useful for
766           plain text files).
767
768   By default, other attributes are handled as Recoll fields. On Linux, the
769   user prefix is removed from the name. This can be configured more
770   precisely inside the fields configuration file.
771
7722.6. Importing external tags
773
774   During indexing, it is possible to import metadata for each file by
775   executing commands. For example, this could extract user tag data for the
776   file and store it in a field for indexing.
777
778   See the section about the metadatacmds field in the main configuration
779   chapter for more detail.
780
7812.7. Periodic indexing
782
783  2.7.1. Running indexing
784
785   Indexing is always performed by the recollindex program, which can be
786   started either from the command line or from the File menu in the recoll
787   GUI program. When started from the GUI, the indexing will run on the same
788   configuration recoll was started on. When started from the command line,
789   recollindex will use the RECOLL_CONFDIR variable or accept a -c confdir
790   option to specify a non-default configuration directory.
791
792   If the recoll program finds no index when it starts, it will automatically
793   start indexing (except if canceled).
794
795   The recollindex indexing process can be interrupted by sending an
796   interrupt (Ctrl-C, SIGINT) or terminate (SIGTERM) signal. Some time may
797   elapse before the process exits, because it needs to properly flush and
798   close the index. This can also be done from the recoll GUI File -> Stop
799   Indexing menu entry.
800
801   After such an interruption, the index will be somewhat inconsistent
802   because some operations which are normally performed at the end of the
803   indexing pass will have been skipped (for example, the stemming and
804   spelling databases will be inexistent or out of date). You just need to
805   restart indexing at a later time to restore consistency. The indexing will
806   restart at the interruption point (the full file tree will be traversed,
807   but files that were indexed up to the interruption and for which the index
808   is still up to date will not need to be reindexed).
809
810   recollindex has a number of other options which are described in its man
811   page. Only a few will be described here.
812
813   Option -z will reset the index when starting. This is almost the same as
814   destroying the index files (the nuance is that the Xapian format version
815   will not be changed).
816
817   Option -Z will force the update of all documents without resetting the
818   index first. This will not have the "clean start" aspect of -z, but the
819   advantage is that the index will remain available for querying while it is
820   rebuilt, which can be a significant advantage if it is very big (some
821   installations need days for a full index rebuild).
822
823   Option -k will force retrying files which previously failed to be indexed,
824   for example because of a missing helper program.
825
826   Of special interest also, maybe, are the -i and -f options. -i allows
827   indexing an explicit list of files (given as command line parameters or
828   read on stdin). -f tells recollindex to ignore file selection parameters
829   from the configuration. Together, these options allow building a custom
830   file selection process for some area of the file system, by adding the top
831   directory to the skippedPaths list and using an appropriate file selection
832   method to build the file list to be fed to recollindex -if. Trivial
833   example:
834
835             find . -name indexable.txt -print | recollindex -if
836
837
838   recollindex -i will not descend into subdirectories specified as
839   parameters, but just add them as index entries. It is up to the external
840   file selection method to build the complete file list.
841
842  2.7.2. Using cron to automate indexing
843
844   The most common way to set up indexing is to have a cron task execute it
845   every night. For example the following crontab entry would do it every day
846   at 3:30AM (supposing recollindex is in your PATH):
847
848 30 3 * * * recollindex > /some/tmp/dir/recolltrace 2>&1
849
850   Or, using anacron:
851
852 1  15  su mylogin -c "recollindex recollindex > /tmp/rcltraceme 2>&1"
853
854   As of version 1.17 the Recoll GUI has dialogs to manage crontab entries
855   for recollindex. You can reach them from the Preferences -> Indexing
856   Schedule menu. They only work with the good old cron, and do not give
857   access to all features of cron scheduling.
858
859   The usual command to edit your crontab is crontab -e (which will usually
860   start the vi editor to edit the file). You may have more sophisticated
861   tools available on your system.
862
863   Please be aware that there may be differences between your usual
864   interactive command line environment and the one seen by crontab commands.
865   Especially the PATH variable may be of concern. Please check the crontab
866   manual pages about possible issues.
867
8682.8. Real time indexing
869
870   Real time monitoring/indexing is performed by starting the recollindex -m
871   command. With this option, recollindex will detach from the terminal and
872   become a daemon, permanently monitoring file changes and updating the
873   index.
874
875   Under KDE, Gnome and some other desktop environments, the daemon can
876   automatically started when you log in, by creating a desktop file inside
877   the ~/.config/autostart directory. This can be done for you by the Recoll
878   GUI. Use the Preferences->Indexing Schedule menu.
879
880   With older X11 setups, starting the daemon is normally performed as part
881   of the user session script.
882
883   The rclmon.sh script can be used to easily start and stop the daemon. It
884   can be found in the examples directory (typically
885   /usr/local/[share/]recoll/examples).
886
887   For example, my out of fashion xdm-based session has a .xsession script
888   with the following lines at the end:
889
890 recollconf=$HOME/.recoll-home
891 recolldata=/usr/local/share/recoll
892 RECOLL_CONFDIR=$recollconf $recolldata/examples/rclmon.sh start
893
894 fvwm
895
896
897   The indexing daemon gets started, then the window manager, for which the
898   session waits.
899
900   By default the indexing daemon will monitor the state of the X11 session,
901   and exit when it finishes, it is not necessary to kill it explicitly. (The
902   X11 server monitoring can be disabled with option -x to recollindex).
903
904   If you use the daemon completely out of an X11 session, you need to add
905   option -x to disable X11 session monitoring (else the daemon will not
906   start).
907
908   By default, the messages from the indexing daemon will be setn to the same
909   file as those from the interactive commands (logfilename). You may want to
910   change this by setting the daemlogfilename and daemloglevel configuration
911   parameters. Also the log file will only be truncated when the daemon
912   starts. If the daemon runs permanently, the log file may grow quite big,
913   depending on the log level.
914
915   When building Recoll, the real time indexing support can be customised
916   during package configuration with the --with[out]-fam or
917   --with[out]-inotify options. The default is currently to include inotify
918   monitoring on systems that support it, and, as of Recoll 1.17, gamin
919   support on FreeBSD.
920
921   While it is convenient that data is indexed in real time, repeated
922   indexing can generate a significant load on the system when files such as
923   email folders change. Also, monitoring large file trees by itself
924   significantly taxes system resources. You probably do not want to enable
925   it if your system is short on resources. Periodic indexing is adequate in
926   most cases.
927
928  Increasing resources for inotify
929
930   On Linux systems, monitoring a big tree may need increasing the resources
931   available to inotify, which are normally defined in /etc/sysctl.conf.
932
933 ### inotify
934 #
935 # cat  /proc/sys/fs/inotify/max_queued_events   - 16384
936 # cat  /proc/sys/fs/inotify/max_user_instances  - 128
937 # cat  /proc/sys/fs/inotify/max_user_watches    - 16384
938 #
939 # -- Change to:
940 #
941 fs.inotify.max_queued_events=32768
942 fs.notify.max_user_instances=256
943 fs.inotify.max_user_watches=32768
944
945
946   Especially, you will need to trim your tree or adjust the max_user_watches
947   value if indexing exits with a message about errno ENOSPC (28) from
948   inotify_add_watch.
949
950  2.8.1. Slowing down the reindexing rate for fast changing files
951
952   When using the real time monitor, it may happen that some files need to be
953   indexed, but change so often that they impose an excessive load for the
954   system.
955
956   Recoll provides a configuration option to specify the minimum time before
957   which a file, specified by a wildcard pattern, cannot be reindexed. See
958   the mondelaypatterns parameter in the configuration section.
959
960Chapter 3. Searching
961
9623.1. Searching with the Qt graphical user interface
963
964   The recoll program provides the main user interface for searching. It is
965   based on the Qt library.
966
967   recoll has two search modes:
968
969     o Simple search (the default, on the main screen) has a single entry
970       field where you can enter multiple words.
971
972     o Advanced search (a panel accessed through the Tools menu or the
973       toolbox bar icon) has multiple entry fields, which you may use to
974       build a logical condition, with additional filtering on file type,
975       location in the file system, modification date, and size.
976
977   In most cases, you can enter the terms as you think them, even if they
978   contain embedded punctuation or other non-textual characters. For example,
979   Recoll can handle things like email addresses, or arbitrary cut and paste
980   from another text window, punctuation and all.
981
982   The main case where you should enter text differently from how it is
983   printed is for east-asian languages (Chinese, Japanese, Korean). Words
984   composed of single or multiple characters should be entered separated by
985   white space in this case (they would typically be printed without white
986   space).
987
988   Some searches can be quite complex, and you may want to re-use them later,
989   perhaps with some tweaking. Recoll versions 1.21 and later can save and
990   restore searches, using XML files. See Saving and restoring queries.
991
992  3.1.1. Simple search
993
994    1. Start the recoll program.
995
996    2. Possibly choose a search mode: Any term, All terms, File name or Query
997       language.
998
999    3. Enter search term(s) in the text field at the top of the window.
1000
1001    4. Click the Search button or hit the Enter key to start the search.
1002
1003   The initial default search mode is Query language. Without special
1004   directives, this will look for documents containing all of the search
1005   terms (the ones with more terms will get better scores), just like the All
1006   terms mode which will ignore such directives. Any term will search for
1007   documents where at least one of the terms appear.
1008
1009   The Query Language features are described in a separate section.
1010
1011   All search modes allow wildcards inside terms (*, ?, []). You may want to
1012   have a look at the section about wildcards for more information about
1013   this.
1014
1015   File name will specifically look for file names. The point of having a
1016   separate file name search is that wild card expansion can be performed
1017   more efficiently on a small subset of the index (allowing wild cards on
1018   the left of terms without excessive penalty). Things to know:
1019
1020     o White space in the entry should match white space in the file name,
1021       and is not treated specially.
1022
1023     o The search is insensitive to character case and accents, independently
1024       of the type of index.
1025
1026     o An entry without any wild card character and not capitalized will be
1027       prepended and appended with '*' (ie: etc -> *etc*, but Etc -> etc).
1028
1029     o If you have a big index (many files), excessively generic fragments
1030       may result in inefficient searches.
1031
1032   You can search for exact phrases (adjacent words in a given order) by
1033   enclosing the input inside double quotes. Ex: "virtual reality".
1034
1035   When using a stripped index, character case has no influence on search,
1036   except that you can disable stem expansion for any term by capitalizing
1037   it. Ie: a search for floor will also normally look for flooring, floored,
1038   etc., but a search for Floor will only look for floor, in any character
1039   case. Stemming can also be disabled globally in the preferences. When
1040   using a raw index, the rules are a bit more complicated.
1041
1042   Recoll remembers the last few searches that you performed. You can use the
1043   simple search text entry widget (a combobox) to recall them (click on the
1044   thing at the right of the text field). Please note, however, that only the
1045   search texts are remembered, not the mode (all/any/file name).
1046
1047   Typing Esc Space while entering a word in the simple search entry will
1048   open a window with possible completions for the word. The completions are
1049   extracted from the database.
1050
1051   Double-clicking on a word in the result list or a preview window will
1052   insert it into the simple search entry field.
1053
1054   You can cut and paste any text into an All terms or Any term search field,
1055   punctuation, newlines and all - except for wildcard characters (single ?
1056   characters are ok). Recoll will process it and produce a meaningful
1057   search. This is what most differentiates this mode from the Query Language
1058   mode, where you have to care about the syntax.
1059
1060   You can use the Tools -> Advanced search dialog for more complex searches.
1061
1062  3.1.2. The default result list
1063
1064   After starting a search, a list of results will instantly be displayed in
1065   the main list window.
1066
1067   By default, the document list is presented in order of relevance (how well
1068   the system estimates that the document matches the query). You can sort
1069   the result by ascending or descending date by using the vertical arrows in
1070   the toolbar.
1071
1072   Clicking on the Preview link for an entry will open an internal preview
1073   window for the document. Further Preview clicks for the same search will
1074   open tabs in the existing preview window. You can use Shift+Click to force
1075   the creation of another preview window, which may be useful to view the
1076   documents side by side. (You can also browse successive results in a
1077   single preview window by typing Shift+ArrowUp/Down in the window).
1078
1079   Clicking the Open link will start an external viewer for the document. By
1080   default, Recoll lets the desktop choose the appropriate application for
1081   most document types (there is a short list of exceptions, see further). If
1082   you prefer to completely customize the choice of applications, you can
1083   uncheck the Use desktop preferences option in the GUI preferences dialog,
1084   and click the Choose editor applications button to adjust the predefined
1085   Recoll choices. The tool accepts multiple selections of MIME types (e.g.
1086   to set up the editor for the dozens of office file types).
1087
1088   Even when Use desktop preferences is checked, there is a small list of
1089   exceptions, for MIME types where the Recoll choice should override the
1090   desktop one. These are applications which are well integrated with Recoll,
1091   especially evince for viewing PDF and Postscript files because of its
1092   support for opening the document at a specific page and passing a search
1093   string as an argument. Of course, you can edit the list (in the GUI
1094   preferences) if you would prefer to lose the functionality and use the
1095   standard desktop tool.
1096
1097   You may also change the choice of applications by editing the mimeview
1098   configuration file if you find this more convenient.
1099
1100   Each result entry also has a right-click menu with an Open With entry.
1101   This lets you choose an application from the list of those which
1102   registered with the desktop for the document MIME type.
1103
1104   The Preview and Open edit links may not be present for all entries,
1105   meaning that Recoll has no configured way to preview a given file type
1106   (which was indexed by name only), or no configured external editor for the
1107   file type. This can sometimes be adjusted simply by tweaking the mimemap
1108   and mimeview configuration files (the latter can be modified with the user
1109   preferences dialog).
1110
1111   The format of the result list entries is entirely configurable by using
1112   the preference dialog to edit an HTML fragment.
1113
1114   You can click on the Query details link at the top of the results page to
1115   see the query actually performed, after stem expansion and other
1116   processing.
1117
1118   Double-clicking on any word inside the result list or a preview window
1119   will insert it into the simple search text.
1120
1121   The result list is divided into pages (the size of which you can change in
1122   the preferences). Use the arrow buttons in the toolbar or the links at the
1123   bottom of the page to browse the results.
1124
1125    3.1.2.1. No results: the spelling suggestions
1126
1127   When a search yields no result, and if the aspell dictionary is
1128   configured, Recoll will try to check for misspellings among the query
1129   terms, and will propose lists of replacements. Clicking on one of the
1130   suggestions will replace the word and restart the search. You can hold any
1131   of the modifier keys (Ctrl, Shift, etc.) while clicking if you would
1132   rather stay on the suggestion screen because several terms need
1133   replacement.
1134
1135    3.1.2.2. The result list right-click menu
1136
1137   Apart from the preview and edit links, you can display a pop-up menu by
1138   right-clicking over a paragraph in the result list. This menu has the
1139   following entries:
1140
1141     o Preview
1142
1143     o Open
1144
1145     o Open With
1146
1147     o Run Script
1148
1149     o Copy File Name
1150
1151     o Copy Url
1152
1153     o Save to File
1154
1155     o Find similar
1156
1157     o Preview Parent document
1158
1159     o Open Parent document
1160
1161     o Open Snippets Window
1162
1163   The Preview and Open entries do the same thing as the corresponding links.
1164
1165   Open With lets you open the document with one of the applications claiming
1166   to be able to handle its MIME type (the information comes from the
1167   .desktop files in /usr/share/applications).
1168
1169   Run Script allows starting an arbitrary command on the result file. It
1170   will only appear for results which are top-level files. See further for a
1171   more detailed description.
1172
1173   The Copy File Name and Copy Url copy the relevant data to the clipboard,
1174   for later pasting.
1175
1176   Save to File allows saving the contents of a result document to a chosen
1177   file. This entry will only appear if the document does not correspond to
1178   an existing file, but is a subdocument inside such a file (ie: an email
1179   attachment). It is especially useful to extract attachments with no
1180   associated editor.
1181
1182   The Open/Preview Parent document entries allow working with the higher
1183   level document (e.g. the email message an attachment comes from). Recoll
1184   is sometimes not totally accurate as to what it can or can't do in this
1185   area. For example the Parent entry will also appear for an email which is
1186   part of an mbox folder file, but you can't actually visualize the mbox
1187   (there will be an error dialog if you try).
1188
1189   If the document is a top-level file, Open Parent will start the default
1190   file manager on the enclosing filesystem directory.
1191
1192   The Find similar entry will select a number of relevant term from the
1193   current document and enter them into the simple search field. You can then
1194   start a simple search, with a good chance of finding documents related to
1195   the current result. I can't remember a single instance where this function
1196   was actually useful to me...
1197
1198   The Open Snippets Window entry will only appear for documents which
1199   support page breaks (typically PDF, Postscript, DVI). The snippets window
1200   lists extracts from the document, taken around search terms occurrences,
1201   along with the corresponding page number, as links which can be used to
1202   start the native viewer on the appropriate page. If the viewer supports
1203   it, its search function will also be primed with one of the search terms.
1204
1205  3.1.3. The result table
1206
1207   In Recoll 1.15 and newer, the results can be displayed in spreadsheet-like
1208   fashion. You can switch to this presentation by clicking the table-like
1209   icon in the toolbar (this is a toggle, click again to restore the list).
1210
1211   Clicking on the column headers will allow sorting by the values in the
1212   column. You can click again to invert the order, and use the header
1213   right-click menu to reset sorting to the default relevance order (you can
1214   also use the sort-by-date arrows to do this).
1215
1216   Both the list and the table display the same underlying results. The sort
1217   order set from the table is still active if you switch back to the list
1218   mode. You can click twice on a date sort arrow to reset it from there.
1219
1220   The header right-click menu allows adding or deleting columns. The columns
1221   can be resized, and their order can be changed (by dragging). All the
1222   changes are recorded when you quit recoll
1223
1224   Hovering over a table row will update the detail area at the bottom of the
1225   window with the corresponding values. You can click the row to freeze the
1226   display. The bottom area is equivalent to a result list paragraph, with
1227   links for starting a preview or a native application, and an equivalent
1228   right-click menu. Typing Esc (the Escape key) will unfreeze the display.
1229
1230  3.1.4. Running arbitrary commands on result files (1.20 and later)
1231
1232   Apart from the Open and Open With operations, which allow starting an
1233   application on a result document (or a temporary copy), based on its MIME
1234   type, it is also possible to run arbitrary commands on results which are
1235   top-level files, using the Run Script entry in the results pop-up menu.
1236
1237   The commands which will appear in the Run Script submenu must be defined
1238   by .desktop files inside the scripts subdirectory of the current
1239   configuration directory.
1240
1241   Here follows an example of a .desktop file, which could be named for
1242   example, ~/.recoll/scripts/myscript.desktop (the exact file name inside
1243   the directory is irrelevant):
1244
1245 [Desktop Entry]
1246 Type=Application
1247 Name=MyFirstScript
1248 Exec=/home/me/bin/tryscript %F
1249 MimeType=*/*
1250
1251
1252   The Name attribute defines the label which will appear inside the Run
1253   Script menu. The Exec attribute defines the program to be run, which does
1254   not need to actually be a script, of course. The MimeType attribute is not
1255   used, but needs to exist.
1256
1257   The commands defined this way can also be used from links inside the
1258   result paragraph.
1259
1260   As an example, it might make sense to write a script which would move the
1261   document to the trash and purge it from the Recoll index.
1262
1263  3.1.5. Displaying thumbnails
1264
1265   The default format for the result list entries and the detail area of the
1266   result table display an icon for each result document. The icon is either
1267   a generic one determined from the MIME type, or a thumbnail of the
1268   document appearance. Thumbnails are only displayed if found in the
1269   standard freedesktop location, where they would typically have been
1270   created by a file manager.
1271
1272   Recoll has no capability to create thumbnails. A relatively simple trick
1273   is to use the Open parent document/folder entry in the result list popup
1274   menu. This should open a file manager window on the containing directory,
1275   which should in turn create the thumbnails (depending on your settings).
1276   Restarting the search should then display the thumbnails.
1277
1278   There are also some pointers about thumbnail generation on the Recoll
1279   wiki.
1280
1281  3.1.6. The preview window
1282
1283   The preview window opens when you first click a Preview link inside the
1284   result list.
1285
1286   Subsequent preview requests for a given search open new tabs in the
1287   existing window (except if you hold the Shift key while clicking which
1288   will open a new window for side by side viewing).
1289
1290   Starting another search and requesting a preview will create a new preview
1291   window. The old one stays open until you close it.
1292
1293   You can close a preview tab by typing Ctrl-W (Ctrl + W) in the window.
1294   Closing the last tab for a window will also close the window.
1295
1296   Of course you can also close a preview window by using the window manager
1297   button in the top of the frame.
1298
1299   You can display successive or previous documents from the result list
1300   inside a preview tab by typing Shift+Down or Shift+Up (Down and Up are the
1301   arrow keys).
1302
1303   A right-click menu in the text area allows switching between displaying
1304   the main text or the contents of fields associated to the document (ie:
1305   author, abtract, etc.). This is especially useful in cases where the term
1306   match did not occur in the main text but in one of the fields. In the case
1307   of images, you can switch between three displays: the image itself, the
1308   image metadata as extracted by exiftool and the fields, which is the
1309   metadata stored in the index.
1310
1311   You can print the current preview window contents by typing Ctrl-P (Ctrl +
1312   P) in the window text.
1313
1314    3.1.6.1. Searching inside the preview
1315
1316   The preview window has an internal search capability, mostly controlled by
1317   the panel at the bottom of the window, which works in two modes: as a
1318   classical editor incremental search, where we look for the text entered in
1319   the entry zone, or as a way to walk the matches between the document and
1320   the Recoll query that found it.
1321
1322   Incremental text search
1323
1324           The preview tabs have an internal incremental search function. You
1325           initiate the search either by typing a / (slash) or CTL-F inside
1326           the text area or by clicking into the Search for: text field and
1327           entering the search string. You can then use the Next and Previous
1328           buttons to find the next/previous occurrence. You can also type F3
1329           inside the text area to get to the next occurrence.
1330
1331           If you have a search string entered and you use Ctrl-Up/Ctrl-Down
1332           to browse the results, the search is initiated for each successive
1333           document. If the string is found, the cursor will be positioned at
1334           the first occurrence of the search string.
1335
1336   Walking the match lists
1337
1338           If the entry area is empty when you click the Next or Previous
1339           buttons, the editor will be scrolled to show the next match to any
1340           search term (the next highlighted zone). If you select a search
1341           group from the dropdown list and click Next or Previous, the match
1342           list for this group will be walked. This is not the same as a text
1343           search, because the occurrences will include non-exact matches (as
1344           caused by stemming or wildcards). The search will revert to the
1345           text mode as soon as you edit the entry area.
1346
1347  3.1.7. The Query Fragments window
1348
1349   Selecting the Tools -> Query Fragments menu entry will open a window with
1350   radio- and check-buttons which can be used to activate query language
1351   fragments for filtering the current query. This can be useful if you have
1352   frequent reusable selectors, for example, filtering on alternate
1353   directories, or searching just one category of files, not covered by the
1354   standard category selectors.
1355
1356   The contents of the window are entirely customizable, and defined by the
1357   contents of the fragbuts.xml file inside the configuration directory. The
1358   sample file distributed with Recoll (which you should be able to find
1359   under /usr/share/recoll/examples/fragbuts.xml), contains an example which
1360   filters the results from the WEB history.
1361
1362   Here follows an example:
1363
1364 <?xml version="1.0" encoding="UTF-8"?>
1365
1366 <fragbuts version="1.0">
1367
1368   <radiobuttons>
1369
1370     <fragbut>
1371       <label>Include Web Results</label>
1372       <frag></frag>
1373     </fragbut>
1374
1375     <fragbut>
1376       <label>Exclude Web Results</label>
1377       <frag>-rclbes:BGL</frag>
1378     </fragbut>
1379
1380     <fragbut>
1381       <label>Only Web Results</label>
1382       <frag>rclbes:BGL</frag>
1383     </fragbut>
1384
1385   </radiobuttons>
1386
1387   <buttons>
1388
1389     <fragbut>
1390       <label>Year 2010</label>
1391       <frag>date:2010-01-01/2010-12-31</frag>
1392     </fragbut>
1393
1394     <fragbut>
1395       <label>My Great Directory Only</label>
1396       <frag>dir:/my/great/directory</frag>
1397     </fragbut>
1398
1399   </buttons>
1400 </fragbuts>
1401
1402   Each radiobuttons or buttons section defines a line of checkbuttons or
1403   radiobuttons inside the window. Any number of buttons can be selected, but
1404   the radiobuttons in a line are exclusive.
1405
1406   Each fragbut section defines the label for a button, and the Query
1407   Language fragment which will be added (as an AND filter) before performing
1408   the query if the button is active.
1409
1410   This feature is new in Recoll 1.20, and will probably be refined depending
1411   on user feedback.
1412
1413  3.1.8. Complex/advanced search
1414
1415   The advanced search dialog helps you build more complex queries without
1416   memorizing the search language constructs. It can be opened through the
1417   Tools menu or through the main toolbar.
1418
1419   Recoll keeps a history of searches. See Advanced search history.
1420
1421   The dialog has two tabs:
1422
1423    1. The first tab lets you specify terms to search for, and permits
1424       specifying multiple clauses which are combined to build the search.
1425
1426    2. The second tab lets filter the results according to file size, date of
1427       modification, MIME type, or location.
1428
1429   Click on the Start Search button in the advanced search dialog, or type
1430   Enter in any text field to start the search. The button in the main window
1431   always performs a simple search.
1432
1433   Click on the Show query details link at the top of the result page to see
1434   the query expansion.
1435
1436    3.1.8.1. Advanced search: the "find" tab
1437
1438   This part of the dialog lets you constructc a query by combining multiple
1439   clauses of different types. Each entry field is configurable for the
1440   following modes:
1441
1442     o All terms.
1443
1444     o Any term.
1445
1446     o None of the terms.
1447
1448     o Phrase (exact terms in order within an adjustable window).
1449
1450     o Proximity (terms in any order within an adjustable window).
1451
1452     o Filename search.
1453
1454   Additional entry fields can be created by clicking the Add clause button.
1455
1456   When searching, the non-empty clauses will be combined either with an AND
1457   or an OR conjunction, depending on the choice made on the left (All
1458   clauses or Any clause).
1459
1460   Entries of all types except "Phrase" and "Near" accept a mix of single
1461   words and phrases enclosed in double quotes. Stemming and wildcard
1462   expansion will be performed as for simple search.
1463
1464   Phrases and Proximity searches. These two clauses work in similar ways,
1465   with the difference that proximity searches do not impose an order on the
1466   words. In both cases, an adjustable number (slack) of non-matched words
1467   may be accepted between the searched ones (use the counter on the left to
1468   adjust this count). For phrases, the default count is zero (exact match).
1469   For proximity it is ten (meaning that two search terms, would be matched
1470   if found within a window of twelve words). Examples: a phrase search for
1471   quick fox with a slack of 0 will match quick fox but not quick brown fox.
1472   With a slack of 1 it will match the latter, but not fox quick. A proximity
1473   search for quick fox with the default slack will match the latter, and
1474   also a fox is a cunning and quick animal.
1475
1476    3.1.8.2. Advanced search: the "filter" tab
1477
1478   This part of the dialog has several sections which allow filtering the
1479   results of a search according to a number of criteria
1480
1481     o The first section allows filtering by dates of last modification. You
1482       can specify both a minimum and a maximum date. The initial values are
1483       set according to the oldest and newest documents found in the index.
1484
1485     o The next section allows filtering the results by file size. There are
1486       two entries for minimum and maximum size. Enter decimal numbers. You
1487       can use suffix multipliers: k/K, m/M, g/G, t/T for 1E3, 1E6, 1E9, 1E12
1488       respectively.
1489
1490     o The next section allows filtering the results by their MIME types, or
1491       MIME categories (ie: media/text/message/etc.).
1492
1493       You can transfer the types between two boxes, to define which will be
1494       included or excluded by the search.
1495
1496       The state of the file type selection can be saved as the default (the
1497       file type filter will not be activated at program start-up, but the
1498       lists will be in the restored state).
1499
1500     o The bottom section allows restricting the search results to a sub-tree
1501       of the indexed area. You can use the Invert checkbox to search for
1502       files not in the sub-tree instead. If you use directory filtering
1503       often and on big subsets of the file system, you may think of setting
1504       up multiple indexes instead, as the performance may be better.
1505
1506       You can use relative/partial paths for filtering. Ie, entering
1507       dirA/dirB would match either /dir1/dirA/dirB/myfile1 or
1508       /dir2/dirA/dirB/someother/myfile2.
1509
1510    3.1.8.3. Advanced search history
1511
1512   The advanced search tool memorizes the last 100 searches performed. You
1513   can walk the saved searches by using the up and down arrow keys while the
1514   keyboard focus belongs to the advanced search dialog.
1515
1516   The complex search history can be erased, along with the one for simple
1517   search, by selecting the File -> Erase Search History menu entry.
1518
1519  3.1.9. The term explorer tool
1520
1521   Recoll automatically manages the expansion of search terms to their
1522   derivatives (ie: plural/singular, verb inflections). But there are other
1523   cases where the exact search term is not known. For example, you may not
1524   remember the exact spelling, or only know the beginning of the name.
1525
1526   The search will only propose replacement terms with spelling variations
1527   when no matching document were found. In some cases, both proper spellings
1528   and mispellings are present in the index, and it may be interesting to
1529   look for them explicitly.
1530
1531   The term explorer tool (started from the toolbar icon or from the Term
1532   explorer entry of the Tools menu) can be used to search the full index
1533   terms list. It has three modes of operations:
1534
1535   Wildcard
1536
1537           In this mode of operation, you can enter a search string with
1538           shell-like wildcards (*, ?, []). ie: xapi* would display all index
1539           terms beginning with xapi. (More about wildcards here).
1540
1541   Regular expression
1542
1543           This mode will accept a regular expression as input. Example:
1544           word[0-9]+. The expression is implicitly anchored at the
1545           beginning. Ie: press will match pression but not expression. You
1546           can use .*press to match the latter, but be aware that this will
1547           cause a full index term list scan, which can be quite long.
1548
1549   Stem expansion
1550
1551           This mode will perform the usual stem expansion normally done as
1552           part user input processing. As such it is probably mostly useful
1553           to demonstrate the process.
1554
1555   Spelling/Phonetic
1556
1557           In this mode, you enter the term as you think it is spelled, and
1558           Recoll will do its best to find index terms that sound like your
1559           entry. This mode uses the Aspell spelling application, which must
1560           be installed on your system for things to work (if your documents
1561           contain non-ascii characters, Recoll needs an aspell version newer
1562           than 0.60 for UTF-8 support). The language which is used to build
1563           the dictionary out of the index terms (which is done at the end of
1564           an indexing pass) is the one defined by your NLS environment.
1565           Weird things will probably happen if languages are mixed up.
1566
1567   Note that in cases where Recoll does not know the beginning of the string
1568   to search for (ie a wildcard expression like *coll), the expansion can
1569   take quite a long time because the full index term list will have to be
1570   processed. The expansion is currently limited at 10000 results for
1571   wildcards and regular expressions. It is possible to change the limit in
1572   the configuration file.
1573
1574   Double-clicking on a term in the result list will insert it into the
1575   simple search entry field. You can also cut/paste between the result list
1576   and any entry field (the end of lines will be taken care of).
1577
1578  3.1.10. Multiple indexes
1579
1580   See the section describing the use of multiple indexes for generalities.
1581   Only the aspects concerning the recoll GUI are described here.
1582
1583   A recoll program instance is always associated with a specific index,
1584   which is the one to be updated when requested from the File menu, but it
1585   can use any number of Recoll indexes for searching. The external indexes
1586   can be selected through the external indexes tab in the preferences
1587   dialog.
1588
1589   Index selection is performed in two phases. A set of all usable indexes
1590   must first be defined, and then the subset of indexes to be used for
1591   searching. These parameters are retained across program executions (there
1592   are kept separately for each Recoll configuration). The set of all indexes
1593   is usually quite stable, while the active ones might typically be adjusted
1594   quite frequently.
1595
1596   The main index (defined by RECOLL_CONFDIR) is always active. If this is
1597   undesirable, you can set up your base configuration to index an empty
1598   directory.
1599
1600   When adding a new index to the set, you can select either a Recoll
1601   configuration directory, or directly a Xapian index directory. In the
1602   first case, the Xapian index directory will be obtained from the selected
1603   configuration.
1604
1605   As building the set of all indexes can be a little tedious when done
1606   through the user interface, you can use the RECOLL_EXTRA_DBS environment
1607   variable to provide an initial set. This might typically be set up by a
1608   system administrator so that every user does not have to do it. The
1609   variable should define a colon-separated list of index directories, ie:
1610
1611 export RECOLL_EXTRA_DBS=/some/place/xapiandb:/some/other/db
1612
1613   Another environment variable, RECOLL_ACTIVE_EXTRA_DBS allows adding to the
1614   active list of indexes. This variable was suggested and implemented by a
1615   Recoll user. It is mostly useful if you use scripts to mount external
1616   volumes with Recoll indexes. By using RECOLL_EXTRA_DBS and
1617   RECOLL_ACTIVE_EXTRA_DBS, you can add and activate the index for the
1618   mounted volume when starting recoll.
1619
1620   RECOLL_ACTIVE_EXTRA_DBS is available for Recoll versions 1.17.2 and later.
1621   A change was made in the same update so that recoll will automatically
1622   deactivate unreachable indexes when starting up.
1623
1624  3.1.11. Document history
1625
1626   Documents that you actually view (with the internal preview or an external
1627   tool) are entered into the document history, which is remembered.
1628
1629   You can display the history list by using the Tools/Doc History menu
1630   entry.
1631
1632   You can erase the document history by using the Erase document history
1633   entry in the File menu.
1634
1635  3.1.12. Sorting search results and collapsing duplicates
1636
1637   The documents in a result list are normally sorted in order of relevance.
1638   It is possible to specify a different sort order, either by using the
1639   vertical arrows in the GUI toolbox to sort by date, or switching to the
1640   result table display and clicking on any header. The sort order chosen
1641   inside the result table remains active if you switch back to the result
1642   list, until you click one of the vertical arrows, until both are unchecked
1643   (you are back to sort by relevance).
1644
1645   Sort parameters are remembered between program invocations, but result
1646   sorting is normally always inactive when the program starts. It is
1647   possible to keep the sorting activation state between program invocations
1648   by checking the Remember sort activation state option in the preferences.
1649
1650   It is also possible to hide duplicate entries inside the result list
1651   (documents with the exact same contents as the displayed one). The test of
1652   identity is based on an MD5 hash of the document container, not only of
1653   the text contents (so that ie, a text document with an image added will
1654   not be a duplicate of the text only). Duplicates hiding is controlled by
1655   an entry in the GUI configuration dialog, and is off by default.
1656
1657   As of release 1.19, when a result document does have undisplayed
1658   duplicates, a Dups link will be shown with the result list entry. Clicking
1659   the link will display the paths (URLs + ipaths) for the duplicate entries.
1660
1661  3.1.13. Search tips, shortcuts
1662
1663    3.1.13.1. Terms and search expansion
1664
1665   Term completion. Typing Esc Space in the simple search entry field while
1666   entering a word will either complete the current word if its beginning
1667   matches a unique term in the index, or open a window to propose a list of
1668   completions.
1669
1670   Picking up new terms from result or preview text. Double-clicking on a
1671   word in the result list or in a preview window will copy it to the simple
1672   search entry field.
1673
1674   Wildcards. Wildcards can be used inside search terms in all forms of
1675   searches. More about wildcards.
1676
1677   Automatic suffixes. Words like odt or ods can be automatically turned into
1678   query language ext:xxx clauses. This can be enabled in the Search
1679   preferences panel in the GUI.
1680
1681   Disabling stem expansion. Entering a capitalized word in any search field
1682   will prevent stem expansion (no search for gardening if you enter Garden
1683   instead of garden). This is the only case where character case should make
1684   a difference for a Recoll search. You can also disable stem expansion or
1685   change the stemming language in the preferences.
1686
1687   Finding related documents. Selecting the Find similar documents entry in
1688   the result list paragraph right-click menu will select a set of
1689   "interesting" terms from the current result, and insert them into the
1690   simple search entry field. You can then possibly edit the list and start a
1691   search to find documents which may be apparented to the current result.
1692
1693   File names. File names are added as terms during indexing, and you can
1694   specify them as ordinary terms in normal search fields (Recoll used to
1695   index all directories in the file path as terms. This has been abandoned
1696   as it did not seem really useful). Alternatively, you can use the specific
1697   file name search which will only look for file names, and may be faster
1698   than the generic search especially when using wildcards.
1699
1700    3.1.13.2. Working with phrases and proximity
1701
1702   Phrases and Proximity searches. A phrase can be looked for by enclosing it
1703   in double quotes. Example: "user manual" will look only for occurrences of
1704   user immediately followed by manual. You can use the This phrase field of
1705   the advanced search dialog to the same effect. Phrases can be entered
1706   along simple terms in all simple or advanced search entry fields (except
1707   This exact phrase).
1708
1709   AutoPhrases. This option can be set in the preferences dialog. If it is
1710   set, a phrase will be automatically built and added to simple searches
1711   when looking for Any terms. This will not change radically the results,
1712   but will give a relevance boost to the results where the search terms
1713   appear as a phrase. Ie: searching for virtual reality will still find all
1714   documents where either virtual or reality or both appear, but those which
1715   contain virtual reality should appear sooner in the list.
1716
1717   Phrase searches can strongly slow down a query if most of the terms in the
1718   phrase are common. This is why the autophrase option is off by default for
1719   Recoll versions before 1.17. As of version 1.17, autophrase is on by
1720   default, but very common terms will be removed from the constructed
1721   phrase. The removal threshold can be adjusted from the search preferences.
1722
1723   Phrases and abbreviations. As of Recoll version 1.17, dotted abbreviations
1724   like I.B.M. are also automatically indexed as a word without the dots:
1725   IBM. Searching for the word inside a phrase (ie: "the IBM company") will
1726   only match the dotted abrreviation if you increase the phrase slack (using
1727   the advanced search panel control, or the o query language modifier).
1728   Literal occurrences of the word will be matched normally.
1729
1730    3.1.13.3. Others
1731
1732   Using fields. You can use the query language and field specifications to
1733   only search certain parts of documents. This can be especially helpful
1734   with email, for example only searching emails from a specific originator:
1735   search tips from:helpfulgui
1736
1737   Adjusting the result table columns. When displaying results in table mode,
1738   you can use a right click on the table headers to activate a pop-up menu
1739   which will let you adjust what columns are displayed. You can drag the
1740   column headers to adjust their order. You can click them to sort by the
1741   field displayed in the column. You can also save the result list in CSV
1742   format.
1743
1744   Changing the GUI geometry. It is possible to configure the GUI in wide
1745   form factor by dragging the toolbars to one of the sides (their location
1746   is remembered between sessions), and moving the category filters to a menu
1747   (can be set in the Preferences -> GUI configuration -> User interface
1748   panel).
1749
1750   Query explanation. You can get an exact description of what the query
1751   looked for, including stem expansion, and Boolean operators used, by
1752   clicking on the result list header.
1753
1754   Advanced search history. As of Recoll 1.18, you can display any of the
1755   last 100 complex searches performed by using the up and down arrow keys
1756   while the advanced search panel is active.
1757
1758   Browsing the result list inside a preview window. Entering Shift-Down or
1759   Shift-Up (Shift + an arrow key) in a preview window will display the next
1760   or the previous document from the result list. Any secondary search
1761   currently active will be executed on the new document.
1762
1763   Scrolling the result list from the keyboard. You can use PageUp and
1764   PageDown to scroll the result list, Shift+Home to go back to the first
1765   page. These work even while the focus is in the search entry.
1766
1767   Result table: moving the focus to the table. You can use Ctrl-r to move
1768   the focus from the search entry to the table, and then use the arrow keys
1769   to change the current row. Ctrl-Shift-s returns to the search.
1770
1771   Result table: open / preview. With the focus in the result table, you can
1772   use Ctrl-o to open the document from the current row, Ctrl-Shift-o to open
1773   the document and close recoll, Ctrl-d to preview the document.
1774
1775   Editing a new search while the focus is not in the search entry. You can
1776   use the Ctrl-Shift-S shortcut to return the cursor to the search entry
1777   (and select the current search text), while the focus is anywhere in the
1778   main window.
1779
1780   Forced opening of a preview window. You can use Shift+Click on a result
1781   list Preview link to force the creation of a preview window instead of a
1782   new tab in the existing one.
1783
1784   Closing previews. Entering Ctrl-W in a tab will close it (and, for the
1785   last tab, close the preview window). Entering Esc will close the preview
1786   window and all its tabs.
1787
1788   Printing previews. Entering Ctrl-P in a preview window will print the
1789   currently displayed text.
1790
1791   Quitting. Entering Ctrl-Q almost anywhere will close the application.
1792
1793  3.1.14. Saving and restoring queries (1.21 and later)
1794
1795   Both simple and advanced query dialogs save recent history, but the amount
1796   is limited: old queries will eventually be forgotten. Also, important
1797   queries may be difficult to find among others. This is why both types of
1798   queries can also be explicitly saved to files, from the GUI menus: File
1799   -> Save last query / Load last query
1800
1801   The default location for saved queries is a subdirectory of the current
1802   configuration directory, but saved queries are ordinary files and can be
1803   written or moved anywhere.
1804
1805   Some of the saved query parameters are part of the preferences (e.g.
1806   autophrase or the active external indexes), and may differ when the query
1807   is loaded from the time it was saved. In this case, Recoll will warn of
1808   the differences, but will not change the user preferences.
1809
1810  3.1.15. Customizing the search interface
1811
1812   You can customize some aspects of the search interface by using the GUI
1813   configuration entry in the Preferences menu.
1814
1815   There are several tabs in the dialog, dealing with the interface itself,
1816   the parameters used for searching and returning results, and what indexes
1817   are searched.
1818
1819   User interface parameters:
1820
1821     o Highlight color for query terms: Terms from the user query are
1822       highlighted in the result list samples and the preview window. The
1823       color can be chosen here. Any Qt color string should work (ie red,
1824       #ff0000). The default is blue.
1825
1826     o Style sheet: The name of a Qt style sheet text file which is applied
1827       to the whole Recoll application on startup. The default value is
1828       empty, but there is a skeleton style sheet (recoll.qss) inside the
1829       /usr/share/recoll/examples directory. Using a style sheet, you can
1830       change most recoll graphical parameters: colors, fonts, etc. See the
1831       sample file for a few simple examples.
1832
1833       You should be aware that parameters (e.g.: the background color) set
1834       inside the Recoll GUI style sheet will override global system
1835       preferences, with possible strange side effects: for example if you
1836       set the foreground to a light color and the background to a dark one
1837       in the desktop preferences, but only the background is set inside the
1838       Recoll style sheet, and it is light too, then text will appear
1839       light-on-light inside the Recoll GUI.
1840
1841     o Maximum text size highlighted for preview Inserting highlights on
1842       search term inside the text before inserting it in the preview window
1843       involves quite a lot of processing, and can be disabled over the given
1844       text size to speed up loading.
1845
1846     o Prefer HTML to plain text for preview if set, Recoll will display HTML
1847       as such inside the preview window. If this causes problems with the Qt
1848       HTML display, you can uncheck it to display the plain text version
1849       instead.
1850
1851     o Plain text to HTML line style: when displaying plain text inside the
1852       preview window, Recoll tries to preserve some of the original text
1853       line breaks and indentation. It can either use PRE HTML tags, which
1854       will well preserve the indentation but will force horizontal scrolling
1855       for long lines, or use BR tags to break at the original line breaks,
1856       which will let the editor introduce other line breaks according to the
1857       window width, but will lose some of the original indentation. The
1858       third option has been available in recent releases and is probably now
1859       the best one: use PRE tags with line wrapping.
1860
1861     o Choose editor applicationsr: this opens a dialog which allows you to
1862       select the application to be used to open each MIME type. The default
1863       is nornally to use the xdg-open utility, but you can override it.
1864
1865     o Exceptions: even wen xdg-open is used by default for opening
1866       documents, you can set exceptions for MIME types that will still be
1867       opened according to Recoll preferences. This is useful for passing
1868       parameters like page numbers or search strings to applications that
1869       support them (e.g. evince). This cannot be done with xdg-open which
1870       only supports passing one parameter.
1871
1872     o Document filter choice style: this will let you choose if the document
1873       categories are displayed as a list or a set of buttons, or a menu.
1874
1875     o Start with simple search mode: this lets you choose the value of the
1876       simple search type on program startup. Either a fixed value (e.g.
1877       Query Language, or the value in use when the program last exited.
1878
1879     o Auto-start simple search on white space entry: if this is checked, a
1880       search will be executed each time you enter a space in the simple
1881       search input field. This lets you look at the result list as you enter
1882       new terms. This is off by default, you may like it or not...
1883
1884     o Start with advanced search dialog open : If you use this dialog
1885       frequently, checking the entries will get it to open when recoll
1886       starts.
1887
1888     o Remember sort activation state if set, Recoll will remember the sort
1889       tool stat between invocations. It normally starts with sorting
1890       disabled.
1891
1892   Result list parameters:
1893
1894     o Number of results in a result page
1895
1896     o Result list font: There is quite a lot of information shown in the
1897       result list, and you may want to customize the font and/or font size.
1898       The rest of the fonts used by Recoll are determined by your generic Qt
1899       config (try the qtconfig command).
1900
1901     o Edit result list paragraph format string: allows you to change the
1902       presentation of each result list entry. See the result list
1903       customisation section.
1904
1905     o Edit result page HTML header insert: allows you to define text
1906       inserted at the end of the result page HTML header. More detail in the
1907       result list customisation section.
1908
1909     o Date format: allows specifying the format used for displaying dates
1910       inside the result list. This should be specified as an strftime()
1911       string (man strftime).
1912
1913     o Abstract snippet separator: for synthetic abstracts built from index
1914       data, which are usually made of several snippets from different parts
1915       of the document, this defines the snippet separator, an ellipsis by
1916       default.
1917
1918   Search parameters:
1919
1920     o Hide duplicate results: decides if result list entries are shown for
1921       identical documents found in different places.
1922
1923     o Stemming language: stemming obviously depends on the document's
1924       language. This listbox will let you chose among the stemming databases
1925       which were built during indexing (this is set in the main
1926       configuration file), or later added with recollindex -s (See the
1927       recollindex manual). Stemming languages which are dynamically added
1928       will be deleted at the next indexing pass unless they are also added
1929       in the configuration file.
1930
1931     o Automatically add phrase to simple searches: a phrase will be
1932       automatically built and added to simple searches when looking for Any
1933       terms. This will give a relevance boost to the results where the
1934       search terms appear as a phrase (consecutive and in order).
1935
1936     o Autophrase term frequency threshold percentage: very frequent terms
1937       should not be included in automatic phrase searches for performance
1938       reasons. The parameter defines the cutoff percentage (percentage of
1939       the documents where the term appears).
1940
1941     o Replace abstracts from documents: this decides if we should synthesize
1942       and display an abstract in place of an explicit abstract found within
1943       the document itself.
1944
1945     o Dynamically build abstracts: this decides if Recoll tries to build
1946       document abstracts (lists of snippets) when displaying the result
1947       list. Abstracts are constructed by taking context from the document
1948       information, around the search terms.
1949
1950     o Synthetic abstract size: adjust to taste...
1951
1952     o Synthetic abstract context words: how many words should be displayed
1953       around each term occurrence.
1954
1955     o Query language magic file name suffixes: a list of words which
1956       automatically get turned into ext:xxx file name suffix clauses when
1957       starting a query language query (ie: doc xls xlsx...). This will save
1958       some typing for people who use file types a lot when querying.
1959
1960   External indexes: This panel will let you browse for additional indexes
1961   that you may want to search. External indexes are designated by their
1962   database directory (ie: /home/someothergui/.recoll/xapiandb,
1963   /usr/local/recollglobal/xapiandb).
1964
1965   Once entered, the indexes will appear in the External indexes list, and
1966   you can chose which ones you want to use at any moment by checking or
1967   unchecking their entries.
1968
1969   Your main database (the one the current configuration indexes to), is
1970   always implicitly active. If this is not desirable, you can set up your
1971   configuration so that it indexes, for example, an empty directory. An
1972   alternative indexer may also need to implement a way of purging the index
1973   from stale data,
1974
1975    3.1.15.1. The result list format
1976
1977   Newer versions of Recoll (from 1.17) normally use WebKit HTML widgets for
1978   the result list and the snippets window (this may be disabled at build
1979   time). Total customisation is possible with full support for CSS and
1980   Javascript. Conversely, there are limits to what you can do with the older
1981   Qt QTextBrowser, but still, it is possible to decide what data each result
1982   will contain, and how it will be displayed.
1983
1984   The result list presentation can be exhaustively customized by adjusting
1985   two elements:
1986
1987     o The paragraph format
1988
1989     o HTML code inside the header section. For versions 1.21 and later, this
1990       is also used for the snippets window
1991
1992   The paragraph format and the header fragment can be edited from the Result
1993   list tab of the GUI configuration.
1994
1995   The header fragment is used both for the result list and the snippets
1996   window. The snippets list is a table and has a snippets class attribute.
1997   Each paragraph in the result list is a table, with class respar, but this
1998   can be changed by editing the paragraph format.
1999
2000   There are a few examples on the page about customising the result list on
2001   the Recoll web site.
2002
2003      The paragraph format
2004
2005   This is an arbitrary HTML string where the following printf-like %
2006   substitutions will be performed:
2007
2008     o %A. Abstract
2009
2010     o %D. Date
2011
2012     o %I. Icon image name. This is normally determined from the MIME type.
2013       The associations are defined inside the mimeconf configuration file.
2014       If a thumbnail for the file is found at the standard Freedesktop
2015       location, this will be displayed instead.
2016
2017     o %K. Keywords (if any)
2018
2019     o %L. Precooked Preview, Edit, and possibly Snippets links
2020
2021     o %M. MIME type
2022
2023     o %N. result Number inside the result page
2024
2025     o %P. Parent folder Url. In the case of an embedded document, this is
2026       the parent folder for the top level container file.
2027
2028     o %R. Relevance percentage
2029
2030     o %S. Size information
2031
2032     o %T. Title or Filename if not set.
2033
2034     o %t. Title or Filename if not set.
2035
2036     o %U. Url
2037
2038   The format of the Preview, Edit, and Snippets links is <a href="P%N">, <a
2039   href="E%N"> and <a href="A%N"> where docnum (%N) expands to the document
2040   number inside the result page).
2041
2042   A link target defined as "F%N" will open the document corresponding to the
2043   %P parent folder expansion, usually creating a file manager window on the
2044   folder where the container file resides. E.g.:
2045
2046 <a href="F%N">%P</a>
2047
2048   A link target defined as R%N|scriptname will run the corresponding script
2049   on the result file (if the document is embedded, the script will be
2050   started on the top-level parent). See the section about defining scripts.
2051
2052   In addition to the predefined values above, all strings like %(fieldname)
2053   will be replaced by the value of the field named fieldname for this
2054   document. Only stored fields can be accessed in this way, the value of
2055   indexed but not stored fields is not known at this point in the search
2056   process (see field configuration). There are currently very few fields
2057   stored by default, apart from the values above (only author and filename),
2058   so this feature will need some custom local configuration to be useful. An
2059   example candidate would be the recipient field which is generated by the
2060   message input handlers.
2061
2062   The default value for the paragraph format string is:
2063
2064     "<table class=\"respar\">\n"
2065     "<tr>\n"
2066     "<td><a href='%U'><img src='%I' width='64'></a></td>\n"
2067     "<td>%L &nbsp;<i>%S</i> &nbsp;&nbsp;<b>%T</b><br>\n"
2068     "<span style='white-space:nowrap'><i>%M</i>&nbsp;%D</span>&nbsp;&nbsp;&nbsp; <i>%U</i>&nbsp;%i<br>\n"
2069     "%A %K</td>\n"
2070     "</tr></table>\n"
2071
2072   You may, for example, try the following for a more web-like experience:
2073
2074 <u><b><a href="P%N">%T</a></b></u><br>
2075 %A<font color=#008000>%U - %S</font> - %L
2076
2077   Note that the P%N link in the above paragraph makes the title a preview
2078   link. Or the clean looking:
2079
2080 <img src="%I" align="left">%L <font color="#900000">%R</font>
2081 &nbsp;&nbsp;<b>%T&</b><br>%S&nbsp;
2082 <font color="#808080"><i>%U</i></font>
2083 <table bgcolor="#e0e0e0">
2084 <tr><td><div>%A</div></td></tr>
2085 </table>%K
2086
2087   These samples, and some others are on the web site, with pictures to show
2088   how they look.
2089
2090   It is also possible to define the value of the snippet separator inside
2091   the abstract section.
2092
20933.2. Searching with the KDE KIO slave
2094
2095  3.2.1. What's this
2096
2097   The Recoll KIO slave allows performing a Recoll search by entering an
2098   appropriate URL in a KDE open dialog, or with an HTML-based interface
2099   displayed in Konqueror.
2100
2101   The HTML-based interface is similar to the Qt-based interface, but
2102   slightly less powerful for now. Its advantage is that you can perform your
2103   search while staying fully within the KDE framework: drag and drop from
2104   the result list works normally and you have your normal choice of
2105   applications for opening files.
2106
2107   The alternative interface uses a directory view of search results. Due to
2108   limitations in the current KIO slave interface, it is currently not
2109   obviously useful (to me).
2110
2111   The interface is described in more detail inside a help file which you can
2112   access by entering recoll:/ inside the konqueror URL line (this works only
2113   if the recoll KIO slave has been previously installed).
2114
2115   The instructions for building this module are located in the source tree.
2116   See: kde/kio/recoll/00README.txt. Some Linux distributions do package the
2117   kio-recoll module, so check before diving into the build process, maybe
2118   it's already out there ready for one-click installation.
2119
2120  3.2.2. Searchable documents
2121
2122   As a sample application, the Recoll KIO slave could allow preparing a set
2123   of HTML documents (for example a manual) so that they become their own
2124   search interface inside konqueror.
2125
2126   This can be done by either explicitly inserting <a href="recoll://...">
2127   links around some document areas, or automatically by adding a very small
2128   javascript program to the documents, like the following example, which
2129   would initiate a search by double-clicking any term:
2130
2131 <script language="JavaScript">
2132     function recollsearch() {
2133         var t = document.getSelection();
2134         window.location.href = 'recoll://search/query?qtp=a&p=0&q=' +
2135             encodeURIComponent(t);
2136     }
2137 </script>
2138  ....
2139 <body ondblclick="recollsearch()">
2140
2141
21423.3. Searching on the command line
2143
2144   There are several ways to obtain search results as a text stream, without
2145   a graphical interface:
2146
2147     o By passing option -t to the recoll program.
2148
2149     o By using the recollq program.
2150
2151     o By writing a custom Python program, using the Recoll Python API.
2152
2153   The first two methods work in the same way and accept/need the same
2154   arguments (except for the additional -t to recoll). The query to be
2155   executed is specified as command line arguments.
2156
2157   recollq is not built by default. You can use the Makefile in the query
2158   directory to build it. This is a very simple program, and if you can
2159   program a little c++, you may find it useful to taylor its output format
2160   to your needs. Not that recollq is only really useful on systems where the
2161   Qt libraries (or even the X11 ones) are not available. Otherwise, just use
2162   recoll -t, which takes the exact same parameters and options which are
2163   described for recollq
2164
2165   recollq has a man page (not installed by default, look in the doc/man
2166   directory). The Usage string is as follows:
2167
2168 recollq: usage:
2169  -P: Show the date span for all the documents present in the index
2170  [-o|-a|-f] [-q] <query string>
2171  Runs a recoll query and displays result lines.
2172   Default: will interpret the argument(s) as a xesam query string
2173     query may be like:
2174     implicit AND, Exclusion, field spec:    t1 -t2 title:t3
2175     OR has priority: t1 OR t2 t3 OR t4 means (t1 OR t2) AND (t3 OR t4)
2176     Phrase: "t1 t2" (needs additional quoting on cmd line)
2177   -o Emulate the GUI simple search in ANY TERM mode
2178   -a Emulate the GUI simple search in ALL TERMS mode
2179   -f Emulate the GUI simple search in filename mode
2180   -q is just ignored (compatibility with the recoll GUI command line)
2181 Common options:
2182     -c <configdir> : specify config directory, overriding $RECOLL_CONFDIR
2183     -d also dump file contents
2184     -n [first-]<cnt> define the result slice. The default value for [first]
2185        is 0. Without the option, the default max count is 2000.
2186        Use n=0 for no limit
2187     -b : basic. Just output urls, no mime types or titles
2188     -Q : no result lines, just the processed query and result count
2189     -m : dump the whole document meta[] array for each result
2190     -A : output the document abstracts
2191     -S fld : sort by field <fld>
2192     -s stemlang : set stemming language to use (must exist in index...)
2193        Use -s "" to turn off stem expansion
2194     -D : sort descending
2195     -i <dbdir> : additional index, several can be given
2196     -e use url encoding (%xx) for urls
2197     -F <field name list> : output exactly these fields for each result.
2198        The field values are encoded in base64, output in one line and
2199        separated by one space character. This is the recommended format
2200        for use by other programs. Use a normal query with option -m to
2201        see the field names.
2202
2203   Sample execution:
2204
2205 recollq 'ilur -nautique mime:text/html'
2206 Recoll query: ((((ilur:(wqf=11) OR ilurs) AND_NOT (nautique:(wqf=11)
2207   OR nautiques OR nautiqu OR nautiquement)) FILTER Ttext/html))
2208 4 results
2209 text/html       [file:///Users/uncrypted-dockes/projets/bateaux/ilur/comptes.html]      [comptes.html]  18593   bytes
2210 text/html       [file:///Users/uncrypted-dockes/projets/nautique/webnautique/articles/ilur1/index.html] [Constructio...
2211 text/html       [file:///Users/uncrypted-dockes/projets/pagepers/index.html]    [psxtcl/writemime/recoll]...
2212 text/html       [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/recu-chasse-maree....
2213
22143.4. Path translations
2215
2216   In some cases, the document paths stored inside the index do not match the
2217   actual ones, so that document previews and accesses will fail. This can
2218   occur in a number of circumstances:
2219
2220     o When using multiple indexes it is a relatively common occurrence that
2221       some will actually reside on a remote volume, for example mounted via
2222       NFS. In this case, the paths used to access the documents on the local
2223       machine are not necessarily the same than the ones used while indexing
2224       on the remote machine. For example, /home/me may have been used as a
2225       topdirs elements while indexing, but the directory might be mounted as
2226       /net/server/home/me on the local machine.
2227
2228     o The case may also occur with removable disks. It is perfectly possible
2229       to configure an index to live with the documents on the removable
2230       disk, but it may happen that the disk is not mounted at the same place
2231       so that the documents paths from the index are invalid.
2232
2233     o As a last example, one could imagine that a big directory has been
2234       moved, but that it is currently inconvenient to run the indexer.
2235
2236   More generally, the path translation facility may be useful whenever the
2237   documents paths seen by the indexer are not the same as the ones which
2238   should be used at query time.
2239
2240   Recoll has a facility for rewriting access paths when extracting the data
2241   from the index. The translations can be defined for the main index and for
2242   any additional query index.
2243
2244   In the above NFS example, Recoll could be instructed to rewrite any
2245   file:///home/me URL from the index to file:///net/server/home/me, allowing
2246   accesses from the client.
2247
2248   The translations are defined in the ptrans configuration file, which can
2249   be edited by hand or from the GUI external indexes configuration dialog.
2250
22513.5. The query language
2252
2253   The query language processor is activated in the GUI simple search entry
2254   when the search mode selector is set to Query Language. It can also be
2255   used with the KIO slave or the command line search. It broadly has the
2256   same capabilities as the complex search interface in the GUI.
2257
2258   The language is based on the (seemingly defunct) Xesam user search
2259   language specification.
2260
2261   If the results of a query language search puzzle you and you doubt what
2262   has been actually searched for, you can use the GUI Show Query link at the
2263   top of the result list to check the exact query which was finally executed
2264   by Xapian.
2265
2266   Here follows a sample request that we are going to explain:
2267
2268           author:"john doe" Beatles OR Lennon Live OR Unplugged -potatoes
2269
2270
2271   This would search for all documents with John Doe appearing as a phrase in
2272   the author field (exactly what this is would depend on the document type,
2273   ie: the From: header, for an email message), and containing either beatles
2274   or lennon and either live or unplugged but not potatoes (in any part of
2275   the document).
2276
2277   An element is composed of an optional field specification, and a value,
2278   separated by a colon (the field separator is the last colon in the
2279   element). Examples: Eugenie, author:balzac, dc:title:grandet
2280   dc:title:"eugenie grandet"
2281
2282   The colon, if present, means "contains". Xesam defines other relations,
2283   which are mostly unsupported for now (except in special cases, described
2284   further down).
2285
2286   All elements in the search entry are normally combined with an implicit
2287   AND. It is possible to specify that elements be OR'ed instead, as in
2288   Beatles OR Lennon. The OR must be entered literally (capitals), and it has
2289   priority over the AND associations: word1 word2 OR word3 means word1 AND
2290   (word2 OR word3) not (word1 AND word2) OR word3. Explicit parenthesis are
2291   not supported.
2292
2293   As of Recoll 1.21, you can use parentheses to group elements, which will
2294   sometimes make things clearer, and may allow expressing combinations which
2295   would have been difficult otherwise.
2296
2297   An element preceded by a - specifies a term that should not appear.
2298
2299   As usual, words inside quotes define a phrase (the order of words is
2300   significant), so that title:"prejudice pride" is not the same as
2301   title:prejudice title:pride, and is unlikely to find a result.
2302
2303   Words inside phrases and capitalized words are not stem-expanded.
2304   Wildcards may be used anywhere inside a term. Specifying a wild-card on
2305   the left of a term can produce a very slow search (or even an incorrect
2306   one if the expansion is truncated because of excessive size). Also see
2307   More about wildcards.
2308
2309   To save you some typing, recent Recoll versions (1.20 and later) interpret
2310   a comma-separated list of terms as an AND list inside the field. Use slash
2311   characters ('/') for an OR list. No white space is allowed. So
2312
2313 author:john,lennon
2314
2315   will search for documents with john and lennon inside the author field (in
2316   any order), and
2317
2318 author:john/ringo
2319
2320   would search for john or ringo.
2321
2322   Modifiers can be set on a double-quote value, for example to specify a
2323   proximity search (unordered). See the modifier section. No space must
2324   separate the final double-quote and the modifiers value, e.g. "two
2325   one"po10
2326
2327   Recoll currently manages the following default fields:
2328
2329     o title, subject or caption are synonyms which specify data to be
2330       searched for in the document title or subject.
2331
2332     o author or from for searching the documents originators.
2333
2334     o recipient or to for searching the documents recipients.
2335
2336     o keyword for searching the document-specified keywords (few documents
2337       actually have any).
2338
2339     o filename for the document's file name. This is not necessarily set for
2340       all documents: internal documents contained inside a compound one (for
2341       example an EPUB section) do not inherit the container file name any
2342       more, this was replaced by an explicit field (see next). Sub-documents
2343       can still have a specific filename, if it is implied by the document
2344       format, for example the attachment file name for an email attachment.
2345
2346     o containerfilename. This is set for all documents, both top-level and
2347       contained sub-documents, and is always the name of the filesystem
2348       directory entry which contains the data. The terms from this field can
2349       only be matched by an explicit field specification (as opposed to
2350       terms from filename which are also indexed as general document
2351       content). This avoids getting matches for all the sub-documents when
2352       searching for the container file name.
2353
2354     o ext specifies the file name extension (Ex: ext:html)
2355
2356   Recoll 1.20 and later have a way to specify aliases for the field names,
2357   which will save typing, for example by aliasing filename to fn or
2358   containerfilename to cfn. See the section about the fields file
2359
2360   The field syntax also supports a few field-like, but special, criteria:
2361
2362     o dir for filtering the results on file location (Ex:
2363       dir:/home/me/somedir). -dir also works to find results not in the
2364       specified directory (release >= 1.15.8). Tilde expansion will be
2365       performed as usual (except for a bug in versions 1.19 to 1.19.11p1).
2366       Wildcards will be expanded, but please have a look at an important
2367       limitation of wildcards in path filters.
2368
2369       Relative paths also make sense, for example, dir:share/doc would match
2370       either /usr/share/doc or /usr/local/share/doc
2371
2372       Several dir clauses can be specified, both positive and negative. For
2373       example the following makes sense:
2374
2375 dir:recoll dir:src -dir:utils -dir:common
2376
2377
2378       This would select results which have both recoll and src in the path
2379       (in any order), and which have not either utils or common.
2380
2381       You can also use OR conjunctions with dir: clauses.
2382
2383       A special aspect of dir clauses is that the values in the index are
2384       not transcoded to UTF-8, and never lower-cased or unaccented, but
2385       stored as binary. This means that you need to enter the values in the
2386       exact lower or upper case, and that searches for names with diacritics
2387       may sometimes be impossible because of character set conversion
2388       issues. Non-ASCII UNIX file paths are an unending source of trouble
2389       and are best avoided.
2390
2391       You need to use double-quotes around the path value if it contains
2392       space characters.
2393
2394     o size for filtering the results on file size. Example: size<10000. You
2395       can use <, > or = as operators. You can specify a range like the
2396       following: size>100 size<1000. The usual k/K, m/M, g/G, t/T can be
2397       used as (decimal) multipliers. Ex: size>1k to search for files bigger
2398       than 1000 bytes.
2399
2400     o date for searching or filtering on dates. The syntax for the argument
2401       is based on the ISO8601 standard for dates and time intervals. Only
2402       dates are supported, no times. The general syntax is 2 elements
2403       separated by a / character. Each element can be a date or a period of
2404       time. Periods are specified as PnYnMnD. The n numbers are the
2405       respective numbers of years, months or days, any of which may be
2406       missing. Dates are specified as YYYY-MM-DD. The days and months parts
2407       may be missing. If the / is present but an element is missing, the
2408       missing element is interpreted as the lowest or highest date in the
2409       index. Examples:
2410
2411          o 2001-03-01/2002-05-01 the basic syntax for an interval of dates.
2412
2413          o 2001-03-01/P1Y2M the same specified with a period.
2414
2415          o 2001/ from the beginning of 2001 to the latest date in the index.
2416
2417          o 2001 the whole year of 2001
2418
2419          o P2D/ means 2 days ago up to now if there are no documents with
2420            dates in the future.
2421
2422          o /2003 all documents from 2003 or older.
2423
2424       Periods can also be specified with small letters (ie: p2y).
2425
2426     o mime or format for specifying the MIME type. This one is quite special
2427       because you can specify several values which will be OR'ed (the normal
2428       default for the language is AND). Ex: mime:text/plain mime:text/html.
2429       Specifying an explicit boolean operator before a mime specification is
2430       not supported and will produce strange results. You can filter out
2431       certain types by using negation (-mime:some/type), and you can use
2432       wildcards in the value (mime:text/*). Note that mime is the ONLY field
2433       with an OR default. You do need to use OR with ext terms for example.
2434
2435     o type or rclcat for specifying the category (as in
2436       text/media/presentation/etc.). The classification of MIME types in
2437       categories is defined in the Recoll configuration (mimeconf), and can
2438       be modified or extended. The default category names are those which
2439       permit filtering results in the main GUI screen. Categories are OR'ed
2440       like MIME types above. This can't be negated with - either.
2441
2442   The document input handlers used while indexing have the possibility to
2443   create other fields with arbitrary names, and aliases may be defined in
2444   the configuration, so that the exact field search possibilities may be
2445   different for you if someone took care of the customisation.
2446
2447  3.5.1. Modifiers
2448
2449   Some characters are recognized as search modifiers when found immediately
2450   after the closing double quote of a phrase, as in "some
2451   term"modifierchars. The actual "phrase" can be a single term of course.
2452   Supported modifiers:
2453
2454     o l can be used to turn off stemming (mostly makes sense with p because
2455       stemming is off by default for phrases).
2456
2457     o o can be used to specify a "slack" for phrase and proximity searches:
2458       the number of additional terms that may be found between the specified
2459       ones. If o is followed by an integer number, this is the slack, else
2460       the default is 10.
2461
2462     o p can be used to turn the default phrase search into a proximity one
2463       (unordered). Example:"order any in"p
2464
2465     o C will turn on case sensitivity (if the index supports it).
2466
2467     o D will turn on diacritics sensitivity (if the index supports it).
2468
2469     o A weight can be specified for a query element by specifying a decimal
2470       value at the start of the modifiers. Example: "Important"2.5.
2471
24723.6. Search case and diacritics sensitivity
2473
2474   For Recoll versions 1.18 and later, and when working with a raw index (not
2475   the default), searches can be made sensitive to character case and
2476   diacritics. How this happens is controlled by configuration variables and
2477   what search data is entered.
2478
2479   The general default is that searches are insensitive to case and
2480   diacritics. An entry of resume will match any of Resume, RESUME, resume,
2481   Resume etc.
2482
2483   Two configuration variables can automate switching on sensitivity:
2484
2485   autodiacsens
2486
2487           If this is set, search sensitivity to diacritics will be turned on
2488           as soon as an accented character exists in a search term. When the
2489           variable is set to true, resume will start a
2490           diacritics-unsensitive search, but resume will be matched exactly.
2491           The default value is false.
2492
2493   autocasesens
2494
2495           If this is set, search sensitivity to character case will be
2496           turned on as soon as an upper-case character exists in a search
2497           term except for the first one. When the variable is set to true,
2498           us or Us will start a diacritics-unsensitive search, but US will
2499           be matched exactly. The default value is true (contrary to
2500           autodiacsens).
2501
2502   As in the past, capitalizing the first letter of a word will turn off its
2503   stem expansion and have no effect on case-sensitivity.
2504
2505   You can also explicitly activate case and diacritics sensitivity by using
2506   modifiers with the query language. C will make the term case-sensitive,
2507   and D will make it diacritics-sensitive. Examples:
2508
2509         "us"C
2510
2511
2512   will search for the term us exactly (Us will not be a match).
2513
2514         "resume"D
2515
2516
2517   will search for the term resume exactly (resume will not be a match).
2518
2519   When either case or diacritics sensitivity is activated, stem expansion is
2520   turned off. Having both does not make much sense.
2521
25223.7. Anchored searches and wildcards
2523
2524   Some special characters are interpreted by Recoll in search strings to
2525   expand or specialize the search. Wildcards expand a root term in
2526   controlled ways. Anchor characters can restrict a search to succeed only
2527   if the match is found at or near the beginning of the document or one of
2528   its fields.
2529
2530  3.7.1. More about wildcards
2531
2532   All words entered in Recoll search fields will be processed for wildcard
2533   expansion before the request is finally executed.
2534
2535   The wildcard characters are:
2536
2537     o * which matches 0 or more characters.
2538
2539     o ? which matches a single character.
2540
2541     o [] which allow defining sets of characters to be matched (ex: [abc]
2542       matches a single character which may be 'a' or 'b' or 'c', [0-9]
2543       matches any number.
2544
2545   You should be aware of a few things when using wildcards.
2546
2547     o Using a wildcard character at the beginning of a word can make for a
2548       slow search because Recoll will have to scan the whole index term list
2549       to find the matches. However, this is much less a problem for field
2550       searches, and queries like author:*@domain.com can sometimes be very
2551       useful.
2552
2553     o For Recoll version 18 only, when working with a raw index (preserving
2554       character case and diacritics), the literal part of a wildcard
2555       expression will be matched exactly for case and diacritics. This is
2556       not true any more for versions 19 and later.
2557
2558     o Using a * at the end of a word can produce more matches than you would
2559       think, and strange search results. You can use the term explorer tool
2560       to check what completions exist for a given term. You can also see
2561       exactly what search was performed by clicking on the link at the top
2562       of the result list. In general, for natural language terms, stem
2563       expansion will produce better results than an ending * (stem expansion
2564       is turned off when any wildcard character appears in the term).
2565
2566    3.7.1.1. Wildcards and path filtering
2567
2568   Due to the way that Recoll processes wildcards inside dir path filtering
2569   clauses, they will have a multiplicative effect on the query size. A
2570   clause containing wildcards in several paths elements, like, for example,
2571   dir:/home/me/*/*/docdir, will almost certainly fail if your indexed tree
2572   is of any realistic size.
2573
2574   Depending on the case, you may be able to work around the issue by
2575   specifying the paths elements more narrowly, with a constant prefix, or by
2576   using 2 separate dir: clauses instead of multiple wildcards, as in
2577   dir:/home/me dir:docdir. The latter query is not equivalent to the initial
2578   one because it does not specify a number of directory levels, but that's
2579   the best we can do (and it may be actually more useful in some cases).
2580
2581  3.7.2. Anchored searches
2582
2583   Two characters are used to specify that a search hit should occur at the
2584   beginning or at the end of the text. ^ at the beginning of a term or
2585   phrase constrains the search to happen at the start, $ at the end force it
2586   to happen at the end.
2587
2588   As this function is implemented as a phrase search it is possible to
2589   specify a maximum distance at which the hit should occur, either through
2590   the controls of the advanced search panel, or using the query language,
2591   for example, as in:
2592
2593 "^someterm"o10
2594
2595   which would force someterm to be found within 10 terms of the start of the
2596   text. This can be combined with a field search as in
2597   somefield:"^someterm"o10 or somefield:someterm$.
2598
2599   This feature can also be used with an actual phrase search, but in this
2600   case, the distance applies to the whole phrase and anchor, so that, for
2601   example, bla bla my unexpected term at the beginning of the text would be
2602   a match for "^my term"o5.
2603
2604   Anchored searches can be very useful for searches inside somewhat
2605   structured documents like scientific articles, in case explicit metadata
2606   has not been supplied (a most frequent case), for example for looking for
2607   matches inside the abstract or the list of authors (which occur at the top
2608   of the document).
2609
26103.8. Desktop integration
2611
2612   Being independent of the desktop type has its drawbacks: Recoll desktop
2613   integration is minimal. However there are a few tools available:
2614
2615     o The KDE KIO Slave was described in a previous section.
2616
2617     o If you use a recent version of Ubuntu Linux, you may find the Ubuntu
2618       Unity Lens module useful.
2619
2620     o There is also an independently developed Krunner plugin.
2621
2622   Here follow a few other things that may help.
2623
2624  3.8.1. Hotkeying recoll
2625
2626   It is surprisingly convenient to be able to show or hide the Recoll GUI
2627   with a single keystroke. Recoll comes with a small Python script, based on
2628   the libwnck window manager interface library, which will allow you to do
2629   just this. The detailed instructions are on this wiki page.
2630
2631  3.8.2. The KDE Kicker Recoll applet
2632
2633   This is probably obsolete now. Anyway:
2634
2635   The Recoll source tree contains the source code to the recoll_applet, a
2636   small application derived from the find_applet. This can be used to add a
2637   small Recoll launcher to the KDE panel.
2638
2639   The applet is not automatically built with the main Recoll programs, nor
2640   is it included with the main source distribution (because the KDE build
2641   boilerplate makes it relatively big). You can download its source from the
2642   recoll.org download page. Use the omnipotent configure;make;make install
2643   incantation to build and install.
2644
2645   You can then add the applet to the panel by right-clicking the panel and
2646   choosing the Add applet entry.
2647
2648   The recoll_applet has a small text window where you can type a Recoll
2649   query (in query language form), and an icon which can be used to restrict
2650   the search to certain types of files. It is quite primitive, and launches
2651   a new recoll GUI instance every time (even if it is already running). You
2652   may find it useful anyway.
2653
2654Chapter 4. Programming interface
2655
2656   Recoll has an Application Programming Interface, usable both for indexing
2657   and searching, currently accessible from the Python language.
2658
2659   Another less radical way to extend the application is to write input
2660   handlers for new types of documents.
2661
2662   The processing of metadata attributes for documents (fields) is highly
2663   configurable.
2664
26654.1. Writing a document input handler
2666
2667  Terminology
2668
2669   The small programs or pieces of code which handle the processing of the
2670   different document types for Recoll used to be called filters, which is
2671   still reflected in the name of the directory which holds them and many
2672   configuration variables. They were named this way because one of their
2673   primary functions is to filter out the formatting directives and keep the
2674   text content. However these modules may have other behaviours, and the
2675   term input handler is now progressively substituted in the documentation.
2676   filter is still used in many places though.
2677
2678   Recoll input handlers cooperate to translate from the multitude of input
2679   document formats, simple ones as opendocument, acrobat), or compound ones
2680   such as Zip or Email, into the final Recoll indexing input format, which
2681   is plain text. Most input handlers are executable programs or scripts. A
2682   few handlers are coded in C++ and live inside recollindex. This latter
2683   kind will not be described here.
2684
2685   There are currently (1.18 and since 1.13) two kinds of external executable
2686   input handlers:
2687
2688     o Simple exec handlers run once and exit. They can be bare programs like
2689       antiword, or scripts using other programs. They are very simple to
2690       write, because they just need to print the converted document to the
2691       standard output. Their output can be plain text or HTML. HTML is
2692       usually preferred because it can store metadata fields and it allows
2693       preserving some of the formatting for the GUI preview.
2694
2695     o Multiple execm handlers can process multiple files (sparing the
2696       process startup time which can be very significant), or multiple
2697       documents per file (e.g.: for zip or chm files). They communicate with
2698       the indexer through a simple protocol, but are nevertheless a bit more
2699       complicated than the older kind. Most of new handlers are written in
2700       Python, using a common module to handle the protocol. There is an
2701       exception, rclimg which is written in Perl. The subdocuments output by
2702       these handlers can be directly indexable (text or HTML), or they can
2703       be other simple or compound documents that will need to be processed
2704       by another handler.
2705
2706   In both cases, handlers deal with regular file system files, and can
2707   process either a single document, or a linear list of documents in each
2708   file. Recoll is responsible for performing up to date checks, deal with
2709   more complex embedding and other upper level issues.
2710
2711   A simple handler returning a document in text/plain format, can transfer
2712   no metadata to the indexer. Generic metadata, like document size or
2713   modification date, will be gathered and stored by the indexer.
2714
2715   Handlers that produce text/html format can return an arbitrary amount of
2716   metadata inside HTML meta tags. These will be processed according to the
2717   directives found in the fields configuration file.
2718
2719   The handlers that can handle multiple documents per file return a single
2720   piece of data to identify each document inside the file. This piece of
2721   data, called an ipath element will be sent back by Recoll to extract the
2722   document at query time, for previewing, or for creating a temporary file
2723   to be opened by a viewer.
2724
2725   The following section describes the simple handlers, and the next one
2726   gives a few explanations about the execm ones. You could conceivably write
2727   a simple handler with only the elements in the manual. This will not be
2728   the case for the other ones, for which you will have to look at the code.
2729
2730  4.1.1. Simple input handlers
2731
2732   Recoll simple handlers are usually shell-scripts, but this is in no way
2733   necessary. Extracting the text from the native format is the difficult
2734   part. Outputting the format expected by Recoll is trivial. Happily enough,
2735   most document formats have translators or text extractors which can be
2736   called from the handler. In some cases the output of the translating
2737   program is completely appropriate, and no intermediate shell-script is
2738   needed.
2739
2740   Input handlers are called with a single argument which is the source file
2741   name. They should output the result to stdout.
2742
2743   When writing a handler, you should decide if it will output plain text or
2744   HTML. Plain text is simpler, but you will not be able to add metadata or
2745   vary the output character encoding (this will be defined in a
2746   configuration file). Additionally, some formatting may be easier to
2747   preserve when previewing HTML. Actually the deciding factor is metadata:
2748   Recoll has a way to extract metadata from the HTML header and use it for
2749   field searches..
2750
2751   The RECOLL_FILTER_FORPREVIEW environment variable (values yes, no) tells
2752   the handler if the operation is for indexing or previewing. Some handlers
2753   use this to output a slightly different format, for example stripping
2754   uninteresting repeated keywords (ie: Subject: for email) when indexing.
2755   This is not essential.
2756
2757   You should look at one of the simple handlers, for example rclps for a
2758   starting point.
2759
2760   Don't forget to make your handler executable before testing !
2761
2762  4.1.2. "Multiple" handlers
2763
2764   If you can program and want to write an execm handler, it should not be
2765   too difficult to make sense of one of the existing modules. For example,
2766   look at rclzip which uses Zip file paths as identifiers (ipath), and
2767   rclics, which uses an integer index. Also have a look at the comments
2768   inside the internfile/mh_execm.h file and possibly at the corresponding
2769   module.
2770
2771   execm handlers sometimes need to make a choice for the nature of the ipath
2772   elements that they use in communication with the indexer. Here are a few
2773   guidelines:
2774
2775     o Use ASCII or UTF-8 (if the identifier is an integer print it, for
2776       example, like printf %d would do).
2777
2778     o If at all possible, the data should make some kind of sense when
2779       printed to a log file to help with debugging.
2780
2781     o Recoll uses a colon (:) as a separator to store a complex path
2782       internally (for deeper embedding). Colons inside the ipath elements
2783       output by a handler will be escaped, but would be a bad choice as a
2784       handler-specific separator (mostly, again, for debugging issues).
2785
2786   In any case, the main goal is that it should be easy for the handler to
2787   extract the target document, given the file name and the ipath element.
2788
2789   execm handlers will also produce a document with a null ipath element.
2790   Depending on the type of document, this may have some associated data
2791   (e.g. the body of an email message), or none (typical for an archive
2792   file). If it is empty, this document will be useful anyway for some
2793   operations, as the parent of the actual data documents.
2794
2795  4.1.3. Telling Recoll about the handler
2796
2797   There are two elements that link a file to the handler which should
2798   process it: the association of file to MIME type and the association of a
2799   MIME type with a handler.
2800
2801   The association of files to MIME types is mostly based on name suffixes.
2802   The types are defined inside the mimemap file. Example:
2803
2804
2805 .doc = application/msword
2806
2807   If no suffix association is found for the file name, Recoll will try to
2808   execute the file -i command to determine a MIME type.
2809
2810   The association of file types to handlers is performed in the mimeconf
2811   file. A sample will probably be of better help than a long explanation:
2812
2813
2814 [index]
2815 application/msword = exec antiword -t -i 1 -m UTF-8;\
2816      mimetype = text/plain ; charset=utf-8
2817
2818 application/ogg = exec rclogg
2819
2820 text/rtf = exec unrtf --nopict --html; charset=iso-8859-1; mimetype=text/html
2821
2822 application/x-chm = execm rclchm
2823
2824   The fragment specifies that:
2825
2826     o application/msword files are processed by executing the antiword
2827       program, which outputs text/plain encoded in utf-8.
2828
2829     o application/ogg files are processed by the rclogg script, with default
2830       output type (text/html, with encoding specified in the header, or
2831       utf-8 by default).
2832
2833     o text/rtf is processed by unrtf, which outputs text/html. The
2834       iso-8859-1 encoding is specified because it is not the utf-8 default,
2835       and not output by unrtf in the HTML header section.
2836
2837     o application/x-chm is processed by a persistent handler. This is
2838       determined by the execm keyword.
2839
2840  4.1.4. Input handler HTML output
2841
2842   The output HTML could be very minimal like the following example:
2843
2844 <html>
2845   <head>
2846     <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
2847   </head>
2848   <body>
2849    Some text content
2850   </body>
2851 </html>
2852
2853
2854   You should take care to escape some characters inside the text by
2855   transforming them into appropriate entities. At the very minimum, "&"
2856   should be transformed into "&amp;", "<" should be transformed into "&lt;".
2857   This is not always properly done by translating programs which output
2858   HTML, and of course never by those which output plain text.
2859
2860   When encapsulating plain text in an HTML body, the display of a preview
2861   may be improved by enclosing the text inside <pre> tags.
2862
2863   The character set needs to be specified in the header. It does not need to
2864   be UTF-8 (Recoll will take care of translating it), but it must be
2865   accurate for good results.
2866
2867   Recoll will process meta tags inside the header as possible document
2868   fields candidates. Documents fields can be processed by the indexer in
2869   different ways, for searching or displaying inside query results. This is
2870   described in a following section.
2871
2872   By default, the indexer will process the standard header fields if they
2873   are present: title, meta/description, and meta/keywords are both indexed
2874   and stored for query-time display.
2875
2876   A predefined non-standard meta tag will also be processed by Recoll
2877   without further configuration: if a date tag is present and has the right
2878   format, it will be used as the document date (for display and sorting), in
2879   preference to the file modification date. The date format should be as
2880   follows:
2881
2882 <meta name="date" content="YYYY-mm-dd HH:MM:SS">
2883 or
2884 <meta name="date" content="YYYY-mm-ddTHH:MM:SS">
2885
2886
2887   Example:
2888
2889 <meta name="date" content="2013-02-24 17:50:00">
2890
2891
2892   Input handlers also have the possibility to "invent" field names. This
2893   should also be output as meta tags:
2894
2895 <meta name="somefield" content="Some textual data" />
2896
2897   You can embed HTML markup inside the content of custom fields, for
2898   improving the display inside result lists. In this case, add a (wildly
2899   non-standard) markup attribute to tell Recoll that the value is HTML and
2900   should not be escaped for display.
2901
2902 <meta name="somefield" markup="html" content="Some <i>textual</i> data" />
2903
2904   As written above, the processing of fields is described in a further
2905   section.
2906
2907  4.1.5. Page numbers
2908
2909   The indexer will interpret ^L characters in the handler output as
2910   indicating page breaks, and will record them. At query time, this allows
2911   starting a viewer on the right page for a hit or a snippet. Currently,
2912   only the PDF, Postscript and DVI handlers generate page breaks.
2913
29144.2. Field data processing
2915
2916   Fields are named pieces of information in or about documents, like title,
2917   author, abstract.
2918
2919   The field values for documents can appear in several ways during indexing:
2920   either output by input handlers as meta fields in the HTML header section,
2921   or extracted from file extended attributes, or added as attributes of the
2922   Doc object when using the API, or again synthetized internally by Recoll.
2923
2924   The Recoll query language allows searching for text in a specific field.
2925
2926   Recoll defines a number of default fields. Additional ones can be output
2927   by handlers, and described in the fields configuration file.
2928
2929   Fields can be:
2930
2931     o indexed, meaning that their terms are separately stored in inverted
2932       lists (with a specific prefix), and that a field-specific search is
2933       possible.
2934
2935     o stored, meaning that their value is recorded in the index data record
2936       for the document, and can be returned and displayed with search
2937       results.
2938
2939   A field can be either or both indexed and stored. This and other aspects
2940   of fields handling is defined inside the fields configuration file.
2941
2942   The sequence of events for field processing is as follows:
2943
2944     o During indexing, recollindex scans all meta fields in HTML documents
2945       (most document types are transformed into HTML at some point). It
2946       compares the name for each element to the configuration defining what
2947       should be done with fields (the fields file)
2948
2949     o If the name for the meta element matches one for a field that should
2950       be indexed, the contents are processed and the terms are entered into
2951       the index with the prefix defined in the fields file.
2952
2953     o If the name for the meta element matches one for a field that should
2954       be stored, the content of the element is stored with the document data
2955       record, from which it can be extracted and displayed at query time.
2956
2957     o At query time, if a field search is performed, the index prefix is
2958       computed and the match is only performed against appropriately
2959       prefixed terms in the index.
2960
2961     o At query time, the field can be displayed inside the result list by
2962       using the appropriate directive in the definition of the result list
2963       paragraph format. All fields are displayed on the fields screen of the
2964       preview window (which you can reach through the right-click menu).
2965       This is independent of the fact that the search which produced the
2966       results used the field or not.
2967
2968   You can find more information in the section about the fields file, or in
2969   comments inside the file.
2970
2971   You can also have a look at the example on the Wiki, detailing how one
2972   could add a page count field to pdf documents for displaying inside result
2973   lists.
2974
29754.3. API
2976
2977  4.3.1. Interface elements
2978
2979   A few elements in the interface are specific and and need an explanation.
2980
2981   udi
2982
2983           An udi (unique document identifier) identifies a document. Because
2984           of limitations inside the index engine, it is restricted in length
2985           (to 200 bytes), which is why a regular URI cannot be used. The
2986           structure and contents of the udi is defined by the application
2987           and opaque to the index engine. For example, the internal file
2988           system indexer uses the complete document path (file path +
2989           internal path), truncated to length, the suppressed part being
2990           replaced by a hash value.
2991
2992   ipath
2993
2994           This data value (set as a field in the Doc object) is stored,
2995           along with the URL, but not indexed by Recoll. Its contents are
2996           not interpreted, and its use is up to the application. For
2997           example, the Recoll internal file system indexer stores the part
2998           of the document access path internal to the container file (ipath
2999           in this case is a list of subdocument sequential numbers). url and
3000           ipath are returned in every search result and permit access to the
3001           original document.
3002
3003   Stored and indexed fields
3004
3005           The fields file inside the Recoll configuration defines which
3006           document fields are either "indexed" (searchable), "stored"
3007           (retrievable with search results), or both.
3008
3009   Data for an external indexer, should be stored in a separate index, not
3010   the one for the Recoll internal file system indexer, except if the latter
3011   is not used at all). The reason is that the main document indexer purge
3012   pass would remove all the other indexer's documents, as they were not seen
3013   during indexing. The main indexer documents would also probably be a
3014   problem for the external indexer purge operation.
3015
3016  4.3.2. Python interface
3017
3018    4.3.2.1. Introduction
3019
3020   Recoll versions after 1.11 define a Python programming interface, both for
3021   searching and indexing. The indexing portion has seen little use, but the
3022   searching one is used in the Recoll Ubuntu Unity Lens and Recoll Web UI.
3023
3024   The API is inspired by the Python database API specification. There were
3025   two major changes in recent Recoll versions:
3026
3027     o The basis for the Recoll API changed from Python database API version
3028       1.0 (Recoll versions up to 1.18.1), to version 2.0 (Recoll 1.18.2 and
3029       later).
3030     o The recoll module became a package (with an internal recoll module) as
3031       of Recoll version 1.19, in order to add more functions. For existing
3032       code, this only changes the way the interface must be imported.
3033
3034   We will mostly describe the new API and package structure here. A
3035   paragraph at the end of this section will explain a few differences and
3036   ways to write code compatible with both versions.
3037
3038   The Python interface can be found in the source package, under
3039   python/recoll.
3040
3041   The python/recoll/ directory contains the usual setup.py. After
3042   configuring the main Recoll code, you can use the script to build and
3043   install the Python module:
3044
3045             cd recoll-xxx/python/recoll
3046             python setup.py build
3047             python setup.py install
3048
3049
3050   The normal Recoll installer installs the Python API along with the main
3051   code.
3052
3053   When installing from a repository, and depending on the distribution, the
3054   Python API can sometimes be found in a separate package.
3055
3056    4.3.2.2. Recoll package
3057
3058   The recoll package contains two modules:
3059
3060     o The recoll module contains functions and classes used to query (or
3061       update) the index.
3062
3063     o The rclextract module contains functions and classes used to access
3064       document data.
3065
3066    4.3.2.3. The recoll module
3067
3068      Functions
3069
3070   connect(confdir=None, extra_dbs=None, writable = False)
3071           The connect() function connects to one or several Recoll index(es)
3072           and returns a Db object.
3073              o confdir may specify a configuration directory. The usual
3074                defaults apply.
3075              o extra_dbs is a list of additional indexes (Xapian
3076                directories).
3077              o writable decides if we can index new data through this
3078                connection.
3079           This call initializes the recoll module, and it should always be
3080           performed before any other call or object creation.
3081
3082      Classes
3083
3084        The Db class
3085
3086   A Db object is created by a connect() call and holds a connection to a
3087   Recoll index.
3088
3089   Methods
3090
3091   Db.close()
3092           Closes the connection. You can't do anything with the Db object
3093           after this.
3094
3095   Db.query(), Db.cursor()
3096           These aliases return a blank Query object for this index.
3097
3098   Db.setAbstractParams(maxchars, contextwords)
3099           Set the parameters used to build snippets (sets of keywords in
3100           context text fragments). maxchars defines the maximum total size
3101           of the abstract. contextwords defines how many terms are shown
3102           around the keyword.
3103
3104   Db.termMatch(match_type, expr, field='', maxlen=-1, casesens=False,
3105   diacsens=False, lang='english')
3106           Expand an expression against the index term list. Performs the
3107           basic function from the GUI term explorer tool. match_type can be
3108           either of wildcard, regexp or stem. Returns a list of terms
3109           expanded from the input expression.
3110
3111        The Query class
3112
3113   A Query object (equivalent to a cursor in the Python DB API) is created by
3114   a Db.query() call. It is used to execute index searches.
3115
3116   Methods
3117
3118   Query.sortby(fieldname, ascending=True)
3119           Sort results by fieldname, in ascending or descending order. Must
3120           be called before executing the search.
3121
3122   Query.execute(query_string, stemming=1, stemlang="english")
3123           Starts a search for query_string, a Recoll search language string.
3124
3125   Query.executesd(SearchData)
3126           Starts a search for the query defined by the SearchData object.
3127
3128   Query.fetchmany(size=query.arraysize)
3129           Fetches the next Doc objects in the current search results, and
3130           returns them as an array of the required size, which is by default
3131           the value of the arraysize data member.
3132
3133   Query.fetchone()
3134           Fetches the next Doc object from the current search results.
3135
3136   Query.close()
3137           Closes the query. The object is unusable after the call.
3138
3139   Query.scroll(value, mode='relative')
3140           Adjusts the position in the current result set. mode can be
3141           relative or absolute.
3142
3143   Query.getgroups()
3144           Retrieves the expanded query terms as a list of pairs. Meaningful
3145           only after executexx In each pair, the first entry is a list of
3146           user terms (of size one for simple terms, or more for group and
3147           phrase clauses), the second a list of query terms as derived from
3148           the user terms and used in the Xapian Query.
3149
3150   Query.getxquery()
3151           Return the Xapian query description as a Unicode string.
3152           Meaningful only after executexx.
3153
3154   Query.highlight(text, ishtml = 0, methods = object)
3155           Will insert <span "class=rclmatch">, </span> tags around the match
3156           areas in the input text and return the modified text. ishtml can
3157           be set to indicate that the input text is HTML and that HTML
3158           special characters should not be escaped. methods if set should be
3159           an object with methods startMatch(i) and endMatch() which will be
3160           called for each match and should return a begin and end tag
3161
3162   Query.makedocabstract(doc, methods = object))
3163           Create a snippets abstract for doc (a Doc object) by selecting
3164           text around the match terms. If methods is set, will also perform
3165           highlighting. See the highlight method.
3166
3167   Query.__iter__() and Query.next()
3168           So that things like for doc in query: will work.
3169
3170   Data descriptors
3171
3172   Query.arraysize
3173           Default number of records processed by fetchmany (r/w).
3174
3175   Query.rowcount
3176           Number of records returned by the last execute.
3177
3178   Query.rownumber
3179           Next index to be fetched from results. Normally increments after
3180           each fetchone() call, but can be set/reset before the call to
3181           effect seeking (equivalent to using scroll()). Starts at 0.
3182
3183        The Doc class
3184
3185   A Doc object contains index data for a given document. The data is
3186   extracted from the index when searching, or set by the indexer program
3187   when updating. The Doc object has many attributes to be read or set by its
3188   user. It matches exactly the Rcl::Doc C++ object. Some of the attributes
3189   are predefined, but, especially when indexing, others can be set, the name
3190   of which will be processed as field names by the indexing configuration.
3191   Inputs can be specified as Unicode or strings. Outputs are Unicode
3192   objects. All dates are specified as Unix timestamps, printed as strings.
3193   Please refer to the rcldb/rcldoc.h C++ file for a description of the
3194   predefined attributes.
3195
3196   At query time, only the fields that are defined as stored either by
3197   default or in the fields configuration file will be meaningful in the Doc
3198   object. Especially this will not be the case for the document text. See
3199   the rclextract module for accessing document contents.
3200
3201   Methods
3202
3203   get(key), [] operator
3204           Retrieve the named doc attribute
3205
3206   getbinurl()
3207           Retrieve the URL in byte array format (no transcoding), for use as
3208           parameter to a system call.
3209
3210   items()
3211           Return a dictionary of doc object keys/values
3212
3213   keys()
3214           list of doc object keys (attribute names).
3215
3216        The SearchData class
3217
3218   A SearchData object allows building a query by combining clauses, for
3219   execution by Query.executesd(). It can be used in replacement of the query
3220   language approach. The interface is going to change a little, so no
3221   detailed doc for now...
3222
3223   Methods
3224
3225   addclause(type='and'|'or'|'excl'|'phrase'|'near'|'sub', qstring=string,
3226   slack=0, field='', stemming=1, subSearch=SearchData)
3227
3228    4.3.2.4. The rclextract module
3229
3230   Index queries do not provide document content (only a partial and
3231   unprecise reconstruction is performed to show the snippets text). In order
3232   to access the actual document data, the data extraction part of the
3233   indexing process must be performed (subdocument access and format
3234   translation). This is not trivial in general. The rclextract module
3235   currently provides a single class which can be used to access the data
3236   content for result documents.
3237
3238      Classes
3239
3240        The Extractor class
3241
3242   Methods
3243
3244   Extractor(doc)
3245           An Extractor object is built from a Doc object, output from a
3246           query.
3247
3248   Extractor.textextract(ipath)
3249           Extract document defined by ipath and return a Doc object. The
3250           doc.text field has the document text converted to either
3251           text/plain or text/html according to doc.mimetype. The typical use
3252           would be as follows:
3253
3254 qdoc = query.fetchone()
3255 extractor = recoll.Extractor(qdoc)
3256 doc = extractor.textextract(qdoc.ipath)
3257 # use doc.text, e.g. for previewing
3258
3259   Extractor.idoctofile(ipath, targetmtype, outfile='')
3260           Extracts document into an output file, which can be given
3261           explicitly or will be created as a temporary file to be deleted by
3262           the caller. Typical use:
3263
3264 qdoc = query.fetchone()
3265 extractor = recoll.Extractor(qdoc)
3266 filename = extractor.idoctofile(qdoc.ipath, qdoc.mimetype)
3267
3268    4.3.2.5. Example code
3269
3270   The following sample would query the index with a user language string.
3271   See the python/samples directory inside the Recoll source for other
3272   examples. The recollgui subdirectory has a very embryonic GUI which
3273   demonstrates the highlighting and data extraction functions.
3274
3275 #!/usr/bin/env python
3276
3277 from recoll import recoll
3278
3279 db = recoll.connect()
3280 db.setAbstractParams(maxchars=80, contextwords=4)
3281
3282 query = db.query()
3283 nres = query.execute("some user question")
3284 print "Result count: ", nres
3285 if nres > 5:
3286     nres = 5
3287 for i in range(nres):
3288     doc = query.fetchone()
3289     print "Result #%d" % (query.rownumber,)
3290     for k in ("title", "size"):
3291         print k, ":", getattr(doc, k).encode('utf-8')
3292     abs = db.makeDocAbstract(doc, query).encode('utf-8')
3293     print abs
3294     print
3295
3296
3297
3298    4.3.2.6. Compatibility with the previous version
3299
3300   The following code fragments can be used to ensure that code can run with
3301   both the old and the new API (as long as it does not use the new abilities
3302   of the new API of course).
3303
3304   Adapting to the new package structure:
3305
3306
3307 try:
3308     from recoll import recoll
3309     from recoll import rclextract
3310     hasextract = True
3311 except:
3312     import recoll
3313     hasextract = False
3314
3315
3316   Adapting to the change of nature of the next Query member. The same test
3317   can be used to choose to use the scroll() method (new) or set the next
3318   value (old).
3319
3320
3321        rownum = query.next if type(query.next) == int else \
3322                  query.rownumber
3323
3324
3325Chapter 5. Installation and configuration
3326
33275.1. Installing a binary copy
3328
3329   Recoll binary copies are always distributed as regular packages for your
3330   system. They can be obtained either through the system's normal software
3331   distribution framework (e.g. Debian/Ubuntu apt, FreeBSD ports, etc.), or
3332   from some type of "backports" repository providing versions newer than the
3333   standard ones, or found on the Recoll WEB site in some cases.
3334
3335   There used to exist another form of binary install, as pre-compiled source
3336   trees, but these are just less convenient than the packages and don't
3337   exist any more.
3338
3339   The package management tools will usually automatically deal with hard
3340   dependencies for packages obtained from a proper package repository. You
3341   will have to deal with them by hand for downloaded packages (for example,
3342   when dpkg complains about missing dependencies).
3343
3344   In all cases, you will have to check or install supporting applications
3345   for the file types that you want to index beyond those that are natively
3346   processed by Recoll (text, HTML, email files, and a few others).
3347
3348   You should also maybe have a look at the configuration section (but this
3349   may not be necessary for a quick test with default parameters). Most
3350   parameters can be more conveniently set from the GUI interface.
3351
33525.2. Supporting packages
3353
3354   Recoll uses external applications to index some file types. You need to
3355   install them for the file types that you wish to have indexed (these are
3356   run-time optional dependencies. None is needed for building or running
3357   Recoll except for indexing their specific file type).
3358
3359   After an indexing pass, the commands that were found missing can be
3360   displayed from the recoll File menu. The list is stored in the missing
3361   text file inside the configuration directory.
3362
3363   A list of common file types which need external commands follows. Many of
3364   the handlers need the iconv command, which is not always listed as a
3365   dependency.
3366
3367   Please note that, due to the relatively dynamic nature of this
3368   information, the most up to date version is now kept on
3369   http://www.recoll.org/features.html along with links to the home pages or
3370   best source/patches pages, and misc tips. The list below is not updated
3371   often and may be quite stale.
3372
3373   For many Linux distributions, most of the commands listed can be installed
3374   from the package repositories. However, the packages are sometimes
3375   outdated, or not the best version for Recoll, so you should take a look at
3376   http://www.recoll.org/features.html if a file type is important to you.
3377
3378   As of Recoll release 1.14, a number of XML-based formats that were handled
3379   by ad hoc handler code now use the xsltproc command, which usually comes
3380   with libxslt. These are: abiword, fb2 (ebooks), kword, openoffice, svg.
3381
3382   Now for the list:
3383
3384     o Openoffice files need unzip and xsltproc.
3385
3386     o PDF files need pdftotext which is part of Poppler (usually comes with
3387       the poppler-utils package). Avoid the original one from Xpdf.
3388
3389     o Postscript files need pstotext. The original version has an issue with
3390       shell character in file names, which is corrected in recent packages.
3391       See http://www.recoll.org/features.html for more detail.
3392
3393     o MS Word needs antiword. It is also useful to have wvWare installed as
3394       it may be be used as a fallback for some files which antiword does not
3395       handle.
3396
3397     o MS Excel and PowerPoint are processed by internal Python handlers.
3398
3399     o MS Open XML (docx) needs xsltproc.
3400
3401     o Wordperfect files need wpd2html from the libwpd (or libwpd-tools on
3402       Ubuntu) package.
3403
3404     o RTF files need unrtf, which, in its older versions, has much trouble
3405       with non-western character sets. Many Linux distributions carry
3406       outdated unrtf versions. Check http://www.recoll.org/features.html for
3407       details.
3408
3409     o TeX files need untex or detex. Check
3410       http://www.recoll.org/features.html for sources if it's not packaged
3411       for your distribution.
3412
3413     o dvi files need dvips.
3414
3415     o djvu files need djvutxt and djvused from the DjVuLibre package.
3416
3417     o Audio files: Recoll releases 1.14 and later use a single Python
3418       handler based on mutagen for all audio file types.
3419
3420     o Pictures: Recoll uses the Exiftool Perl package to extract tag
3421       information. Most image file formats are supported. Note that there
3422       may not be much interest in indexing the technical tags (image size,
3423       aperture, etc.). This is only of interest if you store personal tags
3424       or textual descriptions inside the image files.
3425
3426     o chm: files in Microsoft help format need Python and the pychm module
3427       (which needs chmlib).
3428
3429     o ICS: up to Recoll 1.13, iCalendar files need Python and the icalendar
3430       module. icalendar is not needed for newer versions, which use internal
3431       code.
3432
3433     o Zip archives need Python (and the standard zipfile module).
3434
3435     o Rar archives need Python, the rarfile Python module and the unrar
3436       utility.
3437
3438     o Midi karaoke files need Python and the Midi module
3439
3440     o Konqueror webarchive format with Python (uses the Tarfile module).
3441
3442     o Mimehtml web archive format (support based on the email handler, which
3443       introduces some mild weirdness, but still usable).
3444
3445   Text, HTML, email folders, and Scribus files are processed internally. Lyx
3446   is used to index Lyx files. Many handlers need iconv and the standard sed
3447   and awk.
3448
34495.3. Building from source
3450
3451  5.3.1. Prerequisites
3452
3453   If you can install any or all of the following through the package manager
3454   for your system, all the better. Especially Qt is a very big piece of
3455   software, but you will most probably be able to find a binary package.
3456
3457   You may have to compile Xapian but this is easy.
3458
3459   The shopping list:
3460
3461     o C++ compiler. Up to Recoll version 1.13.04, its absence can manifest
3462       itself by strange messages about a missing iconv_open.
3463
3464     o Development files for Xapian core.
3465
3466  Important
3467
3468       If you are building Xapian for an older CPU (before Pentium 4 or
3469       Athlon 64), you need to add the --disable-sse flag to the configure
3470       command. Else all Xapian application will crash with an illegal
3471       instruction error.
3472
3473     o Development files for Qt 4 . Recoll has not been tested with Qt 5 yet.
3474       Recoll 1.15.9 was the last version to support Qt 3. If you do not want
3475       to install or build the Qt Webkit module, Recoll has a configuration
3476       option to disable its use (see further).
3477
3478     o Development files for X11 and zlib.
3479
3480     o You may also need libiconv. On Linux systems, the iconv interface is
3481       part of libc and you should not need to do anything special.
3482
3483   Check the Recoll download page for up to date version information.
3484
3485  5.3.2. Building
3486
3487   Recoll has been built on Linux, FreeBSD, Mac OS X, and Solaris, most
3488   versions after 2005 should be ok, maybe some older ones too (Solaris 8 is
3489   ok). If you build on another system, and need to modify things, I would
3490   very much welcome patches.
3491
3492   Configure options:
3493
3494     o --without-aspell will disable the code for phonetic matching of search
3495       terms.
3496
3497     o --with-fam or --with-inotify will enable the code for real time
3498       indexing. Inotify support is enabled by default on recent Linux
3499       systems.
3500
3501     o --with-qzeitgeist will enable sending Zeitgeist events about the
3502       visited search results, and needs the qzeitgeist package.
3503
3504     o --disable-webkit is available from version 1.17 to implement the
3505       result list with a Qt QTextBrowser instead of a WebKit widget if you
3506       do not or can't depend on the latter.
3507
3508     o --disable-idxthreads is available from version 1.19 to suppress
3509       multithreading inside the indexing process. You can also use the
3510       run-time configuration to restrict recollindex to using a single
3511       thread, but the compile-time option may disable a few more unused
3512       locks. This only applies to the use of multithreading for the core
3513       index processing (data input). The Recoll monitor mode always uses at
3514       least two threads of execution.
3515
3516     o --disable-python-module will avoid building the Python module.
3517
3518     o --disable-xattr will prevent fetching data from file extended
3519       attributes. Beyond a few standard attributes, fetching extended
3520       attributes data can only be useful is some application stores data in
3521       there, and also needs some simple configuration (see comments in the
3522       fields configuration file).
3523
3524     o --enable-camelcase will enable splitting camelCase words. This is not
3525       enabled by default as it has the unfortunate side-effect of making
3526       some phrase searches quite confusing: ie, "MySQL manual" would be
3527       matched by "MySQL manual" and "my sql manual" but not "mysql manual"
3528       (only inside phrase searches).
3529
3530     o --with-file-command Specify the version of the 'file' command to use
3531       (ie: --with-file-command=/usr/local/bin/file). Can be useful to enable
3532       the gnu version on systems where the native one is bad.
3533
3534     o --disable-qtgui Disable the Qt interface. Will allow building the
3535       indexer and the command line search program in absence of a Qt
3536       environment.
3537
3538     o --disable-x11mon Disable X11 connection monitoring inside recollindex.
3539       Together with --disable-qtgui, this allows building recoll without Qt
3540       and X11.
3541
3542     o --disable-pic will compile Recoll with position-dependant code. This
3543       is incompatible with building the KIO or the Python or PHP extensions,
3544       but might yield very marginally faster code.
3545
3546     o Of course the usual autoconf configure options, like --prefix apply.
3547
3548   Normal procedure:
3549
3550         cd recoll-xxx
3551         ./configure
3552         make
3553         (practices usual hardship-repelling invocations)
3554
3555
3556   There is little auto-configuration. The configure script will mainly link
3557   one of the system-specific files in the mk directory to mk/sysconf. If
3558   your system is not known yet, it will tell you as much, and you may want
3559   to manually copy and modify one of the existing files (the new file name
3560   should be the output of uname -s).
3561
3562    5.3.2.1. Building on Solaris
3563
3564   We did not test building the GUI on Solaris for recent versions. You will
3565   need at least Qt 4.4. There are some hints on an old web site page, they
3566   may still be valid.
3567
3568   Someone did test the 1.19 indexer and Python module build, they do work,
3569   with a few minor glitches. Be sure to use GNU make and install.
3570
3571  5.3.3. Installation
3572
3573   Either type make install or execute recollinstall prefix, in the root of
3574   the source tree. This will copy the commands to prefix/bin and the sample
3575   configuration files, scripts and other shared data to prefix/share/recoll.
3576
3577   If the installation prefix given to recollinstall is different from either
3578   the system default or the value which was specified when executing
3579   configure (as in configure --prefix /some/path), you will have to set the
3580   RECOLL_DATADIR environment variable to indicate where the shared data is
3581   to be found (ie for (ba)sh: export
3582   RECOLL_DATADIR=/some/path/share/recoll).
3583
3584   You can then proceed to configuration.
3585
35865.4. Configuration overview
3587
3588   Most of the parameters specific to the recoll GUI are set through the
3589   Preferences menu and stored in the standard Qt place
3590   ($HOME/.config/Recoll.org/recoll.conf). You probably do not want to edit
3591   this by hand.
3592
3593   Recoll indexing options are set inside text configuration files located in
3594   a configuration directory. There can be several such directories, each of
3595   which defines the parameters for one index.
3596
3597   The configuration files can be edited by hand or through the Index
3598   configuration dialog (Preferences menu). The GUI tool will try to respect
3599   your formatting and comments as much as possible, so it is quite possible
3600   to use both ways.
3601
3602   The most accurate documentation for the configuration parameters is given
3603   by comments inside the default files, and we will just give a general
3604   overview here.
3605
3606   By default, for each index, there are two sets of configuration files.
3607   System-wide configuration files are kept in a directory named like
3608   /usr/[local/]share/recoll/examples, and define default values, shared by
3609   all indexes. For each index, a parallel set of files defines the
3610   customized parameters.
3611
3612   In addition (as of Recoll version 1.19.7), it is possible to specify two
3613   additional configuration directories which will be stacked before and
3614   after the user configuration directory. These are defined by the
3615   RECOLL_CONFTOP and RECOLL_CONFMID environment variables. Values from
3616   configuration files inside the top directory will override user ones,
3617   values from configuration files inside the middle directory will override
3618   system ones and be overridden by user ones. These two variables may be of
3619   use to applications which augment Recoll functionality, and need to add
3620   configuration data without disturbing the user's files. Please note that
3621   the two, currently single, values will probably be interpreted as
3622   colon-separated lists in the future: do not use colon characters inside
3623   the directory paths.
3624
3625   The default location of the configuration is the .recoll directory in your
3626   home. Most people will only use this directory.
3627
3628   This location can be changed, or others can be added with the
3629   RECOLL_CONFDIR environment variable or the -c option parameter to recoll
3630   and recollindex.
3631
3632   If the .recoll directory does not exist when recoll or recollindex are
3633   started, it will be created with a set of empty configuration files.
3634   recoll will give you a chance to edit the configuration file before
3635   starting indexing. recollindex will proceed immediately. To avoid
3636   mistakes, the automatic directory creation will only occur for the default
3637   location, not if -c or RECOLL_CONFDIR were used (in the latter cases, you
3638   will have to create the directory).
3639
3640   All configuration files share the same format. For example, a short
3641   extract of the main configuration file might look as follows:
3642
3643         # Space-separated list of directories to index.
3644         topdirs =  ~/docs /usr/share/doc
3645
3646         [~/somedirectory-with-utf8-txt-files]
3647         defaultcharset = utf-8
3648
3649
3650   There are three kinds of lines:
3651
3652     o Comment (starts with #) or empty.
3653
3654     o Parameter affectation (name = value).
3655
3656     o Section definition ([somedirname]).
3657
3658   Depending on the type of configuration file, section definitions either
3659   separate groups of parameters or allow redefining some parameters for a
3660   directory sub-tree. They stay in effect until another section definition,
3661   or the end of file, is encountered. Some of the parameters used for
3662   indexing are looked up hierarchically from the current directory location
3663   upwards. Not all parameters can be meaningfully redefined, this is
3664   specified for each in the next section.
3665
3666   When found at the beginning of a file path, the tilde character (~) is
3667   expanded to the name of the user's home directory, as a shell would do.
3668
3669   White space is used for separation inside lists. List elements with
3670   embedded spaces can be quoted using double-quotes.
3671
3672   Encoding issues. Most of the configuration parameters are plain ASCII. Two
3673   particular sets of values may cause encoding issues:
3674
3675     o File path parameters may contain non-ascii characters and should use
3676       the exact same byte values as found in the file system directory.
3677       Usually, this means that the configuration file should use the system
3678       default locale encoding.
3679
3680     o The unac_except_trans parameter should be encoded in UTF-8. If your
3681       system locale is not UTF-8, and you need to also specify non-ascii
3682       file paths, this poses a difficulty because common text editors cannot
3683       handle multiple encodings in a single file. In this relatively
3684       unlikely case, you can edit the configuration file as two separate
3685       text files with appropriate encodings, and concatenate them to create
3686       the complete configuration.
3687
3688  5.4.1. Environment variables
3689
3690   RECOLL_CONFDIR
3691
3692           Defines the main configuration directory.
3693
3694   RECOLL_TMPDIR, TMPDIR
3695
3696           Locations for temporary files, in this order of priority. The
3697           default if none of these is set is to use /tmp. Big temporary
3698           files may be created during indexing, mostly for decompressing,
3699           and also for processing, e.g. email attachments.
3700
3701   RECOLL_CONFTOP, RECOLL_CONFMID
3702
3703           Allow adding configuration directories with priorities below and
3704           above the user directory (see above the Configuration overview
3705           section for details).
3706
3707   RECOLL_EXTRA_DBS, RECOLL_ACTIVE_EXTRA_DBS
3708
3709           Help for setting up external indexes. See this paragraph for
3710           explanations.
3711
3712   RECOLL_DATADIR
3713
3714           Defines replacement for the default location of Recoll data files,
3715           normally found in, e.g., /usr/share/recoll).
3716
3717   RECOLL_FILTERSDIR
3718
3719           Defines replacement for the default location of Recoll filters,
3720           normally found in, e.g., /usr/share/recoll/filters).
3721
3722   ASPELL_PROG
3723
3724           aspell program to use for creating the spelling dictionary. The
3725           result has to be compatible with the libaspell which Recoll is
3726           using.
3727
3728   VARNAME
3729
3730           Blabla
3731
3732  5.4.2. The main configuration file, recoll.conf
3733
3734   recoll.conf is the main configuration file. It defines things like what to
3735   index (top directories and things to ignore), and the default character
3736   set to use for document types which do not specify it internally.
3737
3738   The default configuration will index your home directory. If this is not
3739   appropriate, start recoll to create a blank configuration, click Cancel,
3740   and edit the configuration file before restarting the command. This will
3741   start the initial indexing, which may take some time.
3742
3743   Most of the following parameters can be changed from the Index
3744   Configuration menu in the recoll interface. Some can only be set by
3745   editing the configuration file.
3746
3747    5.4.2.1. Parameters affecting what documents we index:
3748
3749   topdirs
3750
3751           Specifies the list of directories or files to index (recursively
3752           for directories). You can use symbolic links as elements of this
3753           list. See the followLinks option about following symbolic links
3754           found under the top elements (not followed by default).
3755
3756   skippedNames
3757
3758           A space-separated list of wildcard patterns for names of files or
3759           directories that should be completely ignored. The list defined in
3760           the default file is:
3761
3762 skippedNames = #* bin CVS  Cache cache* caughtspam  tmp .thumbnails .svn \
3763                *~ .beagle .git .hg .bzr loop.ps .xsession-errors \
3764                .recoll* xapiandb recollrc recoll.conf
3765
3766           The list can be redefined at any sub-directory in the indexed
3767           area.
3768
3769           The top-level directories are not affected by this list (that is,
3770           a directory in topdirs might match and would still be indexed).
3771
3772           The list in the default configuration does not exclude hidden
3773           directories (names beginning with a dot), which means that it may
3774           index quite a few things that you do not want. On the other hand,
3775           email user agents like thunderbird usually store messages in
3776           hidden directories, and you probably want this indexed. One
3777           possible solution is to have .* in skippedNames, and add things
3778           like ~/.thunderbird or ~/.evolution in topdirs.
3779
3780           Not even the file names are indexed for patterns in this list. See
3781           the noContentSuffixes variable for an alternative approach which
3782           indexes the file names.
3783
3784   noContentSuffixes
3785
3786           This is a list of file name endings (not wildcard expressions, nor
3787           dot-delimited suffixes). Only the names of matching files will be
3788           indexed (no attempt at MIME type identification, no decompression,
3789           no content indexing). This can be redefined for subdirectories,
3790           and edited from the GUI. The default value is:
3791
3792 noContentSuffixes = .md5 .map \
3793        .o .lib .dll .a .sys .exe .com \
3794        .mpp .mpt .vsd \
3795            .img .img.gz .img.bz2 .img.xz .image .image.gz .image.bz2 .image.xz \
3796        .dat .bak .rdf .log.gz .log .db .msf .pid \
3797        ,v ~ #
3798
3799   skippedPaths and daemSkippedPaths
3800
3801           A space-separated list of patterns for paths of files or
3802           directories that should be skipped. There is no default in the
3803           sample configuration file, but the code always adds the
3804           configuration and database directories in there.
3805
3806           skippedPaths is used both by batch and real time indexing.
3807           daemSkippedPaths can be used to specify things that should be
3808           indexed at startup, but not monitored.
3809
3810           Example of use for skipping text files only in a specific
3811           directory:
3812
3813 skippedPaths = ~/somedir/*.txt
3814
3815
3816   skippedPathsFnmPathname
3817
3818           The values in the *skippedPaths variables are matched by default
3819           with fnmatch(3), with the FNM_PATHNAME flag. This means that '/'
3820           characters must be matched explicitly. You can set
3821           skippedPathsFnmPathname to 0 to disable the use of FNM_PATHNAME
3822           (meaning that /*/dir3 will match /dir1/dir2/dir3).
3823
3824   zipSkippedNames
3825
3826           A space-separated list of patterns for names of files or
3827           directories that should be ignored inside zip archives. This is
3828           used directly by the zip handler, and has a function similar to
3829           skippedNames, but works independently. Can be redefined for
3830           filesystem subdirectories. For versions up to 1.19, you will need
3831           to update the Zip handler and install a supplementary Python
3832           module. The details are described on the Recoll wiki.
3833
3834   followLinks
3835
3836           Specifies if the indexer should follow symbolic links while
3837           walking the file tree. The default is to ignore symbolic links to
3838           avoid multiple indexing of linked files. No effort is made to
3839           avoid duplication when this option is set to true. This option can
3840           be set individually for each of the topdirs members by using
3841           sections. It can not be changed below the topdirs level.
3842
3843   indexedmimetypes
3844
3845           Recoll normally indexes any file which it knows how to read. This
3846           list lets you restrict the indexed MIME types to what you specify.
3847           If the variable is unspecified or the list empty (the default),
3848           all supported types are processed. Can be redefined for
3849           subdirectories.
3850
3851   excludedmimetypes
3852
3853           This list lets you exclude some MIME types from indexing. Can be
3854           redefined for subdirectories.
3855
3856   compressedfilemaxkbs
3857
3858           Size limit for compressed (.gz or .bz2) files. These need to be
3859           decompressed in a temporary directory for identification, which
3860           can be very wasteful if 'uninteresting' big compressed files are
3861           present. Negative means no limit, 0 means no processing of any
3862           compressed file. Defaults to -1.
3863
3864   textfilemaxmbs
3865
3866           Maximum size for text files. Very big text files are often
3867           uninteresting logs. Set to -1 to disable (default 20MB).
3868
3869   textfilepagekbs
3870
3871           If set to other than -1, text files will be indexed as multiple
3872           documents of the given page size. This may be useful if you do
3873           want to index very big text files as it will both reduce memory
3874           usage at index time and help with loading data to the preview
3875           window. A size of a few megabytes would seem reasonable (default:
3876           1MB).
3877
3878   membermaxkbs
3879
3880           This defines the maximum size in kilobytes for an archive member
3881           (zip, tar or rar at the moment). Bigger entries will be skipped.
3882
3883   indexallfilenames
3884
3885           Recoll indexes file names in a special section of the database to
3886           allow specific file names searches using wild cards. This
3887           parameter decides if file name indexing is performed only for
3888           files with MIME types that would qualify them for full text
3889           indexing, or for all files inside the selected subtrees,
3890           independently of MIME type.
3891
3892   usesystemfilecommand
3893
3894           Decide if we execute a system command (file -i by default) as a
3895           final step for determining the MIME type for a file (the main
3896           procedure uses suffix associations as defined in the mimemap
3897           file). This can be useful for files with suffix-less names, but it
3898           will also cause the indexing of many bogus "text" files.
3899
3900   systemfilecommand
3901
3902           Command to use for mime for mime type determination if
3903           usesystefilecommand is set. Recent versions of xdg-mime sometimes
3904           work better than file.
3905
3906   processwebqueue
3907
3908           If this is set, process the directory where Web browser plugins
3909           copy visited pages for indexing.
3910
3911   webqueuedir
3912
3913           The path to the web indexing queue. This is hard-coded in the
3914           Firefox plugin as ~/.recollweb/ToIndex so there should be no need
3915           to change it.
3916
3917    5.4.2.2. Parameters affecting how we generate terms:
3918
3919   Changing some of these parameters will imply a full reindex. Also, when
3920   using multiple indexes, it may not make sense to search indexes that don't
3921   share the values for these parameters, because they usually affect both
3922   search and index operations.
3923
3924   indexStripChars
3925
3926           Decide if we strip characters of diacritics and convert them to
3927           lower-case before terms are indexed. If we don't, searches
3928           sensitive to case and diacritics can be performed, but the index
3929           will be bigger, and some marginal weirdness may sometimes occur.
3930           The default is a stripped index (indexStripChars = 1) for now.
3931           When using multiple indexes for a search, this parameter must be
3932           defined identically for all. Changing the value implies an index
3933           reset.
3934
3935   maxTermExpand
3936
3937           Maximum expansion count for a single term (e.g.: when using
3938           wildcards). The default of 10000 is reasonable and will avoid
3939           queries that appear frozen while the engine is walking the term
3940           list.
3941
3942   maxXapianClauses
3943
3944           Maximum number of elementary clauses we can add to a single Xapian
3945           query. In some cases, the result of term expansion can be
3946           multiplicative, and we want to avoid using excessive memory. The
3947           default of 100 000 should be both high enough in most cases and
3948           compatible with current typical hardware configurations.
3949
3950   nonumbers
3951
3952           If this set to true, no terms will be generated for numbers. For
3953           example "123", "1.5e6", 192.168.1.4, would not be indexed
3954           ("value123" would still be). Numbers are often quite interesting
3955           to search for, and this should probably not be set except for
3956           special situations, ie, scientific documents with huge amounts of
3957           numbers in them. This can only be set for a whole index, not for a
3958           subtree.
3959
3960   nocjk
3961
3962           If this set to true, specific east asian (Chinese Korean Japanese)
3963           characters/word splitting is turned off. This will save a small
3964           amount of cpu if you have no CJK documents. If your document base
3965           does include such text but you are not interested in searching it,
3966           setting nocjk may be a significant time and space saver.
3967
3968   cjkngramlen
3969
3970           This lets you adjust the size of n-grams used for indexing CJK
3971           text. The default value of 2 is probably appropriate in most
3972           cases. A value of 3 would allow more precision and efficiency on
3973           longer words, but the index will be approximately twice as large.
3974
3975   indexstemminglanguages
3976
3977           A list of languages for which the stem expansion databases will be
3978           built. See recollindex(1) or use the recollindex -l command for
3979           possible values. You can add a stem expansion database for a
3980           different language by using recollindex -s, but it will be deleted
3981           during the next indexing. Only languages listed in the
3982           configuration file are permanent.
3983
3984   defaultcharset
3985
3986           The name of the character set used for files that do not contain a
3987           character set definition (ie: plain text files). This can be
3988           redefined for any sub-directory. If it is not set at all, the
3989           character set used is the one defined by the nls environment (
3990           LC_ALL, LC_CTYPE, LANG), or iso8859-1 if nothing is set.
3991
3992   unac_except_trans
3993
3994           This is a list of characters, encoded in UTF-8, which should be
3995           handled specially when converting text to unaccented lowercase.
3996           For example, in Swedish, the letter a with diaeresis has full
3997           alphabet citizenship and should not be turned into an a. Each
3998           element in the space-separated list has the special character as
3999           first element and the translation following. The handling of both
4000           the lowercase and upper-case versions of a character should be
4001           specified, as appartenance to the list will turn-off both standard
4002           accent and case processing. Example for Swedish:
4003
4004 unac_except_trans =  aaaa AAaa a:a: A:a: o:o: O:o:
4005
4006
4007           Note that the translation is not limited to a single character,
4008           you could very well have something like u:ue in the list.
4009
4010           The default value set for unac_except_trans can't be listed here
4011           because I have trouble with SGML and UTF-8, but it only contains
4012           ligature decompositions: german ss, oe, ae, fi, fl.
4013
4014           This parameter can't be defined for subdirectories, it is global,
4015           because there is no way to do otherwise when querying. If you have
4016           document sets which would need different values, you will have to
4017           index and query them separately.
4018
4019   maildefcharset
4020
4021           This can be used to define the default character set specifically
4022           for email messages which don't specify it. This is mainly useful
4023           for readpst (libpst) dumps, which are utf-8 but do not say so.
4024
4025   localfields
4026
4027           This allows setting fields for all documents under a given
4028           directory. Typical usage would be to set an "rclaptg" field, to be
4029           used in mimeview to select a specific viewer. If several fields
4030           are to be set, they should be separated with a semi-colon (';')
4031           character, which there is currently no way to escape. Also note
4032           the initial semi-colon. Example: localfields= ;rclaptg=gnus;other
4033           = val, then select specifier viewer with mimetype|tag=... in
4034           mimeview.
4035
4036   testmodifusemtime
4037
4038           If true, use mtime instead of default ctime to determine if a file
4039           has been modified (in addition to size, which is always used).
4040           Setting this can reduce re-indexing on systems where extended
4041           attributes are modified (by some other application), but not
4042           indexed (changing extended attributes only affects ctime). Notes:
4043
4044              o This may prevent detection of change in some marginal file
4045                rename cases (the target would need to have the same size and
4046                mtime).
4047
4048              o You should probably also set noxattrfields to 1 in this case,
4049                except if you still prefer to perform xattr indexing, for
4050                example if the local file update pattern makes it of value
4051                (as in general, there is a risk for pure extended attributes
4052                updates without file modification to go undetected).
4053
4054           Perform a full index reset after changing the value of this
4055           parameter.
4056
4057   noxattrfields
4058
4059           Recoll versions 1.19 and later automatically translate file
4060           extended attributes into document fields (to be processed
4061           according to the parameters from the fields file). Setting this
4062           variable to 1 will disable the behaviour.
4063
4064   metadatacmds
4065
4066           This allows executing external commands for each file and storing
4067           the output in Recoll document fields. This could be used for
4068           example to index external tag data. The value is a list of field
4069           names and commands, don't forget an initial semi-colon. Example:
4070
4071 [/some/area/of/the/fs]
4072 metadatacmds = ; tags = tmsu tags %f; otherfield = somecmd -xx %f
4073
4074
4075           As a specially disgusting hack brought by Recoll 1.19.7, if a
4076           "field name" begins with rclmulti, the data returned by the
4077           command is expected to contain multiple field values, in
4078           configuration file format. This allows setting several fields by
4079           executing a single command. Example:
4080
4081 metadatacmds = ; rclmulti1 = somecmd %f
4082
4083
4084           If somecmd returns data in the form of:
4085
4086 field1 = value1
4087 field2 = value for field2
4088
4089
4090           field1 and field2 will be set inside the document metadata.
4091
4092    5.4.2.3. Parameters affecting where and how we store things:
4093
4094   dbdir
4095
4096           The name of the Xapian data directory. It will be created if
4097           needed when the index is initialized. If this is not an absolute
4098           path, it will be interpreted relative to the configuration
4099           directory. The value can have embedded spaces but starting or
4100           trailing spaces will be trimmed. You cannot use quotes here.
4101
4102   idxstatusfile
4103
4104           The name of the scratch file where the indexer process updates its
4105           status. Default: idxstatus.txt inside the configuration directory.
4106
4107   maxfsoccuppc
4108
4109           Maximum file system occupation before we stop indexing. The value
4110           is a percentage, corresponding to what the "Capacity" df output
4111           column shows. The default value is 0, meaning no checking.
4112
4113   mboxcachedir
4114
4115           The directory where mbox message offsets cache files are held.
4116           This is normally $RECOLL_CONFDIR/mboxcache, but it may be useful
4117           to share a directory between different configurations.
4118
4119   mboxcacheminmbs
4120
4121           The minimum mbox file size over which we cache the offsets. There
4122           is really no sense in caching offsets for small files. The default
4123           is 5 MB.
4124
4125   webcachedir
4126
4127           This is only used by the web browser plugin indexing code, and
4128           defines where the cache for visited pages will live. Default:
4129           $RECOLL_CONFDIR/webcache
4130
4131   webcachemaxmbs
4132
4133           This is only used by the web browser plugin indexing code, and
4134           defines the maximum size for the web page cache. Default: 40 MB.
4135           Quite unfortunately, this is only taken into account when creating
4136           the cache file. You need to delete the file for a change to be
4137           taken into account.
4138
4139   idxflushmb
4140
4141           Threshold (megabytes of new text data) where we flush from memory
4142           to disk index. Setting this can help control memory usage. A value
4143           of 0 means no explicit flushing, letting Xapian use its own
4144           default, which is flushing every 10000 (or XAPIAN_FLUSH_THRESHOLD)
4145           documents, which gives little memory usage control, as memory
4146           usage also depends on average document size. The default value is
4147           10, and it is probably a bit low. If your system usually has free
4148           memory, you can try higher values between 20 and 80. In my
4149           experience, values beyond 100 are always counterproductive.
4150
4151    5.4.2.4. Parameters affecting multithread processing
4152
4153   The Recoll indexing process recollindex can use multiple threads to speed
4154   up indexing on multiprocessor systems. The work done to index files is
4155   divided in several stages and some of the stages can be executed by
4156   multiple threads. The stages are:
4157
4158    1. File system walking: this is always performed by the main thread.
4159    2. File conversion and data extraction.
4160    3. Text processing (splitting, stemming, etc.)
4161    4. Xapian index update.
4162
4163   You can also read a longer document about the transformation of Recoll
4164   indexing to multithreading.
4165
4166   The threads configuration is controlled by two configuration file
4167   parameters.
4168
4169   thrQSizes
4170
4171           This variable defines the job input queues configuration. There
4172           are three possible queues for stages 2, 3 and 4, and this
4173           parameter should give the queue depth for each stage (three
4174           integer values). If a value of -1 is used for a given stage, no
4175           queue is used, and the thread will go on performing the next
4176           stage. In practise, deep queues have not been shown to increase
4177           performance. A value of 0 for the first queue tells Recoll to
4178           perform autoconfiguration (no need for the two other values in
4179           this case) - this is the default configuration.
4180
4181   thrTCounts
4182
4183           This defines the number of threads used for each stage. If a value
4184           of -1 is used for one of the queue depths, the corresponding
4185           thread count is ignored. It makes no sense to use a value other
4186           than 1 for the last stage because updating the Xapian index is
4187           necessarily single-threaded (and protected by a mutex).
4188
4189   The following example would use three queues (of depth 2), and 4 threads
4190   for converting source documents, 2 for processing their text, and one to
4191   update the index. This was tested to be the best configuration on the test
4192   system (quadri-processor with multiple disks).
4193
4194 thrQSizes = 2 2 2
4195 thrTCounts =  4 2 1
4196
4197   The following example would use a single queue, and the complete
4198   processing for each document would be performed by a single thread
4199   (several documents will still be processed in parallel in most cases). The
4200   threads will use mutual exclusion when entering the index update stage. In
4201   practise the performance would be close to the precedent case in general,
4202   but worse in certain cases (e.g. a Zip archive would be performed purely
4203   sequentially), so the previous approach is preferred. YMMV... The 2 last
4204   values for thrTCounts are ignored.
4205
4206 thrQSizes = 2 -1 -1
4207 thrTCounts =  6 1 1
4208
4209   The following example would disable multithreading. Indexing will be
4210   performed by a single thread.
4211
4212 thrQSizes = -1 -1 -1
4213
4214    5.4.2.5. Miscellaneous parameters:
4215
4216   autodiacsens
4217
4218           IF the index is not stripped, decide if we automatically trigger
4219           diacritics sensitivity if the search term has accented characters
4220           (not in unac_except_trans). Else you need to use the query
4221           language and the D modifier to specify diacritics sensitivity.
4222           Default is no.
4223
4224   autocasesens
4225
4226           IF the index is not stripped, decide if we automatically trigger
4227           character case sensitivity if the search term has upper-case
4228           characters in any but the first position. Else you need to use the
4229           query language and the C modifier to specify character-case
4230           sensitivity. Default is yes.
4231
4232   loglevel,daemloglevel
4233
4234           Verbosity level for recoll and recollindex. A value of 4 lists
4235           quite a lot of debug/information messages. 2 only lists errors.
4236           The daemversion is specific to the indexing monitor daemon.
4237
4238   logfilename, daemlogfilename
4239
4240           Where the messages should go. 'stderr' can be used as a special
4241           value, and is the default. The daemversion is specific to the
4242           indexing monitor daemon.
4243
4244   checkneedretryindexscript
4245
4246           This defines the name for a command executed by recollindex when
4247           starting indexing. If the exit status of the command is 0,
4248           recollindex retries to index all files which previously could not
4249           be indexed because of data extraction errors. The default value is
4250           a script which checks if any of the common bin directories have
4251           changed (indicating that a helper program may have been
4252           installed).
4253
4254   mondelaypatterns
4255
4256           This allows specify wildcard path patterns (processed with
4257           fnmatch(3) with 0 flag), to match files which change too often and
4258           for which a delay should be observed before re-indexing. This is a
4259           space-separated list, each entry being a pattern and a time in
4260           seconds, separated by a colon. You can use double quotes if a path
4261           entry contains white space. Example:
4262
4263 mondelaypatterns = *.log:20 "this one has spaces*:10"
4264
4265
4266   monixinterval
4267
4268           Minimum interval (seconds) for processing the indexing queue. The
4269           real time monitor does not process each event when it comes in,
4270           but will wait this time for the queue to accumulate to diminish
4271           overhead and in order to aggregate multiple events to the same
4272           file. Default 30 S.
4273
4274   monauxinterval
4275
4276           Period (in seconds) at which the real time monitor will regenerate
4277           the auxiliary databases (spelling, stemming) if needed. The
4278           default is one hour.
4279
4280   monioniceclass, monioniceclassdata
4281
4282           These allow defining the ionice class and data used by the indexer
4283           (default class 3, no data).
4284
4285   filtermaxseconds
4286
4287           Maximum handler execution time, after which it is aborted. Some
4288           postscript programs just loop...
4289
4290   filtermaxmbytes
4291
4292           Recoll 1.20.7 and later. Maximum handler memory utilisation. This
4293           uses setrlimit(RLIMIT_AS) on most systems (total virtual memory
4294           space size limit). Some programs may start with 500 MBytes of
4295           mapped shared libraries, so take this into account when choosing a
4296           value. The default is a liberal 2000MB.
4297
4298   filtersdir
4299
4300           A directory to search for the external input handler scripts used
4301           to index some types of files. The value should not be changed,
4302           except if you want to modify one of the default scripts. The value
4303           can be redefined for any sub-directory.
4304
4305   iconsdir
4306
4307           The name of the directory where recoll result list icons are
4308           stored. You can change this if you want different images.
4309
4310   idxabsmlen
4311
4312           Recoll stores an abstract for each indexed file inside the
4313           database. The text can come from an actual 'abstract' section in
4314           the document or will just be the beginning of the document. It is
4315           stored in the index so that it can be displayed inside the result
4316           lists without decoding the original file. The idxabsmlen parameter
4317           defines the size of the stored abstract. The default value is 250
4318           bytes. The search interface gives you the choice to display this
4319           stored text or a synthetic abstract built by extracting text
4320           around the search terms. If you always prefer the synthetic
4321           abstract, you can reduce this value and save a little space.
4322
4323   idxmetastoredlen
4324
4325           Maximum stored length for metadata fields. This does not affect
4326           indexing (the whole field is processed anyway), just the amount of
4327           data stored in the index for the purpose of displaying fields
4328           inside result lists or previews. The default value is 150 bytes
4329           which may be too low if you have custom fields.
4330
4331   aspellLanguage
4332
4333           Language definitions to use when creating the aspell dictionary.
4334           The value must match a set of aspell language definition files.
4335           You can type "aspell config" to see where these are installed
4336           (look for data-dir). The default if the variable is not set is to
4337           use your desktop national language environment to guess the value.
4338
4339   noaspell
4340
4341           If this is set, the aspell dictionary generation is turned off.
4342           Useful for cases where you don't need the functionality or when it
4343           is unusable because aspell crashes during dictionary generation.
4344
4345   mhmboxquirks
4346
4347           This allows defining location-related quirks for the mailbox
4348           handler. Currently only the tbird flag is defined, and it should
4349           be set for directories which hold Thunderbird data, as their
4350           folder format is weird.
4351
4352  5.4.3. The fields file
4353
4354   This file contains information about dynamic fields handling in Recoll.
4355   Some very basic fields have hard-wired behaviour, and, mostly, you should
4356   not change the original data inside the fields file. But you can create
4357   custom fields fitting your data and handle them just like they were native
4358   ones.
4359
4360   The fields file has several sections, which each define an aspect of
4361   fields processing. Quite often, you'll have to modify several sections to
4362   obtain the desired behaviour.
4363
4364   We will only give a short description here, you should refer to the
4365   comments inside the default file for more detailed information.
4366
4367   Field names should be lowercase alphabetic ASCII.
4368
4369   [prefixes]
4370
4371           A field becomes indexed (searchable) by having a prefix defined in
4372           this section.
4373
4374   [stored]
4375
4376           A field becomes stored (displayable inside results) by having its
4377           name listed in this section (typically with an empty value).
4378
4379   [aliases]
4380
4381           This section defines lists of synonyms for the canonical names
4382           used inside the [prefixes] and [stored] sections
4383
4384   [queryaliases]
4385
4386           This section also defines aliases for the canonic field names,
4387           with the difference that the substitution will only be used at
4388           query time, avoiding any possibility that the value would pick-up
4389           random metadata from documents.
4390
4391   handler-specific sections
4392
4393           Some input handlers may need specific configuration for handling
4394           fields. Only the email message handler currently has such a
4395           section (named [mail]). It allows indexing arbitrary email headers
4396           in addition to the ones indexed by default. Other such sections
4397           may appear in the future.
4398
4399   Here follows a small example of a personal fields file. This would extract
4400   a specific email header and use it as a searchable field, with data
4401   displayable inside result lists. (Side note: as the email handler does no
4402   decoding on the values, only plain ascii headers can be indexed, and only
4403   the first occurrence will be used for headers that occur several times).
4404
4405 [prefixes]
4406 # Index mailmytag contents (with the given prefix)
4407 mailmytag = XMTAG
4408
4409 [stored]
4410 # Store mailmytag inside the document data record (so that it can be
4411 # displayed - as %(mailmytag) - in result lists).
4412 mailmytag =
4413
4414 [queryaliases]
4415 filename = fn
4416 containerfilename = cfn
4417
4418 [mail]
4419 # Extract the X-My-Tag mail header, and use it internally with the
4420 # mailmytag field name
4421 x-my-tag = mailmytag
4422
4423    5.4.3.1. Extended attributes in the fields file
4424
4425   Recoll versions 1.19 and later process user extended file attributes as
4426   documents fields by default.
4427
4428   Attributes are processed as fields of the same name, after removing the
4429   user prefix on Linux.
4430
4431   The [xattrtofields] section of the fields file allows specifying
4432   translations from extended attributes names to Recoll field names. An
4433   empty translation disables use of the corresponding attribute data.
4434
4435  5.4.4. The mimemap file
4436
4437   mimemap specifies the file name extension to MIME type mappings.
4438
4439   For file names without an extension, or with an unknown one, the system's
4440   file -i command will be executed to determine the MIME type (this can be
4441   switched off inside the main configuration file).
4442
4443   The mappings can be specified on a per-subtree basis, which may be useful
4444   in some cases. Example: gaim logs have a .txt extension but should be
4445   handled specially, which is possible because they are usually all located
4446   in one place.
4447
4448   The recoll_noindex mimemap variable has been moved to recoll.conf and
4449   renamed to noContentSuffixes, while keeping the same function, as of
4450   Recoll version 1.21. For older Recoll versions, see the documentation for
4451   noContentSuffixes but use recoll_noindex in mimemap.
4452
4453  5.4.5. The mimeconf file
4454
4455   mimeconf specifies how the different MIME types are handled for indexing,
4456   and which icons are displayed in the recoll result lists.
4457
4458   Changing the parameters in the [index] section is probably not a good idea
4459   except if you are a Recoll developer.
4460
4461   The [icons] section allows you to change the icons which are displayed by
4462   recoll in the result lists (the values are the basenames of the png images
4463   inside the iconsdir directory (specified in recoll.conf).
4464
4465  5.4.6. The mimeview file
4466
4467   mimeview specifies which programs are started when you click on an Open
4468   link in a result list. Ie: HTML is normally displayed using firefox, but
4469   you may prefer Konqueror, your openoffice.org program might be named
4470   oofice instead of openoffice etc.
4471
4472   Changes to this file can be done by direct editing, or through the recoll
4473   GUI preferences dialog.
4474
4475   If Use desktop preferences to choose document editor is checked in the
4476   Recoll GUI preferences, all mimeview entries will be ignored except the
4477   one labelled application/x-all (which is set to use xdg-open by default).
4478
4479   In this case, the xallexcepts top level variable defines a list of MIME
4480   type exceptions which will be processed according to the local entries
4481   instead of being passed to the desktop. This is so that specific Recoll
4482   options such as a page number or a search string can be passed to
4483   applications that support them, such as the evince viewer.
4484
4485   As for the other configuration files, the normal usage is to have a
4486   mimeview inside your own configuration directory, with just the
4487   non-default entries, which will override those from the central
4488   configuration file.
4489
4490   All viewer definition entries must be placed under a [view] section.
4491
4492   The keys in the file are normally MIME types. You can add an application
4493   tag to specialize the choice for an area of the filesystem (using a
4494   localfields specification in mimeconf). The syntax for the key is
4495   mimetype|tag
4496
4497   The nouncompforviewmts entry, (placed at the top level, outside of the
4498   [view] section), holds a list of MIME types that should not be
4499   uncompressed before starting the viewer (if they are found compressed, ie:
4500   mydoc.doc.gz).
4501
4502   The right side of each assignment holds a command to be executed for
4503   opening the file. The following substitutions are performed:
4504
4505     o %D. Document date
4506
4507     o %f. File name. This may be the name of a temporary file if it was
4508       necessary to create one (ie: to extract a subdocument from a
4509       container).
4510
4511     o %i. Internal path, for subdocuments of containers. The format depends
4512       on the container type. If this appears in the command line, Recoll
4513       will not create a temporary file to extract the subdocument, expecting
4514       the called application (possibly a script) to be able to handle it.
4515
4516     o %M. MIME type
4517
4518     o %p. Page index. Only significant for a subset of document types,
4519       currently only PDF, Postscript and DVI files. Can be used to start the
4520       editor at the right page for a match or snippet.
4521
4522     o %s. Search term. The value will only be set for documents with indexed
4523       page numbers (ie: PDF). The value will be one of the matched search
4524       terms. It would allow pre-setting the value in the "Find" entry inside
4525       Evince for example, for easy highlighting of the term.
4526
4527     o %u. Url.
4528
4529   In addition to the predefined values above, all strings like %(fieldname)
4530   will be replaced by the value of the field named fieldname for the
4531   document. This could be used in combination with field customisation to
4532   help with opening the document.
4533
4534  5.4.7. The ptrans file
4535
4536   ptrans specifies query-time path translations. These can be useful in
4537   multiple cases.
4538
4539   The file has a section for any index which needs translations, either the
4540   main one or additional query indexes. The sections are named with the
4541   Xapian index directory names. No slash character should exist at the end
4542   of the paths (all comparisons are textual). An example should make things
4543   sufficiently clear
4544
4545           [/home/me/.recoll/xapiandb]
4546           /this/directory/moved = /to/this/place
4547
4548           [/path/to/additional/xapiandb]
4549           /server/volume1/docdir = /net/server/volume1/docdir
4550           /server/volume2/docdir = /net/server/volume2/docdir
4551
4552
4553  5.4.8. Examples of configuration adjustments
4554
4555    5.4.8.1. Adding an external viewer for an non-indexed type
4556
4557   Imagine that you have some kind of file which does not have indexable
4558   content, but for which you would like to have a functional Open link in
4559   the result list (when found by file name). The file names end in .blob and
4560   can be displayed by application blobviewer.
4561
4562   You need two entries in the configuration files for this to work:
4563
4564     o In $RECOLL_CONFDIR/mimemap (typically ~/.recoll/mimemap), add the
4565       following line:
4566
4567 .blob = application/x-blobapp
4568
4569       Note that the MIME type is made up here, and you could call it
4570       diesel/oil just the same.
4571
4572     o In $RECOLL_CONFDIR/mimeview under the [view] section, add:
4573
4574 application/x-blobapp = blobviewer %f
4575
4576       We are supposing that blobviewer wants a file name parameter here, you
4577       would use %u if it liked URLs better.
4578
4579   If you just wanted to change the application used by Recoll to display a
4580   MIME type which it already knows, you would just need to edit mimeview.
4581   The entries you add in your personal file override those in the central
4582   configuration, which you do not need to alter. mimeview can also be
4583   modified from the Gui.
4584
4585    5.4.8.2. Adding indexing support for a new file type
4586
4587   Let us now imagine that the above .blob files actually contain indexable
4588   text and that you know how to extract it with a command line program.
4589   Getting Recoll to index the files is easy. You need to perform the above
4590   alteration, and also to add data to the mimeconf file (typically in
4591   ~/.recoll/mimeconf):
4592
4593     o Under the [index] section, add the following line (more about the
4594       rclblob indexing script later):
4595
4596 application/x-blobapp = exec rclblob
4597
4598     o Under the [icons] section, you should choose an icon to be displayed
4599       for the files inside the result lists. Icons are normally 64x64 pixels
4600       PNG files which live in /usr/[local/]share/recoll/images.
4601
4602     o Under the [categories] section, you should add the MIME type where it
4603       makes sense (you can also create a category). Categories may be used
4604       for filtering in advanced search.
4605
4606   The rclblob handler should be an executable program or script which exists
4607   inside /usr/[local/]share/recoll/filters. It will be given a file name as
4608   argument and should output the text or html contents on the standard
4609   output.
4610
4611   The filter programming section describes in more detail how to write an
4612   input handler.
4613