1.. cyrusman:: squatter(8)
2
3.. author: Nic Bernstein (Onlight)
4
5.. _imap-reference-manpages-systemcommands-squatter:
6
7============
8**squatter**
9============
10
11Create SQUAT and Xapian indexes for mailboxes
12
13Synopsis
14========
15
16.. parsed-literal::
17
18    general:
19    **squatter** [ **-C** *config-file* ] [**mode**] [**options**] [**source**]
20
21    i.e.:
22    **squatter** [ **-C** *config-file* ] [ **-v** ] [ **-a** ] [ **-S** *seconds* ] [ **-Z** ]
23    **squatter** [ **-C** *config-file* ] [ **-v** ] [ **-a** ] [ **-i** ] [ **-N** *name* ] [ **-S** *seconds* ] [ **-r** ] [ **-Z** ] *mailbox*...
24    **squatter** [ **-C** *config-file* ] [ **-v** ] [ **-a** ] [ **-i** ] [ **-N** *name* ] [ **-S** *seconds* ] [ **-r** ] [ **-Z** ] **-u** *user*...
25    **squatter** [ **-C** *config-file* ] [ **-v** ] [ **-a** ] **-R** [ **-n** *channel* ] [ **-d** ] [ **-S** *seconds* ] [ **-Z** ]
26    **squatter** [ **-C** *config-file* ] [ **-v** ] [ **-a** ] **-f** *synclogfile* [ **-S** *seconds* ] [ **-Z** ]
27    **squatter** [ **-C** *config-file* ] [ **-v** ] **-t** *srctier(s)*... **-z** *desttier* [ **-B** ] [ **-F** ] [ **-U** ] [ **-T** *reindextiers* ] [ **-X** ] [ **-o** ] [ **-S** *seconds* ] [ **-u** *user*... ]
28
29
30
31Description
32===========
33
34.. Note::
35    The name "**squatter**" once referred both to the SQUAT indexing
36    engine and to the command used to create indexes.  Now that Cyrus
37    supports more than one index type -- SQUAT and Xapian, as of this
38    writing -- the name "**squatter**" refers to the command used to
39    control index creation.  The terms "SQUAT" or "SQUAT index(es)"
40    refers to the indexes used by the older SQUAT indexing engine.
41    Post v3 the *search_engine* setting in *imapd.conf* determines
42    which search engine is used.
43
44**squatter** creates a new text index for one or more IMAP mailboxes.
45The index is a unified index of all of the header and body text
46of each message in a given mailbox.  This index is used to significantly
47reduce IMAP SEARCH times on a mailbox.
48
49**mode** is one of indexer, search, rolling, synclog, compact or audit.
50
51By default, **squatter** creates an index of ALL messages in the
52mailbox, not just those since the last time that it was run.  The
53**-i** option is used to select incremental updates.  Any messages
54appended to the mailbox after **squatter** is run, will NOT be included
55in the index.  To include new messages in the index, **squatter** must
56be run again, or on a regular basis via crontab, an entry in the EVENTS
57section of :cyrusman:`cyrus.conf(5)` or use *rolling* mode (**-R**).
58
59In the first synopsis, **squatter** indexes all mailboxes.
60
61In the second synopsis, **squatter** indexes the specified mailbox(es).
62The mailboxes are space-separated.
63
64In the third synopsis, **squatter** indexes the specified user(s)
65mailbox(es).
66
67For the latter two index modes (mailbox, user) one
68may optionally specify **-r** to recurse from the specified start, or
69**-a** to limit action only to mailboxes which have the shared
70*/vendor/cmu/cyrus-imapd/squat* annotation set to "true".
71
72In the fourth synopsis, **squatter** runs in rolling mode.  In this
73mode **squatter** backgrounds itself and runs as a daemon (unless
74**-d** is set), listening to a sync log channel chosen using the **-n**
75option, and set up using the *sync_log_channels* setting in
76:cyrusman:`imapd.conf(5)`.  Very soon after messages are delivered or
77uploaded to mailboxes **squatter** will incrementally index the
78affected mailbox (see notes, below).
79
80In the fifth synopsis, **squatter** reads a single sync log file and
81performs incremental indexing on the mailbox(es) listed therein.  This
82is sometimes useful for cleaning up after problems with rolling mode.
83
84In the sixth synopsis, **squatter** will compact indices from
85*srctier(s)* to *desttier*, optionally reindexing (**-X**) or filtering
86expunged records (**-F**) in the process.  The optional **-T** flag may be
87used to specify members of srctiers which must be reindexed.  These files are
88eventually copied with `rsync -a` and then removed by `rm`.
89`rsync` can increase the load average of the system, especially when the
90temporary directory is on `tmpfs`.  To throttle `rsync` it is possible to
91modify the call in `imap/search_xapian.c` and pass `-\\-bwlimit=<number>` as further
92parameter.  The **-o** flag may be used to direct that a single index be
93copied, rather than compacted, from *srctier* to *desttier*.  The **-u** flag
94may be used to restrict operation to the specified user(s).
95
96For all modes, the **-S** option may be specified, causing **squatter** to
97pause *seconds* seconds after each mailbox, to smooth loads.
98
99When using the Xapian engine the **-Z** option may be specified, for
100the indexing modes.  This tells **squatter** to consult the Xapian
101internally indexed GUIDs, rather than relying on what's stored in
102`cyrus.indexed.db`, allowing for recovery from broken
103`cyrus.indexed.db` at the sacrifice of efficiency.
104
105.. Note::
106    Incremental updates are very inefficient with the SQUAT search
107    engine.  If using SQUAT for large and active mailboxes, you should
108    run **squatter** periodically as an EVENT in ``cyrus.conf(5)``.
109
110.. Note::
111    Messages and mailboxes that have not been indexed CAN still be
112    SEARCHed, just not as quickly as those with an index.
113
114**squatter** |default-conf-text|
115
116Options
117=======
118
119.. program:: squatter
120
121.. option:: -C config-file
122
123    |cli-dash-c-text|
124
125.. option:: -a, --squat-annot
126
127    Only create indexes for mailboxes which have the shared
128    */vendor/cmu/cyrus-imapd/squat* annotation set to "true".
129
130    The value of the */vendor/cmu/cyrus-imapd/squat* annotation is
131    inherited by all children of the given mailbox, so an entire
132    mailbox tree can be indexed (or not indexed) by setting a single
133    annotation on the root of that tree with a value of "true" (or
134    "false").  If a mailbox does not have a
135    */vendor/cmu/cyrus-imapd/squat* annotation set on it (or does not
136    inherit one), then the mailbox is not indexed. In other words, the
137    implicit value of */vendor/cmu/cyrus-imapd/squat* is "false".
138
139.. option:: -A, --audit
140
141    Audits the specified mailboxes (or all), reports any unindexed messages.
142    |master-new-feature|
143
144.. option:: -d, --nodaemon
145
146    In rolling mode, don't background and do emit log messages on
147    standard error.  Useful for debugging.
148    |v3-new-feature|
149
150.. option:: -B, --skip-locked
151
152    In compact mode, use non-blocking lock to start and skip any
153    users who have their xapianactive file locked at the time (i.e
154    another reindex task)
155    |master-new-feature|
156
157.. option:: -F, --filter
158
159    In compact mode, filter the resulting database to only include
160    messages which are not expunged in mailboxes with existing
161    name/uidvalidity.
162    |v3-new-feature|
163
164.. option:: -f synclogfile, --synclog=synclogfile
165
166    Read the *synclogfile* and incrementally index all the mailboxes
167    listed therein, then exit.
168    |v3-new-feature|
169
170.. option:: -h, --help
171
172    Display this usage information.
173
174.. option:: -i, --incremental
175
176    Incremental updates where indexes already exist.
177
178.. option:: -N name, --name=name
179
180    Only index mailboxes beginning with *name* while iterating through
181    the mailbox list derived from other options.
182
183.. option:: -n channel, --channel=channel
184
185    In rolling mode, specify the name of the sync log *channel* that
186    **squatter** will listen to.  The default is "squatter".  This
187    channel **must** be defined in :cyrusman:`imapd.conf(5)` before
188    being used.
189    |v3-new-feature|
190
191.. option:: -o, --copydb
192
193    In compact mode, if only one source database is selected, just copy
194    it to the destination rather than compacting.
195    |v3-new-feature|
196
197.. option:: -p, --allow-partials
198
199    When indexing, allow messages to be partially indexed. This may
200    occur if attachment indexing is enabled but indexing failed for
201    one or more attachment body parts. If this flag is set, the
202    message is partially indexed and squatter continues. Otherwise
203    squatter aborts with an error. Also see **-P**.
204    Xapian only.
205    |master-new-feature|
206
207 .. option:: -P, --reindex-partials
208
209    When reindexing, then attempt to reindex any partially indexed
210    messages (see **-p**). Setting this flag implies **-Z**.
211    Xapian only.
212    |master-new-feature|
213
214 .. option:: -L, --reindex-minlevel=level
215
216    When reindexing, index all messages that have an index level
217    less than level. Currently, Cyrus only supports two index levels:
218    A message for which attachment indexing was never attempted has
219    index level 1. A message that has indexed attachments, or does not
220    contain attachments, has index level 3. Consequently, running
221    squatter with minlevel set to 3 will cause it to attempt reindexing
222    all messages, for which attachment indexing never was attempted.
223    Future Cyrus versions may introduce additional levels. Setting
224    this flag implies **-Z**.
225    Xapian only.
226    |master-new-feature|
227
228.. option:: -R, --rolling
229
230    Run in rolling mode; **squatter** runs as a daemon listening to a
231    sync log channel and continuously incrementally indexing mailboxes.
232    See also **-d** and **-n**.
233    |v3-new-feature|
234
235.. option:: -r, --recursive
236
237    Recursively create indexes for all sub-mailboxes of the user,
238    mailboxes or mailbox prefixes given as arguments.
239
240.. option:: -s, --squat-skip[=delta]
241
242    Skip mailboxes that have not been modified since last index. This is
243    achieved by comparing the last modification time of a mailbox to
244    the last time the squat index of this mailbox got updated. If the
245    mailbox modification time (plus delta) is less than the squat
246    index modification time, then the mailbox is skipped. The optional
247    argument value delta is defined in seconds and must be equal to or
248    higher than zero, the default value is 60.
249    Squat only.
250    |master-new-feature|
251
252.. option:: -S seconds, --sleep=seconds
253
254    After processing each mailbox, sleep for "seconds" before
255    continuing. Can be used to provide some load balancing.  Accepts
256    fractional amounts. |v3-new-feature|
257
258.. option:: -T reindextiers, --reindex-tier=reindextiers
259
260    In compact mode, a comma-separated subset of the source tiers
261    (see **-t**) to be reindexed.  Similar to **-X** but allows
262    limiting the tiers that will be reindexed.
263    |v3-new-feature|
264
265.. option:: -t srctiers, --srctier=srctiers
266
267    In compact mode, the comma-separated source tier(s) for the compacted
268    indices.  At least one source tier must be specified in compact mode.
269    Xapian only.
270    |v3-new-feature|
271
272.. option:: -u name, --user=name
273
274    Extra options refer to usernames (e.g. foo@bar.com) rather than
275    mailbox names.  Usernames are space-separated.
276    |v3-new-feature|
277
278.. option:: -U, --only-upgrade
279
280    In compact mode, only compact if re-indexing.
281    Xapian only.
282    |master-new-feature|
283
284.. option:: -v, --verbose
285
286    Increase the verbosity of progress/status messages.  Sometimes additional messages
287    are emitted on the terminal with this option and the messages are unconditionally sent
288    to syslog.  Sometimes messages are sent to syslog, only if -v is provided.  In rolling and
289    synclog modes, -vv sends even more messages to syslog.
290
291.. option:: -X, --reindex
292
293    Reindex all the messages before compacting.  This mode reads all
294    the lists of messages indexed by the listed tiers, and re-indexes
295    them into a temporary database before compacting that into place.
296    Xapian only.
297    |v3-new-feature|
298
299.. option:: -z desttier, --compact=desttier
300
301    In compact mode, the destination tier for the compacted indices.
302    This must be specified in compact mode.
303    Xapian only.
304    |v3-new-feature|
305
306.. option:: -Z, --internalindex
307
308    When indexing messages, use the Xapian internal cyrusid rather than
309    referencing the ranges of already indexed messages to know if a
310    particular message is indexed.  Useful if the ranges get out of
311    sync with the actual messages (e.g. if files on a tier are lost)
312    Xapian only.
313    |master-new-feature|
314
315Examples
316========
317
318**squatter** is typically deployed via entries in
319:cyrusman:`cyrus.conf(5)`, in either the DAEMON or EVENTS sections.
320
321For the older SQUAT search engine, which offers poor performance in
322rolling mode (-R) we recommend triggering periodic runs via entries in
323the EVENTS section, as follows:
324
325Sample entries from the EVENTS section of :cyrusman:`cyrus.conf(5)` for
326periodic **squatter** runs:
327
328    ::
329
330        EVENTS {
331            # reindex changed mailboxes (fulltext) approximately every three hours
332            squatter1   cmd="/usr/bin/ionice -c idle /usr/lib/cyrus/bin/squatter -i" period=180
333
334            # reindex all mailboxes (fulltext) daily
335            squattera   cmd="/usr/lib/cyrus/bin/squatter" at=0117
336        }
337
338For the newer Xapian search engine, and with sufficiently fast storage,
339the rolling mode (-R) offers advantages.  Use of rolling mode requires
340that **squatter** be invoked in the DAEMON section.
341
342Sample entries for the DAEMON section of :cyrusman:`cyrus.conf(5)` for
343rolling **squatter** operation:
344
345    ::
346
347        DAEMON {
348          # run a rolling squatter using the default sync_log channel "squatter"
349          squatter cmd="squatter -R"
350
351          # run a rolling squatter using a specific sync_log channel
352          squatter cmd="squatter -R -n indexer"
353        }
354
355..  Note::
356
357    When using the *-R* rolling mode, you MUST enable sync_log
358    operation in :cyrusman:`imapd.conf(5)` via the `sync_log: on`
359    setting, and MUST define a sync_log channel via the
360    `sync_log_channels:` setting.  If also using replication, you must
361    either explicitly specify your replication sync_log channel via the
362    `sync_log_channels` directive with a name, or specify the default
363    empty name with "" (the two-character string U+22 U+22).  [Please
364    see :cyrusman:`imapd.conf(5)` for details].
365
366..  Note::
367
368    When configuring rolling search indexing on a **replica**, one must
369    consider whether sync_logs will be written at all.  In this case,
370    please consider the setting `sync_log_unsuppressable_channels` to
371    ensure that the sync_log channel upon which one's squatter instance
372    depends will continue to be written.  See :cyrusman:`imapd.conf(5)`
373    for details.
374
375..  Note::
376
377    When using the Xapian search engine, you must define various
378    settings in :cyrusman:`imapd.conf(5)`.  Please read all relevant
379    Xapian documentation in this release before using Xapian.
380
381[NB: More examples needed]
382
383History
384=======
385
386Support for additional search engines was added in version 3.0.
387
388The following command-line switches were added in version 3.0:
389
390    .. parsed-literal::
391
392        **-F -R -X -d -f -o -u**
393
394The following command-line settings were added in version 3.0:
395
396    .. parsed-literal::
397
398        **-S** *<seconds>*, **-T** *<directory>*, **-f** *<synclogfile>*, **-n** *<channel>*, **-t** *srctier*..., **-z** *desttier*
399
400Files
401=====
402
403/etc/imapd.conf,
404/etc/cyrus.conf
405
406See Also
407========
408
409:cyrusman:`imapd.conf(5)`, :cyrusman:`cyrus.conf(5)`
410
411.. only:: html
412
413    :ref:`configuring-xapian`
414