1<html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>bogoutil</title><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="refentry"><a name="bogoutil.1"></a><div class="titlepage"></div><div class="refnamediv"><a name="name"></a><h2>Name</h2><p>bogoutil — Dumps, loads, and maintains 2 <span class="application">bogofilter</span> database files</p></div><div class="refsynopsisdiv"><a name="synopsis"></a><h2>Synopsis</h2><div class="cmdsynopsis"><p><code class="command">bogoutil</code> { -h | -V }</p></div><div class="cmdsynopsis"><p><code class="command">bogoutil</code> [options] { -d <em class="replaceable"><code>file</code></em> | -H <em class="replaceable"><code>file</code></em> | -l <em class="replaceable"><code>file</code></em> | -m <em class="replaceable"><code>file</code></em> | -w <em class="replaceable"><code>file</code></em> | -p <em class="replaceable"><code>file</code></em> }</p></div><div class="cmdsynopsis"><p><code class="command">bogoutil</code> { -r <em class="replaceable"><code>file</code></em> | -R <em class="replaceable"><code>file</code></em> }</p></div><div class="cmdsynopsis"><p><code class="command">bogoutil</code> { --db-print-leafpage-count <em class="replaceable"><code>file</code></em> | --db-print-pagesize <em class="replaceable"><code>file</code></em> | --db-verify <em class="replaceable"><code>file</code></em> | --db-checkpoint 3 <em class="replaceable"><code>directory</code></em> [flag...] | --db-list-logfiles <em class="replaceable"><code>directory</code></em> | --db-prune <em class="replaceable"><code>directory</code></em> | --db-recover <em class="replaceable"><code>directory</code></em> | --db-recover-harder <em class="replaceable"><code>directory</code></em> | --db-remove-environment <em class="replaceable"><code>directory</code></em> }</p></div><p>where <code class="option">options</code> is</p><div class="cmdsynopsis"><p><code class="command">bogoutil</code> [-v] [-n] [-C] [-D] [-a <em class="replaceable"><code>age</code></em>] [-c <em class="replaceable"><code>count</code></em>] [-s <em class="replaceable"><code>min,max</code></em>] [-y <em class="replaceable"><code>date</code></em>] [-I <em class="replaceable"><code>file</code></em>] [-O <em class="replaceable"><code>file</code></em>] [-x <em class="replaceable"><code>flags</code></em>] [--config-file <em class="replaceable"><code>file</code></em>]</p></div></div><div class="refsect1"><a name="description"></a><h2>DESCRIPTION</h2><p><span class="application">Bogoutil</span> is part of the 4 <span class="application">bogofilter</span> Bayesian spam filter package.</p><p>It is used to dump and load <span class="application">bogofilter</span>'s 5 Berkeley DB databases to and from text files, perform database maintenance 6 functions, and to display the values for specific words.</p></div><div class="refsect1"><a name="options"></a><h2>OPTIONS</h2><p> 7 The <code class="option">-d <em class="replaceable"><code>file</code></em></code> 8 option tells <span class="application">bogoutil</span> to print 9 the contents of the database file to <code class="option">stdout</code>. 10 </p><p> 11 The <code class="option">-H <em class="replaceable"><code>file</code></em></code> 12 option tells <span class="application">bogoutil</span> to print 13 a histogram of the database file to 14 <code class="option">stdout</code>. The output is similar to 15 <span class="application">bogofilter -vv</span>. Finally, 16 hapaxes (tokens which were only seen once) and pure tokens 17 (tokens which were encountered only in ham or only in 18 spam) are counted. 19 </p><p> 20 The <code class="option">-l <em class="replaceable"><code>file</code></em></code> 21 option tells <span class="application">bogoutil</span> 22 to load the data from <code class="option">stdin</code> into the database file. 23 If the database file exists, <code class="option">stdin</code> data is 24 merged into the database file, with counts added up. 25 </p><p>The <code class="option">-m</code> option tells <span class="application">bogoutil</span> 26 to perform maintenance functions on the specified database, i.e. discard tokens 27 that are older than desired, have counts that are too small, or sizes (lengths) 28 that are too long or too short. 29 </p><p> 30 The <code class="option">-w <em class="replaceable"><code>file</code></em></code> 31 option tells <span class="application">bogoutil</span> to 32 display token information from the database file. The option 33 takes an argument, which is either the name of the 34 wordlist (usually wordlist.db) or the name of the directory 35 containing it. Tokens can be listed on the command line 36 or piped to <span class="application">bogoutil</span>. When 37 there are extra arguments on the command line, 38 <span class="application">bogoutil</span> will use them as the 39 tokens to lookup. If there are no extra arguments, 40 <span class="application">bogoutil</span> will read tokens from 41 <code class="option">stdin</code>. 42 </p><p> 43 The <code class="option">-p <em class="replaceable"><code>file</code></em></code> 44 option tells <span class="application">bogoutil</span> to 45 display the database information for one or more tokens. 46 The display includes a probability column with the 47 token's spam score (computed using 48 <span class="application">bogofilter</span>'s default values). 49 Option <code class="option">-p</code> takes the same arguments as 50 option <code class="option">-w</code> . 51 </p><p>The <code class="option">-r <em class="replaceable"><code>file</code></em></code> option tells 52 <span class="application">bogoutil</span> to recalculate the ROBX 53 value and print it as a six-digit fraction. 54 </p><p>The <code class="option">-R <em class="replaceable"><code>file</code></em></code> 55 option does the same as <code class="option">-r</code>, but saves the 56 result in the training database without printing it. 57 </p><p>The <code class="option">-I <em class="replaceable"><code>file</code></em></code> option tells 58 <span class="application">bogoutil</span> to read its input from 59 <em class="replaceable"><code>file</code></em> rather than stdin. 60 </p><p>The <code class="option">-O <em class="replaceable"><code>file</code></em></code> option tells 61 <span class="application">bogoutil</span> to write its output to 62 <em class="replaceable"><code>file</code></em> rather than stdout. 63 </p><p> 64 The <code class="option">-v</code> option produces verbose output on <code class="option">stderr</code>. 65 This option is primarily useful for debugging. 66 </p><p>The <code class="option">-C</code> inhibits reading configuration 67 files and lets <span class="application">bogoutil</span> go with the defaults.</p><p>The <code class="option">--config-file 68 <em class="replaceable"><code>file</code></em></code> option tells 69 <span class="application">bogoutil</span> to read <em class="replaceable"><code>file</code></em> 70 instead of the standard configuration file.</p><p>The <code class="option">-D</code> redirects debug output to stdout (it 71 usually goes to stderr).</p><p>The <code class="option">-x <em class="replaceable"><code>flags</code></em></code> 72 option sets debugging flags.</p><p> 73 Option <code class="option">-n</code> stands for "replace non-ascii characters". 74 It will replace characters with the high bit (0x80) by question marks. 75 This can be useful if a word list has lots of unreadable tokens, for 76 example from Asian spam. The "bad" characters will be converted to 77 question marks and matching tokens will be combined when used with 78 <code class="option">-m</code> or <code class="option">-l</code>, but not with <code class="option">-d</code>. 79 </p><p> 80 Option <code class="option">-a age</code> indicates an acceptable token age, with older ones being discarded. 81 The age can be a date (in form YYYYMMMDD) or a day count, i.e. discard tokens older than 82 <code class="option">age</code> days. 83 </p><p> 84 Option <code class="option">-c value</code> indicates that tokens with counts less than or equal to <code class="option">value</code> 85 are to be discarded. 86 </p><p> 87 Option <code class="option">-s min,max</code> is used to discard tokens based on their size, i.e. length. 88 All tokens shorter than <code class="option">min</code> or longer than <code class="option">max</code> will be discarded. 89 </p><p> 90 Option <code class="option">-y date</code> is specifies the date to 91 give to tokens that don't have dates. The format is YYYYMMDD. 92 </p><p>The <code class="option">-h</code> option prints the help message and exits.</p><p>The <code class="option">-V</code> option prints the version number and exits.</p></div><div class="refsect1"><a name="environment_maintenance"></a><h2>ENVIRONMENT MAINTENANCE</h2><p>The <code class="option">--db-checkpoint <em class="replaceable"><code>dir</code></em></code> 93 option causes <span class="application">bogoutil</span> to flush the buffer 94 caches and checkpoint the database environment.</p><p>The <code class="option">--db-list-logfiles 95 <em class="replaceable"><code>dir</code></em></code> 96 option causes <span class="application">bogoutil</span> to list the log 97 files in the environment. Zero or more keywords can be added or 98 combined (separated by whitespace) to modify the behavior of this 99 mode. The default behavior is to list only inactive log 100 files with relative paths. You can add <code class="option">all</code> 101 to list all log files (inactive and active). You can add 102 <code class="option">absolute</code> to switch the listing to absolute 103 paths. 104 </p><p>The <code class="option">--db-prune <em class="replaceable"><code>dir</code></em></code> 105 option causes <span class="application">bogoutil</span> to checkpoint 106 the database environment and remove inactive log files.</p><p>The <code class="option">--db-recover <em class="replaceable"><code>dir</code></em></code> 107 option runs a regular database recovery 108 in the specified database directory. If that fails, it will retry 109 with a (usually slower) catastrophic database recovery. If 110 that fails, too, your database cannot be repaired and must 111 be rebuilt from scratch. 112 This is only supported when compiled with Berkeley DB 113 support with transactions enabled. Trying recovery with QDBM or SQLite3 support will 114 result in an error.</p><p>The <code class="option">--db-recover-harder <em class="replaceable"><code>dir</code></em></code> 115 option runs a catastrophic data 116 base recovery in the specified database directory. If that fails, 117 your database cannot be repaired and must be rebuilt from 118 scratch. 119 This is only supported when compiled with Berkeley DB 120 support with transactions enabled. Trying recovery with QDBM or SQLite3 support will 121 result in an error.</p><p>The <code class="option">--db-remove-environment 122 <em class="replaceable"><code>directory</code></em></code> option has 123 no short option equivalent. It runs recovery in the given 124 directory and then removes the database environment. Use 125 this <span class="emphasis"><em>before</em></span> upgrading to a new Berkeley 126 DB version if the new version to be installed requires a log 127 file format update.</p><p>The <code class="option">--db-print-leafpage-count 128 <em class="replaceable"><code>file</code></em></code> option prints 129 the number of leaf pages in the database file 130 <em class="replaceable"><code>file</code></em> as a decimal number, or 131 UNKNOWN if the database does not support querying this 132 figure.</p><p>The <code class="option">--db-print-pagesize 133 <em class="replaceable"><code>file</code></em></code> option prints 134 the size of a database page in 135 <em class="replaceable"><code>file</code></em> as a decimal number, or 136 UNKNOWN for databases with variable page size or databases 137 that do not allow a query of the database page size.</p><p> 138 The <code class="option">--db-verify <em class="replaceable"><code>file</code></em></code> 139 option requests that <span class="application">bogofilter</span> verifies 140 the database file. It prints only errors, unless in verbose mode. 141 </p></div><div class="refsect1"><a name="dataformat"></a><h2>DATA FORMAT</h2><p> 142 <span class="application">Bogoutil</span> reads and writes text files where each nonblank 143 line consists of a word, any amount of horizontal whitespace, a numeric word count, 144 more whitespace, and (optionally) a date in form YYYYMMDD. 145 Blank lines are skipped. 146 </p></div><div class="refsect1"><a name="returns"></a><h2>RETURN VALUES</h2><p> 147 0 for successful operation. 148 1 for most errors. 149 3 for I/O or other errors. 150 Error 3 usually means that something is seriously wrong with the database files. 151 </p></div><div class="refsect1"><a name="author"></a><h2>AUTHOR</h2><p>Gyepi Sam <code class="email"><<a class="email" href="mailto:gyepi@praxis-sw.com">gyepi@praxis-sw.com</a>></code>.</p><p>Matthias Andree <code class="email"><<a class="email" href="mailto:matthias.andree@gmx.de">matthias.andree@gmx.de</a>></code>.</p><p>David Relson <code class="email"><<a class="email" href="mailto:relson@osagesoftware.com">relson@osagesoftware.com</a>></code>.</p><p> 152 For updates, see <a class="ulink" href="http://bogofilter.sourceforge.net/" target="_top"> 153 the bogofilter project page</a>. 154 </p></div><div class="refsect1"><a name="also"></a><h2>SEE ALSO </h2><p>bogofilter(1), bogolexer(1), bogotune(1), bogoupgrade(1)</p></div></div></body></html> 155