1=head1 NAME
2
3ovdb - Overview storage method for INN
4
5=head1 DESCRIPTION
6
7The ovdb overview is a storage method that uses the S<Berkeley DB>
8library to store overview data.  It requires version 4.4 or later of
9the S<Berkeley DB> library (4.7+ is recommended because older versions
10suffer from various issues).
11
12The ovdb overview method makes use of the full
13transaction/logging/locking functionality of the
14S<Berkeley DB> environment.  S<Berkeley DB> may be downloaded from
15L<http://www.oracle.com/technetwork/database/database-technologies/berkeleydb/overview/index.html>
16and is needed to build the ovdb backend.
17
18=head1 UPGRADING
19
20There are several versions of the ovdb storage method:
21
22=over 2
23
24=item *
25
26Version 1, the initial version shipped with S<INN 2.3.0> up to S<INN 2.3.5>.
27
28=item *
29
30Version 2, with improved performance, since S<INN 2.4.0>.
31
32=item *
33
34Version 3, corresponding to version 2 with compression enabled, starting
35with S<INN 2.5.0>.
36
37=back
38
39If you have a database created with a previous version of ovdb,
40your database will need to be upgraded using B<ovdb_init>.  See the
41ovdb_init(8) man page for upgrade instructions, as well as the
42L<COMPRESSION> section below.
43
44Note that when the S<Berkeley DB> library is updated to a newer version,
45the ovdb database also needs being upgraded.
46
47=head1 INSTALLATION
48
49If the S<Berkeley DB> library is found at configure time, INN will be
50built with S<Berkeley DB> support unless the B<--without-bdb> flag is
51explicitly passed to configure.  By default, configure will search for
52S<Berkeley DB> in standard locations; there will be a message in the
53configure output indicating the pathname that will be used.
54
55You can override this pathname by adding a path to the option, for
56instance B<--with-bdb=/usr/BerkeleyDB.4.4>.  This directory
57is expected to have subdirectories F<include> and F<lib> (F<lib32>
58and F<lib64> are also checked), containing respectively F<db.h>, and
59the library itself.  In case non-standard paths to the S<Berkeley DB>
60libraries are used, one or both of the options B<--with-bdb-include>
61and B<--with-bdb-lib> can be given to configure with a path.
62
63The ovdb database may take up more disk space for a given spool than the
64other overview methods.  Plan on needing at least S<1.1 KB> for every
65article in your spool (not counting crossposts).  So, if you have 5
66million articles, you'll need at least S<5.5 GB> of disk space for ovdb.
67With compression enabled, this estimate changes to S<0.9 KB> per article,
68so you'll need at least S<4.5 GB> of disk space for 5 million articles.
69See the L<COMPRESSION> section below.  Plus, you'll need additional space
70for transaction logs: at least S<100 MB>.  By default, the transaction
71logs go in the same directory as the database.  To improve performance,
72they can be placed on a different disk S<-- see> the L<DB_CONFIG>
73section.
74
75=head1 CONFIGURATION
76
77To enable the ovdb overview method, set the I<ovmethod> parameter in
78F<inn.conf> to C<ovdb>.  The ovdb database is stored in the directory
79specified by the I<pathoverview> parameter in F<inn.conf>.  This is
80the C<DB_HOME> directory.  To start out, this directory should be empty
81(other than an optional F<DB_CONFIG> file; see L<DB_CONFIG> for details),
82and B<innd> (or B<makehistory>) will create the files as necessary in
83that directory.  Also, make sure the directory is owned by the news user.
84
85Other parameters for configuring ovdb are in the F<ovdb.conf>
86configuration file.  The following parameters can be set in that file:
87
88=over 4
89
90=item I<compress>
91
92If INN was compiled with zlib, and this I<compress> parameter is true,
93ovdb will compress overview records that are longer than 600 bytes.
94See the L<COMPRESSION> section below.
95
96=item I<cachesize>
97
98Size of the memory pool cache, in kilobytes.  The cache will have a
99backing store file in the DB directory which will be at least as big.
100In general, the bigger the cache, the better.  Use C<ovdb_stat -m>
101to see cache hit percentages.  To make a change of this parameter take
102effect, shut down and restart INN (be sure to kill all of the B<nnrpd>
103processes when shutting down).  Default is C<8000> (KB), which is adequate
104for small to medium-sized servers.  Large servers will probably need
105at least C<20000> (KB).
106
107=item I<ncache>
108
109Number of regions across which to split the cache.  The region size
110is equal to I<cachesize> divided by I<ncache>.  Default is C<1> for
111I<ncache>, that is to say the cache will be allocated contiguously
112in memory.
113
114=item I<numdbfiles>
115
116Overview data is split between this many files.  Currently, B<innd> will
117keep all of the files open, so don't set this too high or B<innd> may run
118out of file descriptors.  B<nnrpd> only opens one at a time, regardless.
119May be set to one, or just a few, but only do that if your OS supports
120large (S<< > 2 GB >>) files.  Changing this parameter has no effect on an
121already-established database.  Default is C<32>.
122
123=item I<txn_nosync>
124
125If txn_nosync is set to false, S<Berkeley DB> flushes the log after every
126transaction.  This minimizes the number of transactions that may be lost
127in the event of a crash, but results in significantly degraded
128performance.  Default is true.
129
130=item I<useshm>
131
132If I<useshm> is set to true, S<Berkeley DB> will use shared memory instead of
133mmap for its environment regions (cache, lock, etc).  With some platforms,
134this may improve performance.  Default is false.
135
136=item I<shmkey>
137
138Sets the shared memory key used by S<Berkeley DB> when I<useshm> is true.
139S<Berkeley DB> will create several (usually 5) shared memory segments, using
140sequentially numbered keys starting with C<shmkey>.  Choose a key that does
141not conflict with any existing shared memory segments on your system.
142Default is C<6400>.
143
144=item I<pagesize>
145
146Sets the page size for the DB files (in bytes).  Must be a power
147of 2.  Best choices are C<4096> or C<8192>.  The default is C<8192>.
148Changing this parameter has no effect on an already-established database.
149
150=item I<minkey>
151
152Sets the minimum number of keys per page.  See the S<Berkeley DB>
153documentation for more information.  Default is based on page size
154and whether compression is enabled:
155
156   default_minkey = MAX(2, pagesize / 2600) if compress is false
157   default_minkey = MAX(2, pagesize / 1500) if compress is true
158
159The lowest allowed I<minkey> is C<2>.  Setting I<minkey> higher than
160the default is not recommended, as it will cause the databases to have
161a lot of overflow pages.  Changing this parameter has no effect on an
162already-established database.
163
164=item I<maxlocks>
165
166Sets the S<Berkeley DB> I<lk_max> parameter, which is the maximum number of
167locks that can exist in the database at the same time.  Default is C<4000>.
168
169=item I<nocompact>
170
171The I<nocompact> parameter affects the behaviour of B<expireover>.
172The B<expireover> function in ovdb can do its job in one of two
173ways:  by simply deleting expired records from the database; or by
174re-writing the overview records into a different location leaving out
175the expired records.  The first method is faster, but it leaves 'holes'
176that result in space that can not immediately be reused.  The second
177method 'compacts' the records by rewriting them.
178
179If this parameter is set to C<0>, B<expireover> will compact all
180newsgroups; if set to C<1>, B<expireover> will not compact any
181newsgroups; and if set to a value greater than one, B<expireover>
182will only compact groups that have less than that number of articles.
183
184Experience has shown that compacting has minimal effect (other than
185making B<expireover> take longer) so the default is C<1>.  This parameter
186will probably be removed in the future.
187
188=item I<readserver>
189
190When the I<readserver> parameter is set to false, each B<nnrpd>
191process directly accesses the S<Berkeley DB> environment.  The process
192of attaching to the database (and detaching when finished) is fairly
193expensive, and can result in high loads in situations when there are
194lots of reader connections of relatively short duration.
195
196When the I<readserver> parameter is set to true, the B<nnrpd> processes
197will access overview via a helper server (B<ovdb_server> S<-- which>
198is started by B<ovdb_init>).  All ovdb reads will then be funnelled
199through a single process with a cleaner interface to the underlying
200S<Berkeley DB> database.  This will result in cleaner shutdowns for the
201database, improving stability and avoiding deadlocks, timing issues and
202corrupted databases.  That's why you should try to set this parameter to
203true if you are experiencing any instability in the ovdb overview method.
204
205Default value is true.
206
207=item I<numrsprocs>
208
209This parameter is only used when I<readserver> is true.  It sets the
210number of B<ovdb_server> processes.  As each B<ovdb_server> can process
211only one transaction at a time, running more servers can improve reader
212response times.  Default is C<5>.
213
214=item I<maxrsconn>
215
216This parameter is only used when I<readserver> is true.  It sets a
217maximum number of readers that a given B<ovdb_server> process will
218serve at one time.  This means the maximum number of readers for all
219of the B<ovdb_server> processes is (I<numrsprocs> * I<maxrsconn>).
220This does I<not> limit the actual number of readers, since B<nnrpd>
221will fall back to opening the database directly if it can't connect to
222an B<ovdb_server>.  Default is C<0>, which means an unlimited number
223of connections is allowed.
224
225=back
226
227=head1 COMPRESSION
228
229The ovdb storage method has the ability to compress overview data
230before it is stored into the database.  In addition to consuming less disk
231space, compression keeps the average size of the database keys smaller.
232This in turn increases the average number of keys per page, which can
233significantly improve performance and also helps keep the database more
234compact.  This feature requires that INN be built with zlib.  Only records
235larger than 600 bytes get compressed, because that is the point at which
236compression starts to become significant.
237
238If compression is not enabled (either from the I<compress> option in
239F<ovdb.conf> or INN was not built with zlib support), the database
240will be backward compatible with older versions of ovdb.  However,
241if compression is enabled, the database is marked with a newer version
242that will prevent older versions of ovdb from opening the database.
243
244You can upgrade an existing database to use compression simply by setting
245I<compress> to true in F<ovdb.conf>.  Note that existing records in the
246database will remain uncompressed; only new records added after enabling
247compression will be compressed.
248
249If you disable compression on a database that previously had it enabled,
250new records will be stored uncompressed, but the database will still be
251incompatible with older versions of ovdb (and will also be incompatible
252with this version of ovdb if INN was not built with zlib support).
253So to downgrade to a completely uncompressed database, you will have
254to rebuild the database using B<makehistory>.
255
256=head1 DB_CONFIG
257
258A file called F<DB_CONFIG> may be placed in the database directory
259(I<pathoverview> in F<inn.conf>) to customize where the various database
260files and transaction logs are written.  By default, all of the files
261are written in the C<DB_HOME> directory.  One way to improve performance
262is to put the transaction logs on a different disk.  To do this, put:
263
264    DB_LOG_DIR /path/to/logs
265
266in the F<DB_CONFIG> file.  If the pathname you give starts with a C</>, it is
267treated as an absolute path; otherwise, it is relative to the C<DB_HOME>
268directory.  Make sure that any directories you specify exist and have
269proper ownership/mode before starting INN, because they won't be created
270automatically.  Also, don't change the F<DB_CONFIG> file while anything that
271uses ovdb is running.
272
273Another thing that you can do with this file is to split the overview
274database across multiple disks.  In the F<DB_CONFIG> file, you can list
275directories that S<Berkeley DB> will search when it goes to open a database.
276
277For example, let's say that you have I<pathoverview> set to
278F</mnt/overview> and you have four additional file systems created
279on F</mnt/ovX>.  You would create a file F</mnt/overview/DB_CONFIG>
280containing the following lines:
281
282    set_data_dir /mnt/overview
283    set_data_dir /mnt/ov1
284    set_data_dir /mnt/ov2
285    set_data_dir /mnt/ov3
286    set_data_dir /mnt/ov4
287
288Distribute your F<ovNNNNN> files into the four filesystems (say, 8 each).
289When called upon to open a database file, the db library will look for it
290in each of the specified directories (in order).  If said file is not
291found, one will be created in the first of those directories.
292
293Whenever you change F<DB_CONFIG> or move database files around, make
294sure all news processes that use the database are shut down first
295(including B<nnrpd> processes).
296
297The F<DB_CONFIG> functionality is part of S<Berkeley DB> itself,
298rather than something provided by ovdb.  See the S<Berkeley DB>
299documentation for complete details for the version of S<Berkeley DB>
300that you're running.
301
302=head1 RUNNING
303
304When starting the news system, B<rc.news> will invoke the B<ovdb_init>
305program.  See the ovdb_init(8) man page for information about the tasks
306it performs.  B<ovdb_init> must be run before using the database.
307
308And when stopping INN, B<rc.news> kills the B<ovdb_monitor> processes after
309the other INN processes have been shut down.
310
311=head1 DIAGNOSTICS
312
313Problems relating to ovdb are logged to F<news.err> with C<OVDB> in
314the error message.
315
316INN programs that use overview will fail to start up if the
317B<ovdb_monitor> processes aren't running.  Be sure to run B<ovdb_init>
318before running anything that accesses overview.
319
320Also, INN programs that use overview will fail to start up if the user
321running them is not the news user.
322
323If a program accessing the database crashes, or otherwise exits uncleanly,
324it might leave a stale lock in the database.  This lock could cause other
325processes to deadlock on that stale lock.  To fix this, shut down all news
326processes (using C<kill -9> if necessary) and then restart.  B<ovdb_init>
327should perform a recovery operation which will remove the locks and repair
328damage caused by killing the deadlocked processes.
329
330=head1 FILES
331
332=over 4
333
334=item I<pathetc>/inn.conf
335
336The I<ovmethod> and I<pathoverview> parameters are relevant to ovdb.
337
338=item I<pathetc>/ovdb.conf
339
340Optional configuration file for tuning.  See L<CONFIGURATION> above.
341
342=item I<pathoverview>
343
344Directory where the database goes.  S<Berkeley DB> calls it the
345C<DB_HOME> directory.
346
347=item I<pathoverview>/DB_CONFIG
348
349Optional file to configure the layout of the database files.
350
351=item I<pathrun>/ovdb.sem
352
353A file that gets locked by every process that is accessing the database.
354This is used by B<ovdb_init> to determine whether the database is active
355or quiescent.
356
357=item I<pathrun>/ovdb_monitor.pid
358
359Contains the process ID of B<ovdb_monitor>.
360
361=back
362
363=head1 TO DO
364
365Implement a way to limit how many databases can be open at once (to reduce
366file descriptor usage); maybe using something similar to the cache code in
367legacy F<ov3.c> file.
368
369=head1 HISTORY
370
371Written by Heath Kehoe <hakehoe@avalon.net> for InterNetNews.
372
373$Id: ovdb.pod 10525 2021-01-20 11:51:15Z iulius $
374
375=head1 SEE ALSO
376
377inn.conf(5), innd(8), makehistory(8), nnrpd(8), ovdb_init(8),
378ovdb_monitor(8), ovdb_stat(8).
379
380S<Berkeley DB> documentation:  in the F<docs> directory of the S<Berkeley
381DB> source distribution, or on the Oracle S<Berkeley DB> web page
382(L<http://www.oracle.com/technetwork/database/database-technologies/berkeleydb/overview/index.html>).
383
384=cut
385