• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

contrib/H03-May-2022-3,6612,976

doc/H03-May-2022-426351

m4/H03-May-2022-10,6839,592

man/H03-May-2022-2,3101,909

src/H03-May-2022-43,02632,028

txt/H03-May-2022-406331

webui/H03-May-2022-11,7539,902

CHANGELOGH A D11-Apr-2012261.8 KiB7,4704,554

LICENSEH A D16-Aug-201134 KiB669549

Makefile.amH A D16-Aug-2011559 2410

Makefile.inH A D03-May-202223.3 KiB752655

READMEH A D11-Apr-201296.3 KiB2,1641,646

RELEASE.NOTESH A D11-Apr-20128.4 KiB198156

UPGRADINGH A D16-Aug-20118.6 KiB181145

aclocal.m4H A D23-Apr-201234.3 KiB968873

autogen.shH A D16-Aug-20112.1 KiB7858

config.guessH A D23-Apr-201244.1 KiB1,5461,335

config.subH A D23-Apr-201234.9 KiB1,7921,649

configureH A D23-Apr-2012505.2 KiB17,67714,727

configure.acH A D23-Apr-201225.7 KiB838771

depcompH A D31-Aug-201018.2 KiB631407

install-shH A D31-Aug-201013.3 KiB521344

ltmain.shH A D12-Oct-2011276.1 KiB9,6377,288

missingH A D31-Aug-201011.2 KiB377281

README

1DSPAM v3.10.2
2COPYRIGHT (C) 2002-2012 DSPAM Project
3http://dspam.sourceforge.net/
4
5LICENSE
6
7This program is free software: you can redistribute it and/or modify
8it under the terms of the GNU Affero General Public License as
9published by the Free Software Foundation, either version 3 of the
10License, or (at your option) any later version.
11
12This program is distributed in the hope that it will be useful,
13but WITHOUT ANY WARRANTY; without even the implied warranty of
14MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
15GNU Affero General Public License for more details.
16
17You should have received a copy of the GNU Affero General Public License
18along with this program.  If not, see <http://www.gnu.org/licenses/>.
19
20CREDITS
21
22Original Work By
23  Lead development till 3.8.0: Jonathan A. Zdziarski <jonathan@nuclearelephant.com>
24  Lead development after 3.8.0: Stevan Bajic <stevan@bajic.ch>
25  PostgreSQL driver: Rustam Aliyev <rustam@azernews.com>
26  External Lookup module: Hugo Monteiro <hugo.monteiro@fct.unl.pt>
27  Various:
28    Feb/2006 Cove Schneider <cove@wildpackets.com>
29    Jan/2006 Norman Maurer <nm@byteaction.de>
30
31Your name is missing? Let us know with a reference to your commit, and we'll
32add you to the list.
33
34COPYRIGHT
35
36As of 12 January 2009 the copyright is owned by the DSPAM Project, represented
37by a team of people, including:
38  Alexander Prinsier
39  Dov Zamir
40  Hugo Monteiro
41  Ion-Mihai Tetcu
42  Paul Cockings
43  Stevan Bajic
44
45TABLE OF CONTENTS
46
47General DSPAM Information
48
49  1.0 About DSPAM
50  1.1 Installation and Configuration
51  1.2 Testing
52  1.3 Troubleshooting
53  1.4 DSPAM Tools
54  1.5 Agent Commandline Arguments
55
56Advanced DSPAM functionality
57
58  2.0 Linking with libdspam
59  2.1 Configuring groups
60  2.2 External Inoculation Theory
61  2.3 Client/Server Mode
62  2.4 LMTP
63  2.5 DSPAM User Preferences
64  2.6 Fallback Domains
65  2.7 External User Lookup
66
67Miscellaneous
68
69  3.0 Bugs, Feature Requests
70  3.1 Ports / Packages
71  3.2 GIT Access
72
731.0 ABOUT DSPAM
74
75DSPAM is an open-source, freely available anti-spam solution designed to combat
76unsolicited commercial email using advanced statistical analysis. In short,
77DSPAM filters spam by learning what spam is and isn't. It does this by learning
78each user's individual mail behavior. This allows DSPAM to provide
79highly-accurate, personalized filtering for each user on even a large system
80and provides an administratively maintenance free solution capable of learning
81each user's email behaviors with very few false positives.
82
83While DSPAM is focused around spam filtering, many have found alternative
84uses for all types of two-concept document classification.
85
86DSPAM is rapidly gaining a large support forum and being used in many large-
87scale implementations. Contributions to the project are welcome via the
88dspam-dev mailing list or in the form of financial contributions.
89
90Many of the foundational principles incorporated into this software were
91contributed by Paul Graham's white paper on combatting spam, which can be
92found at http://paulgraham.com/spam.html.  Much research and development has
93resulted in many new approaches being added onto the DPSAM project as well,
94some of which are explained in white papers on the DSPAM home page.
95
96DSPAM can be implemented as a total solution, or as a library which developers
97may link their projects to the dspam core engine (libdspam) in accordance with
98the GPL license agreement.  This enables developers to incorporate libdspam as
99a "drop-in" for instant spam filtering within their applications - such as mail
100clients, other anti-spam tools, and so on.
101
102PLEASE NOTE: DSPAM and libdspam are distributed under the AGPL license, not the
103LGPL. Commercial licensing is available for those who seek to redistribute
104DSPAM or some of DSPAM's components/libraries in their non-GPL products.
105Please contact us for more information about commercial licensing.
106
107The DSPAM package is split up into the following pieces:
108
109DSPAM AGENT
110
111The DSPAM agent is the command center for all shell and daemon operations.
112If you're using DSPAM as a filtering solution, this is the 'dspam' (or dspamc)
113binary you're likely going to be talking to via commandline.
114
115LIBDSPAM: CORE ENGINE
116
117The DSPAM core processing engine, also known as libdspam, provides all critical
118spam filtering functions.  The engine is embedded into other dspam components
119(such as the agent) and is responsbile for the actual filtering logic.
120If you're not a developer, you don't need to be concerned with this component
121as it is automatically compiled in with the build.
122
123WEB UI
124
125The Web UI (User Interface) is designed to allow end-users to review their
126spam quarantine and history, graphs, and to delete their spam permanently.
127They can also optionally use the quarantine to perform all of their training.
128The UI also includes some basic administrative tools to change settings and
129manage user quarantines.
130
131TOOLS
132
133Some basic tools which have been provided to manage dictionaries, automate
134corpus feeding, and perform other diagnostic operations related to DSPAM.
135Some of these include dspam_train, dspam_stats, and dspam_dump.
136
137HISTORY OF COPYRIGHT
138
139Original work was done by Jonathan A. Zdziarski.
140
141In 2006 the copyright was handed over to Sensory Networks.
142
143In 2009 Sensory Networks handed over the full copyright to the DSPAM Project,
144represented by a team of people, including:
145  Alexander Prinsier
146  Dov Zamir
147  Hugo Monteiro
148  Ion-Mihai Tetcu
149  Paul Cockings
150  Stevan Bajic
151
1521.1 INSTALLATION
153
154IMPLEMENTATION OPTIONS
155
156There are many different ways to deploy DSPAM onto an existing network. The
157most popular approaches are:
158
1591. As a delivery agent proxy
160
161When your mail server gets ready to deliver mail to a user's mailbox it calls
162a delivery agent of some sort. On most UNIX systems, this is procmail, maildrop,
163mail.local, or a similar tool. When used as a delivery proxy, the DSPAM agent
164is called in place of your existing agent - or better put, it can masquerade
165as the local delivery agent. DSPAM then processes the message and will call
166the /real/ delivery agent to pass the good mail into the user's mailbox,
167quarantining the bad mail. DSPAM can optionally tag and deliver both spam
168and legitimate mail.
169
170In the diagram below, MTA refers to Mail Transfer Agent, or your mail server
171software: Postfix, Sendmail, Exim, etc. LDA refers to the Local Delivery
172Agent: Procmail, Maildrop, etc..
173
174BEFORE:
175
176    [MTA] ---> [LDA] ---> (User's Mailbox)
177
178AFTER:
179
180    [MTA] ---> [DSPAM] ---> [LDA] ---> (User's Mailbox)
181                        \
182                         \--> [Quarantine]
183           [End User] ------> [Web UI]
184
1852. As a POP3 Proxy
186
187If you don't want to tinker with your existing mail server setup, DSPAM can
188be combined with one of a few open source programs designed to act as a POP3
189proxy. This means spam is filtered whenever the user checks their mail,
190rather than when it is delivered. The benefit to this is that you can set up
191a small machine on your network that will connect to your existing mail server,
192so no integration is needed. It also allows your users to arbitarily point their
193mail client at it if they desire filtering. The drawback to this approach is
194that the POP3 protocol has no way to tell the mail client that a message is
195spam, and so the user will have to download the spam (tagged, of course).
196
197BEFORE:
198
199    [End User] ---> [POP3 Server]
200
201AFTER:
202
203    [End User] ---> [POP3 Proxy] <--> [DSPAM]
204                     \
205                      \--> [POP3 Server]
206
2073. As an SMTP Relay
208
209Newer versions of DSPAM have seen features that allow it to function more
210easily as an SMTP relay. An SMTP relay sits in front of your existing mail
211server (requiring no integration). To use an SMTP relay, the MX records for
212your domains are repointed to the relay machine running DSPAM. DSPAM then
213relays the good (and optionally bad) mail to the existing SMTP server. This
214allows you to use DSPAM with even a Windows-based destination mail server
215as no integration is necessary. See doc/relay.txt for one example of how to
216do this with Postfix.
217
218BEFORE:
219
220  { Internet } ---> [Company Mail Server]
221
222AFTER:
223
224  { Internet } --->  [ Inbound SMTP Relay  ]  --->  [Company Mail Server]
225                         ( MTA <> DSPAM )     SMTP
226                          \                    or
227                           \--> [Quarantine]  LMTP
228             [End User] ------> [Web UI]
229
230UPGRADING DSPAM
231
232   Please see the file UPGRADING
233
234FRESH INSTALLATION
235
2360. PREREQUISITES
237
238   DSPAM can use one of many different backends to store its information, and
239   you will need to decide on one and install the appropriate software before
240   you can build DSPAM. The following storage backends are presently available:
241
242   Driver       Requirements
243   -------------------------------------------------------------------------
244 T mysql_drv:   MySQL client libraries      (and a server to connect to)
245 T pgsql_drv:   PostgreSQL client libraries (and a server to connect to)
246   sqlite_drv:  SQLite v2.7.7 or above      (scheduled for removal)
247   sqlite3_drv: SQLite v3.x
248*T hash_drv:    None                        (Self-Contained Hash-Based Driver)
249
250   Legend:
251    * Default storage driver
252    T Thread-safe (Required for running DSPAM in server daemon mode)
253
254   In general, MySQL is one of the faster solutions with a smaller storage
255   footprint and is well suited for both small and large-scale implementations.
256
257   The hash driver (inspired by Bill Yerazunis' CRM Sparse Spectra algorithm)
258   is the fastest solution by far and requires no dependencies. It supports
259   an auto-extend feature to grow the file size as needed and is very
260   fast and compact. It does however lack some features (such as merged
261   groups support) and uses a lot of memory to mmap() users.
262
263   Also note that a database created with the hash driver is currently not safe
264   to move between 32/64 bit systems or big/little endian systems.
265
266   Documentation for any additional setup of your selected storage driver can
267   be found in the doc/ directory. You'll need to follow any steps outlined in
268   the storage driver documentation before continuing.
269
270   You can download MySQL from http://www.mysql.com.
271   You can download PostgreSQL from http://www.postgresql.com.
272   You can download SQLite from http://www.sqlite.org.
273
2741. CONFIGURATION
275
276   DSPAM uses autoconf, so configuration is fairly standardised with other
277   UNIX-based software:
278
279   ./configure [options]
280
281   DSPAM supports the configuration options below. Generally, the default
282   configuration is more than acceptable, so it's a good idea not to tweak too
283   many settings unless you know what you are doing.
284
285   PATH SWITCHES
286
287     --prefix=DIR
288     Specify an alternative root prefix for installation.  The default is
289     /usr/local. This does not affect the location of dspam.conf (which
290     defaults to /etc). Use --sysconfdir= for this.
291
292     --sysconfdir=DIR
293     Specify an alternative home for the dspam.conf file. The default is /etc.
294
295     --with-dspam-home=DIR
296     Specify an alternative DSPAM home for installation. This can alternatively
297     be changed in dspam.conf, but is convenient to do on the configure line.
298     The default is $prefix/var/dspam, or /usr/local/var/dspam.
299
300     --with-logdir=DIR
301     Specify an alternative log directory. The default is $dspam_home/log. Do
302     not set this to /var/log unless DSPAM will have permissions to write to
303     the directory.
304
305   FILESYSTEM SCALE
306
307     The default filesystem scale is "small-scale", and writes each user to
308     its own directory in the top-level DSPAM home data directory.
309     The following two switches allow the scale to be changed to be more
310     suitable for larger installations.
311
312     --enable-large-scale
313     Switch for large-scale implementation.  User data will be stored as
314     $HOME/data/u/s/user instead of $HOME/data/user
315
316     --enable-domain-scale
317     Switch for domain-scale implementation.  When used, DSPAM expects
318     username@domain to be passed in as the user id and user data will be
319     stored as $HOME/data/example.org/user and $HOME/opt-in/example.org/user.dspam
320     instead of $HOME/data/user
321
322   INTEGRATION SWITCHES
323
324     --with-storage-driver=DRIVER[,DRIVER2[...,DRIVERN]]
325     Specify your storage driver selection(s).  A storage driver is a driver
326     written specifically for DSPAM to store tokens, signature data, and
327     perform other proprietary operations.  The default driver is hash_drv.
328     The following drivers have been provided:
329
330     mysql_drv:   MySQL Drivers
331     pgsql_drv:   PostgreSQL Drivers
332     sqlite_drv:  SQLite v2.x Drivers (scheduled for removal)
333     sqlite3_drv: SQLite v3.x Drivers
334     hash_drv:    Self-Contained Hash Database
335
336     If you are a packager, or wish to have multiple drivers built for any
337     reason you may specify multiple drivers by separating them with commas.
338     This will cause the storage driver specified in dspam.conf to be
339     dynamically loaded at runtime rather than statically linked. If you wish
340     to build only one driver, but dynamically, then specify it twice as in
341     --with-storage-driver=mysql_drv,mysql_drv.
342
343     If you will be compiling DSPAM to operate as a server daemon or to deliver
344     via SMTP/LMTP, you will need to use a thread-safe driver (outlined in the
345     chart earlier in this document).
346
347     You may also need to use some of the driver-specific configure flags
348     (discussed in the DRIVER SPECIFIC CONFIGURATION OPTIONS section below).
349
350     --disable-trusted-user-security
351     Administrators who wish to disable trusted user security may do so by
352     using this configure flag.  This will cause DSPAM to treat each user as
353     if they were "trusted" which could allow them to potentially execute
354     arbitrary commands on the server via DSPAM. Because of this, administrators
355     should only use this option on either a closed server, or configure their
356     DSPAM binary to be executable only by users who can be trusted.  This
357     option SHOULD NOT be used as a solution to your MTA dropping privileges
358     prior to calling DSPAM. Instead, see the TRUSTED SECURITY section of this
359     document.
360
361     --enable-homedir
362     When enabled, instead of checking for $HOME/$USER/opt-in/
363     $USER[.dspam|.nodspam], DSPAM will check for a .dspam|.nodspam file in the
364     user's home directory. DSPAM will also store each user's data in ~/.dspam
365     when this option is enabled. Because of this, DSPAM will automatically
366     install and run setuid root so that it can read each user's home directory.
367
368     Note:
369
370       This function is incompatible with most implementations of the Web UI,
371       since it requires access to read each user's home directory. Therefore,
372       only use this option if you will not be using the Web UI or plan on
373       doing something asinine like running it as root.
374
375     --enable-daemon
376     Builds DSPAM with support for daemon mode, and builds associated dspamc
377     thin client. Pthreads is required to build for daemon mode and the
378     storage driver used must be thread-safe.
379
380   DRIVER SPECIFIC CONFIGURE SWITCHES
381
382     Some storage drivers have their own custom configuration switches:
383
384     mysql_drv:
385       --with-mysql-includes=DIR
386       Specify a path to the MySQL includes
387
388       --with-mysql-libraries=DIR
389       Specify a path to the MySQL libraries
390       (Currently links to -lmysqlclient, also -lcrypto on some systems)
391
392       --enable-virtual-users
393       Tells DSPAM to create virtual user ids.  Use this if your users don't
394       actually exist on the system (e.g. in /etc/passwd if using a password
395       file)
396
397       --enable-preferences-extension
398       MySQL supports the preferences extension, which stores user preferences
399       in mysql instead of flat files (the built-in method)
400
401       --disable-mysql4-initialization
402       If you are compiling libdspam for use with a third party application,
403       and the third party application makes its own calls to libmysqlclient,
404       you should use this option to disable libdspam's initialization and
405       cleanup of libmysqlclient, and allow the application to manage this.
406       This option suppresses libdspam's calls to mysql_server_init and
407       mysql_server_end.
408
409       Note:
410
411       Please see the file doc/mysql_drv.txt for more information
412       about configuring the mysql_drv storage driver.
413
414     pgsql_drv:
415       --with-pgsql-includes=DIR
416       Specify a path to the PgSQL includes
417
418       --with-pgsql-libraries=DIR
419       Specify a path to the PgSQL libraries
420       (Currently links to -lpq, and netlibs on some systems)
421
422       --enable-virtual-users
423       Tells DSPAM to create virtual user ids.  Use this if your users don't
424       actually exist on the system (e.g. in /etc/passwd if using a password
425       file)
426
427       --enable-preferences-extension
428       Postgres supports the preferences extension, which stores user
429       preferences in pgsql instead of flat files (the built-in method)
430
431       Note:
432
433       Please see the file doc/pgsql_drv.txt for more information about
434       configuring the pgsql_drv storage driver.
435
436     sqlite_drv:
437     sqlite3_drv:
438       --with-sqlite-includes=DIR
439       Specify a path to the SQLite includes
440
441       --with-sqlite-libraries=DIR
442       Specify a path to the SQLite libraries
443
444   DEBUGGING SWITCHES
445
446     --enable-debug
447     Turns on support for debugging output. This option allows you to turn on
448     debugging messages for all or some users by editing dspam.conf or setting
449     --debug on the commandline. Enabling debug in configure only adds support
450     for debug to be compiled in, it must still be activated using one of the
451     options prescribed above. Debugging support itself doesn't use up very
452     many additional resources, so it should be safe to leave enabled on
453     non-enterprise class systems.
454
455     --enable-verbose-debug
456     Turns on extremely verbose debugging output. --enable-debug is implied.
457     Never use this on production builds!
458
459     Note:
460
461     When verbose debug is compiled in, DSPAM performs many additional
462     mathematical calculations regardless of whether or not it's been
463     activated. You shouldn't use --enable-verbose-debug for production
464     builds unless you have serious issues you can't resolve.
465
466   FEATURE ACTIVATION
467
468     --enable-clamav
469     Enables support for Clam Antivirus. DSPAM can interface directly with
470     clamd to perform virus scanning and can be configured to react in
471     different ways to viruses. See dspam.conf for more information.
472
473   ADDITIONAL CONFIGURATION OPTIONS
474
475     The remainder of configuration options are located in dspam.conf, which
476     is installed in sysconfdir (default: /usr/local/etc) upon a make install.
477     It is generally a good idea to review dspam.conf and make any changes
478     necessary prior to using DSPAM.
479
4802. BUILDING AND INSTALLING
481
482   After you have run configure with the correct options, build and install
483   DSPAM by performing:
484
485   make && make install
486
487   Note:
488
489     If you are a developer wanting to link to the core engine of dspam,
490     libdspam will be built during this process.  Please see the
491     example.c file for examples of how to link to and use libdspam. Static
492     and dynamic libraries are built in the .libs directory. Needed headers
493     will be installed in $prefix$/include/dspam.
494
4953. PERMISSIONS
496
497   In the typical UNIX environment, you'll need to worry about the following
498   permissions:
499
500   The CGI User: This is the user your web server (most likely Apache) is
501     running as. This is commonly 'nobody' or 'web'. You can find this in
502     Apache's httpd.conf by searching for 'User'. The CGI user will need
503     the ability to access the following components of DSPAM:
504       - Ability to execute the dspam binary
505       - Ability to read and write to dspam_home/data/
506       - Trusted user permissions in dspam.conf ("Trust [username]")
507       - The execution 'Group' used must match the group dspam is running as
508         (this is typically 'mail', 'dspam', or similar)
509
510   The MTA User: This is the user your mail server software is running as when
511     it executes DSPAM. This is usually daemon, mail, exim, etc. This is
512     typically different from the user the MTA runs and polices itself as, to
513     avoid security problems. Consult your MTA's documentation for more info.
514     The MTA user will require:
515       - The ability to execute the dspam binary
516       - Trusted user permissions in dspam.conf ("Trust [username]")
517
518   Systems Administrators: In order to perform administrative functions,
519     systems administratiors will require:
520       - The ability to execute dspam-related binaries
521       - Trusted user permissions in dspam.conf ("Trust [username]")
522
523   Note:
524
525     If the MTA is communicating with DSPAM via LMTP (explained later), then
526     execution permissions are not necessary
527
528   Note about FreeBSD:
529
530     FreeBSD's default MTA user is 'mailnull'
531     FreeBSD's default delivery agent also changes its uid, and so in order
532     to call it, dspam must be installed as setuid root to work on the
533     commandline properly. This is done automatically on install.
534
535
536   Understanding Trusted User Security
537
538   DSPAM has tighter security for untrusted users on the system to prevent
539   them from touching other user's data or passing arbitrary commands to the
540   delivery agent DSPAM calls. "Trusted User Security" is a simple system
541   whereby any unsafe functions are not available to a user calling dspam
542   unless they are within dspam.conf's trusted user list.
543
544   Local non-privileged users should be able to use DSPAM without any problems
545   while remaining untrusted, as long as they behave. For example, an untrusted
546   user cannot set their DSPAM username to any name other than their username.
547   Untrusted users are also limited to the delivery options set by the
548   system administrator, and cannot redirect how DSPAM delivers mail.
549
550   A list of trusted users is maintained in dspam.conf. This file should
551   include a list of trusted users who should be allowed to set the dspam user,
552   passthru parameters, and other information that would be potentially
553   dangerous for a malicious user to be able to set.  You'll need to ensure
554   that your CGI user, MTA user, and system administrators are on the list.
555
5564. MAIL SERVER INTEGRATION
557
558   As previously mentioned, there are three popular ways to implement DSPAM:
559
560   As a delivery proxy:
561     The default approach integrates DSPAM directly with the mail server and
562     filters spam as mail comes in. Please see the appropriate instructions
563     in doc/ pertaining to your MTA.
564
565   As a POP3 proxy:
566     This alternative approach implements a POP3 proxy where users
567     connect to the proxy to check their email, and email is filtered when
568     being downloaded.  The POP3 proxy is a much easier approach, as it
569     requires much less integration work with the mail server (and is ideal
570     for implementing DSPAM on Exchange, etcetera). Please see the file
571     doc/pop3filter.txt.
572
573   As an SMTP Relay:
574     DSPAM can be configured as an SMTP relay, a.k.a appliance. You
575     can set it up to sit in front of your real mail server and then point
576     your MX records at it. DSPAM will then pass along the good mail to
577     your real SMTP server. See doc/relay.txt for more information. The
578     example provided uses Postfix and MySQL.
579
580   Trusted users and the MTA
581
582   If you are using an MTA that changes its userid to match the destination
583   user before calling DSPAM, you won't be able to provide pass-thru
584   arguments to DSPAM (these are the commandline arguments that DSPAM in turn
585   passed to the local delivery agent, in such a configuration).
586   You will need to pre-configure the "default" pass-thru arguments in DSPAM.
587   This can be done by declaring an untrusted delivery agent in dspam.conf.
588   When DSPAM is called by an untrusted user, it will automatically force their
589   DSPAM user id and passthru delivery agent arguments specified in dspam.conf.
590
591   This information will override any passthru commandline parameters
592   specified by the user. For example:
593
594   UntrustedDeliveryAgent       "/bin/mail -d $u"
595
596   The variable $u informs DSPAM that you would like the destination username
597   to be used in the position $u is specified, so when DSPAM calls your LDA
598   for user 'bob', it will call it with:
599
600   /bin/mail -d bob
601
6025. ALIASES
603
604   There are essentially two different ways a user might train DSPAM. The first
605   is by using the Web UI, which allows them to retrain via the "History"
606   tab. This works quite well, as users must visit the Web UI occasionally
607   to review their quarantine anyway (and reverse any false positives). We'll
608   discuss this shortly in section 1.1.8.
609
610   The more common approach to training, discussed here, is to allow users to
611   simply forward their spam to an email address where DSPAM can analyze and
612   learn it. DSPAM uses a signature-based system, where a serial number of
613   sorts is appended to each email processed by DSPAM. DSPAM reads this serial
614   number when the user forwards (or bounced) a message to what is called their
615   "spam email address". The serial number points to temporary information
616   stored on the server (for 14 days by default) containing all of the
617   information necessary for DSPAM to relearn the message. This is necessary
618   in order to relearn the *exact* message DSPAM originally processed.
619
620   Note:
621
622     If you are using an IMAP based system, Web-based email, or other form of
623     email management where the original messages are stored on the server in
624     pristine format, you can turn this signature feature off by setting
625     "TrainPristine on" in dspam.conf. DSPAM will then use the message itself
626     that you provide it to train, which MUST be identical to the original
627     message in order to retrain properly.
628
629   Because DSPAM learns each user's specific email behavior, it's necessary
630   to identify the user in order to program their specific filtering database.
631   This can be done in one of three ways:
632
633   The Simple Way:
634
635     If you are using the MySQL or PgSQL storage drivers, the original
636     numeric user id can be embedded in the signature, requiring only one
637     central spam alias to be necessary for the entire system. To configure
638     this, uncomment the appropriate UIDInSignature option in dspam.conf:
639
640     # MySQLUIDInSignature    on
641     # PgSQLUIDInSignature    on
642
643     Now all you'll need is a single system-wide alias, and DSPAM will train
644     the appropriate user when it sees the signature. An example of an alias
645     might look like:
646
647     spam:"|/usr/local/bin/dspam --user root --class=spam --source=error"
648
649     Similarly, you may also wish to have a false-positive alias for users who
650     prefer to tag spam rather than quarantine it:
651
652     notspam:"|/usr/local/bin/dspam --user root --class=innocent --source=error"
653
654     Note:
655
656     The 'root' user represents any active dspam user. It is necessary to
657     supply a username on the commandline or DSPAM will bail on
658     an error, however the user will be changed internally once the signature
659     is read.
660
661   The Kind-of-Simple Way:
662
663     If you're not using one of the above storage drivers, the next easiest
664     way to configure aliases is to have DSPAM parse the 'To:' header of the
665     message and use a catch-all subdomain to direct all mail into DSPAM for
666     retraining. You can then instruct your users to email addresses like
667     'spam-bob@relearn.example.org'. The ParseToHeaders option (available
668     in dspam.conf) will parse the To: header of forwarded messages and
669     set the username to either 'bob' or 'bob@relearn.example.org', depending
670     on how it is configured. DSPAM can also set the training mode to either
671     "learn spam" or "learn notspam" depending on whether the user specified
672     a spam- or notspam- address in the To: header.
673
674     This is ideal if you don't want to set up a separate alias for each user
675     on your system (The Hard Way). If you're fortunate enough to have a
676     mail server that can perform regular expression matching, you can set up
677     your system without a subdomain, and just use addresses like
678     spam-bob@example.org. For the rest of us, it will be necessary to set up
679     a subdomain catch-all directly into DSPAM. For example:
680
681     @relearn.example.org	"|/usr/local/bin/dspam"
682
683     Don't forget to set the appropriate ParseToHeaders and related options in
684     dspam.conf as well. More specific instructions can be found in dspam.conf
685     itself. In most cases, the following will suffice:
686
687     ParseToHeaders on
688     ChangeUserOnParse user
689     ChangeModeOnParse on
690
691   The Old Way (A.K.A. The Hard Way)
692
693     If neither of the easy ways are possible, you're stuck with doing it
694     the hard way. This means you'll need a separate spam alias (and notspam
695     alias, if users are tagging mail) for each user. To do this, you will
696     need to create an email address for each user, so that DSPAM can
697     analyze and learn for that specific user.  For example:
698
699     spam-bob: "|/usr/local/bin/dspam --user bob --class=spam --source=error"
700
701     You will end up having one alias per mail user on the system, two if you
702     do not use DSPAM's CGI quarantine (an additional one using notspam-). Be
703     sure the aliases are unique and each username matches the name after the
704     --user flag.  A tool has been provided called dspam_genaliases.  This tool
705     will read the /etc/passwd file and write out a dspam aliases file that can
706     be included in your master aliases table.
707
708     To report spam, the user should be instructed to forward each spam to
709     spam-user@yourhost
710
711     It doesn't really matter what you name these aliases, so long as the flags
712     being passed to dspam are correct for each user.  It might be a good idea
713     to create an alias custom to your network, so that spammers don't forward
714     spam into it.  For example, notspam-yourcompany-bob or something.
715
716   Note About Security:
717
718     You might be wondering if a user can forward a spam to another user's
719     address, or whether a spammer can forward a spam to another user's
720     notspam address. The answer is "no". The key to all mail-based retraining
721     is the signature embedded in each email. The signature is stored with
722     each user's own user id, and so not only does the incoming message have
723     to bear a valid signature, but it also has to be stored on the system with
724     the correct user id. This prevents any kind of alias abuse.
725
7266. NIGHTLY MAINTENANCE AND HOUSEKEEPING CRONS
727
728   Non-SQL Based Nightly Purge
729
730     If you are NOT running a SQL-based solution, then you should configure
731     dspam_clean to run under cron nightly. This clean tool will read all
732     signature databases and purge signatures that are older than 14 days
733     (configurable), purge abandoned tokens, and remove unimportant tokens.
734     Without this tool, old signatures will continue to pile up.
735     Be sure the user running cleanup has full read/write permissions on the
736     DSPAM data files.
737
738     0 0 * * * /usr/local/bin/dspam_clean [options]
739
740     See the dspam_clean description for more information
741
742   SQL-Based Nightly Purge
743
744     SQL-Based solutions include a nightly SQL script to perform the same basic
745     tasks as dspam_clean, and it does it much faster and with more finesse.
746     You can find instructions about each driver's purge functions in
747     the driver's README (doc/[driver].txt) for performing nightly
748     maintenance. Most SQL drivers will include a purge script in the
749     src/tools.[driver] directory. For example:
750
751     0 0 * * * mysql --user=[user] --pass=[pass] [db] < /path/to/purge-4.1.sql
752
753   Log Rotation
754
755     The system log and user logs can fill up fairly quickly, when all that's
756     really needed to generate graphs are the last two to three weeks of data.
757     You can configure a nightly log cleanup using dspam_logrotate:
758
759     0 0 * * * dspam_logrotate -a 30 -d /usr/local/var/dspam/data
760
7617. NOTIFICATIONS
762
763   DSPAM is capable of sending three different notifications to users:
764
765     - A "First Run" message sent to each user when they receive their first
766       message through DSPAM.
767
768     - A "First Spam" message sent to each user when they receive their first
769       spam
770
771     - A "Quarantine Full" message sent to each user when their quarantine box
772       is > 2MB in size (note: the 2MB limit is hardcoded in DSPAM).
773
774   These notifications can be activated by copying the txt/ directory from the
775   distribution into DSPAM's home (by default /usr/local/var/dspam). You can
776   alter the location of this directory by setting "TxtDirectory" in dspam.conf.
777
778   Example:
779   /usr/local/var/dspam/txt/firstrun.txt
780   /usr/local/var/dspam/txt/firstspam.txt
781   /usr/local/var/dspam/txt/quarantinefull.txt
782
783   You will want to modify these templates prior to installing them to reflect the
784   correct email addresses and URLs (look for 'example.org').
785
786   NOTE: The quarantine warning is reset when the user clicks 'Delete All', but
787   is not reset if they use "Delete Selected".  If the user doesn't wish to
788   receive reminders, they should use the "Delete Selected" function instead
789   of "Delete All".
790
791   You'll need to also set "Notifications" to "on" in dspam.conf.
792
7938. THE WEB UI
794
795   The Web UI (CGI client) can be run from any executable location on
796   a web server, and detects its user's identity from the REMOTE_USER
797   environment variable. This means you'll need to use HTTP password
798   authentication to access the CGI (Any type of authentication will work,
799   so long as Apache supports the module). This is also convenient in that you
800   can set up authentication using almost any existing system you have.
801   The only catch is that you'll need the usernames to match the actual
802   DSPAM usernames used the system. A copy of the shadow password file
803   will suffice for most common installs.
804
805   The accompanying files in the webui/ folder should be copied into your
806   document root and cgi-bin, as specified.
807
808     Note:
809
810     Some authentication mechanisms are case insensitive and will
811     authenticate the user regardless of the case they type it in.  DSPAM,
812     on the other hand, is case sensitive and the case of the username used
813     will need to match the case on the system.  If you suffer from this
814     authentication problem, and are certain all of your users' usernames are
815     in lowercase, you can add the following line of code to the CGI right
816     after the call to &ReadParse...
817
818     $ENV{'REMOTE_USER'} = lc($ENV{'REMOTE_USER'});
819
820   The CGI will need to function in the same group as the dspam agent in order
821   to work with the files in dspam_home.  The best way to do this is to create
822   a separate virtualhost specifically for the CGI and assign it to run in the
823   MTA group using Apache's suexec. If you are using procmail, additional
824   configuration may also be necessary (see below).
825
826   Note:
827
828     Apache users do NOT take on the identity of the groups specified in
829     /etc/group so you will need to specifically assign the group in
830     httpd.conf.
831
832   Note about Procmail:
833
834      Because the DSPAM Web UI is a CGI script, DSPAM will not retain its
835      setuid privileges when called. If you are running procmail, this will
836      become a problem as procmail requires root privileges to deliver. The
837      easiest hack around this is to create a procmail.dspam binary and make it
838      setuid root, then make it executable only by the mail group (or
839      whatever group DSPAM and the CGI run in).
840
841   The DSPAM Web UI has a minimal configuration inside the configure.pl script.
842   You'll want to check and make sure all of the settings are correct. In
843   most cases, the only that will be necessary to change are the large-scale
844   or domain-scale flags.
845
846   BEFORE PROCEEDING:
847     Check and make sure (Again) that the CGI user from Apache's httpd.conf is
848     added as a trusted user in dspam.conf.
849
850   Default Preferences
851
852   Now would be a good time to set the system's default preferences. This can
853   be done using the dspam_admin tool.  For example:
854
855     dspam_admin ch pref default trainingMode TEFT
856     dspam_admin ch pref default spamAction quarantine
857     dspam_admin ch pref default spamSubject "[SPAM]"
858     dspam_admin ch pref default enableWhitelist on
859     dspam_admin ch pref showFactors off
860
861   The default preferences are used for any users who have not yet set their
862   own preferences. You can also control which preferences the user may
863   override by changing the "AllowOverride" settings in dspam.conf.
864
865   By default, the parameters specified on the commandline will be used (if
866   any). If, however, a preference is found for the particular user those
867   preferences will override the commandline.
868
869   GD Graphing Library
870
871   If you plan on leaving DSPAM's logging function enabled, and would like to
872   produce pretty graphs for your users, the graph.cgi script requires the
873   following be installed on your machine:
874
875   - GD Graphics Library (http://www.boutell.com/gd/)
876     Compile with png support
877
878   - The following PERL modules:
879     (http://www.perl.com/CPAN/modules/by-module/GD/)
880
881     . GD
882     . GD-Graph3d
883     . GDGraph
884     . GDTextUtil
885     . CGI
886
887     Typically this can be accomplished on the commandline:
888
889     perl -MCPAN -e 'install GD::Graph3d'
890
891  Configuring Administrators
892
893  Once you've configured the Web UI, you'll want to edit the 'admins' file to
894  contain a list of users who are permitted to use the administration suite.
895
896  Configuring Sub-Administrators / Domain Level Administrators
897
898  It is possible to delegate the management of users to a list of sub-admins/
899  domain level admins. To accomplish that you should edit the 'subadmins'
900  file to contain a list of sub-admins/domain level admins which are permitted
901  to switch their username while using the DSPAM control center.
902
903  Opt-In/Out
904
905  If you would like your users to be able to opt in/out of DSPAM filtering,
906  add the correct option to the nav_preferences.html template, depending on
907  your configuration (for example, if you have an opt-in system, you'll want to
908  add the opt-in option). Note: This currently only works with the preferences
909  extension, and not drop files.
910
911<INPUT TYPE=CHECKBOX NAME=optIn $C_OPTIN$>
912Opt into DSPAM filtering
913
914<INPUT TYPE=CHECKBOX NAME=optOut $C_OPTOUT$>
915Opt out of DSPAM filtering
916
9171.2 TESTING
918
919  If you've installed from an RPM, there's a good chance that the packager
920  went to the trouble of testing already. If you're building from sources,
921  however, you'll need to find a way to ensure your configuration isn't broken.
922
923  Most software packages are supplied with a test suite to determine if the
924  software is functioning properly.  Since DSPAM's correct function relies
925  primarily on having the correct permissions and mail server configuration,
926  a test script fails to provide the level of testing required for such a
927  package.  The following exercise has been provided to test dspam's correct
928  functioning on your system. This exercise does not test the Web UI, but only
929  the core dspam agent.
930
931  Before running the test, you should have completed section 1.1's instructions
932  for compiling and installing dspam as well as configured your mail server
933  to support dspam.
934
935  1. Create a new user account on your system.  It is important that this be a
936  new account to prevent any unrelated email from being delivered during
937  testing.  Be sure to configure a spam alias for the test account.
938
939  2. Send a short (10 words or less) email to the account, and pick it up
940  using your favorite mail client.
941
942  3. Run dspam_stats [username] on the server.  You should see a value of 1
943  for "TI" or "Total Innocent" as shown below:
944
945  dspam-test            0 TP       1 TN       0 FN       0 FP
946
947  If you receive an error such as "unable to open /usr/local/var/dspam... for
948  reading", then the dspam agent is not configured correctly.  The problem
949  could exist in either your mail server configuration or one or more of the
950  permissions on the directory or agent.  Check your configuration and
951  permissions, and repeat this step until the correct results are experienced.
952
953  4. Run dspam_dump [username] to get a complete list of tokens and their
954  statistics.  Each token should have an I: (innocent) hit count of 1. The
955  tokens will be represented as 64-bit values, for example:
956
9573126549390380922317              S:    0  I:    1  LH: Mon Aug  4 11:40:12 2003
95813884833415944681423             S:    0  I:    1  LH: Mon Aug  4 11:40:12 2003
95914519792632472852948             S:    0  I:    1  LH: Mon Aug  4 11:40:12 2003
9608851970219880318167              S:    0  I:    1  LH: Mon Aug  4 11:40:12 2003
961
962  To view statistics for a particular token, run dspam_dump [username] [token]
963  where token is the plain-text token value.  For example:
964
965  % dspam_dump bill FREE
966  7717766825815048192  S: 00265  I: 00068  P: 0.7358
967
968  5. Forward the test message to the spam alias you've created for the test
969  account.  Provide enough time for the message to have processed.
970
971  6. Run dspam_stats [username] on the server again.  Now, the value for TN
972  should be zero and the value for FN (false negatives) should be 1 as shown
973  below:
974
975dspam-test            0 TP       0 TN       1 FN       0 FP
976
977  If this is not the case, check the group permissions of the dspam agent as
978  well as the permissions your MTA uses when piping to aliases.
979
980  7. Run dspam_dump [username] again.  make sure that _EVERY_ token now has an
981  I: of zero and a S: of 1:
982
9833126549390380922317              S:    1  I:    0  LH: Mon Aug  4 11:44:29 2003
98413884833415944681423             S:    1  I:    0  LH: Mon Aug  4 11:44:29 2003
98514519792632472852948             S:    1  I:    0  LH: Mon Aug  4 11:44:29 2003
9868851970219880318167              S:    1  I:    0  LH: Mon Aug  4 11:44:29 2003
987
988  If you have some tokens that do not have an S: of 1 or an I: of 0, the dspam
989  signature was not found on the email, and this could be due to a lot of
990  things.
991
9921.3 TROUBLESHOOTING
993
994    Problem: No files are being created in the user directory
995   Solution: Check the directory permissions of the directory.  The user
996             directory must be writable by the user the dspam agent is running
997             as as well as the CGI user.
998
999    Problem: False positives are never being delivered
1000   Solution: Your CGI most likely doesn't have the privileges required by
1001             the LDA to deliver the messages.  Make sure the CGI user is in
1002             the correct group.  Also consider setting the dspam agent to
1003             setuid or setgid with the correct permissions.
1004
1005    Problem: My database is getting huge!
1006   Solution: DSPAM's default training mode is TEFT. On top of this, the
1007             purging defaults are very lax. You might consider switching to
1008             TOE (Train-on-Error) mode training if you require a minimal
1009             database. If you are willing to sacrifice accuracy for disk space,
1010             disabling the 'chain' tokenizer from dspam.conf will prevent
1011             the use of multi-word (chained) tokens, which will also cut your
1012             database size considerably. You may also consider more frequent
1013             calls to dspam_clean -p to purge neutral data, which comprises a
1014             majority of most databases.
1015
1016  For more help, please see the DSPAM FAQ at http://dspam.sourceforge.net.
1017
10181.4 DSPAM TOOLS
1019
1020  A few useful tools have been provided to make DSPAM management a bit easier.
1021  These tools include:
1022
1023  dspam_admin - A tool used to perform specific administrative functions. These
1024    functions are usually included as part of an extensions package (such as
1025    the preferences extension). Available functions are listed in the tool's
1026    usage output.
1027
1028  dspam_train - Used to train and test a corpus of ham and spam (in maildir
1029    format).
1030    Syntax: dspam_train [username] [spam_dir] [nonspam_dir]
1031    where username is the username of the user to apply the training to, and
1032    the two dirs represent directories containing messages in individual
1033    files (e.g. maildir/corpus format). dspam_train can be used on an existing
1034    user's database, to further improve accuracy, or to train from scratch.
1035    it also provides a solid test jig for testing the efficiency and accuracy
1036    of a test corpus against the filter.
1037    NOTE: dspam_train will automatically balance training of the corpus to
1038          ensure both spam and nonspam are trained based on the ratio of
1039          spam/nonspam. this means if you have twice as much spam as nonspam,
1040          two spam will be trained for every nonspam.
1041
1042  dspam_dump - Dumps a DSPAM dictionary. This can be used to view the
1043    entire contents of a user's dictionary, or used in combination
1044    with grep to view a subset of data.  Syntax: dspam_dump [username] [token]
1045    where username is the DSPAM user's username.  If a token is specified,
1046    statistics only for that token will be printed.
1047
1048  dspam_clean - Performs nightly housecleaning by deleting old or useless
1049    data from user data.  If using the hash driver (hash_drv) please use
1050    cssclean instead (see doc/README.cssclean)
1051
1052    dspam_clean performs the following operations:
1053
1054    1. Using the -s flag, dspam_clean will continue to perform stale signature
1055     purging.  If an age is specified, for example -s14, the age defined as the
1056     default will be overridden.  Specifying an age of 0 will delete all
1057     signatures for the users processed.
1058
1059    2. Using the -p flag, dspam_clean will delete all tokens from a user's
1060     database whose probability is between 0.35 and 0.65 (fairly neutral,
1061     useless tokens) that fall beyond the default age.  If an age is specified,
1062     for example -p30, the age defined as the default will be overridden.  It
1063     is a good idea to use this type of clean with an age of 0 on users after
1064     a lot of corpus training.
1065
1066    3. Using the -u flag, dspam_clean will delete all unused tokens from a
1067     user's database.  There are four different types of unused tokens:
1068
1069     - Tokens which have not been used for a long time
1070     - Tokens which have a total hit count below 5
1071     - Tokens which have only one spam hit
1072     - Tokens which have only one innocent hit
1073
1074   Ages may be overridden by specifying a format such as -u30,15,10,10
1075   where each number represents the respective age.  Specifying an age of
1076   zero will delete all unused tokens in the category. Defaults are set in
1077   dspam.conf.
1078
1079   Optionally, usernames may be specified to override the default behavior of
1080   processing all users.
1081
1082   Examples:
1083
1084   Process all users on the system using all clean operations:
1085     dspam_clean -s -p15 -u90,30,15,15
1086
1087   Delete all of user 'dick' and 'jane's signatures:
1088     dspam_clean -s0 dick jane
1089
1090   Perform a post-corpus training clean on user 'spot':
1091     dspam_clean -p0 -u0,0,0,0 spot
1092
1093   Run dspam_clean with all default options, all clean modes enabled, on all
1094   users on the system:
1095     dspam_clean -s -p -u
1096
1097  NOTE: You may wish to only run certain cleaning modes depending on the type
1098  of storage driver you are using.  For example, the MySQL storage driver
1099  includes a script which performs signature and unused token operations,
1100  leaving only probability operations as useful.  If you are using a SQL-based
1101  storage driver, it is strongly recommended that you use the maintenance
1102  scripts wherever possible for optimum efficiency.
1103
1104  dspam_stats - Displays the spam statistics for one or all users on the system.
1105    Syntax: dspam_stats [username].  If no username is provided, all users
1106    will be displayed.  Displays TP (true positives), TN (true negatives),
1107    FN (false negatives), and FP (false positives).
1108
1109  dspam_genaliases - Reads the /etc/passwd file and outputs a dspam aliases
1110    table which can be included in the master aliases table.  You may try
1111    Art Sackett's generate_dspam_aliases tool at
1112    http://www.artsackett.com/freebies/generate_dspam_aliases/ if you need
1113    some better functionality.  This will eventually be merged in as a
1114    replacement for the existing tool.
1115
1116  dspam_merge - Merges multiple users' dictionaries together into one user's
1117    dictionary (does not affect the merge users).  This can be used to create
1118    a seeded dictionary for a new user, or to copy a single user's dictionary
1119    to a new file.  This is great for building global dictionaries, but
1120    crunches a lot of time and disk.
1121
11221.5 AGENT COMMANDLINE ARGUMENTS
1123
1124  The DSPAM agent (dspam) recognizes the following commandline arguments:
1125
1126  --user [user1 user2 ... userN]
1127  Specifies the destination user(s) of the incoming message.  DSPAM then
1128  processes the message once for each user individually.  If the message is to
1129  be delivered, the $u (or %u) parameters of the arguments string will be
1130  interpolated for the current user being processed.
1131
1132  --class=[spam|innocent]
1133  Tells DSPAM that the message being presented has already been classified by
1134  the user.  This flag should be used when a misclassification has occurred,
1135  when the user is corpus-feeding a message, or an inoculation is being
1136  presented.  This flag must be used in conjunction with the --source flag.
1137  Providing no classification invokes the SOP of DSPAM, which is to determine
1138  the message's nature on its own.
1139
1140  --source=[error|corpus|inoculation]
1141  Wherever --class is used, the source of the user-provided
1142  classification must also be provided.  The source is very important and
1143  dramatically affects DSPAM's training behavior:
1144
1145    error: The message being presented was a message previously misclassified
1146           by DSPAM.  When 'error' is provided as a source, DSPAM requires that
1147           the DSPAM signature be present in the message, and will use the
1148           signature to recall the original training metadata.  If the signature
1149           is not present, the message will be rejected.  In this source mode,
1150           DSPAM will also decrement each token's previous classification's
1151           count as well as the user totals.
1152
1153           You should use error only when DSPAM has made an error in
1154           classifying the message, and should present the modified version of
1155           the message with the DSPAM signature when doing so.
1156
1157   corpus: The message being presented is from a mail corpus, and should be
1158           trained as a new message, rather than re-trained based on a
1159           signature.  The message's full headers and body will be analyzed and
1160           the correct classification will be incremented, without its
1161           opposite being decremented.
1162
1163           You should use corpus only when feeding messages in from corpus, not
1164           for correcting errors.
1165
1166   inoculation: The message being presented is in pristine form, and should
1167                be trained as an inoculation.  Inoculations are a more
1168                intense mode of training designed to cause DSPAM to
1169                train the user's metadata repeatedly on previously unknown
1170                tokens, in an attepmt to vaccinate the user from future
1171                messages similar to the one being presented.
1172
1173                You should use inoculation only on honeypots and the like.
1174
1175  --deliver=[spam,[innocent|nonspam],summary,stdout]
1176  Tells DSPAM to deliver the message if its result falls within the criteria
1177  specified. For example, --deliver=innocent or --deliver=nonspam will cause
1178  DSPAM to only deliver the message if its classification has been determined
1179  as innocent. Providing --deliver=innocent,spam or --deliver=nonspam,spam will
1180  cause DSPAM to deliver the message regardless of its classification. This flag
1181  provides a significant amount of flexibility for nonstandard implementations,
1182  where false positives may not be delivered but spam is, and etcetera.
1183
1184    summary : Deliver (to stdout) a summary indentical to the output of message
1185              classification:
1186                X-DSPAM-Result: User; result="Innocent"; class="Innocent";
1187                probability=0.0000; confidence=1.00;
1188                signature=4b11c532158749980119923
1189
1190    stdout : Is a shortcut for for --deliver=innocent,spam --stdout
1191
1192  --stdout
1193  If the message is indeed deemed "deliverable" by the --deliver flag, this
1194  flag will cause DSPAM to deliver the message to stdout, rather than
1195  the configured delivery agent.
1196
1197  --process
1198  Tells DSPAM to process the message.  This is the default behavior, and the
1199  flag is implied unless --classify is used - but is a good idea to use to
1200  avoid ambiguity.
1201
1202  --classify
1203  Tells DSPAM only to classify the message, and not make any writes to the
1204  user's metadata or attempt to deliver/quarantine the message.
1205
1206  NOTE: The output of the classification is specific to the user, not including
1207        the output of any groups they might be affiliated with, so it is
1208        entirely possible that the message would be caught as spam by the group,
1209        even if it didn't appear in the classification.  If you want to get
1210        the classification for the GROUP, use the group name as the user
1211        instead of an individual.
1212
1213  --signature=[signature]
1214  For some implementations, the admin may wish to pass the signature in
1215  via commandline instead of allowing DSPAM to find it on its own. This is
1216  especially useful when front-ending the agent with other tools. Using this
1217  option will set the active signature and will also forego reading of stdin.
1218
1219  --mode=[toe|tum|teft|notrain|unlearn]
1220  Configures the training mode to be used for this process:
1221
1222    teft: Train-Everything.  Trains on all messages processed.  This is
1223          a very thorough training approach and should be considered the
1224          standard training approach for most users.  TEFT may, however,
1225          prove too volatile on installations with extremely high per-user
1226          traffic, or prove not very scalable on systems with extremely large
1227          user-bases.  In the event that TEFT is proving ineffective, one of
1228          the other modes is recommended.
1229
1230          NOTE: Until a user reaches 100 innocent messages in their
1231                metadata, train-on-error will also be teft-based, even if
1232                otherwise specified on the commandline.
1233
1234     toe: Train-on-Error.  Trains only on a classification error, once the
1235          user's metadata has matured to 2500 innocent messages.  This
1236          training mode is much less resource intensive, as only occasional
1237          metadata writes are necessary.  It is also far less volatile than
1238          the TEFT mode of training.  One drawback, however, is that TOE only
1239          learns when DSPAM has made a mistake - which means the data is
1240          sometimes too static, and unable to "ease into" a different type of
1241          behavior.
1242
1243     tum: Train-until-Mature.  This training mode is a hybrid between the other
1244          two training modes and provides a great balance between volatility
1245          and static metadata.  TuM will train on a per-token basis only
1246          tokens which have had fewer than 50 "hits" on them, unless an error
1247          is being retrained in which case all tokens are trained.  This
1248          training mode provides a solid core of stable tokens to keep
1249          accuracy consistent, but also allows for dynamic adaptation to any
1250          new types of email behavior a user might be experiencing. It is a
1251          balance of resources as well, as only less-than-mature tokens are
1252          written to the database. NOTE: You should corpus train before
1253          using tum.
1254
1255 notrain: No training.  Do not train the user's data, and do not keep totals.
1256          This should only be used in cases where you want to process mail for
1257          a particular user (based on a group, for example), but don't want
1258          the user to accumulate any learning data.
1259
1260 unlearn: Unlearn original training. Use this if you wish to unlearn a
1261          previously learned message. Be sure to specify --source=error and
1262          --class to whatever the original classification the message was
1263          learned under. If not using TrainPristine, this will require the
1264          original signature from training.
1265
1266    RECOMMENDATIONS:
1267      In general, it is recommended that users begin with TEFT.  If a user
1268      is experiencing between a 75-85% spam ratio, they may benefit from
1269      Train-on-Mature mode.  If a user is experiencing over 90% spam, then
1270      Train-on-Error mode should make a noticeable improvement in accuracy.
1271      It eventually boils down to what works best for your users.  There is
1272      no reason a system could not be configured (with a script) to
1273      analyze a user's *.stats file and determine the best training mode
1274      for that user.
1275
1276  --feature=[no,wh,tb=N]
1277  Specifies the features that should be activated for this filter instance.
1278  The following features may be used individually or combined using a comma
1279  as a delimiter:
1280
1281    no:  Bayesian Noise Reduction (BNR). Bayesian Noise Reduction kicks in
1282         at 2500 innocent messages and provides an advanced progressive
1283         noise logic to reduce Bayesian Noise (wordlist attacks) in
1284         spams. BNR is not for everyone, and so users should try it out
1285         after they've trained to see if it helps improve accuracy.
1286
1287  tb=N:  Sets the training loop buffering level.
1288         Training loop buffering is the amount of statistical sedation
1289         performed to water down statistics and avoid false positives
1290         during the user's training loop. The training  buffer sets the
1291         buffer sensitivity, and should be a number between 0 (no buffering
1292         whatsoever) to 10 (heavy buffering). The default is 5, half of
1293         what previous versions of DSPAM used.
1294         To avoid dulling down statistics at all during the training loop,
1295         set this to 0. This feature should be disabled if you're not
1296         paranoid about false positives, as it does increase the number of
1297        spam misses significantly during training.
1298
1299    wh:  Automatic whitelisting.  DSPAM will keep track of the entire
1300         "From:" line for each message received per user, and automatically
1301         whitelist messages from senders with more than 10 innocent
1302             messages and zero spams.  Once the user reports a spam from the
1303             sender, automatic whitelisting will automatically be deactivated
1304             for that sender.  Since DSPAM uses the entire "From:" line, and
1305             not just the sender's email address, automatic whitelisting is
1306             a very safe approach to improving accuracy during initial training.
1307
1308   NOTE: None of the present features are necessary when the source is "error",
1309         because the original training data is used from the signature to
1310         retrain, instantiating whatever features (such as whitelisting) were
1311         active at the time of the initial classification.  Since BNR is only
1312         necessary when a message is being classified, the
1313         --feature flag can be safely omitted from error source calls.
1314
1315  --daemon
1316  Puts DSPAM in daemon mode; e.g. DSPAM acts like a server when started with
1317  this parameter. See section 2.3 for more information about daemon mode.
1318
13192.0 LINKING WITH LIBDSPAM
1320
1321  Developers are able to link to the DSPAM core engine (libdspam) to provide
1322  "drop-in" spam-filtering for their applications.  Examples of the libdspam
1323   API can be found in the example.c file included with this distribution.
1324
1325  <COMMERCIAL LICENSING>
1326
1327  IF YOUR PROJECT USES THE LIBDSPAM API, A GPL-COMPATIBLE OPEN SOURCE LICENSE
1328  IS REQUIRED IN ORDER TO REDISTRIBUTE. IF YOU ARE DEVELOPING A CLOSED-SOURCE
1329  APPLICATION OR APPLICATION THAT DOES NOT CONFORM TO GPL STANDARD, YOU MAY
1330  NOT REDISTRIBUTE ANY APPLICATIONS USING LIBDSPAM WITHOUT A COMMERCIAL
1331  LICENSE.
1332
1333  Please contact project administrators paulcockings@users.sourceforge.net
1334  or sbajic@users.sourceforge.net for information about commercial licensing.
1335
1336  </COMMERCIAL LICENSING>
1337
1338  To link to libdspam, follow the instructions for compiling and installing
1339  DSPAM. When compiled, the libdspam static and shared libraries are also
1340  built. This library contains all the functions necessary to use dspam's
1341  filtering in your application.
1342
1343  Your application will also need to link to the correct storage driver
1344  libraries. If you are using libdspam in a multithreaded application, you
1345  will need to either use a thread-safe storage driver or control access to
1346  libdspam using a mutex lock.
1347
1348  If you are using libdspam in a multithreaded environment, each thread will
1349  require its own DSPAM context. Fortunately, you can attach the same
1350  database handle to each context using dspam_attach(). See the man page for
1351  more information.
1352
1353  To build with the dspam API, you will also need the header files from
1354  the distribution.  You can copy these to /usr/include/dspam for ease of
1355  use, and then use -I/usr/include/dspam
1356
1357  Please see example.c for API examples.
1358
1359  If you are interested in linking libdspam with your project and have
1360  questions or concerns, please contact the dspam-devel@lists.sourceforge.net
1361  mailing list.
1362
13632.1 CONFIGURING GROUPS
1364
1365  Groups enable a group of users to share information.
1366
1367  To create groups, you'll want to create a group configuration file. The location
1368  of this file is defined as GroupConfig in dspam.conf, and defaults to
1369  /usr/local/var/dspam/group. The format of the file is:
1370
1371    group1:type:user1,user2,user3
1372    group2:type:*globaluser
1373
1374  DSPAM will read this file upon startup and determine if the user fits into
1375  any particular group.
1376
1377  DSPAM supports the following group types:
1378
1379  SHARED
1380  Enables users with similar email behavior to share the same dictionary
1381  while still maintaining a private quarantine box.  The benefits of this
1382  type of group are faster learning, and sharing a single spam alias.  Shared
1383  groups can have both positive and negative effects on accuracy.  If a shared
1384  group consists of users with similar, predictable email behavior, the users
1385  in the group can benefit from a larger dictionary of spam and faster
1386  learning (especially for newcomers in the group).  If a group consists of
1387  users with different email behavior, however, the users in the group will
1388  experience poor spam filtering and a higher number of false positives.
1389
1390  NOTE: The SQL-based storage drivers support shared groups, but has one caveat:
1391        If you are NOT enabling "virtual users" support, you will need to create
1392        an actual user on your system named after each group you create.
1393
1394  On top of shared group support, a shared group can also be made to be
1395  'managed'.  Using the group type 'SHARED,MANAGED' will cause the group to
1396  share a single quarantine mailbox which could be managed by the group's
1397  administrator (aka: the group name).  This would enable one individual to
1398  monitor quarantine for the entire group, however personal emails marked as
1399  false positives could potentially be viewed as well.  For this reason,
1400  managed groups should only be used when this is not an issue.
1401
1402  NOTE: Use the dspam_stats tool to keep an eye on the effectiveness of
1403        shared groups. If a shared group experiences poor performance, find
1404        the users whose email behavior is inconsistent with that of the group
1405        and remove them from the group.
1406
1407  The format for a shared or shared,managed group is:
1408
1409    group1:shared:user1,user2,userN
1410    group2:shared,managed:user1,user2,userN
1411    group3:shared:*@example.org
1412    group4:shared:*
1413
1414  The group name (in the example above 'group1', 'group2', 'group3', 'group4')
1415  can be anything you like. If you set the shared group to be managed then the
1416  groupname (in the example above 'group2') will be used by DSPAM as the shared
1417  group administrator.
1418
1419  The user/member list for shared group allows the following syntax:
1420    user1         : Exact match of user with the name "user1"
1421    *             : Match any user
1422    *@example.org : Match any user having '@example.org' at the end of ther
1423                    username. The matching only works for the '@' character.
1424                    You can not use something like '*user' to include user
1425                    'infouser', 'testuser', 'dummyuser', etc.
1426
1427  INOCULATION
1428  An inoculation group allows users to maintain their own private dictionaries
1429  with their own spam alias, but all members of the group will inoculate other
1430  members with spams they manually forward into their alias. This allows users
1431  to report spams to one another and maintain their own private dictionary.
1432  Another advantage to this is that users do not necessarily have to share the
1433  same email behavior.
1434
1435    VERSATILE LANGUAGE INOCULATION MESSAGES
1436
1437    A new Internet-Draft has been released to the public:
1438
1439      http://tools.ietf.org/html/draft-spamfilt-inoculation-01
1440      http://tools.ietf.org/html/draft-yerazunis-spamfilt-inoculation-03
1441
1442    To create a message format standard for sending inoculation data via email.
1443    This will allow users on different servers, and even using different
1444    anti-spam tools to share inoculation information with one-another.
1445
1446    DSPAM presently implements support for this message standard with the
1447    following limitations:
1448
1449    - Only inbound inoculation messages are supported.  DSPAM does not yet send
1450      out inoculations using this message format.  This should not be confused
1451      with local inoculation, which *is* supported.
1452
1453    - The message/inoculation format is the only inoculation type presently
1454      supported.  text/inoculation and multipart/inoculation coming soon.
1455
1456    - The only supported authentication mechanism is presently md5 verification
1457      codes/checksums.
1458
1459    Any unsupported inoculations will simply be dropped.
1460
1461    A list of identifies and authentication information can be set up in the file
1462    [username].inoc or in the user's home directory in a .inoc file if
1463    homedir-dotfiles is enabled.  The format of this file is:
1464
1465    sender1:shared secret
1466    sender2:shared secret
1467
1468    Each sender should specify the correct sender id when sending an
1469    inoculation, and should generate their checksum based on the shared secret
1470    established between both parties.
1471
1472  NOTE: Users should only be added to an inoculation group after their initial
1473        learning period, to avoid potential false positives due to lack of data.
1474
1475  The format for a innoculation group is:
1476
1477    group1:inoculation:user1,user2,userN
1478    group2:inoculation:user3,user4,userN
1479
1480  The group name (in the example above 'group1', 'group2') can be anything you
1481  like. It is not used by DSPAM and does even not have to be unique.
1482
1483  The user/member list for inoculation group allows the following syntax:
1484    user1         : Exact match of user with the name "user1"
1485
1486  CLASSIFICATION
1487  Classification groups allow a group of users to network their results
1488  together. If DSPAM is uncertain of whether a message is spam or nonspam for
1489  a group member, all other members of the group are queried. If another member
1490  believes the message to be spam, it will be marked as spam. DSPAM is querying
1491  the members one by one and stopps as soon as a member reports believes that
1492  the message is spam.
1493
1494  The format for a classification group is:
1495
1496    group1:classification:user1,user2,userN
1497    group2:classification:user3,user4,userN
1498
1499  The group name (in the example above 'group1', 'group2') can be anything you
1500  like. It is not used by DSPAM and does even not have to be unique.
1501
1502  The user/member list for inoculation group allows the following syntax:
1503    user1         : Exact match of user with the name "user1"
1504
1505  GLOBAL
1506  Global groups allows DSPAM to provide a "SpamAssassin type out-of-the-box
1507  filtering" for all new users until they have built their own useful
1508  dictionaries. A global group can be created by adding a CLASSIFICATION
1509  group definition (see above) but prefix the group member/user with a '*'.
1510
1511  The format for a global classification group is:
1512
1513    groupname:classification:*globaluser
1514
1515  This will automatically add user globaluser as a classification peer to all
1516  users. Any user who has less than 1000 innocent messages or 250 spam messages
1517  in their corpus, or whose filter is uncertain (confidence less than 0.65)
1518  about a particular message will consult the globaluser dictionary for an
1519  answer.
1520
1521  The Global group user (in this case 'globaluser') will need to be trained
1522  using corpus, by using the dspam_merge tool, or other means. The Global
1523  group user (in this case 'globaluser') is treated just as any other user on
1524  the system.
1525
1526  The group name (in the example above 'groupname') can be anything you like. It
1527  is not used by DSPAM and does even not have to be unique.
1528
1529    NOTE: Be sure and set your global user's preferences so that trainingMode
1530          is set to TOE. This will prevent the purge tools you use from
1531          purging them empty in 90 days.
1532
1533  MERGED
1534  Merged groups are similar to global groups in that the entire system uses a
1535  single global user as a parent. What's different is that the merged group is
1536  merged with the individual user's training data at run-time, instead of
1537  switching between the two. This allows the merged group to be treated like a
1538  base dataset for all users, and provides for quicker learning and correction
1539  than the previous approach. It is recommended merged groups are only used with
1540  TOE-mode training so that only corrective data is stored, but systems with
1541  ample amounts of disk may wish to run in TUM mode to learn the user's behavior
1542  dynamically.
1543
1544  The group's data is merged with the user's data in real-time, so if you have:
1545
1546    Group : Viagra = 10 Spam Hits,  0 Innocent Hits
1547    User1 : Viagra =  5 Spam Hits, 15 Innocent Hits
1548    User2 : Viagra = 20 Spam Hits,  1 Innocent Hits
1549
1550  Then the token is loaded as:
1551    User1 : Viagra = 15 Spam Hits, 15 Innocent Hits     = 0.50 (50%) = neutral
1552    User2 : Viagra = 30 Spam Hits,  1 Innocent Hits
1553
1554  No data is written to the group by DSPAM; only the user's data. This then
1555  offsets the group's data without affecting other users. Because of the way
1556  this data is merged, it's not recommended that you update the merged group
1557  with more than a handful of messages periodically, as it affects how all
1558  stats are defined for each user.
1559
1560  The format for a merged group is:
1561
1562    group1:merged:user1,user2,userN
1563    group2:merged:user3,user4,userN
1564
1565  The group name (in the example above 'group1', 'group2') can be anything you
1566  like and represents the name of the group user to merge with all members of
1567  the group. DSPAM will use that group name (in the example above 'group1',
1568  'group2') and merge at run-time the tokens from that group name with the tokens
1569  of the user (if the user is member of the merged group).
1570
1571  The user/member list for merged group allows the following syntax:
1572    user1          : exact match of user with the name "user1"
1573    -user1         : exclude user with the name "user1"
1574    *              : match any user
1575    *@example.org  : match users having "@example.org" at the end of ther
1576                     username. The matching only works for the '@' character.
1577                     You can not use something like '*user' to include user
1578                     'infouser', 'testuser', 'dummyuser', etc.
1579    -*@example.org : exclude users having "@example.org" at the end of their
1580                     username. The matching only works for the '@' character.
1581                     You can not use something like '-*user' to exclude user
1582                     'infouser', 'testuser', 'dummyuser', etc.
1583
1584  NOTE: Merged Groups are great for providing out-of-the-box adaptive filtering,
1585        but allowing users to build their own data from scratch will still
1586        result in the best possible accuracy in the longrun.
1587
1588  NOTE: Be sure and set your group user's preferences so that trainingMode is
1589        set to TOE. This will prevent the purge tools you use from purging them
1590        empty in 90 days.
1591
1592  RESTRICTIONS!
1593
1594  A user can simultaneously be a member of multiple classification / global
1595  group(s) and multiple inoculation group(s), but a user cannot be a member
1596  of both a classification / global group(s) or inoculation group(s) and a
1597  shared or shared,managed group.
1598
1599  A user can not be member of:
1600    * both a classification group and a global group
1601    * multiple merged groups
1602    * multiple shared or shared,managed groups
1603    * both a shared group or shared,managed group and a merged group
1604
16052.2 EXTERNAL INOCULATION THEORY
1606
1607  Bill Yerazunis recently expressed his theory of inoculation on an anti-spam
1608  development list, using the term "vaccination":
1609
1610  "Part of the problem is that spam isn't stationary, it evolves. That
1611   pesky .1% error rate is in some part due to the base mutation rate of spam
1612   itself.  Maybe the answer is "vaccination".  Vaccination is using _one_
1613   person's misery be used to generate some protective agent that protects the
1614   rest of the population; only the first person to get the spam actually has
1615   to read it.
1616
1617   My expectation is this: say you have ten friends, and you all agree to share
1618   your training errors.  Each of you will (statistically) expect to be the
1619   first to see a new mutation of spam about 9% of the time; the other ten
1620   friends in this group will have their bayesian filter trained preemptively
1621   to prevent this.  Net result: you get a tenfold decrease in error rate -
1622   down to 99.99% accuracy.  With a hundred such (trusted) friends, you may be
1623   down to 99.999% accuracy."
1624
1625  DSPAM has taken this concept and rolled it into support for what we call
1626  "inoculation groups" providing the exact functionality Bill describes.  This
1627  could be considered an "internal inoculation" practice.
1628
1629  On top of this, DSPAM has been designed to support external inoculation as
1630  a complement to internal inoculation.  This is where instead of your internal
1631  circle of friends inoculate you, you rely on external elements - namely
1632  spammers themselves - to inoculate you.
1633
1634  The theory behind external inoculation is this: why put _anyone_ through
1635  the misery of being the first to receive a new spam when you can have
1636  the spammers themselves send it directly to you.  On top of this,
1637  external inoculation can be combined with internal inoculation by taking
1638  the spam you received externally and inoculating your friends with it
1639  internally.
1640
1641  Inoculation is a little different from learning, as inoculation causes
1642  tokens to be given additional hit counts in an attempt to learn from a
1643  single email.  As a result, any form of inoculation should _only_ be
1644  attempted after an initial learning phase (perhaps when your filtering
1645  accuracy exceeds 99.0%).  DSPAM inoculates like this:
1646
1647  1. Every token that doesn't already exist in the database, or have fewer
1648     than two hits will be hit five times.
1649
1650  2. All other tokens are hit twice.
1651
1652  External inoculation is accomplished by creating a covert, external alias
1653  that is configured to automatically inoculate your dictionary from any
1654  messages it receives.  The covert alias can then be published onto a series
1655  of public newsgroups and websites where it is sure to be harvested by
1656  a spammer's tools.  One could even pro-actively subscribe one's self to
1657  several different opt-in spam lists, etcetera.
1658
1659  The first step is to configure an alias.  To do this you would use something
1660  like:
1661
1662    bob_c:	"|/path/to/dspam --process --class=spam --source=inoculation --user bob"
1663
1664  The 'C' in bob is for 'Covert'.  We must use a covert alias because if we
1665  use something obvious like 'bob-spam', harvester tools will automatically
1666  strip the -spam off and spam your real account.
1667
1668  Once the alias is set up, make sure this alias gets out only on lists where
1669  harvesters will grab it, and nobody will send legitimate email to it.
1670  It may even be a good idea to put it at the bottom of your tagline in all
1671  your publicly archived emails, something like...
1672
1673    Spammers, send me mail here: bob_c@example.org
1674
1675  Finally, you can multiply the effects of this by sharing an inoculation
1676  group with your friends.  If all of your friends have a public covert
1677  alias, then you will all be able to inoculate eachother should one of you
1678  receive a spam to the account.  What a great way to train your filter!
1679
1680  On top of this, should external inoculation become commonplace to the
1681  point where harvesters are picking up an equal amount of them as legitimate
1682  email addresses, spammers will start to realize that harvesters are just
1683  plain too dumb to tell the difference (the spammers themselves couldn't tell
1684  if mine was or not).  This could, best case scenario, put an end to
1685  harvester bots, making them obsolete as counter-productive tools.
1686
16872.3 CLIENT/SERVER MODE
1688
1689  DSPAM supports two different modes of operation.  In standard operating
1690  mode, the DSPAM agent is called by the MTA (or proxy) and each agent process
1691  performs independently, establishing its own connection to a database and
1692  performs delivery on its own. The second operating mode, client/server mode,
1693  allows the DSPAM agent to act more like a thin client, connecting to the
1694  DSPAM server process which then does all the work of analyzing and delivering
1695  or quarantining the message. The advantages to using DSPAM in client/server
1696  mode are:
1697
1698  - Maintaining a set of stateful database connections (within the server),
1699    which should enhance performance on some systems by eliminating the need
1700    to establish a new database connection for every message processed.
1701
1702  - Providing a central point of processing. Having one server perform all
1703    processing and delivery, while having multiple thin clients on your mail
1704    servers may be more desirable than having multiple agents performing
1705    processing and delivery on all your servers.
1706
1707  - The DSPAM server speaks LMTP, which some implementations may be able to
1708    take advantage of, eliminating the need for the DSPAM client all together.
1709
1710  - Having a single multithreaded daemon should use less memory and other
1711    resources than having independently operating clients.
1712
1713  If you've already got DSPAM set up, client/server mode won't require any
1714  changes to your mail server's configuration - it's completely transparent.
1715
1716  The DSPAM agent can be compiled with client/server support by configuring
1717  with --enable-daemon. You will need to use a multithread-safe storage driver
1718  (presently mysql_drv, pgsql_drv and hash_drv are supported). Once you have
1719  compiled with daemon support, you'll need to modify your dspam.conf to
1720  provide the settings necessary for client/server mode:
1721
1722	ServerHost		127.0.0.1
1723
1724  The host to listen on. The default is to comment this setting which will
1725  force DSPAM to listen on all available interfaces.
1726
1727	ServerPort		24
1728
1729  The port to listen on. The default is 24, the LMTP port.
1730
1731	ServerQueueSize		32
1732
1733  The maximum number of connections which may remain backlogged before they
1734  are accepted.
1735
1736	ServerPass.Relay1	"secret"
1737	ServerPass.Relay2	"password"
1738
1739  Each client server allowed to connect should have its own password. They
1740  can be defined here.
1741
1742  The DSPAM server can listen on either a network socket or a local unix
1743  domain socket. If you're running the client and server on the same machine,
1744  a domain socket should be used as it eliminates additional overhead. To use
1745  a domain socket, you'll also need to add the following option:
1746
1747	ServerDomainSocketPath	"/tmp/dspam.sock"
1748
1749  Once you've configured the server config, you'll want to set the client
1750  configuration on all client machines. If you are using network sockets,
1751  set the following to appropriate values:
1752
1753	ClientHost		127.0.0.1
1754	ClientPort		24
1755
1756  Or if using a domain socket:
1757
1758        ClientHost		/tmp/dspam.sock
1759
1760  In both cases, you'll need to set the client's authentication ident:
1761
1762	ClientIdent		"secret@Relay1"
1763
1764  Now you're ready to go. To start the DSPAM server, run:
1765
1766	dspam --daemon &
1767
1768  Or alternatively, if you have debugging enabled:
1769
1770	dspam --debug --daemon &
1771
1772  The DSPAM agent can then be called the same as if you were running in
1773  standard (non-client/server) mode and adding --client to the set of
1774  parameters. Running dspam without --client specified will cause DSPAM to
1775  revert to its normal non-daemon behavior and establish database connections
1776  on its own. The client settings will be loaded from dspam.conf, and the
1777  agent will act as a thin client instead. For example:
1778
1779	dspam --client --user dick jane --deliver=innocent -d %u
1780
1781  Alternatively, if you'd like to use a thinner client, dspamc is identical
1782  to the dspam binary in behavior, but has been stripped down to only include
1783  the lightweight client.
1784
1785	dspamc --user dick jane --deliver=innocent -d %u
1786
1787  The conversation that takes place between the client/server is LMTP-based,
1788  and will look like this:
1789
1790    SERVER> 220 DSPAM DLMTP 3.10.0 Authentication Required
1791    CLIENT> LHLO Relay1
1792    SERVER> 250-PIPELINING
1793    SERVER> 250-ENHANCEDSTATUSCODES
1794    SERVER> 250-DSPAMPROCESSMODE
1795    SERVER> 250 SIZE
1796    CLIENT> MAIL FROM: <secret@Relay1> DSPAMPROCESSMODE="--deliver=innocent -d %u"
1797    SERVER> 250 2.1.0 OK
1798    CLIENT> RCPT TO: dick
1799    SERVER> 250 2.1.5 OK
1800    CLIENT> RCPT TO: jane
1801    SERVER> 250 2.1.5 OK
1802    CLIENT> DATA
1803    SERVER> 354 Enter mail, end with "." on a line by itself
1804    CLIENT> Subject: Cheap Viagra!
1805    CLIENT>
1806    CLIENT> Click Here: http://www.cheapviagra.example.org
1807    CLIENT> .
1808    SERVER> 250 2.0.0 <dick> Message accepted for delivery: INNOCENT
1809    SERVER> 250 2.0.0 <jane> Message accepted for delivery: SPAM
1810
1811  Optionally, if you'd like the clients to perform delivery, you can use
1812  DSPAM's --stdout or --classify functionality to obtain a dump of the message
1813  or results, respectively. From there, it's up to you and your MTA to
1814  deliver the message. The DSPAM client will output the results to stdout in
1815  this case, just as it would in standard operating mode.
1816
1817  Once the server is running, its configuration can be reloaded with a SIGHUP.
1818  When the daemon is reloaded, the following occurs:
1819
1820  - The daemon stops listening for new requests
1821  - All threads are allowed to finish processing and exit
1822  - All connections to the database are closed
1823  - The dspam.conf configuration is reloaded
1824  - All connections to the database are re-opened
1825  - The daemon starts listening for new requests
1826
1827  This allows database and listener configurations to also be reloaded from
1828  dspam.conf without the need to interrupt the process.
1829
1830  NOTE: During the period of time the daemon is reloading, client connections
1831        will fail. Depending on how the MTA reacts, this may cause messages to
1832        fall back to queue or to bounce.
1833
18342.4 LMTP
1835
1836  DSPAM supports LMTP both on the front-end and back-end (delivery). This
1837  section will briefly provide instructions for configuring either or both of
1838  these advanced options.
1839
1840  LMTP (AND SMTP) DELIVERY
1841
1842  DSPAM supports LMTP delivery for admins who would prefer to use this instead
1843  of local delivery. While LMTP delivery doesn't _require_ operating in
1844  daemon mode, it is necessary to compile DSPAM with --enable-daemon to take
1845  advantage of LMTP delivery. To configure LMTP delivery, perform the following
1846  steps:
1847
1848  1. Compile DSPAM with --enable-daemon to enable LMTP delivery code
1849
1850  2. Configure your DeliveryHost and DeliveryIdent in dspam.conf. Set
1851     DeliveryProto based on whether you would like to delivery via LMTP or SMTP.
1852
1853     NOTE: If you would like to delivery to different hosts based on domain,
1854           specify DeliveryHost.example.org as the configuration directive. Use
1855           DeliveryPort.example.org to specify a port for the delivery.
1856
1857  3. Add the --lmtp-recipient flag to the arguments passed into DSPAM. This is
1858     used to specify the destination address for the message. For example, in
1859     postfix:
1860
1861     --lmtp-recipient=${recipient}
1862
1863  DSPAM will then connect to the specified host, and deliver using a standard
1864  LMTP looking like:
1865
1866    LHLO [ident]
1867    MAIL FROM:<> SIZE=[message_length]
1868    RCPT TO: <recipient>
1869    DATA
1870    [Message]
1871    .
1872
1873  LMTP SERVER
1874
1875  DSPAM supports a "daemon" mode where it will sit and listen for inbound
1876  connections. Depending on how the server is configured, DSPAM can speak
1877  either standard LMTP (for interaction with a mail server, such as postfix)
1878  or DLMTP (DSPAM LMTP) which is a proprietary implementation of LMTP between
1879  the DSPAM client and server. If you plan on calling DSPAM from the commandline
1880  via dspamc, but wish to have a stateful daemon perform processing, then
1881  you'll want to use the "dspam" server mode. If you want to call DSPAM by
1882  having your mail server connect to it via LMTP, then you'll need to specify
1883  the "standard" server mode.
1884
1885  The ServerMode can be set in dspam.conf. Each mode has its own custom
1886  tweaks and configurations that will need to be set in dspam.conf.
1887
1888  "dspam" mode settings.
1889  In "dspam" mode, you'll need to set up authentication for each dspam client
1890  relay. This involves configuring the relay ident and password. Examples are
1891  provided.
1892
1893  "dspam" mode notes.
1894  In dspam mode, only the dspam client will be connecting to your LMTP server.
1895  This can be dspamc (a thin-client) or the dspam binary. In either case,
1896  you'll need to specify --client to tell DSPAM to act as a client. DLMTP
1897  allows the client to pass in any commandline arguments provided, so it should
1898  function identical to if you were running it as a dedicated (non-stateful)
1899  process.
1900
1901  "standard" mode settings.
1902  In "standard" mode, you will need to configure the ServerParameters flag to
1903  reflect the commandline parameters you would normally want to pass to DSPAM.
1904
1905  "standard" mode notes.
1906  One thing to watch out for is that the recipient you're sending via LMTP is
1907  unique to a specific user. This means that all of your aliases should be
1908  resolved before the MTA relays to DSPAM. Because DSPAM uses the addresses in
1909  the RCPT TO as usernames, _not_ resolving any aliases will result in
1910  multiple databases being created for one user. Since the signature will be
1911  different for each user, and since the message must be processed
1912  differently for each user, DSPAM demultiplexes a multi-recipient email. This
1913  means that while it can receive an email with multiple RCPT TO's specified, it
1914  will perform delivery individually.
1915
1916  "auto" mode setting.
1917  If you would like to support both connecting MTAs and remote dspam client
1918  processes (such as for inoculations), you can set the server mode to auto,
1919  which will base its dialect on the ident supplied in the LHLO. If the LHLO
1920  ident matches an ident in dspam.conf's ServerPass section, the server will
1921  default to DLMTP. Otherwise, DSPAM will assume the client is a standard
1922  LMTP client and speak standard LMTP.
1923
1924  LOCAL DELIVERY WITH LMTP FRONT-END
1925
1926  In some circumstances, you may want to relay to DSPAM via LMTP, but have
1927  DSPAM deliver via LDA. In these cases, you may use the following
1928  conventions in your ServerParameters configuration:
1929
1930  %r - The RCPT TO passed in via LMTP
1931  %s - The MAIL FROM passed in via LMTP
1932
1933  In both cases, the content provided between < > is what is actually used.
1934
19352.5 DSPAM USER PREFERENCES
1936
1937  Preferences are settings that can be configured globally in dspam.conf or
1938  for individual users via the dspam_admin command.
1939
1940  trainingMode { TOE | TUM | TEFT | NOTRAIN }
1941    How DSPAM should train messages it analyzes. See section 1.5 --mode
1942    (default:teft, see dspam.conf)
1943
1944  spamAction { quarantine | tag | deliver }
1945    What to do with spam. The tag and deliver options both deliver, but tag
1946    adds a special prefix to the subject, whereas deliver merely sets
1947    X-DSPAM-Result. (default:quarantine)
1948
1949  spamSubject
1950    A customized subject to prefix when spamAction=tag. (default:[SPAM])
1951
1952  statisticalSedation { 0 - 10 }
1953    The level of dampening during training (0-10, 0 = no dampening, default:0)
1954
1955  enableBNR { on | off }
1956    Enables or disables bayesian noise reduction (default:off)
1957
1958  enableWhitelist { on | off }
1959    Enables or disables automatic whitelisting (default:on)
1960
1961  signatureLocation { message | headers }
1962    Where to place the DSPAM signature. Placement affects forwarding approach.
1963    (default:message)
1964
1965  tagSpam / tagNonspam { on | off }
1966    Adds a tagline to the end of a message based on its classification; useful
1967    for things such as "Scanned by your ISP example.org". If set to on, the file
1968    msgtag.spam and/or msgtag.nonspam will be looked for in "TxtDirectory"
1969    (see dspam.conf) and appended to appropriate messages.
1970
1971    NOTE: Signed messages will not be tagged in this fashion
1972
1973  showFactors { on | off }
1974    Whether to include an X-DSPAM-Factors header including decision-making
1975    factors (clues). NOTE: This can break RFC in some cases, and should only
1976    be used for debugging. (default:off)
1977
1978  optIn / optOut { on | off }
1979    Depending on whether the system is opt-in or opt-out, sets the user's
1980    membership. If user is opted out (or not opted in), mail will be delivered
1981    by DSPAM without being processed.
1982
1983  whitelistThreshold { Integer }
1984    Overrides the default number of times a From: header has been seen before
1985    it is automatically whitelisted. (default:10)
1986
1987  makeCorpus { on | off }
1988    When activated, a maildir-style corpus is maintained in the user's data
1989    directory (DSPAM_HOME/DATA/USERNAME), suitable for future retraining or
1990    other analysis. (default:off)
1991
1992  storeFragments { on | off }
1993    When activated, the first 1k of each message are temporarily stored on
1994    the server for reference via the webui's history function. (default:off)
1995
1996  localStore { on | off }
1997    Overrides the directory name used for the user's dspam data directory. This
1998    is useful when using recipient addresses as usernames, as it will allow
1999    all addresses belonging to a specific user to be written to a single
2000    webui directory. (default:username)
2001
2002  processorBias { on | off }
2003    Overrides the "bias" setting in dspam.conf, which biases mail as
2004    innocent. (default:on, see dspam.conf)
2005
2006  fallbackDomain { on | off }
2007    Allows a dspam user ("@example.org") to be marked as a fallback user for
2008    the entire domain, so if the destination dspam user does not exist in
2009    the database, the fallback user's database will be used. The
2010    dspam.conf "FallbackDomains" setting must also be "on". (default:off)
2011    NOTE: You will need to set "FallbackDomains on" in dspam.conf to use this.
2012
2013  trainPristine { on | off }
2014    Override's the default signature mode and treats messages as if they were
2015    in pristine format when retraining. This requires all retraining to use
2016    the original message that was processed as no dspam signature is stored
2017    for pristine training. (default:off)
2018
2019  optOutClamAV { on | off }
2020    Opts out of ClamAV virus scanning (if ClamAV is directly integrated with
2021    dspam via dspam.conf). (default:off)
2022
2023  ignoreRBLLookups { on | off }
2024    Overrides the "Lookup" setting in dspam.conf, which lookups senders IP
2025    addresses in a Realtime Blackhole List (RBL). (default:off)
2026
2027  RBLInoculate { on | off }
2028    Overrides the "RBLInoculate" setting in dspam.conf, which inoculates mail
2029    as spam if lookup result is positive. (default: depending on dspam.conf)
2030
2031    NOTE: This user preference has higher weight then the one set in dspam.conf.
2032    If you don't set this user preference to on/off then whatever is set in
2033    dspam.conf will be used for every user.
2034
20352.6 FALLBACK DOMAINS
2036
2037  Fallback domains allow you to default some or all users for a particular
2038  domain to a single domain user; this allows you to set preferences (including
2039  opting out of filtering entirely) for users based on domain name. Any user
2040  who does not exist as a known user to DSPAM will be defaulted to the
2041  domain it belongs to if it is designated as a fallback domain. This
2042  means that you can create bob@example.org and alice@example.org with their own
2043  databases and preferences, but also default all other users to @example.org.
2044  Alternatively, you could create just the domain without any other users and
2045  default all users to @example.org
2046
2047  To use fallback domains, you'll first need to activate this feature in
2048  dspam.conf:
2049
2050  FallbackDomains on
2051
2052  Next, you'll need to create a dspam user for each domain you wish to use
2053  as a fallback domain. For example, @example.org. Depending on your
2054  implementation, this may be a simple insert into dspam_virtual_uids or may
2055  be created automatically when setting a user's preferences.
2056
2057  Finally, designate that special user as a fallback domain by setting a
2058  preference:
2059
2060  dspam_admin ch pref @example.org fallbackDomain on
2061
2062  Any mail coming in for that domain that does _not_ match a known user in
2063  dspam will now fall back to this user; you can then set specific preferences
2064  or even opt out the entire user. Alternatively, you can create a domain-based
2065  database for filtering mail specific to that domain, just as you would a
2066  normal user.
2067
20682.7 EXTERNAL USER LOOKUP
2069  External User Lookup has two major applications. It allows DSPAM to validate
2070  the supplied username in setups where users are Opt'ed-In by default, and there
2071  is no prior recipient checking from the MTA. In those cases, it can be configured
2072  not to automatically create the user entries in the DSPAM system and thus spare
2073  you from polute the DSPAM database with inexistent users.
2074  The other application is when you need username rewritting/mapping. That will
2075  happen when you need to map several email addresses (aliases) into a single
2076  user account or when you wish to integrate DSPAM into systems where the users
2077  email addresses or usernames can change. This will allow you to define alternate
2078  static identifiers while still keeping the users DSPAM dictionaries, across
2079  username/email address change, without dictionary maintenance.
2080
2081  Currently, there are three different modes of operation and two backend lookup
2082  drivers. The mode can be set using the ExtLookupMode directive and the available
2083  possibilities are:
2084
2085    verify - It will verify that the supplied username exists in lookup backend. In
2086	the event that it cannot be verified, DSPAM will not create the user entry in it's
2087	backend facilities.
2088
2089    map - It will NOT verify that the supplied username exists in the lookup backend.
2090	It will, though, try to use the lookup backend to map (rewrite) the username. If
2091	There is a map/rewrite available, it will use the retrieved username, instead of
2092	the supplied one. On the other hand, if there is no map/rewrite available, DSPAM
2093	will use the supplied username and create the respective entries in it's backend.
2094
2095	strict - It will enforce both verify AND map modes. Meaning that it will rewrite
2096	the username, if a rewrite is available, and will also only create that user entry
2097	in it's backend system if there was a successful map/rewrite.
2098
2099  The backend lookup drivers available are only two at the moment, LDAP and Program.
2100  The LDAP drivers allows DSPAM to query an LDAP server for a custom attribute, defined
2101  by the ExtLookupLDAPAttribute directive. The query can be fine grained using the
2102  ExtLookupQuery directive to provide a standard LDAP filter, where %u will be replaced
2103  by the username provided to DSPAM. Literal percentage can used if escaped with
2104  another % sign, i.e., %% will match % in the query filter.
2105  The Program driver exists because this seemed a neat feature and not every one
2106  uses LDAP. In this case, the ExtLookupServer directive will be used to define
2107  the custom program/script call, with the respective arguments. Also here %u can
2108  be used to define the provided username and literal % can be achieved by escaping
2109  the percentage sign with another '%'. Using the program driver, DSPAM will use
2110  whatever was the first line output of the program/script execution.
2111
2112
21133.0 BUGS, FEATURE REQUESTS
2114
2115  Please use our Bug Tracker on the sourceforge project page at
2116  http://sourceforge.net/projects/dspam for the current known bugs list and
2117  proper reporting procedure.
2118
2119  In the same place you can ask for new feature via the Feature Request Tracker.
2120
2121  Please note that everything under contrib/ is not officially supported by the
2122  DSPAM Project but by the respective authors; however, in order to help the
2123  authors, facilitate integration with DSPAM and release procedures, we provide
2124  a bug tracker for each script/plugin at the same URL.
2125
21263.1 PORTS / PACKAGES
2127
2128  The DSPAM Project does not provide binary packages of DSPAM. Each
2129  OS/distribution has its own contributors (they know perfectly their
2130  distribution's policy, their special guidelines, testing procedures, etc.).
2131
2132  Take a look at the DSPAM Wiki for packages/ports for various distributions located
2133  at http://sourceforge.net/apps/mediawiki/dspam/index.php?title=Main_Page or read
2134  http://dspam.sourceforge.net
2135
2136  If you wish to port DSPAM to an other OS/distro/platform and need help or have
2137  patches you would like to be merged in the repo please email
2138  dspam-devel@lists.sourceforge.net mailing list.
2139
2140
2141  Note:
2142
2143  In order to keep DSPAM unencumbered by intellectual property abuses, all
2144  external contributors to the project are asked to release any rights to the
2145  submission. This keeps the DSPAM project a healthy, unencumbered GPL project.
2146  Please accompany your patch, code, or other submission with the following
2147  statement. By submitting a patch to the project, you agree to be bound by
2148  the terms of this statement whether it is specifically included in the
2149  submission or not, however we still require that it be attached to the
2150  submission:
2151
2152    The author or authors of this submission hereby release any and all
2153    copyright interest in this code, documentation, or other materials
2154    included to the DSPAM project and its primary governors. We intend this
2155    relinquishment of copyright interest in perpetuity of all present and
2156    future rights to said submission under copyright law.
2157
21583.2 GIT ACCESS
2159
2160  The DSPAM source tree can be downloaded via read-only git access using the
2161  following commands:
2162
2163  git clone git://dspam.git.sourceforge.net/gitroot/dspam/dspam
2164