1DSPAM v3.10.2
2COPYRIGHT (C) 2002-2012 DSPAM Project
3http://dspam.sourceforge.net/
4
5LICENSE
6
7This program is free software: you can redistribute it and/or modify
8it under the terms of the GNU Affero General Public License as
9published by the Free Software Foundation, either version 3 of the
10License, or (at your option) any later version.
11
12This program is distributed in the hope that it will be useful,
13but WITHOUT ANY WARRANTY; without even the implied warranty of
14MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
15GNU Affero General Public License for more details.
16
17You should have received a copy of the GNU Affero General Public License
18along with this program. If not, see <http://www.gnu.org/licenses/>.
19
20CREDITS
21
22Original Work By
23 Lead development till 3.8.0: Jonathan A. Zdziarski <jonathan@nuclearelephant.com>
24 Lead development after 3.8.0: Stevan Bajic <stevan@bajic.ch>
25 PostgreSQL driver: Rustam Aliyev <rustam@azernews.com>
26 External Lookup module: Hugo Monteiro <hugo.monteiro@fct.unl.pt>
27 Various:
28 Feb/2006 Cove Schneider <cove@wildpackets.com>
29 Jan/2006 Norman Maurer <nm@byteaction.de>
30
31Your name is missing? Let us know with a reference to your commit, and we'll
32add you to the list.
33
34COPYRIGHT
35
36As of 12 January 2009 the copyright is owned by the DSPAM Project, represented
37by a team of people, including:
38 Alexander Prinsier
39 Dov Zamir
40 Hugo Monteiro
41 Ion-Mihai Tetcu
42 Paul Cockings
43 Stevan Bajic
44
45TABLE OF CONTENTS
46
47General DSPAM Information
48
49 1.0 About DSPAM
50 1.1 Installation and Configuration
51 1.2 Testing
52 1.3 Troubleshooting
53 1.4 DSPAM Tools
54 1.5 Agent Commandline Arguments
55
56Advanced DSPAM functionality
57
58 2.0 Linking with libdspam
59 2.1 Configuring groups
60 2.2 External Inoculation Theory
61 2.3 Client/Server Mode
62 2.4 LMTP
63 2.5 DSPAM User Preferences
64 2.6 Fallback Domains
65 2.7 External User Lookup
66
67Miscellaneous
68
69 3.0 Bugs, Feature Requests
70 3.1 Ports / Packages
71 3.2 GIT Access
72
731.0 ABOUT DSPAM
74
75DSPAM is an open-source, freely available anti-spam solution designed to combat
76unsolicited commercial email using advanced statistical analysis. In short,
77DSPAM filters spam by learning what spam is and isn't. It does this by learning
78each user's individual mail behavior. This allows DSPAM to provide
79highly-accurate, personalized filtering for each user on even a large system
80and provides an administratively maintenance free solution capable of learning
81each user's email behaviors with very few false positives.
82
83While DSPAM is focused around spam filtering, many have found alternative
84uses for all types of two-concept document classification.
85
86DSPAM is rapidly gaining a large support forum and being used in many large-
87scale implementations. Contributions to the project are welcome via the
88dspam-dev mailing list or in the form of financial contributions.
89
90Many of the foundational principles incorporated into this software were
91contributed by Paul Graham's white paper on combatting spam, which can be
92found at http://paulgraham.com/spam.html. Much research and development has
93resulted in many new approaches being added onto the DPSAM project as well,
94some of which are explained in white papers on the DSPAM home page.
95
96DSPAM can be implemented as a total solution, or as a library which developers
97may link their projects to the dspam core engine (libdspam) in accordance with
98the GPL license agreement. This enables developers to incorporate libdspam as
99a "drop-in" for instant spam filtering within their applications - such as mail
100clients, other anti-spam tools, and so on.
101
102PLEASE NOTE: DSPAM and libdspam are distributed under the AGPL license, not the
103LGPL. Commercial licensing is available for those who seek to redistribute
104DSPAM or some of DSPAM's components/libraries in their non-GPL products.
105Please contact us for more information about commercial licensing.
106
107The DSPAM package is split up into the following pieces:
108
109DSPAM AGENT
110
111The DSPAM agent is the command center for all shell and daemon operations.
112If you're using DSPAM as a filtering solution, this is the 'dspam' (or dspamc)
113binary you're likely going to be talking to via commandline.
114
115LIBDSPAM: CORE ENGINE
116
117The DSPAM core processing engine, also known as libdspam, provides all critical
118spam filtering functions. The engine is embedded into other dspam components
119(such as the agent) and is responsbile for the actual filtering logic.
120If you're not a developer, you don't need to be concerned with this component
121as it is automatically compiled in with the build.
122
123WEB UI
124
125The Web UI (User Interface) is designed to allow end-users to review their
126spam quarantine and history, graphs, and to delete their spam permanently.
127They can also optionally use the quarantine to perform all of their training.
128The UI also includes some basic administrative tools to change settings and
129manage user quarantines.
130
131TOOLS
132
133Some basic tools which have been provided to manage dictionaries, automate
134corpus feeding, and perform other diagnostic operations related to DSPAM.
135Some of these include dspam_train, dspam_stats, and dspam_dump.
136
137HISTORY OF COPYRIGHT
138
139Original work was done by Jonathan A. Zdziarski.
140
141In 2006 the copyright was handed over to Sensory Networks.
142
143In 2009 Sensory Networks handed over the full copyright to the DSPAM Project,
144represented by a team of people, including:
145 Alexander Prinsier
146 Dov Zamir
147 Hugo Monteiro
148 Ion-Mihai Tetcu
149 Paul Cockings
150 Stevan Bajic
151
1521.1 INSTALLATION
153
154IMPLEMENTATION OPTIONS
155
156There are many different ways to deploy DSPAM onto an existing network. The
157most popular approaches are:
158
1591. As a delivery agent proxy
160
161When your mail server gets ready to deliver mail to a user's mailbox it calls
162a delivery agent of some sort. On most UNIX systems, this is procmail, maildrop,
163mail.local, or a similar tool. When used as a delivery proxy, the DSPAM agent
164is called in place of your existing agent - or better put, it can masquerade
165as the local delivery agent. DSPAM then processes the message and will call
166the /real/ delivery agent to pass the good mail into the user's mailbox,
167quarantining the bad mail. DSPAM can optionally tag and deliver both spam
168and legitimate mail.
169
170In the diagram below, MTA refers to Mail Transfer Agent, or your mail server
171software: Postfix, Sendmail, Exim, etc. LDA refers to the Local Delivery
172Agent: Procmail, Maildrop, etc..
173
174BEFORE:
175
176 [MTA] ---> [LDA] ---> (User's Mailbox)
177
178AFTER:
179
180 [MTA] ---> [DSPAM] ---> [LDA] ---> (User's Mailbox)
181 \
182 \--> [Quarantine]
183 [End User] ------> [Web UI]
184
1852. As a POP3 Proxy
186
187If you don't want to tinker with your existing mail server setup, DSPAM can
188be combined with one of a few open source programs designed to act as a POP3
189proxy. This means spam is filtered whenever the user checks their mail,
190rather than when it is delivered. The benefit to this is that you can set up
191a small machine on your network that will connect to your existing mail server,
192so no integration is needed. It also allows your users to arbitarily point their
193mail client at it if they desire filtering. The drawback to this approach is
194that the POP3 protocol has no way to tell the mail client that a message is
195spam, and so the user will have to download the spam (tagged, of course).
196
197BEFORE:
198
199 [End User] ---> [POP3 Server]
200
201AFTER:
202
203 [End User] ---> [POP3 Proxy] <--> [DSPAM]
204 \
205 \--> [POP3 Server]
206
2073. As an SMTP Relay
208
209Newer versions of DSPAM have seen features that allow it to function more
210easily as an SMTP relay. An SMTP relay sits in front of your existing mail
211server (requiring no integration). To use an SMTP relay, the MX records for
212your domains are repointed to the relay machine running DSPAM. DSPAM then
213relays the good (and optionally bad) mail to the existing SMTP server. This
214allows you to use DSPAM with even a Windows-based destination mail server
215as no integration is necessary. See doc/relay.txt for one example of how to
216do this with Postfix.
217
218BEFORE:
219
220 { Internet } ---> [Company Mail Server]
221
222AFTER:
223
224 { Internet } ---> [ Inbound SMTP Relay ] ---> [Company Mail Server]
225 ( MTA <> DSPAM ) SMTP
226 \ or
227 \--> [Quarantine] LMTP
228 [End User] ------> [Web UI]
229
230UPGRADING DSPAM
231
232 Please see the file UPGRADING
233
234FRESH INSTALLATION
235
2360. PREREQUISITES
237
238 DSPAM can use one of many different backends to store its information, and
239 you will need to decide on one and install the appropriate software before
240 you can build DSPAM. The following storage backends are presently available:
241
242 Driver Requirements
243 -------------------------------------------------------------------------
244 T mysql_drv: MySQL client libraries (and a server to connect to)
245 T pgsql_drv: PostgreSQL client libraries (and a server to connect to)
246 sqlite_drv: SQLite v2.7.7 or above (scheduled for removal)
247 sqlite3_drv: SQLite v3.x
248*T hash_drv: None (Self-Contained Hash-Based Driver)
249
250 Legend:
251 * Default storage driver
252 T Thread-safe (Required for running DSPAM in server daemon mode)
253
254 In general, MySQL is one of the faster solutions with a smaller storage
255 footprint and is well suited for both small and large-scale implementations.
256
257 The hash driver (inspired by Bill Yerazunis' CRM Sparse Spectra algorithm)
258 is the fastest solution by far and requires no dependencies. It supports
259 an auto-extend feature to grow the file size as needed and is very
260 fast and compact. It does however lack some features (such as merged
261 groups support) and uses a lot of memory to mmap() users.
262
263 Also note that a database created with the hash driver is currently not safe
264 to move between 32/64 bit systems or big/little endian systems.
265
266 Documentation for any additional setup of your selected storage driver can
267 be found in the doc/ directory. You'll need to follow any steps outlined in
268 the storage driver documentation before continuing.
269
270 You can download MySQL from http://www.mysql.com.
271 You can download PostgreSQL from http://www.postgresql.com.
272 You can download SQLite from http://www.sqlite.org.
273
2741. CONFIGURATION
275
276 DSPAM uses autoconf, so configuration is fairly standardised with other
277 UNIX-based software:
278
279 ./configure [options]
280
281 DSPAM supports the configuration options below. Generally, the default
282 configuration is more than acceptable, so it's a good idea not to tweak too
283 many settings unless you know what you are doing.
284
285 PATH SWITCHES
286
287 --prefix=DIR
288 Specify an alternative root prefix for installation. The default is
289 /usr/local. This does not affect the location of dspam.conf (which
290 defaults to /etc). Use --sysconfdir= for this.
291
292 --sysconfdir=DIR
293 Specify an alternative home for the dspam.conf file. The default is /etc.
294
295 --with-dspam-home=DIR
296 Specify an alternative DSPAM home for installation. This can alternatively
297 be changed in dspam.conf, but is convenient to do on the configure line.
298 The default is $prefix/var/dspam, or /usr/local/var/dspam.
299
300 --with-logdir=DIR
301 Specify an alternative log directory. The default is $dspam_home/log. Do
302 not set this to /var/log unless DSPAM will have permissions to write to
303 the directory.
304
305 FILESYSTEM SCALE
306
307 The default filesystem scale is "small-scale", and writes each user to
308 its own directory in the top-level DSPAM home data directory.
309 The following two switches allow the scale to be changed to be more
310 suitable for larger installations.
311
312 --enable-large-scale
313 Switch for large-scale implementation. User data will be stored as
314 $HOME/data/u/s/user instead of $HOME/data/user
315
316 --enable-domain-scale
317 Switch for domain-scale implementation. When used, DSPAM expects
318 username@domain to be passed in as the user id and user data will be
319 stored as $HOME/data/example.org/user and $HOME/opt-in/example.org/user.dspam
320 instead of $HOME/data/user
321
322 INTEGRATION SWITCHES
323
324 --with-storage-driver=DRIVER[,DRIVER2[...,DRIVERN]]
325 Specify your storage driver selection(s). A storage driver is a driver
326 written specifically for DSPAM to store tokens, signature data, and
327 perform other proprietary operations. The default driver is hash_drv.
328 The following drivers have been provided:
329
330 mysql_drv: MySQL Drivers
331 pgsql_drv: PostgreSQL Drivers
332 sqlite_drv: SQLite v2.x Drivers (scheduled for removal)
333 sqlite3_drv: SQLite v3.x Drivers
334 hash_drv: Self-Contained Hash Database
335
336 If you are a packager, or wish to have multiple drivers built for any
337 reason you may specify multiple drivers by separating them with commas.
338 This will cause the storage driver specified in dspam.conf to be
339 dynamically loaded at runtime rather than statically linked. If you wish
340 to build only one driver, but dynamically, then specify it twice as in
341 --with-storage-driver=mysql_drv,mysql_drv.
342
343 If you will be compiling DSPAM to operate as a server daemon or to deliver
344 via SMTP/LMTP, you will need to use a thread-safe driver (outlined in the
345 chart earlier in this document).
346
347 You may also need to use some of the driver-specific configure flags
348 (discussed in the DRIVER SPECIFIC CONFIGURATION OPTIONS section below).
349
350 --disable-trusted-user-security
351 Administrators who wish to disable trusted user security may do so by
352 using this configure flag. This will cause DSPAM to treat each user as
353 if they were "trusted" which could allow them to potentially execute
354 arbitrary commands on the server via DSPAM. Because of this, administrators
355 should only use this option on either a closed server, or configure their
356 DSPAM binary to be executable only by users who can be trusted. This
357 option SHOULD NOT be used as a solution to your MTA dropping privileges
358 prior to calling DSPAM. Instead, see the TRUSTED SECURITY section of this
359 document.
360
361 --enable-homedir
362 When enabled, instead of checking for $HOME/$USER/opt-in/
363 $USER[.dspam|.nodspam], DSPAM will check for a .dspam|.nodspam file in the
364 user's home directory. DSPAM will also store each user's data in ~/.dspam
365 when this option is enabled. Because of this, DSPAM will automatically
366 install and run setuid root so that it can read each user's home directory.
367
368 Note:
369
370 This function is incompatible with most implementations of the Web UI,
371 since it requires access to read each user's home directory. Therefore,
372 only use this option if you will not be using the Web UI or plan on
373 doing something asinine like running it as root.
374
375 --enable-daemon
376 Builds DSPAM with support for daemon mode, and builds associated dspamc
377 thin client. Pthreads is required to build for daemon mode and the
378 storage driver used must be thread-safe.
379
380 DRIVER SPECIFIC CONFIGURE SWITCHES
381
382 Some storage drivers have their own custom configuration switches:
383
384 mysql_drv:
385 --with-mysql-includes=DIR
386 Specify a path to the MySQL includes
387
388 --with-mysql-libraries=DIR
389 Specify a path to the MySQL libraries
390 (Currently links to -lmysqlclient, also -lcrypto on some systems)
391
392 --enable-virtual-users
393 Tells DSPAM to create virtual user ids. Use this if your users don't
394 actually exist on the system (e.g. in /etc/passwd if using a password
395 file)
396
397 --enable-preferences-extension
398 MySQL supports the preferences extension, which stores user preferences
399 in mysql instead of flat files (the built-in method)
400
401 --disable-mysql4-initialization
402 If you are compiling libdspam for use with a third party application,
403 and the third party application makes its own calls to libmysqlclient,
404 you should use this option to disable libdspam's initialization and
405 cleanup of libmysqlclient, and allow the application to manage this.
406 This option suppresses libdspam's calls to mysql_server_init and
407 mysql_server_end.
408
409 Note:
410
411 Please see the file doc/mysql_drv.txt for more information
412 about configuring the mysql_drv storage driver.
413
414 pgsql_drv:
415 --with-pgsql-includes=DIR
416 Specify a path to the PgSQL includes
417
418 --with-pgsql-libraries=DIR
419 Specify a path to the PgSQL libraries
420 (Currently links to -lpq, and netlibs on some systems)
421
422 --enable-virtual-users
423 Tells DSPAM to create virtual user ids. Use this if your users don't
424 actually exist on the system (e.g. in /etc/passwd if using a password
425 file)
426
427 --enable-preferences-extension
428 Postgres supports the preferences extension, which stores user
429 preferences in pgsql instead of flat files (the built-in method)
430
431 Note:
432
433 Please see the file doc/pgsql_drv.txt for more information about
434 configuring the pgsql_drv storage driver.
435
436 sqlite_drv:
437 sqlite3_drv:
438 --with-sqlite-includes=DIR
439 Specify a path to the SQLite includes
440
441 --with-sqlite-libraries=DIR
442 Specify a path to the SQLite libraries
443
444 DEBUGGING SWITCHES
445
446 --enable-debug
447 Turns on support for debugging output. This option allows you to turn on
448 debugging messages for all or some users by editing dspam.conf or setting
449 --debug on the commandline. Enabling debug in configure only adds support
450 for debug to be compiled in, it must still be activated using one of the
451 options prescribed above. Debugging support itself doesn't use up very
452 many additional resources, so it should be safe to leave enabled on
453 non-enterprise class systems.
454
455 --enable-verbose-debug
456 Turns on extremely verbose debugging output. --enable-debug is implied.
457 Never use this on production builds!
458
459 Note:
460
461 When verbose debug is compiled in, DSPAM performs many additional
462 mathematical calculations regardless of whether or not it's been
463 activated. You shouldn't use --enable-verbose-debug for production
464 builds unless you have serious issues you can't resolve.
465
466 FEATURE ACTIVATION
467
468 --enable-clamav
469 Enables support for Clam Antivirus. DSPAM can interface directly with
470 clamd to perform virus scanning and can be configured to react in
471 different ways to viruses. See dspam.conf for more information.
472
473 ADDITIONAL CONFIGURATION OPTIONS
474
475 The remainder of configuration options are located in dspam.conf, which
476 is installed in sysconfdir (default: /usr/local/etc) upon a make install.
477 It is generally a good idea to review dspam.conf and make any changes
478 necessary prior to using DSPAM.
479
4802. BUILDING AND INSTALLING
481
482 After you have run configure with the correct options, build and install
483 DSPAM by performing:
484
485 make && make install
486
487 Note:
488
489 If you are a developer wanting to link to the core engine of dspam,
490 libdspam will be built during this process. Please see the
491 example.c file for examples of how to link to and use libdspam. Static
492 and dynamic libraries are built in the .libs directory. Needed headers
493 will be installed in $prefix$/include/dspam.
494
4953. PERMISSIONS
496
497 In the typical UNIX environment, you'll need to worry about the following
498 permissions:
499
500 The CGI User: This is the user your web server (most likely Apache) is
501 running as. This is commonly 'nobody' or 'web'. You can find this in
502 Apache's httpd.conf by searching for 'User'. The CGI user will need
503 the ability to access the following components of DSPAM:
504 - Ability to execute the dspam binary
505 - Ability to read and write to dspam_home/data/
506 - Trusted user permissions in dspam.conf ("Trust [username]")
507 - The execution 'Group' used must match the group dspam is running as
508 (this is typically 'mail', 'dspam', or similar)
509
510 The MTA User: This is the user your mail server software is running as when
511 it executes DSPAM. This is usually daemon, mail, exim, etc. This is
512 typically different from the user the MTA runs and polices itself as, to
513 avoid security problems. Consult your MTA's documentation for more info.
514 The MTA user will require:
515 - The ability to execute the dspam binary
516 - Trusted user permissions in dspam.conf ("Trust [username]")
517
518 Systems Administrators: In order to perform administrative functions,
519 systems administratiors will require:
520 - The ability to execute dspam-related binaries
521 - Trusted user permissions in dspam.conf ("Trust [username]")
522
523 Note:
524
525 If the MTA is communicating with DSPAM via LMTP (explained later), then
526 execution permissions are not necessary
527
528 Note about FreeBSD:
529
530 FreeBSD's default MTA user is 'mailnull'
531 FreeBSD's default delivery agent also changes its uid, and so in order
532 to call it, dspam must be installed as setuid root to work on the
533 commandline properly. This is done automatically on install.
534
535
536 Understanding Trusted User Security
537
538 DSPAM has tighter security for untrusted users on the system to prevent
539 them from touching other user's data or passing arbitrary commands to the
540 delivery agent DSPAM calls. "Trusted User Security" is a simple system
541 whereby any unsafe functions are not available to a user calling dspam
542 unless they are within dspam.conf's trusted user list.
543
544 Local non-privileged users should be able to use DSPAM without any problems
545 while remaining untrusted, as long as they behave. For example, an untrusted
546 user cannot set their DSPAM username to any name other than their username.
547 Untrusted users are also limited to the delivery options set by the
548 system administrator, and cannot redirect how DSPAM delivers mail.
549
550 A list of trusted users is maintained in dspam.conf. This file should
551 include a list of trusted users who should be allowed to set the dspam user,
552 passthru parameters, and other information that would be potentially
553 dangerous for a malicious user to be able to set. You'll need to ensure
554 that your CGI user, MTA user, and system administrators are on the list.
555
5564. MAIL SERVER INTEGRATION
557
558 As previously mentioned, there are three popular ways to implement DSPAM:
559
560 As a delivery proxy:
561 The default approach integrates DSPAM directly with the mail server and
562 filters spam as mail comes in. Please see the appropriate instructions
563 in doc/ pertaining to your MTA.
564
565 As a POP3 proxy:
566 This alternative approach implements a POP3 proxy where users
567 connect to the proxy to check their email, and email is filtered when
568 being downloaded. The POP3 proxy is a much easier approach, as it
569 requires much less integration work with the mail server (and is ideal
570 for implementing DSPAM on Exchange, etcetera). Please see the file
571 doc/pop3filter.txt.
572
573 As an SMTP Relay:
574 DSPAM can be configured as an SMTP relay, a.k.a appliance. You
575 can set it up to sit in front of your real mail server and then point
576 your MX records at it. DSPAM will then pass along the good mail to
577 your real SMTP server. See doc/relay.txt for more information. The
578 example provided uses Postfix and MySQL.
579
580 Trusted users and the MTA
581
582 If you are using an MTA that changes its userid to match the destination
583 user before calling DSPAM, you won't be able to provide pass-thru
584 arguments to DSPAM (these are the commandline arguments that DSPAM in turn
585 passed to the local delivery agent, in such a configuration).
586 You will need to pre-configure the "default" pass-thru arguments in DSPAM.
587 This can be done by declaring an untrusted delivery agent in dspam.conf.
588 When DSPAM is called by an untrusted user, it will automatically force their
589 DSPAM user id and passthru delivery agent arguments specified in dspam.conf.
590
591 This information will override any passthru commandline parameters
592 specified by the user. For example:
593
594 UntrustedDeliveryAgent "/bin/mail -d $u"
595
596 The variable $u informs DSPAM that you would like the destination username
597 to be used in the position $u is specified, so when DSPAM calls your LDA
598 for user 'bob', it will call it with:
599
600 /bin/mail -d bob
601
6025. ALIASES
603
604 There are essentially two different ways a user might train DSPAM. The first
605 is by using the Web UI, which allows them to retrain via the "History"
606 tab. This works quite well, as users must visit the Web UI occasionally
607 to review their quarantine anyway (and reverse any false positives). We'll
608 discuss this shortly in section 1.1.8.
609
610 The more common approach to training, discussed here, is to allow users to
611 simply forward their spam to an email address where DSPAM can analyze and
612 learn it. DSPAM uses a signature-based system, where a serial number of
613 sorts is appended to each email processed by DSPAM. DSPAM reads this serial
614 number when the user forwards (or bounced) a message to what is called their
615 "spam email address". The serial number points to temporary information
616 stored on the server (for 14 days by default) containing all of the
617 information necessary for DSPAM to relearn the message. This is necessary
618 in order to relearn the *exact* message DSPAM originally processed.
619
620 Note:
621
622 If you are using an IMAP based system, Web-based email, or other form of
623 email management where the original messages are stored on the server in
624 pristine format, you can turn this signature feature off by setting
625 "TrainPristine on" in dspam.conf. DSPAM will then use the message itself
626 that you provide it to train, which MUST be identical to the original
627 message in order to retrain properly.
628
629 Because DSPAM learns each user's specific email behavior, it's necessary
630 to identify the user in order to program their specific filtering database.
631 This can be done in one of three ways:
632
633 The Simple Way:
634
635 If you are using the MySQL or PgSQL storage drivers, the original
636 numeric user id can be embedded in the signature, requiring only one
637 central spam alias to be necessary for the entire system. To configure
638 this, uncomment the appropriate UIDInSignature option in dspam.conf:
639
640 # MySQLUIDInSignature on
641 # PgSQLUIDInSignature on
642
643 Now all you'll need is a single system-wide alias, and DSPAM will train
644 the appropriate user when it sees the signature. An example of an alias
645 might look like:
646
647 spam:"|/usr/local/bin/dspam --user root --class=spam --source=error"
648
649 Similarly, you may also wish to have a false-positive alias for users who
650 prefer to tag spam rather than quarantine it:
651
652 notspam:"|/usr/local/bin/dspam --user root --class=innocent --source=error"
653
654 Note:
655
656 The 'root' user represents any active dspam user. It is necessary to
657 supply a username on the commandline or DSPAM will bail on
658 an error, however the user will be changed internally once the signature
659 is read.
660
661 The Kind-of-Simple Way:
662
663 If you're not using one of the above storage drivers, the next easiest
664 way to configure aliases is to have DSPAM parse the 'To:' header of the
665 message and use a catch-all subdomain to direct all mail into DSPAM for
666 retraining. You can then instruct your users to email addresses like
667 'spam-bob@relearn.example.org'. The ParseToHeaders option (available
668 in dspam.conf) will parse the To: header of forwarded messages and
669 set the username to either 'bob' or 'bob@relearn.example.org', depending
670 on how it is configured. DSPAM can also set the training mode to either
671 "learn spam" or "learn notspam" depending on whether the user specified
672 a spam- or notspam- address in the To: header.
673
674 This is ideal if you don't want to set up a separate alias for each user
675 on your system (The Hard Way). If you're fortunate enough to have a
676 mail server that can perform regular expression matching, you can set up
677 your system without a subdomain, and just use addresses like
678 spam-bob@example.org. For the rest of us, it will be necessary to set up
679 a subdomain catch-all directly into DSPAM. For example:
680
681 @relearn.example.org "|/usr/local/bin/dspam"
682
683 Don't forget to set the appropriate ParseToHeaders and related options in
684 dspam.conf as well. More specific instructions can be found in dspam.conf
685 itself. In most cases, the following will suffice:
686
687 ParseToHeaders on
688 ChangeUserOnParse user
689 ChangeModeOnParse on
690
691 The Old Way (A.K.A. The Hard Way)
692
693 If neither of the easy ways are possible, you're stuck with doing it
694 the hard way. This means you'll need a separate spam alias (and notspam
695 alias, if users are tagging mail) for each user. To do this, you will
696 need to create an email address for each user, so that DSPAM can
697 analyze and learn for that specific user. For example:
698
699 spam-bob: "|/usr/local/bin/dspam --user bob --class=spam --source=error"
700
701 You will end up having one alias per mail user on the system, two if you
702 do not use DSPAM's CGI quarantine (an additional one using notspam-). Be
703 sure the aliases are unique and each username matches the name after the
704 --user flag. A tool has been provided called dspam_genaliases. This tool
705 will read the /etc/passwd file and write out a dspam aliases file that can
706 be included in your master aliases table.
707
708 To report spam, the user should be instructed to forward each spam to
709 spam-user@yourhost
710
711 It doesn't really matter what you name these aliases, so long as the flags
712 being passed to dspam are correct for each user. It might be a good idea
713 to create an alias custom to your network, so that spammers don't forward
714 spam into it. For example, notspam-yourcompany-bob or something.
715
716 Note About Security:
717
718 You might be wondering if a user can forward a spam to another user's
719 address, or whether a spammer can forward a spam to another user's
720 notspam address. The answer is "no". The key to all mail-based retraining
721 is the signature embedded in each email. The signature is stored with
722 each user's own user id, and so not only does the incoming message have
723 to bear a valid signature, but it also has to be stored on the system with
724 the correct user id. This prevents any kind of alias abuse.
725
7266. NIGHTLY MAINTENANCE AND HOUSEKEEPING CRONS
727
728 Non-SQL Based Nightly Purge
729
730 If you are NOT running a SQL-based solution, then you should configure
731 dspam_clean to run under cron nightly. This clean tool will read all
732 signature databases and purge signatures that are older than 14 days
733 (configurable), purge abandoned tokens, and remove unimportant tokens.
734 Without this tool, old signatures will continue to pile up.
735 Be sure the user running cleanup has full read/write permissions on the
736 DSPAM data files.
737
738 0 0 * * * /usr/local/bin/dspam_clean [options]
739
740 See the dspam_clean description for more information
741
742 SQL-Based Nightly Purge
743
744 SQL-Based solutions include a nightly SQL script to perform the same basic
745 tasks as dspam_clean, and it does it much faster and with more finesse.
746 You can find instructions about each driver's purge functions in
747 the driver's README (doc/[driver].txt) for performing nightly
748 maintenance. Most SQL drivers will include a purge script in the
749 src/tools.[driver] directory. For example:
750
751 0 0 * * * mysql --user=[user] --pass=[pass] [db] < /path/to/purge-4.1.sql
752
753 Log Rotation
754
755 The system log and user logs can fill up fairly quickly, when all that's
756 really needed to generate graphs are the last two to three weeks of data.
757 You can configure a nightly log cleanup using dspam_logrotate:
758
759 0 0 * * * dspam_logrotate -a 30 -d /usr/local/var/dspam/data
760
7617. NOTIFICATIONS
762
763 DSPAM is capable of sending three different notifications to users:
764
765 - A "First Run" message sent to each user when they receive their first
766 message through DSPAM.
767
768 - A "First Spam" message sent to each user when they receive their first
769 spam
770
771 - A "Quarantine Full" message sent to each user when their quarantine box
772 is > 2MB in size (note: the 2MB limit is hardcoded in DSPAM).
773
774 These notifications can be activated by copying the txt/ directory from the
775 distribution into DSPAM's home (by default /usr/local/var/dspam). You can
776 alter the location of this directory by setting "TxtDirectory" in dspam.conf.
777
778 Example:
779 /usr/local/var/dspam/txt/firstrun.txt
780 /usr/local/var/dspam/txt/firstspam.txt
781 /usr/local/var/dspam/txt/quarantinefull.txt
782
783 You will want to modify these templates prior to installing them to reflect the
784 correct email addresses and URLs (look for 'example.org').
785
786 NOTE: The quarantine warning is reset when the user clicks 'Delete All', but
787 is not reset if they use "Delete Selected". If the user doesn't wish to
788 receive reminders, they should use the "Delete Selected" function instead
789 of "Delete All".
790
791 You'll need to also set "Notifications" to "on" in dspam.conf.
792
7938. THE WEB UI
794
795 The Web UI (CGI client) can be run from any executable location on
796 a web server, and detects its user's identity from the REMOTE_USER
797 environment variable. This means you'll need to use HTTP password
798 authentication to access the CGI (Any type of authentication will work,
799 so long as Apache supports the module). This is also convenient in that you
800 can set up authentication using almost any existing system you have.
801 The only catch is that you'll need the usernames to match the actual
802 DSPAM usernames used the system. A copy of the shadow password file
803 will suffice for most common installs.
804
805 The accompanying files in the webui/ folder should be copied into your
806 document root and cgi-bin, as specified.
807
808 Note:
809
810 Some authentication mechanisms are case insensitive and will
811 authenticate the user regardless of the case they type it in. DSPAM,
812 on the other hand, is case sensitive and the case of the username used
813 will need to match the case on the system. If you suffer from this
814 authentication problem, and are certain all of your users' usernames are
815 in lowercase, you can add the following line of code to the CGI right
816 after the call to &ReadParse...
817
818 $ENV{'REMOTE_USER'} = lc($ENV{'REMOTE_USER'});
819
820 The CGI will need to function in the same group as the dspam agent in order
821 to work with the files in dspam_home. The best way to do this is to create
822 a separate virtualhost specifically for the CGI and assign it to run in the
823 MTA group using Apache's suexec. If you are using procmail, additional
824 configuration may also be necessary (see below).
825
826 Note:
827
828 Apache users do NOT take on the identity of the groups specified in
829 /etc/group so you will need to specifically assign the group in
830 httpd.conf.
831
832 Note about Procmail:
833
834 Because the DSPAM Web UI is a CGI script, DSPAM will not retain its
835 setuid privileges when called. If you are running procmail, this will
836 become a problem as procmail requires root privileges to deliver. The
837 easiest hack around this is to create a procmail.dspam binary and make it
838 setuid root, then make it executable only by the mail group (or
839 whatever group DSPAM and the CGI run in).
840
841 The DSPAM Web UI has a minimal configuration inside the configure.pl script.
842 You'll want to check and make sure all of the settings are correct. In
843 most cases, the only that will be necessary to change are the large-scale
844 or domain-scale flags.
845
846 BEFORE PROCEEDING:
847 Check and make sure (Again) that the CGI user from Apache's httpd.conf is
848 added as a trusted user in dspam.conf.
849
850 Default Preferences
851
852 Now would be a good time to set the system's default preferences. This can
853 be done using the dspam_admin tool. For example:
854
855 dspam_admin ch pref default trainingMode TEFT
856 dspam_admin ch pref default spamAction quarantine
857 dspam_admin ch pref default spamSubject "[SPAM]"
858 dspam_admin ch pref default enableWhitelist on
859 dspam_admin ch pref showFactors off
860
861 The default preferences are used for any users who have not yet set their
862 own preferences. You can also control which preferences the user may
863 override by changing the "AllowOverride" settings in dspam.conf.
864
865 By default, the parameters specified on the commandline will be used (if
866 any). If, however, a preference is found for the particular user those
867 preferences will override the commandline.
868
869 GD Graphing Library
870
871 If you plan on leaving DSPAM's logging function enabled, and would like to
872 produce pretty graphs for your users, the graph.cgi script requires the
873 following be installed on your machine:
874
875 - GD Graphics Library (http://www.boutell.com/gd/)
876 Compile with png support
877
878 - The following PERL modules:
879 (http://www.perl.com/CPAN/modules/by-module/GD/)
880
881 . GD
882 . GD-Graph3d
883 . GDGraph
884 . GDTextUtil
885 . CGI
886
887 Typically this can be accomplished on the commandline:
888
889 perl -MCPAN -e 'install GD::Graph3d'
890
891 Configuring Administrators
892
893 Once you've configured the Web UI, you'll want to edit the 'admins' file to
894 contain a list of users who are permitted to use the administration suite.
895
896 Configuring Sub-Administrators / Domain Level Administrators
897
898 It is possible to delegate the management of users to a list of sub-admins/
899 domain level admins. To accomplish that you should edit the 'subadmins'
900 file to contain a list of sub-admins/domain level admins which are permitted
901 to switch their username while using the DSPAM control center.
902
903 Opt-In/Out
904
905 If you would like your users to be able to opt in/out of DSPAM filtering,
906 add the correct option to the nav_preferences.html template, depending on
907 your configuration (for example, if you have an opt-in system, you'll want to
908 add the opt-in option). Note: This currently only works with the preferences
909 extension, and not drop files.
910
911<INPUT TYPE=CHECKBOX NAME=optIn $C_OPTIN$>
912Opt into DSPAM filtering
913
914<INPUT TYPE=CHECKBOX NAME=optOut $C_OPTOUT$>
915Opt out of DSPAM filtering
916
9171.2 TESTING
918
919 If you've installed from an RPM, there's a good chance that the packager
920 went to the trouble of testing already. If you're building from sources,
921 however, you'll need to find a way to ensure your configuration isn't broken.
922
923 Most software packages are supplied with a test suite to determine if the
924 software is functioning properly. Since DSPAM's correct function relies
925 primarily on having the correct permissions and mail server configuration,
926 a test script fails to provide the level of testing required for such a
927 package. The following exercise has been provided to test dspam's correct
928 functioning on your system. This exercise does not test the Web UI, but only
929 the core dspam agent.
930
931 Before running the test, you should have completed section 1.1's instructions
932 for compiling and installing dspam as well as configured your mail server
933 to support dspam.
934
935 1. Create a new user account on your system. It is important that this be a
936 new account to prevent any unrelated email from being delivered during
937 testing. Be sure to configure a spam alias for the test account.
938
939 2. Send a short (10 words or less) email to the account, and pick it up
940 using your favorite mail client.
941
942 3. Run dspam_stats [username] on the server. You should see a value of 1
943 for "TI" or "Total Innocent" as shown below:
944
945 dspam-test 0 TP 1 TN 0 FN 0 FP
946
947 If you receive an error such as "unable to open /usr/local/var/dspam... for
948 reading", then the dspam agent is not configured correctly. The problem
949 could exist in either your mail server configuration or one or more of the
950 permissions on the directory or agent. Check your configuration and
951 permissions, and repeat this step until the correct results are experienced.
952
953 4. Run dspam_dump [username] to get a complete list of tokens and their
954 statistics. Each token should have an I: (innocent) hit count of 1. The
955 tokens will be represented as 64-bit values, for example:
956
9573126549390380922317 S: 0 I: 1 LH: Mon Aug 4 11:40:12 2003
95813884833415944681423 S: 0 I: 1 LH: Mon Aug 4 11:40:12 2003
95914519792632472852948 S: 0 I: 1 LH: Mon Aug 4 11:40:12 2003
9608851970219880318167 S: 0 I: 1 LH: Mon Aug 4 11:40:12 2003
961
962 To view statistics for a particular token, run dspam_dump [username] [token]
963 where token is the plain-text token value. For example:
964
965 % dspam_dump bill FREE
966 7717766825815048192 S: 00265 I: 00068 P: 0.7358
967
968 5. Forward the test message to the spam alias you've created for the test
969 account. Provide enough time for the message to have processed.
970
971 6. Run dspam_stats [username] on the server again. Now, the value for TN
972 should be zero and the value for FN (false negatives) should be 1 as shown
973 below:
974
975dspam-test 0 TP 0 TN 1 FN 0 FP
976
977 If this is not the case, check the group permissions of the dspam agent as
978 well as the permissions your MTA uses when piping to aliases.
979
980 7. Run dspam_dump [username] again. make sure that _EVERY_ token now has an
981 I: of zero and a S: of 1:
982
9833126549390380922317 S: 1 I: 0 LH: Mon Aug 4 11:44:29 2003
98413884833415944681423 S: 1 I: 0 LH: Mon Aug 4 11:44:29 2003
98514519792632472852948 S: 1 I: 0 LH: Mon Aug 4 11:44:29 2003
9868851970219880318167 S: 1 I: 0 LH: Mon Aug 4 11:44:29 2003
987
988 If you have some tokens that do not have an S: of 1 or an I: of 0, the dspam
989 signature was not found on the email, and this could be due to a lot of
990 things.
991
9921.3 TROUBLESHOOTING
993
994 Problem: No files are being created in the user directory
995 Solution: Check the directory permissions of the directory. The user
996 directory must be writable by the user the dspam agent is running
997 as as well as the CGI user.
998
999 Problem: False positives are never being delivered
1000 Solution: Your CGI most likely doesn't have the privileges required by
1001 the LDA to deliver the messages. Make sure the CGI user is in
1002 the correct group. Also consider setting the dspam agent to
1003 setuid or setgid with the correct permissions.
1004
1005 Problem: My database is getting huge!
1006 Solution: DSPAM's default training mode is TEFT. On top of this, the
1007 purging defaults are very lax. You might consider switching to
1008 TOE (Train-on-Error) mode training if you require a minimal
1009 database. If you are willing to sacrifice accuracy for disk space,
1010 disabling the 'chain' tokenizer from dspam.conf will prevent
1011 the use of multi-word (chained) tokens, which will also cut your
1012 database size considerably. You may also consider more frequent
1013 calls to dspam_clean -p to purge neutral data, which comprises a
1014 majority of most databases.
1015
1016 For more help, please see the DSPAM FAQ at http://dspam.sourceforge.net.
1017
10181.4 DSPAM TOOLS
1019
1020 A few useful tools have been provided to make DSPAM management a bit easier.
1021 These tools include:
1022
1023 dspam_admin - A tool used to perform specific administrative functions. These
1024 functions are usually included as part of an extensions package (such as
1025 the preferences extension). Available functions are listed in the tool's
1026 usage output.
1027
1028 dspam_train - Used to train and test a corpus of ham and spam (in maildir
1029 format).
1030 Syntax: dspam_train [username] [spam_dir] [nonspam_dir]
1031 where username is the username of the user to apply the training to, and
1032 the two dirs represent directories containing messages in individual
1033 files (e.g. maildir/corpus format). dspam_train can be used on an existing
1034 user's database, to further improve accuracy, or to train from scratch.
1035 it also provides a solid test jig for testing the efficiency and accuracy
1036 of a test corpus against the filter.
1037 NOTE: dspam_train will automatically balance training of the corpus to
1038 ensure both spam and nonspam are trained based on the ratio of
1039 spam/nonspam. this means if you have twice as much spam as nonspam,
1040 two spam will be trained for every nonspam.
1041
1042 dspam_dump - Dumps a DSPAM dictionary. This can be used to view the
1043 entire contents of a user's dictionary, or used in combination
1044 with grep to view a subset of data. Syntax: dspam_dump [username] [token]
1045 where username is the DSPAM user's username. If a token is specified,
1046 statistics only for that token will be printed.
1047
1048 dspam_clean - Performs nightly housecleaning by deleting old or useless
1049 data from user data. If using the hash driver (hash_drv) please use
1050 cssclean instead (see doc/README.cssclean)
1051
1052 dspam_clean performs the following operations:
1053
1054 1. Using the -s flag, dspam_clean will continue to perform stale signature
1055 purging. If an age is specified, for example -s14, the age defined as the
1056 default will be overridden. Specifying an age of 0 will delete all
1057 signatures for the users processed.
1058
1059 2. Using the -p flag, dspam_clean will delete all tokens from a user's
1060 database whose probability is between 0.35 and 0.65 (fairly neutral,
1061 useless tokens) that fall beyond the default age. If an age is specified,
1062 for example -p30, the age defined as the default will be overridden. It
1063 is a good idea to use this type of clean with an age of 0 on users after
1064 a lot of corpus training.
1065
1066 3. Using the -u flag, dspam_clean will delete all unused tokens from a
1067 user's database. There are four different types of unused tokens:
1068
1069 - Tokens which have not been used for a long time
1070 - Tokens which have a total hit count below 5
1071 - Tokens which have only one spam hit
1072 - Tokens which have only one innocent hit
1073
1074 Ages may be overridden by specifying a format such as -u30,15,10,10
1075 where each number represents the respective age. Specifying an age of
1076 zero will delete all unused tokens in the category. Defaults are set in
1077 dspam.conf.
1078
1079 Optionally, usernames may be specified to override the default behavior of
1080 processing all users.
1081
1082 Examples:
1083
1084 Process all users on the system using all clean operations:
1085 dspam_clean -s -p15 -u90,30,15,15
1086
1087 Delete all of user 'dick' and 'jane's signatures:
1088 dspam_clean -s0 dick jane
1089
1090 Perform a post-corpus training clean on user 'spot':
1091 dspam_clean -p0 -u0,0,0,0 spot
1092
1093 Run dspam_clean with all default options, all clean modes enabled, on all
1094 users on the system:
1095 dspam_clean -s -p -u
1096
1097 NOTE: You may wish to only run certain cleaning modes depending on the type
1098 of storage driver you are using. For example, the MySQL storage driver
1099 includes a script which performs signature and unused token operations,
1100 leaving only probability operations as useful. If you are using a SQL-based
1101 storage driver, it is strongly recommended that you use the maintenance
1102 scripts wherever possible for optimum efficiency.
1103
1104 dspam_stats - Displays the spam statistics for one or all users on the system.
1105 Syntax: dspam_stats [username]. If no username is provided, all users
1106 will be displayed. Displays TP (true positives), TN (true negatives),
1107 FN (false negatives), and FP (false positives).
1108
1109 dspam_genaliases - Reads the /etc/passwd file and outputs a dspam aliases
1110 table which can be included in the master aliases table. You may try
1111 Art Sackett's generate_dspam_aliases tool at
1112 http://www.artsackett.com/freebies/generate_dspam_aliases/ if you need
1113 some better functionality. This will eventually be merged in as a
1114 replacement for the existing tool.
1115
1116 dspam_merge - Merges multiple users' dictionaries together into one user's
1117 dictionary (does not affect the merge users). This can be used to create
1118 a seeded dictionary for a new user, or to copy a single user's dictionary
1119 to a new file. This is great for building global dictionaries, but
1120 crunches a lot of time and disk.
1121
11221.5 AGENT COMMANDLINE ARGUMENTS
1123
1124 The DSPAM agent (dspam) recognizes the following commandline arguments:
1125
1126 --user [user1 user2 ... userN]
1127 Specifies the destination user(s) of the incoming message. DSPAM then
1128 processes the message once for each user individually. If the message is to
1129 be delivered, the $u (or %u) parameters of the arguments string will be
1130 interpolated for the current user being processed.
1131
1132 --class=[spam|innocent]
1133 Tells DSPAM that the message being presented has already been classified by
1134 the user. This flag should be used when a misclassification has occurred,
1135 when the user is corpus-feeding a message, or an inoculation is being
1136 presented. This flag must be used in conjunction with the --source flag.
1137 Providing no classification invokes the SOP of DSPAM, which is to determine
1138 the message's nature on its own.
1139
1140 --source=[error|corpus|inoculation]
1141 Wherever --class is used, the source of the user-provided
1142 classification must also be provided. The source is very important and
1143 dramatically affects DSPAM's training behavior:
1144
1145 error: The message being presented was a message previously misclassified
1146 by DSPAM. When 'error' is provided as a source, DSPAM requires that
1147 the DSPAM signature be present in the message, and will use the
1148 signature to recall the original training metadata. If the signature
1149 is not present, the message will be rejected. In this source mode,
1150 DSPAM will also decrement each token's previous classification's
1151 count as well as the user totals.
1152
1153 You should use error only when DSPAM has made an error in
1154 classifying the message, and should present the modified version of
1155 the message with the DSPAM signature when doing so.
1156
1157 corpus: The message being presented is from a mail corpus, and should be
1158 trained as a new message, rather than re-trained based on a
1159 signature. The message's full headers and body will be analyzed and
1160 the correct classification will be incremented, without its
1161 opposite being decremented.
1162
1163 You should use corpus only when feeding messages in from corpus, not
1164 for correcting errors.
1165
1166 inoculation: The message being presented is in pristine form, and should
1167 be trained as an inoculation. Inoculations are a more
1168 intense mode of training designed to cause DSPAM to
1169 train the user's metadata repeatedly on previously unknown
1170 tokens, in an attepmt to vaccinate the user from future
1171 messages similar to the one being presented.
1172
1173 You should use inoculation only on honeypots and the like.
1174
1175 --deliver=[spam,[innocent|nonspam],summary,stdout]
1176 Tells DSPAM to deliver the message if its result falls within the criteria
1177 specified. For example, --deliver=innocent or --deliver=nonspam will cause
1178 DSPAM to only deliver the message if its classification has been determined
1179 as innocent. Providing --deliver=innocent,spam or --deliver=nonspam,spam will
1180 cause DSPAM to deliver the message regardless of its classification. This flag
1181 provides a significant amount of flexibility for nonstandard implementations,
1182 where false positives may not be delivered but spam is, and etcetera.
1183
1184 summary : Deliver (to stdout) a summary indentical to the output of message
1185 classification:
1186 X-DSPAM-Result: User; result="Innocent"; class="Innocent";
1187 probability=0.0000; confidence=1.00;
1188 signature=4b11c532158749980119923
1189
1190 stdout : Is a shortcut for for --deliver=innocent,spam --stdout
1191
1192 --stdout
1193 If the message is indeed deemed "deliverable" by the --deliver flag, this
1194 flag will cause DSPAM to deliver the message to stdout, rather than
1195 the configured delivery agent.
1196
1197 --process
1198 Tells DSPAM to process the message. This is the default behavior, and the
1199 flag is implied unless --classify is used - but is a good idea to use to
1200 avoid ambiguity.
1201
1202 --classify
1203 Tells DSPAM only to classify the message, and not make any writes to the
1204 user's metadata or attempt to deliver/quarantine the message.
1205
1206 NOTE: The output of the classification is specific to the user, not including
1207 the output of any groups they might be affiliated with, so it is
1208 entirely possible that the message would be caught as spam by the group,
1209 even if it didn't appear in the classification. If you want to get
1210 the classification for the GROUP, use the group name as the user
1211 instead of an individual.
1212
1213 --signature=[signature]
1214 For some implementations, the admin may wish to pass the signature in
1215 via commandline instead of allowing DSPAM to find it on its own. This is
1216 especially useful when front-ending the agent with other tools. Using this
1217 option will set the active signature and will also forego reading of stdin.
1218
1219 --mode=[toe|tum|teft|notrain|unlearn]
1220 Configures the training mode to be used for this process:
1221
1222 teft: Train-Everything. Trains on all messages processed. This is
1223 a very thorough training approach and should be considered the
1224 standard training approach for most users. TEFT may, however,
1225 prove too volatile on installations with extremely high per-user
1226 traffic, or prove not very scalable on systems with extremely large
1227 user-bases. In the event that TEFT is proving ineffective, one of
1228 the other modes is recommended.
1229
1230 NOTE: Until a user reaches 100 innocent messages in their
1231 metadata, train-on-error will also be teft-based, even if
1232 otherwise specified on the commandline.
1233
1234 toe: Train-on-Error. Trains only on a classification error, once the
1235 user's metadata has matured to 2500 innocent messages. This
1236 training mode is much less resource intensive, as only occasional
1237 metadata writes are necessary. It is also far less volatile than
1238 the TEFT mode of training. One drawback, however, is that TOE only
1239 learns when DSPAM has made a mistake - which means the data is
1240 sometimes too static, and unable to "ease into" a different type of
1241 behavior.
1242
1243 tum: Train-until-Mature. This training mode is a hybrid between the other
1244 two training modes and provides a great balance between volatility
1245 and static metadata. TuM will train on a per-token basis only
1246 tokens which have had fewer than 50 "hits" on them, unless an error
1247 is being retrained in which case all tokens are trained. This
1248 training mode provides a solid core of stable tokens to keep
1249 accuracy consistent, but also allows for dynamic adaptation to any
1250 new types of email behavior a user might be experiencing. It is a
1251 balance of resources as well, as only less-than-mature tokens are
1252 written to the database. NOTE: You should corpus train before
1253 using tum.
1254
1255 notrain: No training. Do not train the user's data, and do not keep totals.
1256 This should only be used in cases where you want to process mail for
1257 a particular user (based on a group, for example), but don't want
1258 the user to accumulate any learning data.
1259
1260 unlearn: Unlearn original training. Use this if you wish to unlearn a
1261 previously learned message. Be sure to specify --source=error and
1262 --class to whatever the original classification the message was
1263 learned under. If not using TrainPristine, this will require the
1264 original signature from training.
1265
1266 RECOMMENDATIONS:
1267 In general, it is recommended that users begin with TEFT. If a user
1268 is experiencing between a 75-85% spam ratio, they may benefit from
1269 Train-on-Mature mode. If a user is experiencing over 90% spam, then
1270 Train-on-Error mode should make a noticeable improvement in accuracy.
1271 It eventually boils down to what works best for your users. There is
1272 no reason a system could not be configured (with a script) to
1273 analyze a user's *.stats file and determine the best training mode
1274 for that user.
1275
1276 --feature=[no,wh,tb=N]
1277 Specifies the features that should be activated for this filter instance.
1278 The following features may be used individually or combined using a comma
1279 as a delimiter:
1280
1281 no: Bayesian Noise Reduction (BNR). Bayesian Noise Reduction kicks in
1282 at 2500 innocent messages and provides an advanced progressive
1283 noise logic to reduce Bayesian Noise (wordlist attacks) in
1284 spams. BNR is not for everyone, and so users should try it out
1285 after they've trained to see if it helps improve accuracy.
1286
1287 tb=N: Sets the training loop buffering level.
1288 Training loop buffering is the amount of statistical sedation
1289 performed to water down statistics and avoid false positives
1290 during the user's training loop. The training buffer sets the
1291 buffer sensitivity, and should be a number between 0 (no buffering
1292 whatsoever) to 10 (heavy buffering). The default is 5, half of
1293 what previous versions of DSPAM used.
1294 To avoid dulling down statistics at all during the training loop,
1295 set this to 0. This feature should be disabled if you're not
1296 paranoid about false positives, as it does increase the number of
1297 spam misses significantly during training.
1298
1299 wh: Automatic whitelisting. DSPAM will keep track of the entire
1300 "From:" line for each message received per user, and automatically
1301 whitelist messages from senders with more than 10 innocent
1302 messages and zero spams. Once the user reports a spam from the
1303 sender, automatic whitelisting will automatically be deactivated
1304 for that sender. Since DSPAM uses the entire "From:" line, and
1305 not just the sender's email address, automatic whitelisting is
1306 a very safe approach to improving accuracy during initial training.
1307
1308 NOTE: None of the present features are necessary when the source is "error",
1309 because the original training data is used from the signature to
1310 retrain, instantiating whatever features (such as whitelisting) were
1311 active at the time of the initial classification. Since BNR is only
1312 necessary when a message is being classified, the
1313 --feature flag can be safely omitted from error source calls.
1314
1315 --daemon
1316 Puts DSPAM in daemon mode; e.g. DSPAM acts like a server when started with
1317 this parameter. See section 2.3 for more information about daemon mode.
1318
13192.0 LINKING WITH LIBDSPAM
1320
1321 Developers are able to link to the DSPAM core engine (libdspam) to provide
1322 "drop-in" spam-filtering for their applications. Examples of the libdspam
1323 API can be found in the example.c file included with this distribution.
1324
1325 <COMMERCIAL LICENSING>
1326
1327 IF YOUR PROJECT USES THE LIBDSPAM API, A GPL-COMPATIBLE OPEN SOURCE LICENSE
1328 IS REQUIRED IN ORDER TO REDISTRIBUTE. IF YOU ARE DEVELOPING A CLOSED-SOURCE
1329 APPLICATION OR APPLICATION THAT DOES NOT CONFORM TO GPL STANDARD, YOU MAY
1330 NOT REDISTRIBUTE ANY APPLICATIONS USING LIBDSPAM WITHOUT A COMMERCIAL
1331 LICENSE.
1332
1333 Please contact project administrators paulcockings@users.sourceforge.net
1334 or sbajic@users.sourceforge.net for information about commercial licensing.
1335
1336 </COMMERCIAL LICENSING>
1337
1338 To link to libdspam, follow the instructions for compiling and installing
1339 DSPAM. When compiled, the libdspam static and shared libraries are also
1340 built. This library contains all the functions necessary to use dspam's
1341 filtering in your application.
1342
1343 Your application will also need to link to the correct storage driver
1344 libraries. If you are using libdspam in a multithreaded application, you
1345 will need to either use a thread-safe storage driver or control access to
1346 libdspam using a mutex lock.
1347
1348 If you are using libdspam in a multithreaded environment, each thread will
1349 require its own DSPAM context. Fortunately, you can attach the same
1350 database handle to each context using dspam_attach(). See the man page for
1351 more information.
1352
1353 To build with the dspam API, you will also need the header files from
1354 the distribution. You can copy these to /usr/include/dspam for ease of
1355 use, and then use -I/usr/include/dspam
1356
1357 Please see example.c for API examples.
1358
1359 If you are interested in linking libdspam with your project and have
1360 questions or concerns, please contact the dspam-devel@lists.sourceforge.net
1361 mailing list.
1362
13632.1 CONFIGURING GROUPS
1364
1365 Groups enable a group of users to share information.
1366
1367 To create groups, you'll want to create a group configuration file. The location
1368 of this file is defined as GroupConfig in dspam.conf, and defaults to
1369 /usr/local/var/dspam/group. The format of the file is:
1370
1371 group1:type:user1,user2,user3
1372 group2:type:*globaluser
1373
1374 DSPAM will read this file upon startup and determine if the user fits into
1375 any particular group.
1376
1377 DSPAM supports the following group types:
1378
1379 SHARED
1380 Enables users with similar email behavior to share the same dictionary
1381 while still maintaining a private quarantine box. The benefits of this
1382 type of group are faster learning, and sharing a single spam alias. Shared
1383 groups can have both positive and negative effects on accuracy. If a shared
1384 group consists of users with similar, predictable email behavior, the users
1385 in the group can benefit from a larger dictionary of spam and faster
1386 learning (especially for newcomers in the group). If a group consists of
1387 users with different email behavior, however, the users in the group will
1388 experience poor spam filtering and a higher number of false positives.
1389
1390 NOTE: The SQL-based storage drivers support shared groups, but has one caveat:
1391 If you are NOT enabling "virtual users" support, you will need to create
1392 an actual user on your system named after each group you create.
1393
1394 On top of shared group support, a shared group can also be made to be
1395 'managed'. Using the group type 'SHARED,MANAGED' will cause the group to
1396 share a single quarantine mailbox which could be managed by the group's
1397 administrator (aka: the group name). This would enable one individual to
1398 monitor quarantine for the entire group, however personal emails marked as
1399 false positives could potentially be viewed as well. For this reason,
1400 managed groups should only be used when this is not an issue.
1401
1402 NOTE: Use the dspam_stats tool to keep an eye on the effectiveness of
1403 shared groups. If a shared group experiences poor performance, find
1404 the users whose email behavior is inconsistent with that of the group
1405 and remove them from the group.
1406
1407 The format for a shared or shared,managed group is:
1408
1409 group1:shared:user1,user2,userN
1410 group2:shared,managed:user1,user2,userN
1411 group3:shared:*@example.org
1412 group4:shared:*
1413
1414 The group name (in the example above 'group1', 'group2', 'group3', 'group4')
1415 can be anything you like. If you set the shared group to be managed then the
1416 groupname (in the example above 'group2') will be used by DSPAM as the shared
1417 group administrator.
1418
1419 The user/member list for shared group allows the following syntax:
1420 user1 : Exact match of user with the name "user1"
1421 * : Match any user
1422 *@example.org : Match any user having '@example.org' at the end of ther
1423 username. The matching only works for the '@' character.
1424 You can not use something like '*user' to include user
1425 'infouser', 'testuser', 'dummyuser', etc.
1426
1427 INOCULATION
1428 An inoculation group allows users to maintain their own private dictionaries
1429 with their own spam alias, but all members of the group will inoculate other
1430 members with spams they manually forward into their alias. This allows users
1431 to report spams to one another and maintain their own private dictionary.
1432 Another advantage to this is that users do not necessarily have to share the
1433 same email behavior.
1434
1435 VERSATILE LANGUAGE INOCULATION MESSAGES
1436
1437 A new Internet-Draft has been released to the public:
1438
1439 http://tools.ietf.org/html/draft-spamfilt-inoculation-01
1440 http://tools.ietf.org/html/draft-yerazunis-spamfilt-inoculation-03
1441
1442 To create a message format standard for sending inoculation data via email.
1443 This will allow users on different servers, and even using different
1444 anti-spam tools to share inoculation information with one-another.
1445
1446 DSPAM presently implements support for this message standard with the
1447 following limitations:
1448
1449 - Only inbound inoculation messages are supported. DSPAM does not yet send
1450 out inoculations using this message format. This should not be confused
1451 with local inoculation, which *is* supported.
1452
1453 - The message/inoculation format is the only inoculation type presently
1454 supported. text/inoculation and multipart/inoculation coming soon.
1455
1456 - The only supported authentication mechanism is presently md5 verification
1457 codes/checksums.
1458
1459 Any unsupported inoculations will simply be dropped.
1460
1461 A list of identifies and authentication information can be set up in the file
1462 [username].inoc or in the user's home directory in a .inoc file if
1463 homedir-dotfiles is enabled. The format of this file is:
1464
1465 sender1:shared secret
1466 sender2:shared secret
1467
1468 Each sender should specify the correct sender id when sending an
1469 inoculation, and should generate their checksum based on the shared secret
1470 established between both parties.
1471
1472 NOTE: Users should only be added to an inoculation group after their initial
1473 learning period, to avoid potential false positives due to lack of data.
1474
1475 The format for a innoculation group is:
1476
1477 group1:inoculation:user1,user2,userN
1478 group2:inoculation:user3,user4,userN
1479
1480 The group name (in the example above 'group1', 'group2') can be anything you
1481 like. It is not used by DSPAM and does even not have to be unique.
1482
1483 The user/member list for inoculation group allows the following syntax:
1484 user1 : Exact match of user with the name "user1"
1485
1486 CLASSIFICATION
1487 Classification groups allow a group of users to network their results
1488 together. If DSPAM is uncertain of whether a message is spam or nonspam for
1489 a group member, all other members of the group are queried. If another member
1490 believes the message to be spam, it will be marked as spam. DSPAM is querying
1491 the members one by one and stopps as soon as a member reports believes that
1492 the message is spam.
1493
1494 The format for a classification group is:
1495
1496 group1:classification:user1,user2,userN
1497 group2:classification:user3,user4,userN
1498
1499 The group name (in the example above 'group1', 'group2') can be anything you
1500 like. It is not used by DSPAM and does even not have to be unique.
1501
1502 The user/member list for inoculation group allows the following syntax:
1503 user1 : Exact match of user with the name "user1"
1504
1505 GLOBAL
1506 Global groups allows DSPAM to provide a "SpamAssassin type out-of-the-box
1507 filtering" for all new users until they have built their own useful
1508 dictionaries. A global group can be created by adding a CLASSIFICATION
1509 group definition (see above) but prefix the group member/user with a '*'.
1510
1511 The format for a global classification group is:
1512
1513 groupname:classification:*globaluser
1514
1515 This will automatically add user globaluser as a classification peer to all
1516 users. Any user who has less than 1000 innocent messages or 250 spam messages
1517 in their corpus, or whose filter is uncertain (confidence less than 0.65)
1518 about a particular message will consult the globaluser dictionary for an
1519 answer.
1520
1521 The Global group user (in this case 'globaluser') will need to be trained
1522 using corpus, by using the dspam_merge tool, or other means. The Global
1523 group user (in this case 'globaluser') is treated just as any other user on
1524 the system.
1525
1526 The group name (in the example above 'groupname') can be anything you like. It
1527 is not used by DSPAM and does even not have to be unique.
1528
1529 NOTE: Be sure and set your global user's preferences so that trainingMode
1530 is set to TOE. This will prevent the purge tools you use from
1531 purging them empty in 90 days.
1532
1533 MERGED
1534 Merged groups are similar to global groups in that the entire system uses a
1535 single global user as a parent. What's different is that the merged group is
1536 merged with the individual user's training data at run-time, instead of
1537 switching between the two. This allows the merged group to be treated like a
1538 base dataset for all users, and provides for quicker learning and correction
1539 than the previous approach. It is recommended merged groups are only used with
1540 TOE-mode training so that only corrective data is stored, but systems with
1541 ample amounts of disk may wish to run in TUM mode to learn the user's behavior
1542 dynamically.
1543
1544 The group's data is merged with the user's data in real-time, so if you have:
1545
1546 Group : Viagra = 10 Spam Hits, 0 Innocent Hits
1547 User1 : Viagra = 5 Spam Hits, 15 Innocent Hits
1548 User2 : Viagra = 20 Spam Hits, 1 Innocent Hits
1549
1550 Then the token is loaded as:
1551 User1 : Viagra = 15 Spam Hits, 15 Innocent Hits = 0.50 (50%) = neutral
1552 User2 : Viagra = 30 Spam Hits, 1 Innocent Hits
1553
1554 No data is written to the group by DSPAM; only the user's data. This then
1555 offsets the group's data without affecting other users. Because of the way
1556 this data is merged, it's not recommended that you update the merged group
1557 with more than a handful of messages periodically, as it affects how all
1558 stats are defined for each user.
1559
1560 The format for a merged group is:
1561
1562 group1:merged:user1,user2,userN
1563 group2:merged:user3,user4,userN
1564
1565 The group name (in the example above 'group1', 'group2') can be anything you
1566 like and represents the name of the group user to merge with all members of
1567 the group. DSPAM will use that group name (in the example above 'group1',
1568 'group2') and merge at run-time the tokens from that group name with the tokens
1569 of the user (if the user is member of the merged group).
1570
1571 The user/member list for merged group allows the following syntax:
1572 user1 : exact match of user with the name "user1"
1573 -user1 : exclude user with the name "user1"
1574 * : match any user
1575 *@example.org : match users having "@example.org" at the end of ther
1576 username. The matching only works for the '@' character.
1577 You can not use something like '*user' to include user
1578 'infouser', 'testuser', 'dummyuser', etc.
1579 -*@example.org : exclude users having "@example.org" at the end of their
1580 username. The matching only works for the '@' character.
1581 You can not use something like '-*user' to exclude user
1582 'infouser', 'testuser', 'dummyuser', etc.
1583
1584 NOTE: Merged Groups are great for providing out-of-the-box adaptive filtering,
1585 but allowing users to build their own data from scratch will still
1586 result in the best possible accuracy in the longrun.
1587
1588 NOTE: Be sure and set your group user's preferences so that trainingMode is
1589 set to TOE. This will prevent the purge tools you use from purging them
1590 empty in 90 days.
1591
1592 RESTRICTIONS!
1593
1594 A user can simultaneously be a member of multiple classification / global
1595 group(s) and multiple inoculation group(s), but a user cannot be a member
1596 of both a classification / global group(s) or inoculation group(s) and a
1597 shared or shared,managed group.
1598
1599 A user can not be member of:
1600 * both a classification group and a global group
1601 * multiple merged groups
1602 * multiple shared or shared,managed groups
1603 * both a shared group or shared,managed group and a merged group
1604
16052.2 EXTERNAL INOCULATION THEORY
1606
1607 Bill Yerazunis recently expressed his theory of inoculation on an anti-spam
1608 development list, using the term "vaccination":
1609
1610 "Part of the problem is that spam isn't stationary, it evolves. That
1611 pesky .1% error rate is in some part due to the base mutation rate of spam
1612 itself. Maybe the answer is "vaccination". Vaccination is using _one_
1613 person's misery be used to generate some protective agent that protects the
1614 rest of the population; only the first person to get the spam actually has
1615 to read it.
1616
1617 My expectation is this: say you have ten friends, and you all agree to share
1618 your training errors. Each of you will (statistically) expect to be the
1619 first to see a new mutation of spam about 9% of the time; the other ten
1620 friends in this group will have their bayesian filter trained preemptively
1621 to prevent this. Net result: you get a tenfold decrease in error rate -
1622 down to 99.99% accuracy. With a hundred such (trusted) friends, you may be
1623 down to 99.999% accuracy."
1624
1625 DSPAM has taken this concept and rolled it into support for what we call
1626 "inoculation groups" providing the exact functionality Bill describes. This
1627 could be considered an "internal inoculation" practice.
1628
1629 On top of this, DSPAM has been designed to support external inoculation as
1630 a complement to internal inoculation. This is where instead of your internal
1631 circle of friends inoculate you, you rely on external elements - namely
1632 spammers themselves - to inoculate you.
1633
1634 The theory behind external inoculation is this: why put _anyone_ through
1635 the misery of being the first to receive a new spam when you can have
1636 the spammers themselves send it directly to you. On top of this,
1637 external inoculation can be combined with internal inoculation by taking
1638 the spam you received externally and inoculating your friends with it
1639 internally.
1640
1641 Inoculation is a little different from learning, as inoculation causes
1642 tokens to be given additional hit counts in an attempt to learn from a
1643 single email. As a result, any form of inoculation should _only_ be
1644 attempted after an initial learning phase (perhaps when your filtering
1645 accuracy exceeds 99.0%). DSPAM inoculates like this:
1646
1647 1. Every token that doesn't already exist in the database, or have fewer
1648 than two hits will be hit five times.
1649
1650 2. All other tokens are hit twice.
1651
1652 External inoculation is accomplished by creating a covert, external alias
1653 that is configured to automatically inoculate your dictionary from any
1654 messages it receives. The covert alias can then be published onto a series
1655 of public newsgroups and websites where it is sure to be harvested by
1656 a spammer's tools. One could even pro-actively subscribe one's self to
1657 several different opt-in spam lists, etcetera.
1658
1659 The first step is to configure an alias. To do this you would use something
1660 like:
1661
1662 bob_c: "|/path/to/dspam --process --class=spam --source=inoculation --user bob"
1663
1664 The 'C' in bob is for 'Covert'. We must use a covert alias because if we
1665 use something obvious like 'bob-spam', harvester tools will automatically
1666 strip the -spam off and spam your real account.
1667
1668 Once the alias is set up, make sure this alias gets out only on lists where
1669 harvesters will grab it, and nobody will send legitimate email to it.
1670 It may even be a good idea to put it at the bottom of your tagline in all
1671 your publicly archived emails, something like...
1672
1673 Spammers, send me mail here: bob_c@example.org
1674
1675 Finally, you can multiply the effects of this by sharing an inoculation
1676 group with your friends. If all of your friends have a public covert
1677 alias, then you will all be able to inoculate eachother should one of you
1678 receive a spam to the account. What a great way to train your filter!
1679
1680 On top of this, should external inoculation become commonplace to the
1681 point where harvesters are picking up an equal amount of them as legitimate
1682 email addresses, spammers will start to realize that harvesters are just
1683 plain too dumb to tell the difference (the spammers themselves couldn't tell
1684 if mine was or not). This could, best case scenario, put an end to
1685 harvester bots, making them obsolete as counter-productive tools.
1686
16872.3 CLIENT/SERVER MODE
1688
1689 DSPAM supports two different modes of operation. In standard operating
1690 mode, the DSPAM agent is called by the MTA (or proxy) and each agent process
1691 performs independently, establishing its own connection to a database and
1692 performs delivery on its own. The second operating mode, client/server mode,
1693 allows the DSPAM agent to act more like a thin client, connecting to the
1694 DSPAM server process which then does all the work of analyzing and delivering
1695 or quarantining the message. The advantages to using DSPAM in client/server
1696 mode are:
1697
1698 - Maintaining a set of stateful database connections (within the server),
1699 which should enhance performance on some systems by eliminating the need
1700 to establish a new database connection for every message processed.
1701
1702 - Providing a central point of processing. Having one server perform all
1703 processing and delivery, while having multiple thin clients on your mail
1704 servers may be more desirable than having multiple agents performing
1705 processing and delivery on all your servers.
1706
1707 - The DSPAM server speaks LMTP, which some implementations may be able to
1708 take advantage of, eliminating the need for the DSPAM client all together.
1709
1710 - Having a single multithreaded daemon should use less memory and other
1711 resources than having independently operating clients.
1712
1713 If you've already got DSPAM set up, client/server mode won't require any
1714 changes to your mail server's configuration - it's completely transparent.
1715
1716 The DSPAM agent can be compiled with client/server support by configuring
1717 with --enable-daemon. You will need to use a multithread-safe storage driver
1718 (presently mysql_drv, pgsql_drv and hash_drv are supported). Once you have
1719 compiled with daemon support, you'll need to modify your dspam.conf to
1720 provide the settings necessary for client/server mode:
1721
1722 ServerHost 127.0.0.1
1723
1724 The host to listen on. The default is to comment this setting which will
1725 force DSPAM to listen on all available interfaces.
1726
1727 ServerPort 24
1728
1729 The port to listen on. The default is 24, the LMTP port.
1730
1731 ServerQueueSize 32
1732
1733 The maximum number of connections which may remain backlogged before they
1734 are accepted.
1735
1736 ServerPass.Relay1 "secret"
1737 ServerPass.Relay2 "password"
1738
1739 Each client server allowed to connect should have its own password. They
1740 can be defined here.
1741
1742 The DSPAM server can listen on either a network socket or a local unix
1743 domain socket. If you're running the client and server on the same machine,
1744 a domain socket should be used as it eliminates additional overhead. To use
1745 a domain socket, you'll also need to add the following option:
1746
1747 ServerDomainSocketPath "/tmp/dspam.sock"
1748
1749 Once you've configured the server config, you'll want to set the client
1750 configuration on all client machines. If you are using network sockets,
1751 set the following to appropriate values:
1752
1753 ClientHost 127.0.0.1
1754 ClientPort 24
1755
1756 Or if using a domain socket:
1757
1758 ClientHost /tmp/dspam.sock
1759
1760 In both cases, you'll need to set the client's authentication ident:
1761
1762 ClientIdent "secret@Relay1"
1763
1764 Now you're ready to go. To start the DSPAM server, run:
1765
1766 dspam --daemon &
1767
1768 Or alternatively, if you have debugging enabled:
1769
1770 dspam --debug --daemon &
1771
1772 The DSPAM agent can then be called the same as if you were running in
1773 standard (non-client/server) mode and adding --client to the set of
1774 parameters. Running dspam without --client specified will cause DSPAM to
1775 revert to its normal non-daemon behavior and establish database connections
1776 on its own. The client settings will be loaded from dspam.conf, and the
1777 agent will act as a thin client instead. For example:
1778
1779 dspam --client --user dick jane --deliver=innocent -d %u
1780
1781 Alternatively, if you'd like to use a thinner client, dspamc is identical
1782 to the dspam binary in behavior, but has been stripped down to only include
1783 the lightweight client.
1784
1785 dspamc --user dick jane --deliver=innocent -d %u
1786
1787 The conversation that takes place between the client/server is LMTP-based,
1788 and will look like this:
1789
1790 SERVER> 220 DSPAM DLMTP 3.10.0 Authentication Required
1791 CLIENT> LHLO Relay1
1792 SERVER> 250-PIPELINING
1793 SERVER> 250-ENHANCEDSTATUSCODES
1794 SERVER> 250-DSPAMPROCESSMODE
1795 SERVER> 250 SIZE
1796 CLIENT> MAIL FROM: <secret@Relay1> DSPAMPROCESSMODE="--deliver=innocent -d %u"
1797 SERVER> 250 2.1.0 OK
1798 CLIENT> RCPT TO: dick
1799 SERVER> 250 2.1.5 OK
1800 CLIENT> RCPT TO: jane
1801 SERVER> 250 2.1.5 OK
1802 CLIENT> DATA
1803 SERVER> 354 Enter mail, end with "." on a line by itself
1804 CLIENT> Subject: Cheap Viagra!
1805 CLIENT>
1806 CLIENT> Click Here: http://www.cheapviagra.example.org
1807 CLIENT> .
1808 SERVER> 250 2.0.0 <dick> Message accepted for delivery: INNOCENT
1809 SERVER> 250 2.0.0 <jane> Message accepted for delivery: SPAM
1810
1811 Optionally, if you'd like the clients to perform delivery, you can use
1812 DSPAM's --stdout or --classify functionality to obtain a dump of the message
1813 or results, respectively. From there, it's up to you and your MTA to
1814 deliver the message. The DSPAM client will output the results to stdout in
1815 this case, just as it would in standard operating mode.
1816
1817 Once the server is running, its configuration can be reloaded with a SIGHUP.
1818 When the daemon is reloaded, the following occurs:
1819
1820 - The daemon stops listening for new requests
1821 - All threads are allowed to finish processing and exit
1822 - All connections to the database are closed
1823 - The dspam.conf configuration is reloaded
1824 - All connections to the database are re-opened
1825 - The daemon starts listening for new requests
1826
1827 This allows database and listener configurations to also be reloaded from
1828 dspam.conf without the need to interrupt the process.
1829
1830 NOTE: During the period of time the daemon is reloading, client connections
1831 will fail. Depending on how the MTA reacts, this may cause messages to
1832 fall back to queue or to bounce.
1833
18342.4 LMTP
1835
1836 DSPAM supports LMTP both on the front-end and back-end (delivery). This
1837 section will briefly provide instructions for configuring either or both of
1838 these advanced options.
1839
1840 LMTP (AND SMTP) DELIVERY
1841
1842 DSPAM supports LMTP delivery for admins who would prefer to use this instead
1843 of local delivery. While LMTP delivery doesn't _require_ operating in
1844 daemon mode, it is necessary to compile DSPAM with --enable-daemon to take
1845 advantage of LMTP delivery. To configure LMTP delivery, perform the following
1846 steps:
1847
1848 1. Compile DSPAM with --enable-daemon to enable LMTP delivery code
1849
1850 2. Configure your DeliveryHost and DeliveryIdent in dspam.conf. Set
1851 DeliveryProto based on whether you would like to delivery via LMTP or SMTP.
1852
1853 NOTE: If you would like to delivery to different hosts based on domain,
1854 specify DeliveryHost.example.org as the configuration directive. Use
1855 DeliveryPort.example.org to specify a port for the delivery.
1856
1857 3. Add the --lmtp-recipient flag to the arguments passed into DSPAM. This is
1858 used to specify the destination address for the message. For example, in
1859 postfix:
1860
1861 --lmtp-recipient=${recipient}
1862
1863 DSPAM will then connect to the specified host, and deliver using a standard
1864 LMTP looking like:
1865
1866 LHLO [ident]
1867 MAIL FROM:<> SIZE=[message_length]
1868 RCPT TO: <recipient>
1869 DATA
1870 [Message]
1871 .
1872
1873 LMTP SERVER
1874
1875 DSPAM supports a "daemon" mode where it will sit and listen for inbound
1876 connections. Depending on how the server is configured, DSPAM can speak
1877 either standard LMTP (for interaction with a mail server, such as postfix)
1878 or DLMTP (DSPAM LMTP) which is a proprietary implementation of LMTP between
1879 the DSPAM client and server. If you plan on calling DSPAM from the commandline
1880 via dspamc, but wish to have a stateful daemon perform processing, then
1881 you'll want to use the "dspam" server mode. If you want to call DSPAM by
1882 having your mail server connect to it via LMTP, then you'll need to specify
1883 the "standard" server mode.
1884
1885 The ServerMode can be set in dspam.conf. Each mode has its own custom
1886 tweaks and configurations that will need to be set in dspam.conf.
1887
1888 "dspam" mode settings.
1889 In "dspam" mode, you'll need to set up authentication for each dspam client
1890 relay. This involves configuring the relay ident and password. Examples are
1891 provided.
1892
1893 "dspam" mode notes.
1894 In dspam mode, only the dspam client will be connecting to your LMTP server.
1895 This can be dspamc (a thin-client) or the dspam binary. In either case,
1896 you'll need to specify --client to tell DSPAM to act as a client. DLMTP
1897 allows the client to pass in any commandline arguments provided, so it should
1898 function identical to if you were running it as a dedicated (non-stateful)
1899 process.
1900
1901 "standard" mode settings.
1902 In "standard" mode, you will need to configure the ServerParameters flag to
1903 reflect the commandline parameters you would normally want to pass to DSPAM.
1904
1905 "standard" mode notes.
1906 One thing to watch out for is that the recipient you're sending via LMTP is
1907 unique to a specific user. This means that all of your aliases should be
1908 resolved before the MTA relays to DSPAM. Because DSPAM uses the addresses in
1909 the RCPT TO as usernames, _not_ resolving any aliases will result in
1910 multiple databases being created for one user. Since the signature will be
1911 different for each user, and since the message must be processed
1912 differently for each user, DSPAM demultiplexes a multi-recipient email. This
1913 means that while it can receive an email with multiple RCPT TO's specified, it
1914 will perform delivery individually.
1915
1916 "auto" mode setting.
1917 If you would like to support both connecting MTAs and remote dspam client
1918 processes (such as for inoculations), you can set the server mode to auto,
1919 which will base its dialect on the ident supplied in the LHLO. If the LHLO
1920 ident matches an ident in dspam.conf's ServerPass section, the server will
1921 default to DLMTP. Otherwise, DSPAM will assume the client is a standard
1922 LMTP client and speak standard LMTP.
1923
1924 LOCAL DELIVERY WITH LMTP FRONT-END
1925
1926 In some circumstances, you may want to relay to DSPAM via LMTP, but have
1927 DSPAM deliver via LDA. In these cases, you may use the following
1928 conventions in your ServerParameters configuration:
1929
1930 %r - The RCPT TO passed in via LMTP
1931 %s - The MAIL FROM passed in via LMTP
1932
1933 In both cases, the content provided between < > is what is actually used.
1934
19352.5 DSPAM USER PREFERENCES
1936
1937 Preferences are settings that can be configured globally in dspam.conf or
1938 for individual users via the dspam_admin command.
1939
1940 trainingMode { TOE | TUM | TEFT | NOTRAIN }
1941 How DSPAM should train messages it analyzes. See section 1.5 --mode
1942 (default:teft, see dspam.conf)
1943
1944 spamAction { quarantine | tag | deliver }
1945 What to do with spam. The tag and deliver options both deliver, but tag
1946 adds a special prefix to the subject, whereas deliver merely sets
1947 X-DSPAM-Result. (default:quarantine)
1948
1949 spamSubject
1950 A customized subject to prefix when spamAction=tag. (default:[SPAM])
1951
1952 statisticalSedation { 0 - 10 }
1953 The level of dampening during training (0-10, 0 = no dampening, default:0)
1954
1955 enableBNR { on | off }
1956 Enables or disables bayesian noise reduction (default:off)
1957
1958 enableWhitelist { on | off }
1959 Enables or disables automatic whitelisting (default:on)
1960
1961 signatureLocation { message | headers }
1962 Where to place the DSPAM signature. Placement affects forwarding approach.
1963 (default:message)
1964
1965 tagSpam / tagNonspam { on | off }
1966 Adds a tagline to the end of a message based on its classification; useful
1967 for things such as "Scanned by your ISP example.org". If set to on, the file
1968 msgtag.spam and/or msgtag.nonspam will be looked for in "TxtDirectory"
1969 (see dspam.conf) and appended to appropriate messages.
1970
1971 NOTE: Signed messages will not be tagged in this fashion
1972
1973 showFactors { on | off }
1974 Whether to include an X-DSPAM-Factors header including decision-making
1975 factors (clues). NOTE: This can break RFC in some cases, and should only
1976 be used for debugging. (default:off)
1977
1978 optIn / optOut { on | off }
1979 Depending on whether the system is opt-in or opt-out, sets the user's
1980 membership. If user is opted out (or not opted in), mail will be delivered
1981 by DSPAM without being processed.
1982
1983 whitelistThreshold { Integer }
1984 Overrides the default number of times a From: header has been seen before
1985 it is automatically whitelisted. (default:10)
1986
1987 makeCorpus { on | off }
1988 When activated, a maildir-style corpus is maintained in the user's data
1989 directory (DSPAM_HOME/DATA/USERNAME), suitable for future retraining or
1990 other analysis. (default:off)
1991
1992 storeFragments { on | off }
1993 When activated, the first 1k of each message are temporarily stored on
1994 the server for reference via the webui's history function. (default:off)
1995
1996 localStore { on | off }
1997 Overrides the directory name used for the user's dspam data directory. This
1998 is useful when using recipient addresses as usernames, as it will allow
1999 all addresses belonging to a specific user to be written to a single
2000 webui directory. (default:username)
2001
2002 processorBias { on | off }
2003 Overrides the "bias" setting in dspam.conf, which biases mail as
2004 innocent. (default:on, see dspam.conf)
2005
2006 fallbackDomain { on | off }
2007 Allows a dspam user ("@example.org") to be marked as a fallback user for
2008 the entire domain, so if the destination dspam user does not exist in
2009 the database, the fallback user's database will be used. The
2010 dspam.conf "FallbackDomains" setting must also be "on". (default:off)
2011 NOTE: You will need to set "FallbackDomains on" in dspam.conf to use this.
2012
2013 trainPristine { on | off }
2014 Override's the default signature mode and treats messages as if they were
2015 in pristine format when retraining. This requires all retraining to use
2016 the original message that was processed as no dspam signature is stored
2017 for pristine training. (default:off)
2018
2019 optOutClamAV { on | off }
2020 Opts out of ClamAV virus scanning (if ClamAV is directly integrated with
2021 dspam via dspam.conf). (default:off)
2022
2023 ignoreRBLLookups { on | off }
2024 Overrides the "Lookup" setting in dspam.conf, which lookups senders IP
2025 addresses in a Realtime Blackhole List (RBL). (default:off)
2026
2027 RBLInoculate { on | off }
2028 Overrides the "RBLInoculate" setting in dspam.conf, which inoculates mail
2029 as spam if lookup result is positive. (default: depending on dspam.conf)
2030
2031 NOTE: This user preference has higher weight then the one set in dspam.conf.
2032 If you don't set this user preference to on/off then whatever is set in
2033 dspam.conf will be used for every user.
2034
20352.6 FALLBACK DOMAINS
2036
2037 Fallback domains allow you to default some or all users for a particular
2038 domain to a single domain user; this allows you to set preferences (including
2039 opting out of filtering entirely) for users based on domain name. Any user
2040 who does not exist as a known user to DSPAM will be defaulted to the
2041 domain it belongs to if it is designated as a fallback domain. This
2042 means that you can create bob@example.org and alice@example.org with their own
2043 databases and preferences, but also default all other users to @example.org.
2044 Alternatively, you could create just the domain without any other users and
2045 default all users to @example.org
2046
2047 To use fallback domains, you'll first need to activate this feature in
2048 dspam.conf:
2049
2050 FallbackDomains on
2051
2052 Next, you'll need to create a dspam user for each domain you wish to use
2053 as a fallback domain. For example, @example.org. Depending on your
2054 implementation, this may be a simple insert into dspam_virtual_uids or may
2055 be created automatically when setting a user's preferences.
2056
2057 Finally, designate that special user as a fallback domain by setting a
2058 preference:
2059
2060 dspam_admin ch pref @example.org fallbackDomain on
2061
2062 Any mail coming in for that domain that does _not_ match a known user in
2063 dspam will now fall back to this user; you can then set specific preferences
2064 or even opt out the entire user. Alternatively, you can create a domain-based
2065 database for filtering mail specific to that domain, just as you would a
2066 normal user.
2067
20682.7 EXTERNAL USER LOOKUP
2069 External User Lookup has two major applications. It allows DSPAM to validate
2070 the supplied username in setups where users are Opt'ed-In by default, and there
2071 is no prior recipient checking from the MTA. In those cases, it can be configured
2072 not to automatically create the user entries in the DSPAM system and thus spare
2073 you from polute the DSPAM database with inexistent users.
2074 The other application is when you need username rewritting/mapping. That will
2075 happen when you need to map several email addresses (aliases) into a single
2076 user account or when you wish to integrate DSPAM into systems where the users
2077 email addresses or usernames can change. This will allow you to define alternate
2078 static identifiers while still keeping the users DSPAM dictionaries, across
2079 username/email address change, without dictionary maintenance.
2080
2081 Currently, there are three different modes of operation and two backend lookup
2082 drivers. The mode can be set using the ExtLookupMode directive and the available
2083 possibilities are:
2084
2085 verify - It will verify that the supplied username exists in lookup backend. In
2086 the event that it cannot be verified, DSPAM will not create the user entry in it's
2087 backend facilities.
2088
2089 map - It will NOT verify that the supplied username exists in the lookup backend.
2090 It will, though, try to use the lookup backend to map (rewrite) the username. If
2091 There is a map/rewrite available, it will use the retrieved username, instead of
2092 the supplied one. On the other hand, if there is no map/rewrite available, DSPAM
2093 will use the supplied username and create the respective entries in it's backend.
2094
2095 strict - It will enforce both verify AND map modes. Meaning that it will rewrite
2096 the username, if a rewrite is available, and will also only create that user entry
2097 in it's backend system if there was a successful map/rewrite.
2098
2099 The backend lookup drivers available are only two at the moment, LDAP and Program.
2100 The LDAP drivers allows DSPAM to query an LDAP server for a custom attribute, defined
2101 by the ExtLookupLDAPAttribute directive. The query can be fine grained using the
2102 ExtLookupQuery directive to provide a standard LDAP filter, where %u will be replaced
2103 by the username provided to DSPAM. Literal percentage can used if escaped with
2104 another % sign, i.e., %% will match % in the query filter.
2105 The Program driver exists because this seemed a neat feature and not every one
2106 uses LDAP. In this case, the ExtLookupServer directive will be used to define
2107 the custom program/script call, with the respective arguments. Also here %u can
2108 be used to define the provided username and literal % can be achieved by escaping
2109 the percentage sign with another '%'. Using the program driver, DSPAM will use
2110 whatever was the first line output of the program/script execution.
2111
2112
21133.0 BUGS, FEATURE REQUESTS
2114
2115 Please use our Bug Tracker on the sourceforge project page at
2116 http://sourceforge.net/projects/dspam for the current known bugs list and
2117 proper reporting procedure.
2118
2119 In the same place you can ask for new feature via the Feature Request Tracker.
2120
2121 Please note that everything under contrib/ is not officially supported by the
2122 DSPAM Project but by the respective authors; however, in order to help the
2123 authors, facilitate integration with DSPAM and release procedures, we provide
2124 a bug tracker for each script/plugin at the same URL.
2125
21263.1 PORTS / PACKAGES
2127
2128 The DSPAM Project does not provide binary packages of DSPAM. Each
2129 OS/distribution has its own contributors (they know perfectly their
2130 distribution's policy, their special guidelines, testing procedures, etc.).
2131
2132 Take a look at the DSPAM Wiki for packages/ports for various distributions located
2133 at http://sourceforge.net/apps/mediawiki/dspam/index.php?title=Main_Page or read
2134 http://dspam.sourceforge.net
2135
2136 If you wish to port DSPAM to an other OS/distro/platform and need help or have
2137 patches you would like to be merged in the repo please email
2138 dspam-devel@lists.sourceforge.net mailing list.
2139
2140
2141 Note:
2142
2143 In order to keep DSPAM unencumbered by intellectual property abuses, all
2144 external contributors to the project are asked to release any rights to the
2145 submission. This keeps the DSPAM project a healthy, unencumbered GPL project.
2146 Please accompany your patch, code, or other submission with the following
2147 statement. By submitting a patch to the project, you agree to be bound by
2148 the terms of this statement whether it is specifically included in the
2149 submission or not, however we still require that it be attached to the
2150 submission:
2151
2152 The author or authors of this submission hereby release any and all
2153 copyright interest in this code, documentation, or other materials
2154 included to the DSPAM project and its primary governors. We intend this
2155 relinquishment of copyright interest in perpetuity of all present and
2156 future rights to said submission under copyright law.
2157
21583.2 GIT ACCESS
2159
2160 The DSPAM source tree can be downloaded via read-only git access using the
2161 following commands:
2162
2163 git clone git://dspam.git.sourceforge.net/gitroot/dspam/dspam
2164