README
1 bmf -- Bayesian Mail Filter
2
3About bmf
4=========
5
6This is a mail filter which uses the Bayes algorithm as explained in Paul
7Graham's article "A Plan for Spam". It aims to be faster, smaller, and more
8versatile than similar applicatios. Implementation is ANSI C and uses POSIX
9functions. Supported platforms are (in theory) all POSIX systems. Support
10for win32 is undecided.
11
12This project provides features which are not available in other filters:
13
14(1) Independence from external programs and libraries. Tokens are stored in
15memory using simple vectors which require no heavyweight external data
16structure libraries. Multiple token database formats are supported,
17including flat files, libdb, and mysql. Conversion between formats will
18always be possible with the included import/export utility and flat files
19will always remain an option.
20
21(2) Efficient processing. Input data is parsed by a handcrafted parser
22which weighs in under 3% of the equivalent code generated by flex. No
23portion of the input is ever copied and all i/o and memory allocation are
24done in large chunks. Updated token lists are merged and written in one
25step. Hashing is being considered for the next version to improve lookup
26speed.
27
28(3) Simple and elegant implementation. No heavyweight, copy-intensive mime
29decoding routines are used. Decoding of quoted-printable text for selected
30mime types is being considered for the next version.
31
32Note: the core filter function is from esr's bogofilter v0.6 (available at
33http://sourceforge.net/projects/bogofilter/) with bugfix updates.
34
35For the most recent version of this software, see:
36
37 http://sourceforge.net/projects/bmf/
38
39How to integrate bmf
40====================
41
42The following procmail recipes will invoke bmf for each incoming email and
43place spam into $MAILDIR/spam. The first sample invokes bmf in its normal
44mode of operation and the second invokes bmf as a filter.
45
46 ### begin sample one ###
47 # Invoke bmf and use return code to filter spam in one step
48 :0HB
49 * ? bmf
50 | formail -A"X-Spam-Status: Yes, tests=bmf" >>$MAILDIR/spam
51
52 ### begin sample two ###
53 # Invoke bmf as a filter
54 :0 fw
55 | bmf -p
56
57 # Filter spam
58 :0:
59 ^X-Spam-Status: Yes
60 $MAILDIR/spam
61
62The following maildrop equivalents are suggested by Christian Kurz.
63
64 ### begin sample one ###
65 # Invoke bmf and use return code to filter spam in one step
66 exception {
67 `bmf`
68 if ( $RETURNCODE == 0 )
69 to $MAILDIR/spam
70 }
71
72 ### begin sample two ###
73 # Invoke bmf as a filter
74 exception {
75 xfilter "bmf -p"
76 if (/^X-Stam-Status: Yes/)
77 to $MAILDIR/spam
78 }
79
80
81If you put bmf in your procmail or maildrop scripts as suggested above, it
82will always register an email as either spam or non-spam. To reverse this
83registration and train bmf, the following mutt macros may be useful:
84
85 macro index \ed "<enter-command>unset wait_key\n<pipe-entry>bmf -S\n<enter-command>set wait_key\n<save-message>=spam\n"
86 macro index \et "<enter-command>unset wait_key\n<pipe-entry>bmf -t\n<enter-command>set wait_key\n"
87 macro index \eu "<enter-command>unset wait_key\n<pipe-entry>bmf -N\n<enter-command>set wait_key\n<save-message>=inbox\n"
88
89These will override these commands:
90
91 <Esc>d = de-register as non-spam, register as spam, and move to spam folder.
92 <Esc>t = test for spamicity.
93 <Esc>u = de-register as spam, register as non-spam, and move to inbox folder.
94
95How to train bmf
96================
97
98First, please keep in mind that bmf "learns" how to recognize spam from the
99input that you give it. It works best if you give it exactly the email that
100you receive, or have received in the recent past.
101
102Here are some good techniques for training bmf:
103
104 - If you keep a history of email that you have received, use your current
105 and/or saved emails. It is fairly easy to create a small shell script
106 that will pass all of your normal email to "bmf -n" and all of your spam
107 to "bmf -s". Note that if you do not use the mbox storage format, you
108 MUST invoke bmf exactly once per email. Using "cat * | bmf -n" will NOT
109 work properly because bmf sees the entire input as one big email.
110
111 - If you already use spamassassin, you can use it to train bmf for a
112 couple of days or weeks. If spamassassin tags it as spam, run it
113 through "bmf -s". If not, run it through "bmf -n". This can be
114 automated with procmail or maildrop recipes.
115
116Here are some things that you should NOT do:
117
118 - Get impatient with the training process and repeatedly pass one email
119 through "bmf -s".
120
121 - Manually move words around between lists and/or adjust the word counts.
122
123Final words
124===========
125
126Thanks for trying bmf. If you have any problems, comments, or suggestions,
127please direct them to the bmf mailing list, bmf-user@lists.sourceforge.net.
128
129 Tom Marshall
130 20 Oct 2002
131