• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

AUTHORSH A D30-Sep-2002113 53

ChangeLogH A D20-Oct-20023 KiB9676

LICENSEH A D30-Sep-200217.6 KiB341281

Makefile.inH A D20-Oct-20022.1 KiB8261

READMEH A D20-Oct-20024.7 KiB13197

TODOH A D20-Oct-2002221 54

bmf.1H A D20-Oct-20025.1 KiB149104

bmf.cH A D20-Oct-20029.4 KiB340301

bmf.spec.inH A D14-Oct-20021.5 KiB6548

bmfconv.1H A D12-Oct-20021.4 KiB8259

bmfconv.cH A D20-Oct-20024 KiB170147

config.hH A D20-Oct-20022.3 KiB8138

configureH A D03-May-20227.7 KiB355322

dbdb.cH A D19-Oct-200215.2 KiB685578

dbdb.hH A D14-Oct-20021.9 KiB6242

dbg.cH A D19-Oct-20027.2 KiB303236

dbg.hH A D14-Oct-20021.2 KiB3620

dbh.cH A D14-Oct-20021.8 KiB7548

dbh.hH A D02-Oct-20021.3 KiB5735

dbmysql.cH A D14-Oct-200212.5 KiB546458

dbmysql.hH A D06-Oct-20021.9 KiB6139

dbtext.cH A D19-Oct-200213.8 KiB592495

dbtext.hH A D02-Oct-20021.9 KiB5437

filt.cH A D20-Oct-20024.5 KiB176131

filt.hH A D20-Oct-2002690 3217

lex.cH A D20-Oct-200215.3 KiB788684

lex.hH A D20-Oct-20021.2 KiB4527

str.cH A D14-Oct-20021.5 KiB7957

str.hH A D30-Sep-2002762 3115

vec.cH A D20-Oct-20027.4 KiB346267

vec.hH A D20-Oct-20021.8 KiB5935

README

1		bmf -- Bayesian Mail Filter
2
3About bmf
4=========
5
6This is a mail filter which uses the Bayes algorithm as explained in Paul
7Graham's article "A Plan for Spam".  It aims to be faster, smaller, and more
8versatile than similar applicatios.  Implementation is ANSI C and uses POSIX
9functions.  Supported platforms are (in theory) all POSIX systems. Support
10for win32 is undecided.
11
12This project provides features which are not available in other filters:
13
14(1) Independence from external programs and libraries.  Tokens are stored in
15memory using simple vectors which require no heavyweight external data
16structure libraries.  Multiple token database formats are supported,
17including flat files, libdb, and mysql.  Conversion between formats will
18always be possible with the included import/export utility and flat files
19will always remain an option.
20
21(2) Efficient processing.  Input data is parsed by a handcrafted parser
22which weighs in under 3% of the equivalent code generated by flex.  No
23portion of the input is ever copied and all i/o and memory allocation are
24done in large chunks.  Updated token lists are merged and written in one
25step.  Hashing is being considered for the next version to improve lookup
26speed.
27
28(3) Simple and elegant implementation.  No heavyweight, copy-intensive mime
29decoding routines are used.  Decoding of quoted-printable text for selected
30mime types is being considered for the next version.
31
32Note: the core filter function is from esr's bogofilter v0.6 (available at
33http://sourceforge.net/projects/bogofilter/) with bugfix updates.
34
35For the most recent version of this software, see:
36
37	http://sourceforge.net/projects/bmf/
38
39How to integrate bmf
40====================
41
42The following procmail recipes will invoke bmf for each incoming email and
43place spam into $MAILDIR/spam.  The first sample invokes bmf in its normal
44mode of operation and the second invokes bmf as a filter.
45
46	### begin sample one ###
47	# Invoke bmf and use return code to filter spam in one step
48	:0HB
49	* ? bmf
50	| formail -A"X-Spam-Status: Yes, tests=bmf" >>$MAILDIR/spam
51
52	### begin sample two ###
53	# Invoke bmf as a filter
54	:0 fw
55	| bmf -p
56
57	# Filter spam
58	:0:
59	^X-Spam-Status: Yes
60	$MAILDIR/spam
61
62The following maildrop equivalents are suggested by Christian Kurz.
63
64	### begin sample one ###
65	# Invoke bmf and use return code to filter spam in one step
66	exception {
67		`bmf`
68		if ( $RETURNCODE == 0 )
69			to $MAILDIR/spam
70	}
71
72	### begin sample two ###
73	# Invoke bmf as a filter
74	exception {
75		xfilter "bmf -p"
76		if (/^X-Stam-Status: Yes/)
77			to $MAILDIR/spam
78	}
79
80
81If you put bmf in your procmail or maildrop scripts as suggested above, it
82will always register an email as either spam or non-spam.  To reverse this
83registration and train bmf, the following mutt macros may be useful:
84
85  macro index \ed "<enter-command>unset wait_key\n<pipe-entry>bmf -S\n<enter-command>set wait_key\n<save-message>=spam\n"
86  macro index \et "<enter-command>unset wait_key\n<pipe-entry>bmf -t\n<enter-command>set wait_key\n"
87  macro index \eu "<enter-command>unset wait_key\n<pipe-entry>bmf -N\n<enter-command>set wait_key\n<save-message>=inbox\n"
88
89These will override these commands:
90
91  <Esc>d = de-register as non-spam, register as spam, and move to spam folder.
92  <Esc>t = test for spamicity.
93  <Esc>u = de-register as spam, register as non-spam, and move to inbox folder.
94
95How to train bmf
96================
97
98First, please keep in mind that bmf "learns" how to recognize spam from the
99input that you give it.  It works best if you give it exactly the email that
100you receive, or have received in the recent past.
101
102Here are some good techniques for training bmf:
103
104  - If you keep a history of email that you have received, use your current
105    and/or saved emails.  It is fairly easy to create a small shell script
106    that will pass all of your normal email to "bmf -n" and all of your spam
107    to "bmf -s".  Note that if you do not use the mbox storage format, you
108    MUST invoke bmf exactly once per email.  Using "cat * | bmf -n" will NOT
109    work properly because bmf sees the entire input as one big email.
110
111  - If you already use spamassassin, you can use it to train bmf for a
112    couple of days or weeks.  If spamassassin tags it as spam, run it
113    through "bmf -s".  If not, run it through "bmf -n".  This can be
114    automated with procmail or maildrop recipes.
115
116Here are some things that you should NOT do:
117
118  - Get impatient with the training process and repeatedly pass one email
119    through "bmf -s".
120
121  - Manually move words around between lists and/or adjust the word counts.
122
123Final words
124===========
125
126Thanks for trying bmf.  If you have any problems, comments, or suggestions,
127please direct them to the bmf mailing list, bmf-user@lists.sourceforge.net.
128
129							Tom Marshall
130							20 Oct 2002
131