1[14/Jan/2007 Version 2.0.4
2o Changes to osbf module
3  - Removed unnecessary linking of liblua.a, which caused segfaults on
4    IRIX 6.5.30. This fix also reduced the size of the module by a
5    factor of 5 or more. Problem detected and fixed by Holger Weiss.
6  - Fixed the number of args returned by osbf.classify in case of error.
7
8o Changes to spamfilter.lua - version 2.0.3
9  - Added --help option;
10  - Extended syntax to read from file passed as arg in command line.
11    If no file is given it uses standard input, as usual;
12  - Better error handling;
13  - Fixed optind in getopt.lua.
14
15o Fixed a date parsing error in cache_report.lua, caused mainly by
16  ill-formed date fields in spam messages;
17
18o The scripts classify.lua and train.lua were renamed to classify.sample
19  and train.sample, because they are meant more as samples, starting
20  points for customized scripts, than for real use. spamfilter.lua should
21  be used for real classifications and trainings.
22
23o Added the file COPYRIGHT_AGREEMENT which states the dual-license
24  agreement between Fidelis Assis and William Yerazunis.
25
26[17/Nov/2006] Version 2.0.3
27o When a SFID is not found in the cache it's now added to the report
28  message, for reference purposes;
29o New config option, osbf.cfg_mail_cmd, to specify the mail command used by
30  the spamfilter;
31o Fixes and improvements to cache_report.lua;
32o Minor fixes and improvements to spamfilter.lua and spamfilter_commands.lua;
33o More flexible config.
34
35[15/Oct/2006] Version 2.0.2
36o Added a new script, cache_report.lua. It sends an email with an HTML form
37  that makes training really easy. The form is an HTML table with Date, From,
38  Subject and a drop down menu with the possible actions: Train as spam,
39  Train as non-spam, Add 'From:' to whitelist, etc. This training mechanism
40  requires the new OSBF module v2.0.2 and that the email client supports HTML
41  messages with "mailto" form action. It works fine with Mozilla Thunderbird
42  and Microsoft Outlook but was not tested with other clients. This script is
43  tipically launched from a cron job. Read the text at the top of the script
44  to know how to use.
45
46o Changes to osbf module
47  - Added the function osbf.dir, a directory iteractor presented in the PIL
48    book, to support the new training mechanism mentioned above;
49  - Replaced the call to luaL_opendir with the new luaL_register;
50  - osbf.create_db and osbf.remove_db now check if the first arg is a table
51    and osbf.create_db returns an error if the file already exists;
52  - osbf.classify now returns an additional value which is a Lua table with
53    the number of trainings for each class. See the manual for details;
54  - Added an optional second argument to osbf.stats to specify full
55    (default) or fast statistics.
56  - Fixes to white and blacklist handling;
57  - Added PREFIX to makefile config, for easier local installation - patch
58    sent by Christian Siefkes.
59
60o Changes to spamfilter.lua - version 2.0
61  - New subject-line command: batch_train <pwd>. This command allows training
62    in bach, that is, many sfids can be sent in the body of the message, along
63    with the right class. Ex:
64
65      sfid-+20060924-215225-+005.65-1@spamfilter.osbf.lua=spam
66      sfid-+20060924-215238-+001.53-1@spamfilter.osbf.lua=nonspam
67      ...
68
69    It can be used manually but its main purpose is to allow the new
70    semi-automated batch training mechanism used by cache_report.lua.
71  - New subject-line command train_form <password>. This command executes the
72    script cache_report.lua which sends a mail with a training form to the
73    user.
74
75  - Improved handling of white and blacklists.
76
77o Minor fixes to the docs.
78
79[02/Sep/2006] Version 2.0.1
80o This version incorporates all changes in version 2.0;
81o Changes to the osbf module:
82  - Changed the function osbf.import to read from a .cfc file instead of from
83    a .csv one;
84o Improvements and fixes to spamfilter.lua:
85  - Changed the tags [s] and [h], in the X-OSBF-Lua-Score header to  [-] and
86    [+], respectively, because some email client filters are case insensitive
87    and can't distinguish between [s] and [S]. This is useful for those who
88    prefer not to tag subject lines and filter using the information in the
89    X-OSBF-Lua-Score header;
90  - Fixed a bug that caused messages with score below
91    osbf.cfg_remove_body_threshold to have their bodies removed even when
92    whitelisted;
93  - Added a new command-line option, --source = <message_source>, to specify
94    the source of the message to be used for training. The possible values
95    for <message_source> are:
96    + stdin - the message is read directly from stdin. This is the default.
97    + sfid  - the message is recovered from the cache, using the sfid present
98              in the header of the message read from stdin. The message read
99              from stdin must have been classified previously, in order to
100              have a sfid in the header.
101    + body  - the message to be trained with is the body of the message read
102              from stdin.
103    These options are valid only in conjunction with one of the commands
104    --learn or --unlearn.
105  - Added a new command-line option, --output, to determine what is written
106    to stdout after training a message (suggested by Steve Pellegrin):
107    --output=report  => a report message is sent to stdout. This is the
108			default action.
109    --output=message => the original message, classified as spam or ham,
110			according to the the training command, is written
111			to stdout.
112  - New config option, osbf.cfg_insert_sfid_in, to define where the sfid
113    will be inserted;
114  - Now, trained messages have their cached name changed to reflect the new
115    state: learned as spam or learned as ham. The changed names can be used
116    for automatic retraining or for rebuilding the databases. The change in
117    name also prevents training a message more than once or unlearning a
118    message that was not learned before;
119  - Fixed a bug in the error handling of invalid command-line options;
120  - There's a new config option, osbf.cfg_insert_sfid_in, to determine where
121    the SFID will be inserted when an incoming message is classified:
122    "references", "message-id" or "both". The default is now to insert in
123    both, References and Message-ID headers, because some email clients
124    don't follow RFC2822 strictly and reinsert only one of them in a reply;
125  - Old SFIDs are now removed when an incoming message is classified, right
126    before the new one is inserted.
127o Updated the training method in toer.lua to the same one introduced in
128  spamfilter.lua version 2.0. As of this version, toer.lua uses the
129  TREC format for both corpora and result files.
130
131[11/Feb/2006] Version 2.0
132o This version was used for TREC 2006 tests only and was not released;
133o Improvements and fixes to the osbf module
134  - Adjustments to the EDDC formula and better tuning of the intrinsic
135    OSB-bigram weights for improved AUC;
136  - Added specific counters for classification, mistake and extra
137    learning, besides the existing learning counter.
138  - Bug fixes;
139o Improvements and fixes to spamfilter
140  - New training method, a variant of TOER (see toer.lua), where extra
141    trainings using exclusively the header are done if the first one, with
142    the full message, was not enough to change the score to an acceptable
143    value. In many tests, with different corpora, this new method resulted
144    in improved Area Under the ROC Curve (AUC);
145  - The messages cached for later training are now saved under the directory
146    "cache", parallel to the previous "log". You must create the directory
147    "cache" before using the filter;
148  - New option for caching the messages in a subdir structure formed by
149    "DD/HH", under the cache dir, to avoid excessive messages per directory;
150  - Added accuracy statistics to the stats command, based on the new counters;
151  - The DSTTT method is not used any more;
152  - Added many command line options - check the file spamfilter.help;
153  - Bug fixes.
154
155OBS: Versions after 1.5.6b and before 2.0.1 were experimental and not
156     released.
157
158[20/Feb/2006] Version 1.5.6b
159o Added a new option to osbf.config: limit_token_size, which toggles
160  token size limitation on when different from 0. The default value
161   is 0 and restores the traditional behavior broken in 1.5.5b;
162o Fixed a bad collateral effect in get_next_hash introduced in v1.5.5b
163  - long sequences of long tokens were not being collapsed any more;
164
165[19/Feb/2006] Version 1.5.5b
166o Fixed a memory leak in osbf.classify;
167o Two new options to osbf.config: max_token_size and max_long_tokens.
168  For testing and special tuning purposes;
169o Added train.lua a script for training from stdin;
170o Added getopt.lua a lua function useful for handling command line
171  arguments, similar to the C getopt_long;
172o Minor change to toer.lua, now it stops without an error message if
173  there are less index files than what is expected in the for loop.
174  It prints an error message if none is found, though.
175
176[21/Jan/2006] Version 1.5.4b
177o Now we have that nice logo at the top, sent by Alessandro Martins
178  <alessandro@martins.eng.br>;
179o Added a new function to the osbf module: osbf.import("file.cfc",
180  "file.csv"). This function is similar to osbf.restore but, contray to
181  that, file.cfc must already exist before the importing and, instead of
182  restoring the original .cfc, the buckets in file .csv will be imported
183  into the existing file .cfc, which can have more or lessi buckets than
184  the original .cfc. Its main use is to create a larger database from an
185  older and full one, preserving the contents.
186o Better separation of lib and bind codes, what will make it easier to
187  adapt the module to other languages;
188o Doc files moved to the new docs dir.
189
190[08/Jan/2006] Version 1.5.3b
191o Fixes to the osbf module
192o Fixed the database restore function - osbf.restore;
193o Changed the osbf.so link from absolute to relative to make it simpler
194  to generate the Slackware package - suggested by Alessandro Martins
195  <alessandro@martins.eng.br>.
196o Improvements and fixes to spamfilter (v1.1.3):
197  - Better detection of the "Subject:" header line;
198  - Improved scan for a command in the subject line. Now it'll detect a
199    command even if another filter in the middle has mistakenly added a
200    tag to the beginning of the subject line. Problem pointed out by
201    Pavel Kolar.
202
203[01/Jan/2006] Version 1.5.2b
204o Improvements and fixes to spamfilter:
205  - The recover command now sends the recovered message as an attachment;
206  - Added a new config option, osbf.cfg_remove_body_threshold, to remove
207    the body of spam messages. Setting osbf.cfg_remove_body_threshold = 20
208    in spamfilter_config.lua removes the body of all spam messages with
209    score greater than 20. The original message is still available with
210    the recover command, if needed;
211o Fixed a problem that occurred when a command-message was sent in HTML
212  format. Because of the Content-Type header in the original message,
213  the answer, in plain text format, was not visible;
214o Fixed a bug in the password parsing. An invalid password was accepted
215  as OK if it started with the valid password as a substring and was the
216  last string in the command.
217o Improvements to the lib
218  - New function added, osbf.config, to allow internal parameter
219    adjustments. This function is more intended for experiments and
220    debugging.
221
222[15/Nov/2005] Version 1.5.1b
223o Improvements and fixes to spamfilter, toer.lua and docs:
224  - All X-OSBF headers were merged into a single one as suggested by Pavel
225    Kolar <kolar@fzu.cz>: Ex: X-OSBF-Lua-Score: 33.63/0.00 [H] (v1.5.1b,
226    Spamfilter v1.1)
227  - White and blacklisted messages are now classified too, so that the
228    score in the header X-OSBF-Lua-Score is the real one, as if they
229    hadn't been listed - suggested by Pavel Kolar. The subject tags for
230    blacklisted and whitelisted messages are the same as configured for
231    spam and ham in the config file, respectively;
232  - The tags in the X-OSBF-Lua-Score header don't follow the subject tags
233    defined in the config file any more. They're now fixed: [B], [S], [s],
234    [h], [H], [W] for blacklisted, spam, spam reinforcement, ham
235    reinforcement, ham and whitelisted, according to the classification;
236  - White and black lists don't use Lua regex by default any more. There's
237    a new option in the config file to turn regex on or off:
238    osbf.cfg_lists_use_regex;
239  - Removed the trailing spaces from the subject tags in the config file.
240    They're now added internally;
241  - Removed duplicate database info showed by the stats <pwd> command;
242  - The var unlearn_threshold in spamfilter_commands.lua is now an option
243    in the config file, as it should: osbf.cfg_unlearn_threshold;
244  - More consistent thresholds checking in toer.lua;
245  - DSTTT is now the default training method in toer.lua.
246  - Added the script roc.lua, which calculates 1-ROCAC%, a measure of the
247    quality of the classifier.
248
249[06/Nov/2005] Version 1.5b - first public release
250o Re-tuning of internal parameters, after the chain rule fix, resulting
251  in improved accuracy.
252o Docs and example scripts updated.
253
254[30/Sep/2005] Version 1.4b - internal use only
255o Changed seen_features and other flags data struture to a separate array
256  of unsigned chars, in the learn function.
257
258[25/Sep/2005] Version 1.3b - internal use only
259o C and Lua codes updated for lua-5.1-alpha
260o No more captures in string.find
261o Code changed to use new Lua function string.mach
262
263[08/Sep/2005] Version 1.2b - internal use only
264o Fixed an old bug in the chain rule that caused bad accuracy with some
265  corpus. It sometimes would also cause unexpected worse scores after
266  training, as if one had done an "unlearn";
267o Fixed a bug in the "unlearn" code that caused broken chains in the
268  databases;
269o Implemented a new training method acting on both, spam and ham,
270  databases simultaneously, doing a "learn" on the right database and an
271  "unlearn" on the opposite if the score improvement was not enough. Now,
272  both toer.lua and spamfilter.lua use this new method;
273
274[25/Aug/2005] Version 1.1b - internal use only
275o Changed the training method used by the spamfilter. Now the original
276  message is saved under a unique SpamFilter ID (SFID) on the server and
277  the original message is sent to the user with the SFID added as a
278  comment to its "Message-ID" header. The original message is recovered,
279  using the SFID sent back by the user's mail client, in the "In-Reply-To"
280  or "References" header, when he does a "Reply" for training.
281
282[13/May/2005] Version 1.0b18 - internal use only
283[16/Mar/2005] Version 1.0b12 - internal use only
284[28/Jan/2005] Version 1.0b1  - internal use only
285
286