• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

cricket/H08-Apr-2004-715450

QUICKH A D08-Apr-2004978 1812

READMEH A D08-Apr-20043 KiB8257

copyrightH A D04-Feb-2006928 2015

patch.use_with_only_spamd_logH A D20-Apr-20041.6 KiB5047

spamstats0.5b-graph.plH A D08-Apr-200437.2 KiB1,077880

spamstats0.6c.plH A D03-May-202242.7 KiB1,221890

README

1This document aims at describing the way Spamstats works internally,
2and how it was thought out.  If you are looking for Usage help, run
3spamstats with the -help switch !
4
5Any dark area /improvment for this document should be sent to Vincent
6Deffontaines <vincent@gryzor.REMOVETHIS.ANDTHIS.com>
7
8Spamstats uses 3 distinct tables (or in perl-speak: hashes) to
9crosslink information from mailer and spamassassin : First table is
10called mailer_table.
11
12####################
13
14Whenever an email "comes in" the system through SMTP (this is not true
15for locally generated messages), an entry is added to that table : in
16the case of postfix, if you have a line like :
17
18Mar 11 00:06:45 mybox postfix/cleanup[16313]: 4DC5213BA1:
19message-id=<000101c2eplentyofnumbers@remotemailer.com>
20
21this entry is added into the mailer_table :
22        mailer_table{'4DC5213BA1'} = '000101c2eplentyofnumbers@remotemailer.com'
23
24
25Spamd-oriented tables :
26
27####################
28Whenever a spamd input line is encountered, the spamd_pid is filled in:
29
30if spamd line is :
31Mar 11 00:06:39 mybox spamd[2364]: processing message \
32<000101c2eplentyofnumbers@remotemailer.com> for spamd:1003, expecting 3022 bytes.
33
34we will have :
35        spamd_pid{'2364'} = '000101c2eplentyofnumbers@remotemailer.com'
36
37
38####################
39Whenever a spamd clean or spam detection is found, the spamd_table is filled in:
40
41If line is
42Mar 11 00:06:44 mybox spamd[2364]: clean message (1.1/6.0) for spamd:1003 \
43in 5.0 seconds, 3022 bytes.
44we have :
45        spamd_table{spamd_pid{'2364'}} = 'clean'
46
47If it was identified as a spam, we would have symmetrically :
48Mar 11 00:06:54 mybox spamd[2364]: identified spam (16.1/6.0) for spamd:1003 \
49in 5.3 seconds, 3022 bytes.
50we would have :
51        spamd_table{spamd_pid{'2364'}} = 'spam'
52
53And at this point, in either case, we delete the spamd_pid reference to '2364',
54since it has finished its job and won't be of any use.
55
56#####################
57
58Whenever an email is delivered, mailer_table and spamd_table
59informations get crosslinked :
60
61Mar 11 00:06:55 mybox postfix/pipe[1119]: 4DC5213BA1: to=<myuser@mydomain.com>,
62relay=filter, delay=7, status=sent (mybox.mydomain.com)
63We look at  spamd_table{mailer_table{'4DC5213BA1'}}, and extract the recipient
64('myuser@mydomain.com')
65
66We will use this information for instance to extract top spammed recipients.
67
68>From spamstats 0.4b two different behaviours are possible at this point :
69
70Default behaviour for 0.4b is to process every delivered email, even
71if a single email has several recipient.
72
73If you issue the -agglo-recipients options (and in spamstats 0.4 and
74earlier), spamd_table and mailer_table are cleared after the first
75recipient has been processed.
76
77This all depends on the way you see it :
78 - You can say you count spam as "bandwidth annoyance" in which case counting one
79spam per effective mailer id is what you want.
80 - You can say spam is a individual user annoyance, and if a spam has 2 recipients
81it must be counted twice; this is starting from version 0.4b the default.
82