README
1
2 Vipul's Razor v2 README
3
4Vipul's Razor is a distributed, collaborative, spam detection and
5filtering network. Through user contribution, Razor establishes a
6distributed and constantly updating catalogue of spam in propagation that
7is consulted by email clients to filter out known spam. Detection is done
8with statistical and randomized signatures that efficiently spot mutating
9spam content. User input is validated through reputation assignments based
10on consensus on report and revoke assertions which in turn is used for
11computing confidence values associated with individual signatures.
12
13Vipul's Razor v2 agent software is available from project's homepage at
14http://razor.sf.net. Razor Agents are written in Perl and will work on
15most Unix operating systems and others OSes for which perl is available.
16Installation and usage instructions can be found in the INSTALL document
17in the distribution.
18
19Vipul's Razor v2 is almost a complete rewrite of Razor v1. The following
20is a list of the most significant new features:
21
22 1 New Protocol
23
24 The Razor v2 protocol has been completely redesigned. The new
25 protocol is based on exchange of _Structured Information Strings_,
26 that are similar to URIs and can be parsed with URI decoding
27 libraries. v2 protocol supports _Pipelining_, which means Razor
28 Agents can keep a connection open with server to eliminate the
29 latency introduced by TCP 3-way handshake and 4-way breakdown for
30 every connection. The new protocol semantics allow seamless
31 introduction of new signature schemes.
32
33 2 Ephemeral Signatures
34
35 Ephemeral Signatures are short-lived signatures based on
36 collaboratively computed random numbers. Ephemeral Signatures select a
37 section of text from the spam message based on a random number that
38 changes every so often. This makes the hashing scheme a moving target,
39 and spammers can't exploit it because they don't know which part of
40 the message will be hashed after the random number rollover.
41
42 3 Preprocessors
43
44 Razor v2 supports several preprocessors. Preprocessors alter the the
45 text of a spam before a hash is computed. This version includes
46 preprocessors to decode Base64 encoded messages, decode QP encoded
47 messages and convert HTML to plaintext. Spammers employ several
48 techniques that hide mutations in various encoding. Preprocessors
49 defeat such techniques by hashing the content that a recipient
50 actually sees in his/her mail user agent.
51
52 4 Multiple Filteration Engines
53
54 Razor v2 supports multiple engines. An engine is logical unit that
55 encapsulates a particular type of filteration service. Razor v2
56 currently supports four engines - VR1 which is equivalent to Razor v1,
57 VR2 that is based on SHA1 signatures of bodytext, VR3 that is based on
58 Nilsimsa signatures, and VR4 based on Ephemeral hashes. New engines
59 can be seamlessly plugged into the service as and when required.
60
61 5 Complete Backward Compatibility with Razor v1
62
63 The VR1 engine is functionally equivalent to the Razor v1 service and
64 uses the same database. This means users who transition from v1 to v2
65 will still get the benefit of several million signatures known to the
66 v1 service.
67
68 6 Base64 signature encoding
69
70 Signatures are now encoded as base 64 numbers instead of base 16
71 (hex), reducing traffic that goes over the wire by 33%.
72
73 7 Truth Evaluation System (TeS)
74
75 Razor v2 has a transparent, back-end component known as TeS. TeS is a
76 combination of a reputation system and pattern recognition heuristics
77 that assigns trust to reporters and confidence values (between 0-100)
78 to every signature. Users can set an acceptable confidence level in
79 their Razor configuration. The server also publishes a recommended
80 confidence level. TeS has been designed to eliminate false positives
81 of legit bulk email that were occasionally generated by bad reports
82 in Razor v1.
83
84 8 Submission of entire spam messages
85
86 Razor v2 accepts the entire body text of spam messages not previously
87 known to the system. This lets Razor v2 compute new Ephemeral
88 Signatures every n hours as well as seed the database whenever a new
89 signature scheme and/or preprocessor is introduced. It should be noted
90 that Razor v2 _does not_ accept contents of legit email during a check
91 dialogue. Only signatures are sent when checking email.
92
93 9 Revocation
94
95 Razor v2 allows users to revoke messages that they don't consider to
96 be spam. Revocation input is fed into TeS, that adjusts the confidence
97 value of a signature or remove it from the database as necessary.
98 Revocation is done through a tool called razor-revoke, which is a part
99 of the new Razor distribution.
100
10110 Reporter Registration
102
103 Razor v2 requires reporters to be registered. This lets reporters
104 build a reputation over time, so their reports and revocations are
105 weighed according to their reputation value. Report requires users to
106 authenticate which is done using a CRAM-SHA1 authentication scheme.
107
10811 Content classes
109
110 Razor v2 introduces the concept of content classes. A content class is
111 a set of messages that represents variations on the same content. As
112 new reports come in, Nomination servers associate them to an existing
113 content class, if a (close) match is found. Additionally, Razor v2
114 treats each MIME attachment is a separate content class, so spammers
115 MIME attachment can be individually tracked (which is very useful in
116 case of viruses).
117
118
119 $Id: README,v 1.4 2005/06/28 22:19:07 jpr5 Exp $
120
121