• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

test/H03-May-2022-8,4737,404

Makefile.inH A D30-May-20112.9 KiB13071

READMEH A D21-Feb-20015.5 KiB153122

charmangle.cH A D30-May-201114.8 KiB386311

charmangle.hH A D30-May-2011769 2614

enriched.cH A D21-Feb-20019.5 KiB337250

enriched.hH A D21-Feb-20011.4 KiB4122

etags.cH A D04-Apr-20016.4 KiB156140

etags.gperfH A D25-Aug-20002.2 KiB4845

lineend.cH A D21-Feb-2001646 348

lineend.hH A D21-Feb-20011.2 KiB4923

mangle.cH A D30-May-201132.1 KiB779583

mangle.hH A D30-May-20111.7 KiB7048

mime.cH A D21-Feb-200138.8 KiB1,223797

mime.hH A D21-Feb-200114.1 KiB404140

mkfileH A D21-Feb-20011,001 4533

striphtml.cH A D30-May-201130 KiB1,341894

striphtml.hH A D21-Feb-20014.5 KiB16961

testhtml.cH A D21-Feb-20011.7 KiB6824

testj.cH A D21-Feb-20012.1 KiB8138

testjig.cH A D21-Feb-20012.3 KiB9245

README

1  Mime Parser
2  Laurence Lundblade <lgl@qualcomm.com>
3  Brian Kelly
4  Copyright QUALCOMM Inc, 1997
5
6The major input structures are:
7
8     Single part message:
9          RFC-822 Headers
10          MIME Headers
11          blank line
12          Message body
13
14     Multipart entity:
15          MIME Headers
16          blank line
17          inter-boundary junk
18          boundary
19            MIME Headers
20            blank line
21            MIME body
22          boundary
23            MIME Headers
24            blank line
25            MIME body
26          ....
27          ending boundary
28
29     Encapsulated message:
30          MIME Headers
31          blank line
32          RFC-822 Headers
33          MIME Headers
34          blank line
35          MIME body
36
37The above structures can of course be nested. Mime bodies may be
38binary (they are not transfer encoded). The MIME standard prevents
39other elements from being binary. Mime headers and RFC 822 headers are
40often interleaved. Mime headers always start with "Content-". (However
41the Content-Length header is never a MIME header). The full RFC-822
42syntax for MIME headers is handled including comments, quoting, etc.
43
44The MIME parse starts by the user calling the MIMEInit function, then
45the MimeInput function with buffers of the input message. Input may be
46fed all at once or a byte at a time. The user will be called back and
47requested to supply functions to handle/output the RFC-822 headers and
48the MIME bodies. In this call back the MIME type, and MIME nesting
49details are provided so it can decided how to handle the content. This
50call back happens whenever the parser encounters the blank line
51separating headers from the body. In addition it is called once at the
52very start of the parse with a NULL Mime type to request a handler for
53the initial RFC 822 headers.  No parsing of the contents of RFC-822
54headers is done at all.
55
56The mime bodies are passed to the callers call back a buffer at a
57time. The transfer encoding is removed before the call back, so the
58data is likely to be binary.
59
60Note that the caller has no access to raw MIME headers, boundary
61delimiters, interboundary junk, or transfer encoded content. This does
62result in a few limitations, such as the inability to adapt this code
63to new transfer encodings without changing it.
64
65A more serious limitation is access to the unparsed MIME headers,
66because this parser (at present) does not parse all parts of the MIME
67headers in the interest of keeping it very small.  At present, it only
68handles the Content-Type, Content-Disposition, and
69Content-Transfer-Encoding headers, and a very limited number of MIME
70parameters. Both these limitations could be remedied without changing
71the structure of the code though.
72
73Another limitation is on the size of parsed MIME tokens. It is set at
74about 100 bytes. MIME parameter names or values are rarely larger, it
75is allowed. Tokens longer are truncated. The MIME nesting depth also
76has a hard limit. These limitations make the MIME parser run in fixed
77memory no matter the complexity of the input.
78
79The core MIME parser consists of the files:
80   mime.c       mime.h,
81   utils.c      utils.h,
82   lineend.c    lineend.h
83
84In addition HTML and text/enriched strippers are included:
85   enriched.c   enriched.h
86   striphtml.c  striphtml.h
87
88----------------------------------------------------------------------
89
90The UNIX code here compiles into a the MIME mangler, that reduces MIME
91structure to plain text. These are the files:
92   testjig.c
93   mangle.c  mangle.h
94
95This is some text processing code that takes standard MIME email as
96input and produces a text-only version of it. It was particularly
97designed with small text-only devices in mind. The code is also
98intented to run on a most any platform and at this point runs well on
99UNIX, and has also been tested o the Palm Pilot. There is no
100recursion, and there is a hard limit on the MIME nesting depth, both
101to limit stack and memory usage. Stack usage is very limited.  In a
102number of cases features where scarificed for small size.  The main
103entry points are in mmangle.c and/or mmangle.h.
104
105This implements a full and proper MIME, text/enriched and HTML
106parse. So for example a legal MIME header like:
107
108   content-type: (((()())))application (( ")))"
109      )) "/"()()()
110     (xxx"\"") "xyz"
111
112will parse down to a MIME type of application/xyz. Some of constructs
113and types explicity handled are:
114  - Filters rfc 822 headers
115  - Content-type header
116  - Content-transfer-encoding - only quoted printable
117  - Content-dispostion - the filename parameter and disposition itself
118  - multipart/alternative - shows only first part
119  - multipart/report - omits body of enclosed message
120  - charset parameter - warns if character set is weird
121  - message/rfc822 - filters headers
122  - message/news - filters headers
123  - multipart/mixed - traverses 5 levels as is, can do up to 31
124  - multipart/* - defaults to multipart/mixed
125  - text/enriched - reduces to plain text
126  - application,image,model,audio,video - shows type and filename
127  - ms-tnef - ignores
128  - HTML - all common entities like &AMP;
129  - HTML - shows URL in <A HREF=xxx>
130  - HTML - shows ALT tag in images
131  - HTML - fakes lists
132  - HTML - <HR>
133  - HTML - <PRE>, <P>, <BR> and similar text formatting
134
135Some bugs/omissions:
136  - doesn't lop off trailing white space in QP decoding
137  - doesn't actually check version of MIME in MIME-version header
138  - some parts are not reentrant
139  - could probably reduce code size by another 10%
140
141
142The test directory contains a bunch of weird input and sample results
143the mangler should produce that can be used as a regression test.
144
145
146
147
148
149
150
151
152
153