1\NeedsTeXFormat{LaTeX2e}
2\documentclass{article}
3\usepackage{psfig}
4\usepackage{times}
5\usepackage{a4wide}
6\author{Frank Pilhofer}
7\title{The UUDeview Decoding Library}
8
9\providecommand{\ush}{\discretionary{-}{}{\_}}
10\providecommand{\uuversion}{0.5}
11\providecommand{\uupatch}{20}
12
13%
14% $Id: library.ltx,v 1.28 2004/03/01 23:06:20 fp Exp $
15%
16
17\begin{document}
18\maketitle
19\begin{abstract}
20The UUDeview library is a highly portable set of functions that
21provide facilities for decoding \emph{uuencoded}, \emph{xxencoded},
22\emph{Base64} and \emph{BinHex}-Encoded files as well as for
23encoding binary files into all of these representations except
24BinHex. This document describes how the features of encoding
25and decoding can be integrated into your own applications.
26
27The information is intended for developers only, and is not required
28reading material for end users. It is assumed that the reader is
29familiar with the general issue of encoding and decoding and has some
30experience with the ``C'' programming language.
31
32This document describes version \uuversion{}, patchlevel \uupatch{}
33of the library.
34\end{abstract}
35
36\section{Introduction}
37
38\subsection{Background}
39
40The Internet provides us with a fast and reliable means of user-to-user
41message delivery, using private email or newsgroups. Both systems have
42originally been designed to transport plain-text messages. Over the
43years, some methods appeared allowing transport of arbitrary binary
44data by ``encoding'' the data into plain-text messages. But after
45these years, there are still certain problems handling the encoded
46data, and many recipients have difficulties decoding the messages back
47into their original form.
48
49It should be the job of the mail delivery agent to handle sending and
50rend receiving binary data transparently. However, the support of most
51applications is limited, and several incompatibilities among different
52software exists.
53
54There are three common formats for encoding binary data, called
55\emph{uuencoding}, \emph{Base64} and \emph{BinHex}. Issues are further
56complicated by slight variations of the formats, the packaging, and
57some broken implementations.
58
59Further problems arise with multi-part postings, where the encoding
60of a huge file has been split up into several individual messages to
61ensure proper transfer over gateways with limited message sizes. Very
62few software is able to properly sort and decode the parts. Even
63nowadays, many users are at a loss to decode these kinds of messages.
64
65This is where the UUDeview Decoding Library steps in.
66
67\subsection{The Library}
68
69The UUDeview library makes an attempt at decoding nearly all
70kinds of encoded files. It is supposed to decode multi-part files as
71well as many files simultaneously. Part numbers are evaluated, thus
72making it possible to re-arrange parts that aren't in their correct
73order.
74
75No assumptions are made on the format of the input file. Usually the
76input will be an email folder or newsgroup messages. If this is the
77case, the information found in header lines is evaluated; but plain
78encoded files with no surrounding information are also accepted. The
79input may also consist of concatenated parts and files.
80
81Decoding files is done in two passes. During the first pass, all input
82files are scanned. Information is gathered about each chunk of encoded
83data. Besides the obvious data about type, position and size of the
84chunk, some environmental information from the envelope of a mail
85message is also gathered if available.
86
87If the scanner finds a properly MIME-formatted message, a proper MIME
88parser steps into action. Because MIME messages include precise
89information about the message's contents, there is seldom doubt about
90its parts.
91
92For other, non-MIME messages, the ``Subject'' header line is closely
93examined. Two informations are extracted: the part number (usually
94given in parentheses) and a unique identifier, which is used to group
95series of postings. If the subject is, for example, ``uudeview.tgz
96(01/04)'', the scanner concludes that this message is the first in a
97series of four, and the indicated filename is an ideal key to identify
98each of the four parts.
99
100If the subject is incomplete (no part number) or missing, the scanner
101tries to make the best of the available information, but some of the
102advanced features won't work. For example, without any information
103about the part number, it must be assumed that the available parts are
104in correct order and can't be automatically rearranged.
105
106All the information is gathered in a linked list. An application can
107then examine the nodes of the list and pick individual items for
108decoding. The decoding functions will then visit the parts of a file
109in correct order and extract the binary data.
110
111Because of heavy testing of the routines against real-life data
112and many problem reports from users, the functions have become very
113robust, even against input files with few, missing or broken
114information.
115
116\begin{figure}
117\centering
118\makebox{\input{structure.tex}}
119\caption{Integration of the Library}
120\label{structure}
121\end{figure}
122
123Figure \ref{structure} displays how the library can be integrated into
124an application. The library does not assume any capabilities of the
125operating system or application language, and can thus be used in
126almost any environment. The few necessary interfaces must be provided
127by the application, which does usually know a great deal more about
128the target system.
129
130The idea of the ``language interface'' is to allow integration of the
131library services into other programming languages; if the application
132is itself written in C, there's no need for a separate interface, of
133course. Such an interface currently exists for the Tcl scripting
134language; other examples might be Visual Basic, Perl or Delphi.
135
136\subsection{Terminology}
137
138These are some buzzwords that will be used in the following text.
139\begin{itemize}
140\item
141``Encoded data'' is binary data encoded by one of the methods
142``uuencoding'', ``xxencoding'', ``Base64'' or ``BinHex''.
143\item
144``Message'' refers to both complete email messages and Usenet news
145postings, including the complete headers. The format of a message is
146described in \cite{rfc0822}. A ``message body'' is an email message
147or news posting without headers.
148\item
149A ``mail folder'' is a number of concatenated messages.
150\item
151``MIME'' refers to the standards set in \cite{rfc1521}.
152\item
153A ``multipart message'' is an entity described by the MIME
154standard. It is a single message divided into one or more individual
155parts by a unique boundary.
156\item
157A ``partial message'' is also described by the MIME standard. It is a
158message with an associated identifier and a part number. Large
159messages can be split into multiple partial messages on the sender's
160side. The recipient's software groups the partial messages by their
161identifier and composes them back into the original large message.
162\item
163The term ``partial message'' only refers to \emph{one part} of the
164large message. The original, partialized message is referred to as
165``multi-part message'' (note the hyphen). To clarify, one part of a
166multi-part message is a partial message.
167\end{itemize}
168
169\section{Compiling the Library}
170
171On Unix systems, configuration and compilation is trivial. The
172script \texttt{configure} automatically checks your
173system and configures the library appropriately. A subsequent
174``make'' compiles the modules and builds the final library.
175
176On other systems, you must manually create the configuration file and
177the Makefile. The configuration file \texttt{config.h} contains a set
178of preprocessor definitions and macros that describe the available
179features on your systems.
180
181\subsection{Creating \texttt{config.h} by hand}
182
183You can find all available definitions in \texttt{config.h.in}. This
184file undefines all possible definitions; you can create your own
185configuration file starting from \texttt{config.h.in} and editing the
186necessary differences.
187
188Most definitions are either present or absent, only a few need to have
189a value. If not explicitly mentioned, you can activate a definition
190by changing the default \texttt{undef} into \texttt{define}.
191The following definitions are available:
192
193\subsubsection{System Specific}
194
195\begin{description}
196\item[\texttt{SYSTEM\_DOS}]
197Define for compilation on a \emph{DOS} system. Currently unused.
198\item[\texttt{SYSTEM\_QUICKWIN}]
199Define for compilation within a \emph{QuickWin}\footnote{The
200Microsoft compilers offer the \emph{QuickWin} target to allow
201terminal-oriented programs to run in the Windows environment}
202program. Currently unused.
203\item[\texttt{SYSTEM\_WINDLL}]
204Causes all modules to include \texttt{<windows.h>} before any other
205include file. Makes \texttt{uulib.c} export a
206\texttt{Dll\-Entry\-Point} function.
207\item[\texttt{SYSTEM\_OS2}]
208Causes all modules to include \texttt{<os2.h>} before any other
209include file.
210\end{description}
211
212\subsubsection{Compiler Specific}
213
214\begin{description}
215\item[\texttt{PROTOTYPES}]
216Define if your compiler supports function prototypes.
217\item[\texttt{UUEXPORT}]
218This can be a declaration to all functions exported from the decoding
219library. Frequently needed when compiling into a shared library.
220\item[\texttt{TOOLEXPORT}]
221Similar to \texttt{TOOL\-EXPORT}, but for the helper functions from
222the replacement functions in \texttt{fptools.c}.
223\end{description}
224
225\subsubsection{Header Files}
226
227There are a number of options that define whether header files are
228available on your system. Don't worry if some of them are not. If a
229header file is present, define ``\texttt{HAVE\_}\emph{name-of-header}'':
230\texttt{HAVE\ush{}ERRNO\_H},
231\texttt{HAVE\ush{}FCNTL\_H},
232\texttt{HAVE\ush{}IO\_H},
233\texttt{HAVE\ush{}MALLOC\_H},
234\texttt{HAVE\ush{}MEMORY\_H},
235\texttt{HAVE\ush{}UNISTD\_H} and
236\texttt{HAVE\ush{}SYS\_TIME\_H}
237(for \texttt{<sys/time.h>}). Some other include files are needed
238as well, but there are no macros for mandatory include files.
239
240There's also a number of header-specific definitions that do not fit
241into the general present-or-not-present scheme.
242
243\begin{description}
244\item[\texttt{STDC\_HEADERS}]
245Define if your header files conform to \emph{ANSI C}. This requires
246that \texttt{stdarg.h} is present, that \texttt{stdlib.h} is
247available, defining both \texttt{malloc()} and \texttt{free()}, and
248that \texttt{string.h} defines the memory functions family
249(\texttt{memcpy()} etc).
250\item[\texttt{HAVE\_STDARG\_H}]
251Implicitly set by \texttt{STDC\ush{}HEADERS}. You only need to define
252this one if \texttt{STDC\ush{}HEADERS} is not defined but
253\texttt{<stdarg.h>} is available.
254\item[\texttt{HAVE\_VARARGS\_H}]
255\emph{varargs} can be used as an alternative to \emph{stdarg}. Define
256if the above two values are undefined and \texttt{<varargs.h>} is
257available.
258\item[\texttt{TIME\_WITH\_SYS\_TIME}]
259Define if \texttt{HAVE\ush{}SYS\ush{}TIME\_H} and if both \texttt{<sys/time.h>}
260and \texttt{<time.h>} can be included without conflicting definitions.
261\end{description}
262
263\subsubsection{Functions}
264
265\begin{description}
266\item[\texttt{HAVE\_STDIO}]
267Define if standard I/O (\texttt{stdin}, \texttt{stdout} and
268\texttt{stderr}) is available.
269\item[\texttt{HAVE\_GETTIMEOFDAY}]
270Define if your system provides the \texttt{gettimeofday()} system
271call, which is needed to provide microsecond resolution to the
272busy callback. If this function is not available, \texttt{time()} is
273used.
274\end{description}
275
276\subsubsection{Replacement Functions}
277
278The tools library \texttt{fptools} defines many functions that aren't
279standard on all systems. Most of them do not differ in behavior from
280their originals, but might be slightly slower. But since they are
281usually only needed in non-speed-critical sections, the replacements
282are used throughout the library. For a full listing of the available
283replacement functions, see section \ref{chap-rf}.
284
285However, there are two functions, \texttt{strerror} and
286\texttt{tempnam}, that aren't fully implemented. The replacement
287\texttt{strerror} does not have a table of error messages and only
288produces the error number as string, and the ``fake''
289\texttt{tempnam} does not necessarily use a proper temp directory.
290
291Because some functionality is missing, the replacement functions should
292\emph{only} be used if the original is not available.
293\begin{description}
294\item[\texttt{strerror}]
295If your system does not provide a \texttt{strerror} function of its
296own, define to \texttt{\_FP\_strerror}. This causes the replacement
297function to be used throughout the library.
298\item[\texttt{tempnam}]
299If your system does not provide a \texttt{tempnam} function of its
300own, define to \texttt{\_FP\_tempnam}. This causes the replacement
301function to be used throughout the library. Must not be defined if the
302function is in fact available.
303\end{description}
304
305\subsection{Creating the \texttt{Makefile} by hand}
306
307The \texttt{Makefile} is automatically generated by the configuration
308script from the template in \texttt{Makefile.in}. This section
309explains how the template must be edited into a proper Makefile.
310
311Just copy \texttt{Makefile.in} to \texttt{Makefile} and edit the
312place-holders for the following values.
313\begin{description}
314\item[\texttt{CC}]
315Your system's ``C'' compiler.
316\item[\texttt{CFLAGS}]
317The compilation flags to be passed to the compiler. This must include
318``-I.'' so that the include files from the local directory are found,
319and ``\mbox{-DHAVE\_CONFIG\_H}'' to declare that a configuration file
320is present.
321\item[\texttt{RANLIB}]
322Set to ``ranlib'' if such a program is available on your system, or to
323``:'' (colon) otherwise.
324\item[\texttt{VERSION}]
325A string holding the release number of the library, currently
326``\uuversion{}''
327\item[\texttt{PATCH}]
328A string holding the patchlevel, currently ``\uupatch{}''.
329\end{description}
330
331Some systems do not know Makefiles but offer the concept of a
332``project''.\footnote{Actually, most project-oriented systems compile
333the project definitions into a Makefile for use by the back-ends.} In
334this case, create a new project targeting a library and add all
335source codes to the project. Then, make sure that the include path
336includes the current directory. Add options to the compiler command
337so that the symbol ``HAVE\_CONFIG\_H'' gets defined.
338Additionally, the symbol ``VERSION'' must be defined as a
339string holding the release number, currently ``\uuversion{}'' and
340``PATCH'' must be defined as a string holding the patch level,
341currently ``\uupatch{}''.
342
343On 16-bit systems, the package should be compiled using the ``Large''
344memory model, so that more than just 64k data space is available.
345
346\subsection{Compiling your Projects}
347
348Compiling the parts of your project that use the functions from the
349decoding library is pretty straightforward:
350\begin{itemize}
351\item All modules that call library functions must include the
352\texttt{<uudeview.h>} header file.
353\item Optionally, if you want to use the replacement functions to make
354your own application more portable, they may also include
355\texttt{<fptools.h>}.
356\item If your compiler understands about function prototypes, define
357the symbol \texttt{PROTOTYPES}. This causes the library functions to
358be declared with a full parameter list.
359\item Modify the include file search path so that the compiler finds
360the include files (usually with the ``-I'' option).
361\item Link with the \texttt{libuu.a} library, usually using the
362``-luu'' option.
363\item Make sure the library is found (usually with the ``-L'' option).
364\end{itemize}
365
366\section{Callback Functions}
367
368\subsection{Intro}
369
370At some points, the decoding library offers to call your custom
371procedures to do jobs you want to take care of yourself. Some examples
372are the ``Message Callback'' to print a message or the ``Busy
373Callback'', which is frequently called during lengthy processing
374of data to indicate the progress. You can hook up your functions by
375calling some library function with a pointer to your function as a
376parameter.
377
378In some cases, you will want that one of your functions receives
379certain data as a parameter. One reason to achieve this would be
380through global data; another possibility is provided through the
381passing of an opaque data pointer.
382
383All callback functions are declared to take an additional parameter of
384type \texttt{void*}. When hooking up one of your callbacks, you can
385specify a value that will passed whenever your function is
386called. Since this pointer is never touched by the library, it can be
387any kind of data, usually some composed structure. Some application
388for the Message Callback might be a \texttt{FILE*} pointer to log the
389messages to.
390
391For portability reasons, you should declare your callbacks with the
392first parameter actually being a \texttt{void*} pointer and only cast
393this pointer to its real type within the function body. This prevents
394compiler warnings about the callback setup.
395
396\subsection{Message Callback}
397\label{Section-Msg-Callback}
398
399For portability reasons, the library does not assume the availability
400of a terminal, so it does not initially know where to print messages
401to. The library generates some messages about its progress as well
402as more serious warnings and errors. An application should provide a
403message callback that displays them. The function might also choose to
404ignore informative messages and only display the fatal ones.
405
406A Message Callback takes three parameters. The first one is the opaque
407data pointer of type \texttt{void*}. The second one is a text message
408of more or less arbitrary length without line breaks. The last
409parameter is an indicator of the seriousness of this message. A string
410representation of the warning level is also prefixed to the message.
411\begin{description}
412\item[\texttt{UUMSG\_MESSAGE}]
413This is just a plain informative message, nothing important. The
414application can choose to simply ignore the message. If a log file
415is available, it should be logged, but the message should never result
416in a modal dialogue.
417\item[\texttt{UUMSG\_NOTE}] ``Note:''
418Still an informative message, meaning that the library made a decision
419on its own that might interest the user. One example for a note is
420that the setuid bit has been stripped from a file mode for security
421reasons. Notes are nothing serious and may still be ignored.
422\item[\texttt{UUMSG\_WARNING}] ``Warning:''
423A warning indicates that a non-serious problem occurred which did not
424stop the library from proceeding with the current action. One example
425is a temporary file that could not be removed. Warnings should be
426displayed, but an application may decide to continue even without user
427intervention.
428\item[\texttt{UUMSG\_ERROR}] ``ERROR:''
429A problem occurred that caused termination of the current request, for
430example if the library tried to access a non-existing file. After an
431error has occurred, the application should closely examine the
432resulting return code of the operation. Error messages are usually
433printed in modal dialogues; another option is to save the error
434message string somewhere and later print the error message after the
435application has examined the operation's return value.
436\item[\texttt{UUMSG\_FATAL}] ``Fatal Error:''
437This would indicate that a serious problem has occurred that prevents
438the library from processing any more requests. Currently unused.
439\item[\texttt{UUMSG\_PANIC}] ``Panic:''
440Such a message would indicate a panic condition, meaning the
441application should terminate without further clean-up handling.
442Unused so far.\footnote{It is not intended that this and the previous
443error levels will ever be used. Currently, there's no need to include
444handling for them.}
445\end{description}
446
447\subsection{Busy Callback}
448\label{Section-Busy-Callback}
449
450Some library functions, like scanning of an input file or decoding an
451output file, can take quite some time. An application will usually
452want to inform the user of the progress. A custom ``Busy Callback''
453can be provided to take care of this job. This function will then be
454called frequently while a large action is being executed within the
455library. It is not called when the application itself has control.
456
457Apart from the usual opaque data pointer, the Busy Callback receives a
458structure of type \texttt{uuprogress} with the following members:
459\begin{description}
460\item[\texttt{action}]
461What the library is currently doing. One of the following integer
462constants:
463\begin{description}
464\item[\texttt{UUACT\_IDLE}]
465The library is idle. This value shouldn't be seen in the Busy
466Callback, because the Busy Callback is never called in an idle state.
467\item[\texttt{UUACT\_SCANNING}] Scanning an input file.
468\item[\texttt{UUACT\_DECODING}] Decoding a file.
469\item[\texttt{UUACT\_COPYING}]  Copying a file.
470\item[\texttt{UUACT\_ENCODING}] Encoding a file.
471\end{description}
472\item[\texttt{curfile}]
473The name of the file we're working on. May include the full
474path. Guaranteed to be 256 characters or shorter.
475\item[\texttt{partno}]
476When decoding a file, this is the current part number we're working
477on. May be zero.
478\item[\texttt{numparts}]
479The maximum part number of this file. Guaranteed to be positive
480(non-zero).
481\item[\texttt{percent}]
482The percentage of the current \emph{part} already processed. The total
483percentage can be calculated as $(100*partno-percent)/numparts$.
484\item[\texttt{fsize}]
485The size of the current file. The percent information is only valid if
486this field is \emph{positive}. Whenever the size of a file cannot be
487properly determined, this field is set to -1; in this case, the
488percent field may hold garbage.
489\end{description}
490
491In some cases, it is possible that the percent counter jumps
492backwards. This happens seldom enough not to worry about it, but the
493callback should take care not to crash in this case.\footnote{This
494happens if, in a MIME multipart posting, the final boundary cannot be
495found. After searching the boundary until the end-of-file, the scanner
496resets itself to the location of the previous boundary.}
497
498The Busy Callback is declared to return an integer value. If a
499\emph{non-zero} value is returned, the current operation from
500which the callback was called is canceled, which then aborts with
501a return value of \texttt{UURET\ush{}CANCEL} (see later).
502
503\subsection{File Callback}
504\label{Section-File-Callback}
505
506Input files are usually needed twice, first for scanning and then for
507decoding. If the input files are downloaded from a remote server,
508perhaps by \emph{NNTP}, they would have to be stored on the local disk
509and await further handling. However, the user may choose not to decode
510some files after all.
511
512If disk space is important, it is possible to install a ``File
513Callback''. When scanning a file, it is assigned an ``Id''. After
514scanning has completed, the application can delete the input file. If
515it should be required later on for decoding, the File Callback is
516called to map the Id back to a filename, possibly retrieving
517another copy and disposing of it afterwards.
518
519The File Callback receives four parameters. The first is the opaque
520data pointer, the second is the Id that was assigned to the file while
521scanning. The fourth parameter is an integer. If it is non-zero, then
522the function is supposed to retrieve the file in question, store it on
523local disk, and write the resulting filename into the area to which
524the third parameter (a \texttt{char*} pointer) points. A fourth
525parameter of zero indicates that the decoder is done handling the
526file, so that the function can decide whether or not to remove the
527file.
528
529The function must return \texttt{UURET\_OK} upon success, or any other
530appropriate error code upon failure.
531
532Since it can usually be assumed that disk space is plentily available,
533and storing a file is ``cheaper'' than retrieving it twice, this
534mechanism has not been used so far.
535
536\subsection{Filename Filter}
537\label{Section-FName-Filter}
538
539For portability reasons, the library does not make any assumptions of
540the legality of certain filenames. It will pick up a ``garbage'' file
541name from the encoded file and happily use it if not told
542otherwise. For example, on DOS systems many filenames must be
543truncated in order to be valid.
544
545If a ``Filename Filter'' is installed, the library will pass each
546potential filename to the filter and then use the filename that the
547filter function returns. The filter also has to remove all directory
548information from the filename -- the library itself does not know
549about directories at all.
550
551The filter function receives the potential filename as string and must
552return a pointer to a string with the corrected filename. It may
553either return a pointer to some position in the original string or a
554pointer to some static area, but it should not modify the source
555string.
556
557Two examples of filename filters can be found among the UUDeview
558distribution as \texttt{uufnflt.c}. The DOS filter function disposes
559directory information, uses only the first 8 characters of the base
560filename and the first three characters after the last '.'~(since a
561filename might have two extensions). Also, space characters are
562replaced by underscores. The Unix filter just returns a pointer to the
563filename part of the name (without directory information).
564
565The ``garbage'' filename mentioned above was just for the sake of
566argument. It is generally safe to assume that the input filename is
567not too weird; after all, it is a filename valid on \emph{some}
568system. Still, the user should always be granted the possibility of
569renaming a file before decoding it, to allow decoding of files with
570insane filenames.
571
572\section{The File List}
573\label{file-list}
574
575While scanning the input files, a linked list is built. Each node is
576of type \texttt{uulist} and describes one file, possibly composed of
577several parts. This section describes the members of the structure
578that may be of interest to an application.
579
580\begin{description}
581\item[\texttt{state}]
582Describes the state of this file. Either the value
583\texttt{UUFILE\ush{}READ}\footnote{This value should
584only appear internally, never to be seen by an application.} or a
585bitfield of the following values:
586\begin{description}
587\item[\texttt{UUFILE\_MISPART}]
588The file is missing at least one part. This bit is set if the part
589numbers are non-sequential. Usually results in incorrect decoding.
590\item[\texttt{UUFILE\_NOBEGIN}]
591No ``begin'' line was detected. Since \emph{Base64}
592files do not have begin lines, this bit is never set on them.
593For \emph{BinHex} files, the initial colon is used.
594\item[\texttt{UUFILE\_NOEND}]
595No ``end'' line was detected. Since \emph{Base64}
596files do not have end lines, this bit is never set on them. A missing
597end on \emph{uuencoded} or \emph{xxencoded} files usually means that
598the file is incomplete. For \emph{BinHex}, the trailing colon is
599used as end marker.
600\item[\texttt{UUFILE\_NODATA}]
601No encoded data was found within these parts.
602\item[\texttt{UUFILE\_OK}]
603This file appears to be okay, and decoding is likely to be successful.
604\item[\texttt{UUFILE\_ERROR}]
605A decode operation was attempted, but failed, usually because of an
606I/O error.
607\item[\texttt{UUFILE\_DECODED}]
608This file has already been successfully decoded.
609\item[\texttt{UUFILE\_TMPFILE}]
610The file has been decoded into a temporary file, which can be found
611using the \texttt{binfile} member (see below). This flag gets removed
612if the temporary file is deleted.
613\end{description}
614\item[\texttt{mode}]
615For \emph{uuencoded} and \emph{xxencoded} files, this is the file mode
616found on the ``begin'' line, \emph{Base64} and \emph{BinHex} files
617receive a default of 0644. A decode operation will try to restore this
618mode.
619\item[\texttt{uudet}]
620The type of encoding this file uses. May be 0 if
621\texttt{UUFILE\ush{}NODATA} or one of the following
622values:
623\begin{description}
624\item[\texttt{UU\_ENCODED}] for \emph{uuencoded} data,
625\item[\texttt{B64ENCODED}]  for \emph{Base64} encoded data,
626\item[\texttt{XX\_ENCODED}] for \emph{xxencoded} data,
627\item[\texttt{BH\_ENCODED}] for \emph{BinHex} data,
628\item[\texttt{PT\_ENCODED}] for plain-text ``data'', or
629\item[\texttt{QT\_ENCODED}] for MIME \emph{quoted-printable} encoded
630text.
631\end{description}
632\item[\texttt{size}]
633The approximate size of the resulting file. It is an estimated value
634and can be a few percent off the final value, hence the suggestion to
635display the size in kilobytes only.
636\item[\texttt{filename}]
637The filename. For \emph{uuencoded} and \emph{xxencoded} files, it is
638extracted from the ``begin'' line. The name of \emph{BinHex} files
639is encoded in the first data bytes. \emph{Base64} files have the
640filename given in the ``Content-Type'' header. This field may be
641\texttt{NULL} if \texttt{state!=UUFILE\ush{}OK}.
642\item[\texttt{subfname}]
643A unique identifier for this group of parts, usually derived from the
644``Subject'' header of each part. It is possible that two
645nodes with the same identifier exist in the file list: If a group of
646files is considered ``complete'', a new node is opened up for more
647parts with the same Id.
648\item[\texttt{mimeid}]
649Stores the ``id'' field from the ``Content-Type'' information if
650available. Actually, this Id is the first choice for grouping of
651files, but not surprisingly, non-MIME mails or articles do not have
652this information.
653\item[\texttt{mimetype}]
654Stores this part's ``Content-Type'' if available.
655\item[\texttt{binfile}]
656After decoding, this is the name of the temporary file the data was
657decoded to and stored in. This value is non-NULL if the flag
658\texttt{UUFILE\ush{}TMPFILE} is set in the state member above.
659\item[\texttt{haveparts}]
660The part numbers found for this group of files as a zero-terminated
661ordered integer array. Some extra care must be taken, because a file
662may have a zeroth part as its first part. Thus if
663\texttt{haveparts[0]} is zero, it indicates a zeroth part, and the
664list of parts continues. A file may have at most one zeroth part, so
665if both \texttt{haveparts[0]} and \texttt{haveparts[1]} are zero, the
666zeroth part is the only part of this file.
667
668No more than 256 parts are listed here.
669\item[\texttt{misparts}]
670Similar to \texttt{haveparts}; a zero-terminated ordered integer array
671of missing parts, or simply \texttt{NULL} if no parts are
672missing. Since we don't mind if a file doesn't have a zeroth part,
673this array does not have the above problems.
674\end{description}
675
676\section{Return Values}
677
678Most of the library functions return a value indicating success or the
679type of error occurred. The following values can be returned:
680
681\begin{description}
682\item[\texttt{UURET\_OK}]
683The action completed successfully.
684\item[\texttt{UURET\_IOERR}]
685An I/O error occurred. There may be many reasons from ``File not
686found'' to ``Disk full''. This return code indicates that the
687application should consult \texttt{errno} for more information.
688\item[\texttt{UURET\_NOMEM}]
689A \texttt{malloc()} operation returned \texttt{NULL}, indicating that
690memory resources are exhausted. Never seen this one in a VM system.
691\item[\texttt{UURET\_ILLVAL}]
692You tried to call some operation with invalid parameters.
693\item[\texttt{UURET\_NODATA}]
694An attempt was made to decode a file, but no encoded data was found
695within its parts. Also returned if decoding a \emph{uuencoded} or
696\emph{xxencoded} file with missing ``begin'' line.
697\item[\texttt{UURET\_NOEND}]
698A decoding operation was attempted, but the decoded data didn't have a
699proper ``end'' line. A similar condition can also be detected for
700\emph{BinHex} files (where the colon is used as end marker).
701\item[\texttt{UURET\_UNSUP}]
702You tried to encode using an unsupported communications channel, for
703example piping to a command on a system without pipes.
704\item[\texttt{UURET\_EXISTS}]
705The target file already exists (upon decoding), and you didn't allow
706to overwrite existing files.
707\item[\texttt{UURET\_CONT}]
708This is a special return code, indicating that the current operation
709must be continued. This return value is used only by two encoding
710functions, so see the documentation there.
711\item[\texttt{UURET\_CANCEL}]
712The current operation was canceled, meaning that the Busy Callback
713returned a non-zero value usually because of user request. The library
714does not produce this return value on its own, so if your Busy
715Callback always returns zero, there's no need to handle this
716``Error''.
717\end{description}
718
719\section{Options}
720\label{Section-Options}
721An application program can set and query a number of options. Some of
722them are read-only, but others can modify the behavior quite
723drastically. Some of them are intended to be set by the end user via
724an options menu.
725
726\begin{description}
727\item[\texttt{UUOPT\_VERSION}] {\small (string, read-only)} \\
728Retrieves the full version number of the library, composed as
729\emph{MA\-JOR}.\emph{MI\-NOR}\discretionary{}{}{}pl\emph{PATCH}
730(the major and minor version
731numbers and the patchlevel are integers).
732
733\item[\texttt{UUOPT\_FAST}] {\small (integer, default=0)} \\
734If set to 1, the library will assume that each input file consists of
735exactly one email message or newsgroup posting. After finding encoded
736data within a file, the scanner will not continue to look for more
737data below. This strategy can save a lot of time, but has the drawback
738that files also cannot be checked for completeness -- since the
739scanner does not look for ``end'' lines, we don't notice them missing.
740
741This flag does not have any effect on MIME multipart messages, which
742are always scanned to the end (alas, the Epilogue will be skipped).
743Actually, with this flag set, the scanner becomes more MIME-compliant.
744
745\item[\texttt{UUOPT\_DUMBNESS}] {\small (integer, default=0)} \\
746As already mentioned, the library evaluates
747information found in the part's ``Subject'' header line if
748available. The heuristics here are versatile, but cannot be guaranteed
749to be completely failure-proof. If false information is derived, the
750parts will be ordered and grouped wrong, resulting in wrong decoding.
751
752If the ``dumbness'' is set to 1, the code to derive a part number is
753disabled; it will then be assumed that all parts within a group appear
754in correct order: the first one is assigned number 1 etc. However,
755part numbers found in MIME-headers are still used (I haven't yet found
756a file where these were wrong).
757
758A dumbness of 2 also switches off the code to select a unique
759identifier from the subject line. This does still work with
760single-part files\footnote{Of course, this option wouldn't make sense
761with single-part files, since there's no ``grouping'' involved that
762might fail.} and \emph{might} work with multi-part files, as long as
763they're in correct order and not mixed. The filename is found on
764the first part and then passed on to the following parts.
765
766This option only takes effect for files scanned afterwards.
767
768\item[\texttt{UUOPT\_BRACKPOL}] {\small (integer, default=0)} \\
769Series of multi-part postings on the Usenet usually have subject lines
770like ``You must see this! [1/3] (2/4)''. How to parse this
771information? Is this the second part of four in a series of three
772postings, or is it the first of three parts and the second in a series
773of four postings? The library cannot know, and simply gives numbers in
774() parentheses precedence over number in [] brackets. If this
775assumption fails, the parts will be grouped and ordered completely
776wrong.
777
778Setting the ``bracket policy'' to 1 changes this precedence.
779If now both parentheses and brackets are present, the
780numbers within brackets will be evaluated first.
781
782This option only takes effect for files scanned afterwards.
783
784\item[\texttt{UUOPT\_VERBOSE}] {\small (integer, default=1)} \\
785If set to 0, the Message Callback will not be bothered with messages
786of level
787\texttt{UUMSG\ush{}MESSAGE} or
788\texttt{UUMSG\ush{}NOTE}.
789The default is to generate these messages.
790
791\item[\texttt{UUOPT\_DESPERATE}] {\small (integer, default=0)} \\
792By default, the library refuses to decode incomplete files and
793generates errors. But if switched into ``desperate mode'' these kinds
794of errors are ignored, and all \emph{available} data is decoded.
795The usefulness of the resulting corrupt file depends on the type of
796the file.
797
798\item[\texttt{UUOPT\_IGNREPLY}] {\small (integer, default=0)} \\
799If set to 1, the library will ignore email messages and news postings
800which were sent as ``Reply'', since they are less likely to feature
801useful data. There's no real reason to turn on this option any more
802(earlier versions got easily confused by replies).
803
804\item[\texttt{UUOPT\_OVERWRITE}] {\small (integer, default=1)} \\
805When the decoder finds that the target file already exists, it is
806simply overwritten silently by default. If this option is set to 0,
807the decoder fails instead, generating a
808\texttt{UURET\ush{}EXIST} error.
809
810\item[\texttt{UUOPT\_SAVEPATH}] {\small (string, default=(empty))} \\
811Without setting this option, files are decoded to the current
812directory. This ``save path'' is handled as prefix to each
813filename. Because the library does not know about directory layouts,
814the resulting filename is simply the concatenation of the save path
815and the target file, meaning that the path must include a final
816directory separator (slash, backslash, or whatever).
817
818\item[\texttt{UUOPT\_IGNMODE}] {\small (integer, default=0)} \\
819Usually, the decoder tries to restore the file mode found on the
820``begin'' line of \emph{uuencoded} and \emph{xxencoded} files. This is
821turned off if this option is set to 1.
822
823\item[\texttt{UUOPT\_DEBUG}] {\small (integer, default=0)} \\
824If set to 1, all messages will be prefixed with the exact sourcecode
825location (filename and line number) where they were created. Might be
826useful if this is not clear from context.
827
828\item[\texttt{UUOPT\_ERRNO}] {\small (integer, read-only)} \\
829This ``option'' can be queried after an operation failed with
830\texttt{UURET\ush{}IOERR} and returns the
831\texttt{errno} value that originally caused the problem. The ``real''
832value of this variable might already be obscured by secondary
833problems.
834
835\item[\texttt{UUOPT\_PROGRESS}] {\small (uuprogress, read-only)} \\
836Returns the progress structure. This would only make sense in
837multi-threaded environments where the decoder runs in one thread and
838is controlled from another. Although some care is taken while updating
839the structure's values, there might still be synchronization problems.
840
841\item[\texttt{UUOPT\_USETEXT}] {\small (integer, default=0)} \\
842If this flag is true, plain text files will be presented for
843``decoding''. This includes non-decodeable messages as well as
844plain-text parts from MIME multipart messages. Since they usually
845don't have associated filenames, a unique name will be created from a
846sequential four-digit number.
847
848\item[\texttt{UUOPT\_PREAMB}] {\small (integer, default=0)} \\
849Whether to use the plain-text preamble and epilogue from MIME
850multipart messages. The standard defines they're supposed to
851be ignored, so there's no real reason to set this option.
852
853\item[\texttt{UUOPT\_TINYB64}] {\small (integer, default=0)} \\
854Support for tiny Base64 data.
855If set to off, the scanner does not recognize stand-alone Base64
856encoded data with less than 3 lines. The problem is that in some
857cases plain text might be misinterpreted as Base64 data, since,
858for example, any four-character alphanumerical string like ``Argh''
859appearing on a line of its own is valid Base64 data. Since encoded
860files are usually longer, and there is considerable confusion about
861erroneous Base64 detection, this option is off by default. There's
862probably no need to present this option separately to the user. It's
863reasonable to associate it with the ``desperate mode'' described
864above.
865
866Note that this option only affects \emph{stand-alone} data. Input
867from Mime messages with the encoding type correctly specified in
868the ``Content-Transfer-Encoding'' header is always evaluated.
869
870There is also no problem with encoding types different than Base64,
871since they have an explicit notion of the beginning and end of a
872file, and no danger of misinterpretation exists.
873
874\item[\texttt{UUOPT\_ENCEXT}] {\small (string, default=(empty))} \\
875When encoding into a file on the local disk, the target files
876usually receive an extension composed of the three-digit part
877number. This may be considered inappropriate for single-part files.
878If this option is set, its value is attached to the base file name as
879extension for the target file. A dot `.' is inserted automatically.
880When using uuencoding, a sensible value might be ``uue''.
881
882This option does not alter the behaviour on multi-part files, where
883the individual parts always receive the three-digit part number as
884extension.
885
886\item[\texttt{UUOPT\_REMOVE}] {\small (integer, default=0)} \\
887If true, input files are deleted if data was successfully decoded from
888them. Be careful with this option, as the library does not care if the
889file contains any other useful information besides the decoded
890data. And it also does not and can not check the integrity of the
891decoded file. Therefore, if in doubt of the incoming data, you should
892do a confidence check first and then delete the relevant input files
893afterwards. But then, this option was requested by many users.
894
895\item[\texttt{UUOPT\_MOREMIME}] {\small (integer, default=0)} \\
896Makes the library behave more MIME-compliant. Normally, some liberties
897are taken with seemingly MIME files in order to find encoded data
898within it, therefore also finding files within broken MIME
899messages. If this option is set to 1, the library is more strict in
900its handling of MIME files, and will for example not allow Base 64
901data outside of properly tagged subparts, and will not accept
902``random'' encoded data.
903
904You can also set the value of this option to 2 to enforce strict MIME
905adherance. If the option is 1, the library will still look into plain
906text attachments and try to find encoded data within it. This causes
907for example uuencoded files that were then sent in a MIME envelope to
908be recognized. With an option value of 2, the library won't even do
909that, trusting all MIME header information.
910\end{description}
911
912\section{General Functions}
913
914After describing all the framework in the previous chapters, it is
915time to mention some function calls. Still, the functions presented
916here don't actually \emph{do} anything, they just query and modify the
917behavior of the core functions.
918
919\begin{description}
920\item[\texttt{int UUInitialize (void)}] \hfill{} \\
921This function initializes the library and must be called before any
922other decoding or encoding function. During initialization, several
923arrays are allocated. If memory is exhausted,
924\texttt{UURET\ush{}NOMEM} is returned, otherwise the initialization
925will return successfully with \texttt{UURET\ush{}OK}.
926\item[\texttt{int UUCleanUp (void)}] \hfill{} \\
927Cleans up all resources that have been allocated during a program run:
928memory structures, temporary files and everything. No library function
929may be called afterwards, with the exception of \texttt{UUInitialize}
930to start another run.
931\item[\texttt{int UUGetOption (int opt, int *ival, char *cval, int len)}] \hfill{} \\
932Retrieves the configuration option (see section \ref{Section-Options})
933opt. If the option is integer, it is stored in \texttt{ival} (only if
934\texttt{ival!=NULL}) and also returned as return value. String options
935are copied to \texttt{cval}. Including the final nullbyte, at most
936\texttt{len} characters are written to \texttt{cval}. If the progress
937information is queried with
938\texttt{UUOPT\ush{}PROGRESS}, \texttt{cval} must
939point to a \texttt{uuprogress} structure and \texttt{len} must equal
940\texttt{sizeof(uuprogress)}.
941
942For integer options, \texttt{cval} may be NULL and \texttt{len} 0 and
943vice versa: for string options, \texttt{ival} is not evaluated.
944\item[\texttt{int UUSetOption (int opt, int ival, char *cval)}] \hfill{} \\
945Sets one of the configuration options. Integer options are set via
946\texttt{ival} (\texttt{cval} may be \texttt{NULL}), and string options
947are copied from the null-ter\-mina\-ted string \texttt{cval}
948(\texttt{ival} may be 0). Returns
949\texttt{UURET\ush{}ILLVAL} if you try to set a
950read-only value, or \texttt{UURET\_OK} otherwise.
951\item[\texttt{char *UUstrerror (int errcode)}] \hfill{} \\
952Maps the return values \texttt{UURET\_*} into error messages:
953\begin{description}
954\item[\texttt{UURET\_OK}] ``OK''
955\item[\texttt{UURET\_IOERR}] ``File I/O Error''
956\item[\texttt{UURET\_NOMEM}] ``Not Enough Memory''
957\item[\texttt{UURET\_ILLVAL}] ``Illegal Value''
958\item[\texttt{UURET\_NODATA}] ``No Data found''
959\item[\texttt{UURET\_NOEND}] ``Unexpected End of File''
960\item[\texttt{UURET\_UNSUP}] ``Unsupported function''
961\item[\texttt{UURET\_EXISTS}] ``File exists''
962\end{description}
963\item[\texttt{int UUSetMsgCallback (void *opaque, void (*func) ())}] \hfill{} \\
964Sets the Message Callback function to \texttt{func} (see section
965\ref{Section-Msg-Callback}). \texttt{opaque} is the opaque data
966pointer that is passed untouched to the callback whenever it is
967called. To prevent compiler warnings, a prototype of the callback
968should appear before this line. Always returns
969\texttt{UURET\ush{}OK}. If \texttt{func==NULL}, the callback is
970disabled.
971\item[\texttt{int UUSetBusyCallback (void *, void (*func) (), long msecs)}] \hfill{} \\
972Sets the Busy Callback function to \texttt{func} (see section
973\ref{Section-Busy-Callback}). \texttt{msecs} gives a timespan in
974milliseconds; the library will try to call the callback after this
975timespan has passed. On some systems, the time can only be queried
976with second resolution -- in that case, timing will be quite
977inaccurate. The semantics for the other two parameters are the same as
978in the previous function. If \texttt{func==NULL}, the busy callback is
979disabled.
980\item[\texttt{int UUSetFileCallback (void *opaque, int (*func) ())}] \hfill{} \\
981Sets the File Callback function to \texttt{func} (see section
982\ref{Section-File-Callback}). Semantics identical to the previous
983two functions. There is no need to install a file callback if this
984feature isn't used.
985\item[\texttt{int UUSetFNameFilter (void *opaque, char * (*func) ())}] \hfill{} \\
986Sets the Filename Filter function to \texttt{func} (see section
987\ref{Section-FName-Filter}). Semantics identical to the previous
988three functions. If no filename filter is installed, any filename is
989accepted. This may result in failures to write a file because of an
990invalid filename.
991\item[\texttt{char * UUFNameFilter (char *fname)}] \hfill{} \\
992Calls the current filename filter on \texttt{fname}. This function is
993provided so that certain parts of applications do not need to know
994which filter is currently installed. This is handy for applications
995that are supposed to run on more than one system. If no filename
996filter is installed, the string itself is returned. Since a filename
997filter may return a pointer to static memory or a pointer into the
998parameter, the result from this function must not be written to.
999\end{description}
1000
1001\section{Decoding Functions}
1002
1003Now for the more useful functions. The functions within this section
1004are everything you need to scan and decode files.
1005
1006\begin{description}
1007\item[\texttt{int UULoadFile (char *fname, char *id, int delflag)}] \hfill{} \\
1008Scans a file for encoded data and inserts the result into the file
1009list. Each input file must only be scanned once; it may contain many
1010parts as well as multiple encoded files, thus it is possible that many
1011decodeable files are found after scanning one input file. On the other
1012hand it is also possible that \emph{no} decodeable data is
1013found. There is no limit to the number of files.\footnote{Strictly
1014speaking, the memory is of course limited. But try to fill a sensible
1015amount with structures in the 100-byte region.}
1016
1017If \texttt{id} is non-NULL, its value is used instead of the filename,
1018and the file callback is used to map the id back into a filename
1019whenever this input file is needed again. If \texttt{id} \emph{is}
1020\texttt{NULL}, then the input file must not be deleted or modified
1021until \texttt{UUCleanUp} has been called.
1022
1023If \texttt{delflag} is non-zero, the input file will automatically be
1024removed within \texttt{UUCleanUp}. This is useful when the decoder's
1025input are also temporary files -- this way, the application can forget
1026about them right after they're ``loaded''. The value of
1027\texttt{delflag} is ignored, however, if \texttt{id} is non-NULL;
1028combining both options does not make sense.
1029
1030The behavior of this function is influenced by some of the options,
1031most notably \texttt{UUOPT\ush{}FAST}. The two most
1032probable return values are \texttt{UURET\ush{}OK}, indicating
1033successful completion, or \texttt{UURET\ush{}IOERR} in case of some
1034error while reading the file. The other return values are less likely
1035to appear.
1036
1037Note that files are even scheduled for destruction if an error
1038\emph{did} happen during scanning (with the exception of a file that
1039could not be opened). But error handling is slightly problematic here
1040anyway, since it might be possible that useful data was found before
1041the error occurred.
1042
1043\item[\texttt{int UULoadFileWithPartNo (char *fname, char *id, int delflag, int partno)}] \hfill{} \\
1044Same as above, but assigns a part number to the data in the file. This
1045function can be used if the callee is certain of the part number and
1046there is thus no need to depend on UUDeview's heuristics. However, it
1047must not be used if the referenced file may contain more than one
1048piece of encoded data.
1049
1050\item[\texttt{uulist * UUGetFileListItem (int num)}] \hfill{} \\
1051Returns a pointer to the \texttt{num}th item of the file list. The
1052elements of this structure are described in section \ref{file-list}.
1053The list is zero-based. If \texttt{num} is out-of-range, \texttt{NULL}
1054is returned. Usage of this function is pretty straightforward: loop
1055with an increasing value until \texttt{NULL} is returned. The
1056structure must not be modified by the application itself. Also, none
1057of the structure's value should be ``cached'' elsewhere, as they are
1058not constant: they may change after each loaded file.
1059
1060\item[\texttt{int UURenameFile (uulist *item, char *newname)}] \hfill{} \\
1061Renames one item of the file list. The old name is discarded and
1062replaced by \texttt{newname}. The new name is copied and may thus
1063point to volatile memory. The name should be a local filename without
1064any directory information, which would be stripped by the filename
1065filter anyway.
1066
1067\item[\texttt{int UUDecodeToTemp (uulist *item)}] \hfill{} \\
1068Decodes the given item of the file list and places the decoded output
1069into a temporary file. This is intended to allow ``previews'' of an
1070encoded file without copying it to its final location (which would
1071probably overwrite other files). The name of the temporary file can be
1072retrieved afterwards by re-retrieving the node of the file list and
1073looking at its \texttt{binfile} member.
1074
1075\texttt{UURET\ush{}OK} is returned upon successful completion. Most
1076other error codes can occur, too. \texttt{UURET\ush{}NODATA} is
1077returned if you try to decode parts without encoded data or with a
1078missing beginning (\emph{uuencoded} and \emph{xxencoded} files only)
1079-- of course, this condition would also have been obvious from the
1080\texttt{state} value of the file list structure.
1081
1082The setting of \texttt{UUOPT\ush{}DESPERATE} changes the behavior if
1083an unexpected end of file was found (usually meaning that one or more
1084parts are missing). Normally, the partially-written target file is
1085removed and the value \texttt{UURET\ush{}NOEND} is returned. In
1086desperate mode, the same error code is returned, but the target file
1087is not removed.
1088
1089The target file is removed in all other error conditions.
1090
1091\item[\texttt{int UURemoveTemp (uulist *item)}] \hfill{} \\
1092After a file has been decoded into a temporary file and is needed no
1093longer, this function can be called to free the disk space immediately
1094instead of having to wait until \texttt{UUCleanUp}. If a decode
1095operation is called for later on, the file will simply be recreated.
1096
1097\item[\texttt{int UUDecodeFile (uulist *item, char *target)}] \hfill{} \\
1098This is the function you have been waiting for. The file is decoded
1099and copied to its final location. Calling \texttt{UUDecodeToTemp}
1100beforehand is not required. If \texttt{target} is non-NULL, then it is
1101immediately used as filename for the target file (without prepending
1102the save path and without passing it through the filename
1103filter). Otherwise, if \texttt{target==NULL}, the final filename is
1104composed by concatenating the save path and the result of the filename
1105filter used upon the filename found in the encoded file.
1106
1107If the target file already exists, the value of the
1108\texttt{UUOPT\ush{}OVERWRITE} option is checked. If it is false
1109(zero), then the error \texttt{UURET\ush{}EXISTS} is generated and
1110decoding fails. If the option is true, the target file is silently
1111overwritten.\footnote{If we don't have permission to overwrite the
1112target file, an I/O error is generated.}
1113
1114The file is first decoded into a temporary file, then the temporary
1115file is copied to the final location. This is done to prevent
1116overwriting target files with data that turns out too late to be
1117invalid.
1118
1119\item[\texttt{int UUInfoFile (uulist *item, void *opaque, int (*func) ())}] \hfill{} \\
1120This function can be used to query information about the encoded
1121file. This is either the zeroth part of a file if available, or the
1122beginning of the first part up to the encoded data otherwise. Once
1123again, a callback function is used to do the job. \texttt{func} must
1124be a function with two parameters. The first one is an opaque data
1125pointer (the value of \texttt{opaque}), the other is one line of info
1126about the file (at maximum, 512 bytes). The callback is called for
1127each line of info.
1128
1129The callback can return either zero, meaning that it can accept more
1130data, or non-zero, which immediately stops retrieval of more
1131information.
1132
1133Usually, the opaque pointer holds some information about a text
1134window, so that the callback knows where to print the next line. In
1135a terminal-oriented application, the user can be queried each 25th
1136line and the callback can return non-zero if the user doesn't wish to
1137continue.
1138
1139\item[\texttt{int UUSmerge (int pass)}] \hfill{} \\
1140Attempts a ``Smart Merge'' of parts that seem to belong to different
1141files but which \emph{could} belong to the same. Occasionally, you
1142will find a posting with parts 1 to 3 and 5 to 8 of ``picture.gif''
1143and part 4 of ``picure.gif'' (note the spelling). To the human, it is
1144obvious that these parts belong together, to a machine, it is
1145not. This function attempts to detect these conditions and merge the
1146appropriate parts together. This function must be called repeatedly
1147with increasing values for ``pass'': With \texttt{pass==0}, only
1148immediate fits are merged, increasing values allow greater
1149``distances'' between part numbers,
1150
1151This function is a bunch of heuristics, and I don't really trust
1152them. In some cases, the ``smart'' merge may do more harm than
1153good. This function should only be called as last resort on explicit
1154user request. The first call should be made with \texttt{pass==0},
1155then with \texttt{pass==1} and at last with \texttt{pass=99}.
1156\end{description}
1157
1158\section{Encoding Functions}
1159
1160There are a couple of functions to encode data into a file. You will
1161usually need no more than one of them, depending on the job you want
1162to do. The functions also differ in the headers they generate. Some
1163functions do generate full MIME-compliant headers. This may sound like
1164the best choice, but it's not always the wisest choice. Please follow
1165the following guidelines.
1166
1167\begin{itemize}
1168\item
1169Do not produce MIME-compliant messages if you cannot guarantee their
1170proper handling. For example, if you create a MIME-compliant message
1171on disk, and the user \emph{includes} this file in a text message, the
1172headers produced for the encoded data become not part of the final
1173message's header but are just included in the message body. The
1174resulting message will \emph{not} be MIME-compliant!
1175\item
1176Take it from the author that slightly-different-than-MIME messages
1177give the recipient much worse headaches than messages that do not try
1178to be MIME in the first place.
1179\item
1180Because of that, headers should \emph{only} be generated if the
1181application itself handles the final mailing or posting of the
1182message. Do not rely on user actions.
1183\item
1184Do not encode to \emph{Base64} outside of MIME messages. Because some
1185information like the filename is only available in the MIME-message
1186framework, \emph{Base64} doesn't make much sense without it.
1187\item
1188However, if you can guarantee proper MIME handling, \emph{Base64}
1189should be favored above the other types of encoding. Most
1190MIME-compliant applications do not know the other encoding types.
1191\end{itemize}
1192
1193All of the functions have a bunch of parameters for greater
1194flexibility. Don't be confused by their number, usually you'll need to
1195fill only a few of them. There's a number of common parameters which
1196can be explained separately:
1197
1198\begin{description}
1199\item[\texttt{FILE *outfile}] \hfill{} \\
1200The output stream, where the encoded data is written to.
1201\item[\texttt{FILE *infile, char *infname}] \hfill{} \\
1202Where the input data shall be read from. Only one of both values must
1203be specified, the other can be NULL.
1204\item[\texttt{char *outfname}] \hfill{} \\
1205The name by which the recipient will receive the file. It is used on
1206the ``begin'' line for \emph{uuencoded} and \emph{xxencoded} data, and
1207in the headers of MIME-formatted messages. If this parameter is NULL,
1208it defaults to \texttt{infname}. It must be specified if data is read
1209from a stream and \texttt{infname==NULL}.
1210\item[\texttt{int filemode}] \hfill{} \\
1211For \emph{uuencoded} and \emph{xxencoded} data, the file permissions
1212are encoded into the ``begin'' line. This mode can be specified
1213here. If the value is 0, it will be determined by performing a
1214\texttt{stat()} call on the input file. If this call should fail, a
1215value of 0644 is used as default.
1216\item[\texttt{int encoding}] \hfill{} \\
1217The encoding to use. One of the three constants \texttt{UU\ush{}ENCODED},
1218\texttt{XX\ush{}ENCODED} or \texttt{B64\-ENCODED}.
1219\end{description}
1220
1221Now for the functions \dots{}
1222
1223\begin{description}
1224\item[\texttt{\begin{tabular}{ll}%
1225int UUEncodeMulti & (FILE *outfile, FILE *infile, \\
1226		  & ~char *infname, int encoding, \\
1227		  & ~char *outfname, char *mimetype, \\
1228		  & ~int filemode) \\
1229\end{tabular}}] \hfill{} \\
1230Encodes data into a subpart of a MIME ``multipart'' message.
1231Appropriate ``Content-Type'' headers are produced, followed by
1232the encoded data. The application must provide the envelope and
1233boundary lines. If \texttt{mimetype!=NULL}, it is used as value
1234for the ``Content-Type'' field, otherwise, the extension from
1235\texttt{outfname} or \texttt{infname} (if \texttt{outfname==NULL})
1236is used to look up the relevant type name.
1237
1238\item[\texttt{\begin{tabular}{ll}%
1239int UUEncodePartial & (FILE *outfile, FILE *infile, \\
1240		    & ~char *infname, int encoding, \\
1241		    & ~char *outfname, char *mimetype, \\
1242		    & ~int filemode, int partno, \\
1243		    & ~long linperfile) \\
1244\end{tabular}}] \hfill{} \\
1245Encodes data as the body of a MIME ``message/partial'' message. This
1246type allows message fragmentation. This function must be called
1247repetitively until it runs out of input data. The application must
1248provide a valid envelope with a ``message/partial'' content type and
1249proper information about the part numbers.
1250
1251Each call produces \texttt{linperfile} lines of encoded output. For
1252\emph{uuencoded} and \emph{xxencoded} files, each output line encodes
125345 bytes of input data, each \emph{Base64} line encodes 57 bytes.
1254If \texttt{linperfile==0}, this function is equivalent to
1255\texttt{UUEncodeMulti}.
1256
1257Different handling is necessary when reading from an input stream
1258(if \texttt{infile!=NULL}) compared to reading from a file
1259(if \texttt{infname!=NULL}). In the first case, the function must be
1260called until \texttt{feof()} becomes true on the input file, or an
1261error occurs. In the second case, the file will be opened
1262internally. Instead of \texttt{UURET\ush{}OK}, a value of
1263\texttt{UURET\ush{}CONT} is returned for all but the last part.
1264
1265\item[\texttt{\begin{tabular}{ll}%
1266int UUEncodeToStream & (FILE *outfile, FILE *infile, \\
1267		     & ~char *infname, int encoding, \\
1268		     & ~char *outfname, int filemode) \\
1269\end{tabular}}] \hfill{} \\
1270Encodes the input data and sends the plain output without any
1271headers to the output stream. Be aware that for \emph{Base64}, the
1272output does not include any information about the filename.
1273
1274\item[\texttt{\begin{tabular}{ll}%
1275int UUEncodeToFile & (FILE *infile, char *infname, \\
1276		   & ~int encoding, char *outfname, \\
1277		   & ~char *diskname, long linperfile) \\
1278\end{tabular}}] \hfill{} \\
1279Encodes the input data and writes the output into one or more output
1280files on the local disk. No headers are generated. If
1281\texttt{diskname==NULL}, the names of the encoded files are generated
1282by concatenating the save path (see the \texttt{UUOPT\ush{}SAVEPATH}
1283option) and the base name of \texttt{outfname} or \texttt{infname}
1284(if \texttt{outfname==NULL}).
1285
1286If \texttt{diskname!=NULL} and does not contain directory information,
1287the target filename is the concatenation of the save path and
1288\texttt{diskname}. If \texttt{diskname} is an absolute path name, it
1289is used itself.
1290
1291From the so-generated target filename, the extension is stripped. For
1292single-part output files, the extension set with the
1293\texttt{UUOPT\ush{}ENCEXT} option is used. Otherwise, the three-digit
1294part number is used as extension. If the destination file does already
1295exist, the value of the \texttt{UUOPT\ush{}OVERWRITE} is checked; if
1296overwriting is not allowed, encoding fails with
1297\texttt{UURET\ush{}EXISTS}.
1298
1299\item[\texttt{\begin{tabular}{ll}%
1300int UUE\_PrepSingle  & (FILE *outfile, FILE *infile, \\
1301		    & ~char *infname, int encoding, \\
1302		    & ~char *outfname, int filemode, \\
1303		    & ~char *destination, char *from, \\
1304		    & ~char *subject, int isemail) \\
1305\end{tabular}}] \hfill{} \\
1306Produces a complete MIME-formatted message including all necessary
1307headers. The output from this function is usually fed directly into a
1308mail delivery agent which honors headers (like ``sendmail'' or
1309``inews'').
1310
1311If \texttt{from!=NULL}, it is sent as the sender's email address
1312in the ``From'' header field. Some MDA programs are able to provide
1313the sender's address themselves, so this value may be NULL in certain
1314cases.
1315
1316If \texttt{subject!=NULL}, the text is included in the ``Subject''
1317header field. The subject is extended with information about the file
1318name and part number (in this case, always ``(001/001)'').
1319
1320``Destination'' must not be NULL. Depending on the ``isemail'' flag,
1321its contents are sent either in the ``To'' or ``Newsgroups'' header
1322field.
1323
1324\item[\texttt{\begin{tabular}{ll}%
1325int UUE\_PrepPartial & (FILE *outfile, FILE *infile, \\
1326		     & ~char *infname, int encoding, \\
1327		     & ~char *outfname, int filemode, \\
1328		     & ~int partno, long linperfile, \\
1329		     & ~long filesize, \\
1330		     & ~char *destination, char *from, \\
1331		     & ~char *subject, int isemail) \\
1332\end{tabular}}] \hfill{} \\
1333Similar to \texttt{UUE\_PrepSingle}, but produces a complete
1334MIME-formatted ``message/partial'' message including all necessary
1335headers. The function must be called repetitively until it runs
1336out of input data. For more explanations, see the description of the
1337function \texttt{UUEncodePartial} above.
1338
1339The only additional parameter is \texttt{filesize}. Usually, this
1340value can be 0, as the size of the input file can usually be
1341determined by performing a \texttt{stat()} call. However, this might
1342not be possible if \texttt{infile} refers to a pipe. In that case, the
1343value of \texttt{filesize} is used.
1344
1345If the size of the input data cannot be determined, and
1346\texttt{filesize} is 0, the function refuses encoding into multiple
1347files and produces only a single stream of output.
1348
1349If data is read from a file instead from a stream
1350(\texttt{infile==NULL}), the function opens the file internally and
1351returns \texttt{UURET\ush{}CONT} instead of \texttt{UURET\ush{}OK} on
1352successful completion for all but the last part.
1353\end{description}
1354
1355\section{The Trivial Decoder}
1356
1357In this section, we implement and discuss the ``Trivial Decoder'',
1358which illustrates the use of the decoding functions. We start with the
1359absolute minimum and then add more features and actually end up with a
1360limited, but useful tool. For a full-scale frontend, look at the
1361implementation of the ``UUDeview'' program. The sample code can be
1362found among the documentation files as \texttt{\mbox{td-v1.c}},
1363\texttt{\mbox{td-v2.c}} and \texttt{\mbox{td-v3.c}}.
1364
1365\subsection{Version 1}
1366
1367\begin{figure}
1368\centering
1369\begin{small}
1370\rule{\textwidth}{1pt}
1371\begin{verbatim}
1372#include <stdio.h>
1373#include <stdlib.h>
1374#include <config.h>
1375#include <uudeview.h>
1376
1377int main (int argc, char *argv[])
1378{
1379  UUInitialize ();
1380  UULoadFile   (argv[1], NULL, 0);
1381  UUDecodeFile (UUGetFileListItem (0), NULL);
1382  UUCleanUp    ();
1383  return 0;
1384}
1385\end{verbatim}
1386\rule{\textwidth}{1pt}
1387\end{small}
1388\caption{The ``Trivial Decoder'', Version 1}
1389\label{td-v1}
1390\end{figure}
1391
1392The minimal decoding program is displayed in Figure \ref{td-v1}. Only
1393four code lines are needed for the implementation. \texttt{<stdlib.h>}
1394defines \texttt{NULL}, \texttt{<uudeview.h>} declares the decoding
1395library functions, and \texttt{<config.h>}, the library's
1396configuration file, is needed for some configuration
1397details\footnote{Actually, only the definition of \texttt{UUEXPORT}
1398is needed. You could omit \texttt{<config.h>} and define this value
1399elsewhere, for example in the project definitions.}.
1400
1401After initialization, the file given as first command line parameter
1402is scanned. No symbolic name is assigned to the file, so that we don't
1403need a file callback. After the scanning, the encoded file is decoded
1404and stored in the current directory by its native name.
1405
1406Of course, there is much to complain about:
1407\begin{itemize}
1408\item No error checking is done. For example, does the input file exist?
1409\item Only a single file can be scanned for encoded data.
1410\item If more than one encoded file is found, only the first one is
1411decoded, the others are ignored.
1412\item No checking is done if there actually \emph{is} encoded data in
1413the file and whether this data is valid.
1414\end{itemize}
1415
1416\subsection{Version 2}
1417
1418\begin{figure}
1419\centering
1420\begin{small}
1421\rule{\textwidth}{1pt}
1422\begin{verbatim}
1423#include <stdio.h>
1424#include <string.h>
1425#include <errno.h>
1426#include <stdlib.h>
1427#include <config.h>
1428#include <uudeview.h>
1429
1430int main (int argc, char *argv[])
1431{
1432  uulist *item;
1433  int i, res;
1434
1435  UUInitialize ();
1436  for (i=1; i<argc; i++)
1437    if ((res = UULoadFile (argv[i], NULL, 0)) != UURET_OK)
1438      fprintf (stderr, "could not load %s: %s\n",
1439               argv[i], (res==UURET_IOERR) ?
1440               strerror (UUGetOption (UUOPT_ERRNO, NULL, NULL, 0)) :
1441               UUstrerror(res));
1442
1443  for (i=0; (item=UUGetFileListItem(i)) != NULL; i++) {
1444    if ((item->state & UUFILE_OK) == 0)
1445      continue;
1446    if ((res = UUDecodeFile (item, NULL)) != UURET_OK) {
1447      fprintf (stderr, "error decoding %s: %s\n",
1448               (item->filename==NULL)?"oops":item->filename,
1449               (res==UURET_IOERR) ?
1450               strerror (UUGetOption (UUOPT_ERRNO, NULL, NULL, 0)) :
1451               UUstrerror(res));
1452    }
1453    else {
1454      printf ("successfully decoded '%s'\n", item->filename);
1455    }
1456  }
1457  UUCleanUp ();
1458  return 0;
1459}
1460\end{verbatim}
1461\rule{\textwidth}{1pt}
1462\end{small}
1463\caption{The ``Trivial Decoder'', Version 2}
1464\label{td-v2}
1465\end{figure}
1466
1467The second version, printed in figure \ref{td-v2}, addresses all of
1468the above problems. The code size more than tripled, but that's
1469largely because of the error messages.
1470
1471All files given on the command
1472line are scanned\footnote{With Microsoft compilers on MS-DOS systems,
1473don't forget to link with \texttt{setargv.obj} to properly handle
1474wildcards}, and all encoded files are decoded. Of course, it is now
1475also possible for an encoded file to span its parts over more than one
1476input file. Appropriate error messages are printed upon failure of any
1477step, and a success message is printed for successfully decoded files.
1478
1479Apart from the program's unfriendliness that there is no
1480user-interaction like selective decoding of files, choice of a target
1481directory etc., there are only three more items to complain about:
1482\begin{itemize}
1483\item Errors and other messages produced within the library aren't
1484displayed because there's no message callback.
1485\item No filename filter is installed, so decoding of files with
1486invalid filenames will fail; this especially includes filenames
1487with directory information.
1488\item No information is printed for invalid encoded files, or files
1489with missing parts (they're simply skipped).
1490\end{itemize}
1491
1492\subsection{Version 3}
1493
1494\begin{figure}
1495\centering
1496\begin{small}
1497\rule{\textwidth}{1pt}
1498{\small\emph{\dots{} right after the \#includes}} \\
1499\begin{verbatim}
1500#include <fptools.h>
1501
1502void MsgCallBack (void *opaque, char *msg, int level)
1503{
1504  fprintf (stderr, "%s\n", msg);
1505}
1506
1507char * FNameFilter (void *opaque, char *fname)
1508{
1509  static char dname[13];
1510  char *p1, *p2;
1511  int i;
1512
1513  if ((p1 = _FP_strrchr (fname, '/')) == NULL)
1514    p1 = fname;
1515  if ((p2 = _FP_strrchr (p1, '\\')) == NULL)
1516    p2 = p1;
1517  for (i=0, p1=dname; *p2 && *p2!='.' && i<8; i++)
1518    *p1++ = (*p2==' ')?(p2++,'_'):*p2++;
1519  while (*p2 && *p2 != '.') p2++;
1520  if ((*p1++ = *p2++) == '.')
1521    for (i=0; *p2 && *p2!='.' && i<3; i++)
1522      *p1++ = (*p2==' ')?(p2++,'_'):*p2++;
1523  *p1 = '\0';
1524  return dname;
1525}
1526\end{verbatim}
1527{\small\emph{\dots{} within \texttt{main()} after \texttt{UUInitialize}}} \\
1528\begin{verbatim}
1529  UUSetMsgCallback (NULL, MsgCallBack);
1530  UUSetFNameFilter (NULL, FNameFilter);
1531\end{verbatim}
1532{\small\emph{\dots{} replacing the main loop's \emph{else}}} \\
1533\begin{verbatim}
1534    else {
1535      printf ("successfully decoded '%s' as '%s'\n",
1536              item->filename,
1537              UUFNameFilter (item->filename));
1538    }
1539\end{verbatim}
1540\rule{\textwidth}{1pt}
1541\end{small}
1542\caption{Changes for Version 3}
1543\label{td-v3-diff}
1544\end{figure}
1545
1546This last section adds a simple filename filter (targeting at a DOS
1547system with 8.3 filenames) and a simple
1548message callback, which just dumps messages to the console. Figure
1549\ref{td-v3-diff} lists the changes with respect to version 2 (for the
1550full listing, refer to the source file on disk).
1551
1552The message callback, a one-liner, couldn't be simpler. The filename
1553filter will probably not win an award for good programming style, but
1554it does its job of stripping Unix-style or DOS-style directory names
1555and using only the first 8 characters of the base filename and the
1556first three characters of the extension. If the filename contains
1557space characters, they're replaced by underscores. Note that
1558\texttt{dname}, the storage for the resulting filename, is declared
1559static, as it must be accessible after the filter function has
1560returned.
1561
1562For portability, the filename filter uses a replacement function from
1563the \texttt{fptools} library instead of relying of a native implementation
1564of the \texttt{strrchr} function.
1565
1566Both callbacks are installed right after initializing the
1567library. Since now the filename of the decoded file may be
1568different from the filename of the file list structure, we recreate
1569the resulting filename by calling the filename filter ourselves for
1570display, so that the user knows where to look for the file.
1571
1572\section{Replacement functions}
1573\label{chap-rf}
1574
1575This section is a short reference for the replacement functions from
1576the \texttt{fptools} library. Some of them may be useful in the
1577application code as well. Most of these functions are pretty standard
1578in modern systems, but there's also a few from the author's
1579imagination. Each of the functions is tagged with information why this
1580replacement exists:
1581\begin{itemize}
1582\item ``nonstandard'' (ns): this function is available on some systems, but
1583not on others. Functions with this tag could be safely replaced with a
1584native implementation.
1585\item ``feature'' (f): the replacement adds some functionality with
1586respect to the ``original''.
1587\item ``author''(a): just a function the author considered useful.
1588\end{itemize}
1589
1590\begin{description}
1591\item[\texttt{void \_FP\_free (void *)}] {\small (f)} \\
1592ANSI C guarantees that \texttt{free()} can be safely called with a
1593\texttt{NULL} argument, but some old systems dump core. This
1594replacement just ignores a \texttt{NULL} pointer and passes anything
1595else to the original \texttt{free()}.
1596
1597\item[\texttt{char *\_FP\_strdup (char *ptr)}] {\small (ns)} \\
1598Allocates new storage for the string \texttt{ptr} and copies the
1599string including the final nullbyte to the new location (thus
1600``duplicating'' the string). Returns \texttt{NULL} if the
1601\texttt{malloc()} call fails.
1602
1603\item[\texttt{char *\_FP\_strncpy (char *dest, char *src, int count)}] {\small (f)} \\
1604Copies text from the \texttt{src} area to the \texttt{dest} area,
1605until either a nullbyte has been copied or \texttt{count} bytes have
1606been copied. Differs from the original in that if \texttt{src} is
1607longer than \texttt{count} bytes, then only \texttt{count}-1 bytes are
1608copied, and the destination area is properly terminated with a
1609nullbyte.
1610
1611\item[\texttt{void *\_FP\_memdup (void *ptr, int count)}] {\small (a)} \\
1612Allocates a new area of \texttt{count} bytes, which are then copied
1613from the \texttt{ptr} area.
1614
1615\item[\texttt{int \_FP\_stricmp (char *str1, char *str2)}] {\small (ns)} \\
1616Case-insensitive equivalent of \texttt{strcmp}.
1617
1618\item[\texttt{int \_FP\_strnicmp (char *str1, char *str2, int count)}] {\small (ns)} \\
1619Case-insensitive equivalent of \texttt{strncmp}.
1620
1621\item[\texttt{char *\_FP\_strrchr (char *string, int chr)}] {\small (ns)} \\
1622Similar to \texttt{strchr}, but returns a pointer to the last
1623occurrence of the character \texttt{chr} in \texttt{string}.
1624
1625\item[\texttt{char *\_FP\_strstr (char *str1, char *str2)}] {\small (ns)} \\
1626Returns a pointer to the first occurrence of \texttt{str2} in
1627\texttt{str1} or \texttt{NULL} if the second string does not appear
1628within the first.
1629
1630\item[\texttt{char *\_FP\_strrstr (char *str1, char *str2)}] {\small (ns)} \\
1631Similar to \texttt{strstr}, but returns a pointer to the last
1632occurrence of \texttt{str2} in \texttt{str1}.
1633
1634\item[\texttt{char *\_FP\_stristr (char *str1, char *str2)}] {\small (a)} \\
1635Case-insensitive equivalent of \texttt{strstr}.
1636
1637\item[\texttt{char *\_FP\_strirstr (char *str1, char *str2)}] {\small (a)} \\
1638Case-insensitive equivalent of \texttt{strrstr}.
1639
1640\item[\texttt{char *\_FP\_stoupper (char *string)}] {\small (a)} \\
1641Converts all alphabetic characters in \texttt{string} to uppercase.
1642
1643\item[\texttt{char *\_FP\_stolower (char *string)}] {\small (a)} \\
1644Converts all alphabetic characters in \texttt{string} to lowercase.
1645
1646\item[\texttt{int \_FP\_strmatch (char *str, char *pat)}] {\small (a)} \\
1647Performs glob-style pattern matching. \texttt{pat} is a string
1648containing regular characters and the two wildcards '?'
1649(question mark) and '*'. The question mark matches any single
1650character, the '*' matches any zero or more characters. If
1651\texttt{str} is matched by \texttt{pat}, the function returns 1,
1652otherwise 0.
1653
1654\item[\texttt{char *\_FP\_fgets (char *buf, int max, FILE *file)}] {\small (f)} \\
1655Extends the standard \texttt{fgets()}; this replacement is able to
1656handle line terminators from various systems. DOS text files have
1657their lines terminated by CRLF, Unix files by LF only and Mac files by
1658CR only. This function reads a line and replaces whatever line
1659terminator present with a single LF.
1660
1661\item[\texttt{char *\_FP\_strpbrk (char *str, char *accept)}] {\small (ns)} \\
1662Locates the first occurrence in the string \texttt{str} of any of
1663the characters in \texttt{accept}.
1664
1665\item[\texttt{char *\_FP\_strtok (char *str, char *del)}] {\small (ns)} \\
1666Considers the string \texttt{str} to be a sequence of tokens separated
1667by one or more of the delimiter characters given in \texttt{del}. Upon
1668first call with \texttt{str!=NULL}, returns the first token. Later
1669calls with \texttt{str==NULL} return the following tokens. Returns
1670\texttt{NULL} if no more tokens are found.
1671
1672\item[\texttt{char *\_FP\_cutdir (char *str)}] {\small (a)} \\
1673Returns the filename part of \texttt{str}, meaning everything after
1674the last slash or backslash in the string. Now replaced with the
1675concept of the filename filter.
1676
1677\item[\texttt{char *\_FP\_strerror (int errcode)}] {\small (ns)} \\
1678A rather dumb replacement of the original one, which transforms error
1679codes from \texttt{errno} into a human-readable error message. This
1680function should \emph{only} be used if no native implementation
1681exists; it just returns a string with the numerical error number.
1682
1683\item[\texttt{char *\_FP\_tempnam (char *dir, char *pfx)}] {\small (ns)} \\
1684The original is supposed to return a unique filename. The temporary
1685file should be stored in \texttt{dir} and have a prefix of
1686\texttt{pfx}. This replacement, too, should only be used if no native
1687implementation exists. It just returns a temporary filename created by
1688the standard \texttt{tmpnam()}, which not necessarily resides in a
1689proper \texttt{TEMP} directory. The value returned by this function is
1690an allocated memory area which must later be freed by calling
1691\texttt{free}.
1692\end{description}
1693
1694\section{Known Problems}
1695
1696This section mentions a few known problems with the library, which the
1697author considers to be ``features'' rather than ``bugs'', meaning that
1698they probably won't be ``fixed'' in the near future.
1699
1700\begin{itemize}
1701\item Encoding to \emph{BinHex} is not yet supported.
1702\item The checksums found in \emph{BinHex} files are ignored.
1703\item If both data and resource forks in a \emph{BinHex} file are
1704non-empty, the larger one is decoded. Non-Mac systems can only use one
1705of them anyway (usually the ``data'' fork, the ``resource'' fork
1706usually contains M68k or PPC machine code).
1707\end{itemize}
1708
1709\begin{thebibliography}{RFC1521}
1710\bibitem[RFC0822]{rfc0822} Crocker, D., ``Standard for the Format of
1711ARPA Internet Text Messages'', RFC 822, Network Working Group, August
17121982.
1713\bibitem[RFC1521]{rfc1521} Borenstein, N., ``MIME (Multipurpose
1714Internet Mail Extensions) Part One'', RFC 1521, Network Working Group,
1715September 1993.
1716\bibitem[RFC1741]{rfc1741} Faltstr\o{}m, P., Crocker, D. and Fair, E.,
1717``MIME Content Type for BinHex Encoded Files'', RFC 1741, Network
1718Working Group, December 1994.
1719\bibitem[RFC1806]{rfc1806} Troost, R., Dorner, S., ``The
1720Content-Disposition Header'', RFC 1806, Network Working Group, June
17211995.
1722\end{thebibliography}
1723
1724RFC documents (``Request for Comments'') can be downloaded from many
1725ftp sites around the world.
1726
1727\newpage
1728\begin{appendix}
1729
1730\section{Encoding Formats}
1731
1732The following sections describe the four most widely used formats
1733for encoding binary data into plain text, \emph{uuencoding},
1734\emph{xxencoding}, \emph{Base64} and \emph{BinHex}. Another section
1735shortly mentions \emph{Quoted-Printable} encoding.
1736
1737Other formats exist, like \emph{btoa} and \emph{ship}, but they are
1738not mentioned here. \emph{btoa} is much less efficient than the
1739others. \emph{ship} is slightly more efficient and will probably be
1740supported in future.
1741
1742Uuencoding, xxencoding and Base 64 basically work the same. They are
1743all ``three in four'' encodings, which means that they take three
1744octets\footnote{The term ``octet'' is used here instead of ``byte'',
1745since it more accurately reflects the 8-bit nature of what we
1746usually call a ``byte''} from the input file and encode them into four
1747characters.
1748
1749\begin{table}
1750\centering
1751\begin{tabular}{|r|c|c|c|c|c|c|c|c|}\hline
1752Input Octet     &1& & & & & & &  \\ \hline
1753Input Bit       &7&6&5&4&3&2&1&0 \\ \hline\hline
1754Output Data \#1 &5&4&3&2&1&0& &  \\ \hline
1755Output Data \#2 & & & & & & &5&4 \\ \hline\\[3mm]\hline
1756Input Octet     &2& & & & & & &  \\ \hline
1757Input Bit       &7&6&5&4&3&2&1&0 \\ \hline\hline
1758Output Data \#2 &3&2&1&0& & & &  \\ \hline
1759Output Data \#3 & & & & &5&4&3&2 \\ \hline\\[3mm]\hline
1760Input Octet     &3& & & & & & &  \\ \hline
1761Input Bit       &7&6&5&4&3&2&1&0 \\ \hline\hline
1762Output Data \#3 &1&0& & & & & &  \\ \hline
1763Output Data \#4 & & &5&4&3&2&1&0 \\ \hline
1764\end{tabular}
1765\caption{Bit mapping for Three-in-Four encoding}
1766\label{tab-3-in-4}
1767\end{table}
1768
1769Three bytes are 24 bits, and they are divided into 4 sections of 6
1770bits each. Table \ref{tab-3-in-4} describes in detail how the input
1771bits are copied into the output data bits. 6 bits can have values from
17720 to 63; each of the ``three in four'' encodings now uses a character
1773table with 64 entries, where each possible value is mapped to a
1774specific character.
1775
1776The advantage of three in four encodings is their simplicity, as
1777encoding and decoding can be done by mere bit shifting and two simple
1778tables (one for encoding, mapping values to characters, and one for
1779decoding, with the reverse mapping). The disadvantage is that the
1780encoded data is 33\% larger than the input (not counting line breaks
1781and other information added to the encoded data).
1782
1783The before-mentioned \emph{ship} data is more effective; it is a
1784so-called \emph{Base 85} encoding. Base 85 encodings take four input
1785bytes (32 bits) and encode them into five characters. Each of this
1786characters encode a value from 0 to 84; five characters can therefore
1787encode a value from 0 to $85^5=4437053125$, covering the complete 32
1788bit range. Base 85 encodings need more ``complicated'' math and a
1789larger character table, but result in only 25\% bigger encoded files.
1790
1791In order to illustrate the encodings and present some actual data, we
1792will present the following text encoded in each of the formats:
1793
1794\begin{quote}
1795\begin{small}
1796\begin{verbatim}
1797This is a test file for illustrating the various
1798encoding methods. Let's make this text longer than
179957 bytes to wrap lines with Base64 data, too.
1800Greetings, Frank Pilhofer
1801\end{verbatim}
1802\end{small}
1803\end{quote}
1804
1805\subsection{Uuencoding}
1806
1807A document actually describing uuencoding as a standard does not seem
1808to exist. This is probably the reason why there are so many broken
1809encoders and decoders around that each take their liberties with the
1810definition.
1811
1812The following text describe the pretty strict rules for uuencoding
1813that are used in the UUEnview encoding engine. The UUDeview decoding
1814engine is much more relaxed, according to the general rule that you
1815should be strict in all that you generate, but liberal in the data
1816that your receive.
1817
1818Uuencoded data always starts with a \texttt{begin} line and continues
1819until the \texttt{end} line. Encoded data starts on the line following
1820the begin. Immediately before the \texttt{end} line, there must be a
1821single \emph{empty} line (see below).
1822
1823\begin{quote}
1824\begin{small}
1825\texttt{begin} \emph{mode} \emph{filename} \\
1826\dots{} \emph{encoded data} \dots{} \\
1827\emph{``empty'' line} \\
1828\texttt{end}
1829\end{small}
1830\end{quote}
1831
1832\subsubsection{The \texttt{begin} Line}
1833
1834The \texttt{begin} line starts with the word \texttt{begin} in the
1835first column. It is followed, all on the same line, by the
1836\emph{mode} and the \emph{filename}.
1837
1838\emph{mode} is a three- or four-digit octal number, describing the
1839access permissions of the target file. This mode value is the same as
1840used with the Unix \texttt{chmod} command and by the \texttt{open}
1841system call. Each of the three digits is a binary or of the values 4
1842(read permission), 2 (write permission) and 1 (execute
1843permission). The first digit gives the user's permissions, the second
1844one the permissions for the group the user is in, and the third digit
1845describes everyone else's permissions. On DOS or other systems with
1846only a limited concept of file permissions, only the first digit
1847should be evaluated. If the ``2'' bit is not set, the resulting file
1848should be read-only, the ``1'' bit should be set for COM and EXE
1849files. Common values are \texttt{644} or \texttt{755}.
1850
1851\emph{filename} is the name of the file. The name \emph{should} be
1852without any directory information.
1853
1854\subsubsection{Encoded Data}
1855
1856The basic version of uencoding simply uses the ASCII characters 32-95
1857for encoding the 64 values of a three in for encoding. An
1858exception\footnote{\dots{} that is not always respected by old
1859encoders} is the value 0, which would normally map into the space
1860character (ASCII 32). To prevent problems with mailers that strip
1861space characters at the beginning or end of the line, character 96
1862``\,`\,'' is used instead. The encoding table is shown in table
1863\ref{tab-uu}.
1864
1865\begin{table}
1866\centering
1867\begin{tabular}{|r||c|c|c|c|c|c|c|c|}\hline
1868Data Value &+0&+1&+2&+3&+4&+5&+6&+7 \\ \hline\hline
1869         0 & `& !& "&\#&\$&\%&\&& ' \\ \hline
1870         8 & (& )& *& +& ,& -& .& / \\ \hline
1871        16 & 0& 1& 2& 3& 4& 5& 6& 7 \\ \hline
1872        24 & 8& 9& :& ;&\texttt{\symbol{60}}&=&\texttt{\symbol{62}}&?\\ \hline
1873        32 & @& A& B& C& D& E& F& G \\ \hline
1874        40 & H& I& J& K& L& M& N& O \\ \hline
1875        48 & P& Q& R& S& T& U& V& W \\ \hline
1876        56 & X& Y& Z& [&\texttt{\symbol{92}}&]&\symbol{94}&\_ \\ \hline
1877\end{tabular}
1878\caption{Encoding Table for Uuencoding}
1879\label{tab-uu}
1880\end{table}
1881
1882Each line of uuencoded data is prefixed, in the first column, with the
1883encoded number of encoded octets on this line. The most common prefix
1884that you'll see is `M'. By looking up `M' in table \ref{tab-uu}, we
1885see that it represents the number 45. Therefore, this prefix means
1886that the line contains 45 octets (which are encoded into 60 $(45/3*4)$
1887plain-text characters).
1888
1889In uuencoding, each line has the same length, normally, the length
1890(excluding the end of line character) is 61. Only the last line of
1891encoded data may be shorter.
1892
1893If the input data is not a multiple of three octets long, the last
1894triple is filled up with (one or two) nulls. The decoder can determine
1895the number of octets that are to go into the output file from the
1896prefix.
1897
1898\subsubsection{The Empty Line}
1899
1900After the last line of data, there must be an \emph{empty} line, which
1901must be a valid encoded line containing no encoded data. This is
1902achieved by having a line with the single character ``\,`\,'' on it
1903(which is the prefix that encodes the value of 0 octets).
1904
1905\subsubsection{The \texttt{end} Line}
1906
1907The encoded file is then ended with a line consisting of the word
1908\texttt{end}.
1909
1910\subsubsection{Splitting Files}
1911
1912Uuencoding does not describe a mechanism for splitting a file into two
1913or more messages for separate mailing or posting. Usually, the encoded
1914file is simply split into parts of more or less equal line
1915count\footnote{Of course, encoded files must be split on line
1916boundaries instead of at a fixed byte count.}. Before the age of smart
1917decoders, the recipient had to manually concatenate the parts and
1918remove the headers in between, because the headers of mail messages
1919\emph{might} just be valid uuencoded data lines, thus potentially
1920corrupting the data.
1921
1922\subsubsection{Variants of Uuencoding}
1923
1924There are many variations of the above rules which must be
1925taken into account in a decoder program. Here are the most
1926frequent:
1927
1928\begin{itemize}
1929\item Many old encoders do not pay attention to the special rule of
1930encoding the 0 value, and encode it into a space character instead of
1931the ``\,`\,'' character. This is not an ``error,'' but rather a
1932potential problem when mailing or posting the file.
1933\item Some encoders add a 62nd character to each encoded line:
1934sometimes a character looping from ``a'' to ``z'' over and over
1935again. This technique could be used to detect missing lines, but
1936confuses some decoders.
1937\item If the length of the input file is not a multiple of three, some
1938encoders omit the ``unnecessary'' characters at the end of the last
1939data line.
1940\item Sometimes, the ``empty'' data line at the end is omitted, and at
1941other times, the line is just completely empty (without the
1942``\,`\,'').
1943\end{itemize}
1944
1945There is also some confusion how to properly terminate a line. Most
1946encoders simply use the convention of the local system (DOS encoders
1947using CRLF, Unix encoders using LF, Mac encoders using CR), but with
1948respect to the MIME standard, the encoding library uses CRLF on all
1949systems. This causes a slight problem with some Unix decoders, which
1950look for ``end'' followed directly by LF (as four characters in
1951total). Such programs report ``end not found'', but nevertheless
1952decode the file correctly.
1953
1954\subsubsection{Example}
1955
1956This is what our sample text looks like as uuencoded data:
1957
1958\begin{small}
1959\begin{verbatim}
1960begin 600 test.txt
1961M5&AI<R!I<R!A('1E<W0@9FEL92!F;W(@:6QL=7-T<F%T:6YG('1H92!V87)I
1962M;W5S"F5N8V]D:6YG(&UE=&AO9',N($QE="=S(&UA:V4@=&AI<R!T97AT(&QO
1963M;F=E<B!T:&%N"C4W(&)Y=&5S('1O('=R87`@;&EN97,@=VET:"!"87-E-C0@
1964E9&%T82P@=&]O+@I'<F5E=&EN9W,L($9R86YK(%!I;&AO9F5R"@``
1965`
1966end
1967\end{verbatim}
1968\end{small}
1969
1970% ''
1971\subsection{Xxencoding}
1972
1973The xxencoding method was conceived shortly after the initial use of
1974uuencoding. The first implementations of uuencoding did not realize
1975the potential problem of using the space character for encoding
1976data. Before this mistake was workarounded with the special case,
1977another author used a different charset for encoding, composed of
1978characters available on any system.
1979
1980\begin{table}
1981\centering
1982\begin{tabular}{|r||c|c|c|c|c|c|c|c|}\hline
1983Data Value &+0&+1&+2&+3&+4&+5&+6&+7 \\ \hline\hline
1984         0 & +& -& 0& 1& 2& 3& 4& 5 \\ \hline
1985         8 & 6& 7& 8& 9& A& B& C& D \\ \hline
1986        16 & E& F& G& H& I& J& K& L \\ \hline
1987        24 & M& N& O& P& Q& R& S& T \\ \hline
1988        32 & U& V& W& X& Y& Z& a& b \\ \hline
1989        40 & c& d& e& f& g& h& i& j \\ \hline
1990        48 & k& l& m& n& o& p& q& r \\ \hline
1991        56 & s& t& u& v& w& x& y& z \\ \hline
1992\end{tabular}
1993\caption{Encoding Table for Xxencoding}
1994\label{tab-xx}
1995\end{table}
1996
1997Xxencoding is absolutely identical to uuencoding with the difference
1998of using a different mapping of data values into printable characters
1999(table \ref{tab-xx}). Instead of `M', a normal-sized xxencoded line is
2000prefixed by `h' (note that `h' encodes 45, just as `M' in uuencoding).
2001The empty data line at the end consists of a single `+' character. Our
2002sample file looks like the following:
2003
2004\begin{small}
2005\begin{verbatim}
2006begin 600 test.txt
2007hJ4VdQm-dQm-V65FZQrEUNaZgNG-aPr6UOKlgRLBoQa3oOKtb65FcNG-qML7d
2008hPrJn0aJiMqxYOKtb64pZR4VjN5Ai62lZR0Rn64pVOqIUR4VdQm-oNLVo64lj
2009hPaRZQW-oO43i0XIr647tR4Jn65Fj65RmML+UP4ZiNLAURqZoO0-0MLBZBXEU
2010ZN43oMGkUR4xj9Ud5QaJZR4ZiNrAg62NmMKtf63-dP4VjNaJm0U++
2011+
2012end
2013\end{verbatim}
2014\end{small}
2015
2016\subsection{Base64 encoding}
2017
2018\emph{Base 64} is part of the \emph{MIME} (Multipurpose Internet Mail
2019Extensions) standard, described in \cite{rfc1521}, section 5.2. Sometimes,
2020it is incorrectly referred to as ``MIME encoding''; however, the MIME
2021documents specify much more than just how to encode binary data. It
2022defines a complete framework for attachments within E-Mails. Being
2023part of a widely accepted standard, \emph{Base64} has the advantage
2024of being the best-specified type of encoding.
2025
2026\begin{table}
2027\centering
2028\begin{tabular}{|r||c|c|c|c|c|c|c|c|}\hline
2029Data Value &+0&+1&+2&+3&+4&+5&+6&+7 \\ \hline\hline
2030         0 & A& B& C& D& E& F& G& H \\ \hline
2031         8 & I& J& K& L& M& N& O& P \\ \hline
2032        16 & Q& R& S& T& U& V& W& X \\ \hline
2033        24 & Y& Z& a& b& c& d& e& f \\ \hline
2034        32 & g& h& i& j& k& l& m& n \\ \hline
2035        40 & o& p& q& r& s& t& u& v \\ \hline
2036        48 & w& x& y& z& 0& 1& 2& 3 \\ \hline
2037        56 & 4& 5& 6& 7& 8& 9& +& / \\ \hline
2038\end{tabular}
2039\caption{Encoding Table for Base64 Encoding}
2040\label{tab-b64}
2041\end{table}
2042
2043The general concept of three-in-four encoding is the same as with the
2044previous two types, just another new character table to represent the
2045values needs to be introduced (table \ref{tab-b64}). Note that this
2046table differs from the \emph{xxencoding} table only in a single
2047character (`/' versus `-'). If a line of encoding does not feature
2048either character, it may be difficult to tell which encoding is used
2049on the line.
2050
2051The \emph{Base64} encoding does not have ``begin'' and ``end'' lines;
2052such a concept is not needed, because the framework of a \emph{MIME}
2053message defines the beginning and end of a part. The encoded data is
2054defined to be a ``stream'' of characters, and the decoder is supposed
2055to ignore any ``illegal'' characters in the stream (such as line
2056breaks or other whitespace). Each line must be shorter than 80
2057characters and terminated with a CRLF sequence. No particular line
2058length is enforced, but most implementations encode 57 octets into 76
2059encoded characters. Theoretically, a line might hold 79 characters,
2060although this would violate the rule of thumb that the line length is
2061a multiple of four (therefore encoding an integral number of
2062octets).\footnote{Yes, there \emph{are} files violating this
2063assumption.}
2064
2065The end-of-file handling if the input data has not a multiple of three
2066octets is slightly different in \emph{Base64} encoding than it is in
2067uuencoding. If one octet is left at the end of the input stream, the
2068data is padded with 4 zero bits (giving a total of 12 bits) and
2069encoded into two characters. After that, two equal signs `=' are
2070written to complete the four character sequence. If two octets are
2071left, the data is padded with 2 zero bits (giving a total of 18 bits),
2072and encoded into three characters, after which a single equal sign `='
2073is written.
2074
2075Here's our sample file in \emph{Base64}. Note that this text is
2076\emph{only} the encoded data. It is not a valid \emph{MIME}
2077message. Without the required framework, no proper \emph{MIME}
2078software will read it.
2079
2080\begin{small}
2081\begin{verbatim}
2082VGhpcyBpcyBhIHRlc3QgZmlsZSBmb3IgaWxsdXN0cmF0aW5nIHRoZSB2YXJpb3VzCmVuY29kaW5n
2083IG1ldGhvZHMuIExldCdzIG1ha2UgdGhpcyB0ZXh0IGxvbmdlciB0aGFuCjU3IGJ5dGVzIHRvIHdy
2084YXAgbGluZXMgd2l0aCBCYXNlNjQgZGF0YSwgdG9vLgpHcmVldGluZ3MsIEZyYW5rIFBpbGhvZmVy
2085Cg==
2086\end{verbatim}
2087\end{small}
2088
2089For a more elaborate documentation of \emph{Base64} encoding and
2090details of the \emph{MIME} framework, I suggest reading \cite{rfc1521}.
2091
2092The \emph{MIME} standard also defines a way to split a message into
2093multiple parts so that re-assembly of the parts on the remote end is
2094easily possible. For details, see section 7.3.2, ``The Message/Partial
2095subtype'' of the standard.
2096
2097\subsection{BinHex encoding}
2098
2099The \emph{BinHex} encoding originates from the Macintosh environment,
2100and it takes the special properties of a Macintosh file into
2101account. There, a file has two parts or ``forks'': the ``resource''
2102fork holds machine code, and the ``data'' fork holds arbitrary
2103data. For files from other systems, the data fork is usually empty.
2104
2105I have not found a ``definitive'' definition of the format. My
2106knowledge is based on two descriptions I found, one from Yves
2107Lempereur and another from Peter Lewis. A similar description can be
2108found in \cite{rfc1741}.
2109
2110\begin{table}
2111\centering
2112\begin{tabular}{|r||c|c|c|c|c|c|c|c|}\hline
2113Data Value &+0&+1&+2&+3&+4&+5&+6&+7 \\ \hline\hline
2114         0 & !& "&\#&\$&\%&\&& '& ( \\ \hline
2115         8 & )& *& +& ,& -& 0& 1& 2 \\ \hline
2116        16 & 3& 4& 5& 6& 8& 9& @& A \\ \hline
2117        24 & B& C& D& E& F& G& H& I \\ \hline
2118        32 & J& K& L& M& N& P& Q& R \\ \hline
2119        40 & S& T& U& V& X& Y& Z& [ \\ \hline
2120        48 & `& a& b& c& d& e& f& h \\ \hline
2121        56 & i& j& k& l& m& p& q& r \\ \hline
2122\end{tabular}
2123\caption{Encoding Table for BinHex Encoding}
2124\label{tab-bh}
2125\end{table}
2126
2127A \emph{BinHex} file is a stream of characters, beginning and ending
2128with a colon `:'; intermediate line breaks are to be ignored by the
2129decoder. Each line but the last should be exactly 64 characters in
2130length. The last line may be shorter, and in a special case can also
2131be 65 characters long. The trailing colon must not stand alone, so if
2132the input data ends on an output line boundary, the colon is appended
2133to this line as 65th character. Thus a \emph{BinHex} begins with a
2134colon in the first column and ends with a colon \emph{not} in the
2135first column.
2136
2137The line before the beginning of encoded data (before the initial
2138`:') should contain the following verbatim text:\footnote{In fact, this
2139text is \emph{required} by certain decoding software.}
2140\begin{quote}
2141\begin{verbatim}(This file must be converted with BinHex 4.0)\end{verbatim}
2142\end{quote}
2143BinHex is another three-in-four encoding, and not surprisingly,
2144another different character table is used (table \ref{tab-bh}).
2145The documentation does not explicitely mention what is supposed to
2146happen if the original input data does not have a multiple of three
2147octets. But from reading between the lines, it looks like
2148``unnecessary'' characters (those that would result in equal
2149signs in Base64 encoding) are not printed.
2150
2151\begin{table}
2152\centering
2153\begin{tabular}{|cccccc|c|cccccc|} \hline
2154\multicolumn{6}{|c|}{Compressed Data} &&
2155\multicolumn{6}{|c|}{Uncompressed Data} \\ \hline\hline
215600 & 11 & 22 & 33 & 44 & 55 &$\mapsto$& 00 & 11 & 22 & 33 & 44 & 55 \\ \hline
215711 & 22 & 90 & 04 & 33 &    &$\mapsto$& 11 & 22 & 22 & 22 & 22 & 33 \\ \hline
215811 & 22 & 90 & 00 & 33 & 44 &$\mapsto$& 11 & 22 & 90 & 33 & 44 &    \\ \hline
21592B & 90 & 00 & 90 & 04 & 55 &$\mapsto$& 2B & 90 & 90 & 90 & 90 & 55 \\ \hline
2160\end{tabular}
2161\caption{BinHex RLE decoding}
2162\label{bh-rle}
2163\end{table}
2164
2165The encoded characters decode into a RLE-compressed bytestream, which
2166must be handled in the next step (of course, decoding and
2167decompressing are usually handled at the same time). A Run Length
2168Encoding simply replaces multiple subsequent occurrences of one octet
2169are replaced by the character, a special marker, and the repetition
2170count. BinHex uses the marker \texttt{0x90} (octal \texttt{0220},
2171decimal \texttt{128}). The octet sequence \texttt{0xff} \texttt{0x90}
2172\texttt{0x04} would decompress into four times \texttt{0xff}. If the
2173marker itself occurs, it must be ``escaped'' by the special sequence
2174\texttt{0x90} \texttt{0x00} (the marker with a repetition count of
21750). Table \ref{bh-rle} shows four more examples. Note the last
2176example, where the marker itself is repeated.
2177
2178\begin{figure}
2179\centering
2180\makebox{\input{binhex.tex}}
2181\caption{BinHex file structure}
2182\label{bh-parts}
2183\end{figure}
2184
2185The decompression results in a data stream which consists of three
2186parts, the header section, the data fork and the resource fork. Figure
2187\ref{bh-parts} shows how the sections are composed. The numbers above
2188each item indicate its size in octets. The header has the following
2189items:
2190\begin{description}
2191\item[n] The length of the filename in octets. This is a single octet,
2192so the maximum length of a filename is 255.
2193\item[Name] The filename, \emph{n} octets in length. The length does
2194not include the final nullbyte (which is actually the next
2195item).\footnote{The Filename may contain certain characters that are
2196invalid on MS-DOS systems, like space characters}
2197\item[0] This single nullbyte terminates the previous filename.
2198\item[Type] The Macintosh file type.
2199\item[Auth] The Macintosh ``creator'', the program which wrote the
2200original file. This and the previous item are used to start the right
2201program to edit or display a file. I have no idea what common values
2202are.
2203\item[Flags] Macintosh file flags. No idea what they are.
2204\item[Dlen] The number of octets in the data fork.
2205\item[Rlen] The number of octets in the resource fork.
2206\item[HC] CRC checksum of the header data.
2207\end{description}
2208
2209After the header, at offset $n+22$, follow the \emph{Dlen} octets of
2210the data fork and a CRC checksum of the data fork (offset
2211$n+Dlen+22$), then \emph{Rlen} octets of the resource
2212fork (offset $n+Dlen+24$) and a CRC checksum of the resource fork
2213(offset $n+Dlen+Rlen+24$). Note that the CRCs are present even if
2214the forks are empty.
2215
2216The three CRC checksums are calculated as described in the following
2217text, taken from Peter Lewis' description:
2218\begin{quote}
2219BinHex 4.0 uses a 16-bit CRC with a 0x1021 seed.  The general algorithm is
2220to take data 1 bit at a time and process it through the following:
2221\begin{enumerate}
2222\item Take the old CRC (use 0x0000 if there is no previous CRC) and shift it
2223to the left by 1.
2224\item Put the new data bit in the least significant position (right bit).
2225\item If the bit shifted out in (1) was a 1 then xor the CRC with 0x1021.
2226\item Loop back to (1) until all the data has been processed.
2227\end{enumerate}
2228\end{quote}
2229
2230This is the sample file in \emph{BinHex}. However, the encoder I used
2231replaced the LF characters from the original file with CR
2232characters. It probably noticed that the input file was plain text and
2233reformatted it to Mac-style text, but I consider this a software
2234bug. The assigned filename is ``test.txt''.
2235
2236\begin{small}
2237\begin{verbatim}
2238(This file must be converted with BinHex 4.0)
2239:#&4&8e3Z9&K8!&4&@&4dG(Kd!!!!!!#X!!!!!+3j9'KTFb"TFb"K)(4PFh3JCQP
2240XC5"QEh)JD@aXGA0dFQ&dD@jR)(4SC5"fBA*TEh9c$@9ZBfpND@jR)'ePG'K[C(-
2241Z)%aPG#Gc)'eKDf8JG'KTFb"dCAKd)'a[EQGPFL"dD'&Z$68h)'*jG'9c)(4[)(G
2242bBA!JE'PZCA-JGfPdD#"#BA0P0M3JC'&dB5`JG'p[,Je(FQ9PG'PZCh-X)%CbB@j
2243V)&"TE'K[CQ9b$B0A!!!!:
2244\end{verbatim}
2245\end{small}
2246
2247
2248\subsection{Quoted-Printable}
2249
2250The \emph{Quoted-Printable} encoding is, like \emph{Base64}, part of the
2251\emph{MIME} standard, described in \cite{rfc1521}. It is not suitable
2252for encoding arbitrary binary data, but is intended for ``data that
2253largely consists of octets that correspond to printable characters''.
2254It is widely in use in countries with an extended character set, where
2255characters like the German umlauts `\"a' or `\ss' are represented by
2256non-ASCII characters with the highest bit set.
2257
2258The essence of the encoding is that arbitrary octets can be
2259represented by an equal sign `=' followed by two hexadecimal
2260digits. The equal sign itself, for example, is encoded as ``=3D''.
2261
2262Quoted-Printable enforces a maximum line length of 76
2263characters. Longer lines can be wrapped using soft line breaks. If the
2264last character of an encoded line is an equal sign, the following line
2265break is to be ignored.
2266
2267It would indeed be possible to transfer arbitrary binary data using
2268this encoding, but care must be taken with line breaks, which are
2269converted from native format on the sender's side and back into native
2270format on the recipient's side. However, the native representations
2271may differ. But this alternative is hardly worth considering, since
2272for arbitrary data, \emph{quoted-printable} is substantially less
2273effective than \emph{Base64}.
2274
2275Please refer to the original document, \cite{rfc1521}, for a complete
2276discussion of the encoding.
2277
2278Here is how the example file could look like in Quoted-Printable
2279encoding.
2280
2281\begin{small}
2282\begin{verbatim}
2283This is a test file for =
2284illustrating the various
2285encoding methods=2e=20=
2286Let=27s make this text=
2287 longer than
2288=357 bytes to wrap lines =
2289with Base64 data=2c too=2e
2290Greetings=2c Frank Pilhofer
2291\end{verbatim}
2292\end{small}
2293
2294
2295\end{appendix}
2296\end{document}
2297