1The easel (esl) module implements a small set of functionality shared
2by all the modules: notably, the error-handling system.
3
4\section{Error handling conventions}
5
6Easel might be used in applications ranging from small command line
7utilities to complex graphical user interfaces and parallel
8systems. Simple and complex applications have different needs for how
9errors should be handled by a library.
10
11In a simple application, we don't want to write a lot of code to
12checking return codes for unexpected problems. We would prefer to have
13Easel crash out with an appropriate message to \ccode{stderr} -- after
14all, that's all a simple application would do anyway.
15
16On the other hand, there are certain problems that even the simplest
17command-line applications should handle gracefully. Errors involving
18user input (including typos in command line arguments, bad file
19formats, nonexistent files, or bad file permissions) are ``normal''
20and should be expected. Users will do anything.
21
22In a complex application, we may want to guarantee that execution
23never terminates within a library routine. In this case, library
24functions always need to return control to the application, even in
25the most unexpected circumstances, so the application can fail
26gracefully. A failure in an Easel routine should not suddenly crash a
27whole graphical user environment, for example. Additionally, because a
28complex application may not even be associated with a terminal, a
29library cannot count on printing error messages directly to
30\ccode{stderr}.
31
32These considerations motivate Easel's error handling conventions.
33Most Easel procedures return an integer status code. An \ccode{eslOK}
34code indicates that the procedure succeeded. A nonzero code indicates
35an error. Easel distinguishes two kinds of errors:
36
37\begin{itemize}
38\item \textbf{Failures} include normal ``errors'' (like a read failing
39  when the end of a file is reached), and errors that are the user's
40  fault, such as bad input (which are also normal, because users will
41  do anything.) We say that failures are \textbf{returned} by Easel
42  functions. All applications should check the return status of any
43  Easel function that might return a failure code. Relatively few
44  Easel functions can return failure codes. The ones that do are
45  generally functions having to do with reading user input.
46
47\item \textbf{Exceptions} are errors that are the fault of Easel (bugs
48in my code) or your application (bugs in your code) or the system
49(resource allocation failures). We say that exceptions are
50\textbf{thrown} by Easel functions. By default, exceptions result in
51immediate termination of your program. Optionally, you may provide
52your own exception handler, in which case Easel functions may return
53nonzero exception codes (in addition to any nonzero failure codes).
54\end{itemize}
55
56The documentation for each Easel function lists what failure codes it
57may return, as well as what exception codes it may throw (if a
58nonfatal exception handler has been registered), in addition to the
59\ccode{eslOK} normal status code. The list of possible status codes is
60shown in Table~\ref{tbl:statuscodes}. There is no intrinsic
61distinction between failure codes and exception codes. Codes that
62indicate failures in one function may indicate exceptions in another
63function.
64
65\begin{table}
66\begin{center}
67\input{cexcerpts/statuscodes}
68\end{center}
69\caption{List of all status codes that might be returned by Easel functions.}
70\label{tbl:statuscodes}
71\end{table}
72
73Not all Easel functions return status codes. \ccode{*\_Create()}
74functions that allocate and create new objects usually follow a
75convention of returning a valid pointer on success, and \ccode{NULL}
76on failure; these are functions that only fail by memory allocation
77failure. Destructor functions (\ccode{*\_Destroy()}) always return
78\ccode{void}, and must have no points of failure of their own, because
79destructors can be called when we're already handling an
80exception. Functions with names containing \ccode{Is}, such as
81\ccode{esl\_abc\_XIsValid()}, are tests that return \ccode{TRUE} or
82\ccode{FALSE}. Finally, there are some ``true'' functions that simply
83return an answer, rather than a status code; these must be functions
84that have no points of failure.
85
86\subsection{Failure messages}
87
88When failures occur, often the failure status code is sufficient for
89your application to know what went wrong. For instance, \ccode{eslEOF}
90means end-of-file, so your application might report \ccode{"premature
91end of file"} if it receives such a status code unexpectedly. But for
92failures involving a file format syntax problem (for instance) a terse
93\ccode{eslESYNTAX} return code is not as useful as knowing
94\ccode{"Parse failed at line 42 of file foo.data, where I expected to
95see an integer, but I saw nothing"}. When your application might want
96more information to format an informative failure message for the
97user, the Easel API provides (somewhere) a message buffer called
98\ccode{errbuf[]}.
99
100In many cases, file parsers in Easel are encapsulated in objects. In
101these cases, the object itself allocates an \ccode{errbuf[]} message
102string. (For instance, see the \eslmod{sqio} module and its
103\ccode{ESL\_SQFILE} object for sequence file parsing.)  In a few
104cases, the \ccode{errbuf[]} is part of the procedure's call API, and
105space is provided by the caller. In such cases, the caller either
106passes \ccode{NULL} (no failure message is requested) or a pointer to
107allocated space for at least \ccode{eslERRBUFSIZE} chars. (For
108instance, see the \eslmod{tree} module and the
109\ccode{esl\_tree\_ReadNewick()} parser.)
110
111Easel uses \ccode{sprintf()} to format the messages in
112\ccode{errbuf[]}'s. Each individual call guarantees that the size of
113its message cannot overflow \ccode{eslERRBUFSIZE} chars, so none of
114these \ccode{sprintf()} calls represent possible security
115vulnerabilities (buffer overrun attacks).
116
117
118\subsection{Exception handling}
119
120Easel's default exception handler prints a message to \ccode{stderr}
121and aborts execution of your program, as in:
122
123\begin{cchunk}
124   Easel exception: Memory allocation failed.
125   Aborted at file sqio.c, line 42.
126\end{cchunk}
127
128Therefore, by default, Easel handles its own exceptions internally,
129and exception status codes are not returned to your
130application. Simple applications don't need to worry about checking
131for exceptions.
132
133If your application wants to handle exceptions itself -- for instance,
134if you want a guarantee that execution will never terminate from
135within Easel -- or even if you simply want to change the format of
136these messages, you can register a custom exception handler which will
137catch the information from Easel and react appropriately. If your
138exception handler prints a message and exits, Easel will still just
139abort without returning exception codes. If your exception handler is
140nonfatal (returning \ccode{void}), Easel procedures then percolate the
141exception code up through the call stack until the exception code is
142returned to your application.
143
144To provide your own exception handler, you define your exception
145handler with the following prototype:
146
147\begin{cchunk}
148extern void my_exception_handler(int code, char *file, int line, char *format, va_list arg);
149\end{cchunk}
150
151An example implementation of a nonfatal exception handler:
152
153\begin{cchunk}
154#include <stdarg.h>
155
156void
157my_exception_handler(int code, char *file, int line, char *format, va_list arg)
158{
159  fprintf(stderr, ``Easel threw an exception (code %d):\n'', code);
160  if (format != NULL) vfprintf(stderr, format, arg);
161  fprintf(stderr, ``at line %d, file %s\b'', line, file);
162  return;
163}
164\end{cchunk}
165
166The \ccode{code}, \ccode{file}, and \ccode{line} are always
167present. The formatted message (the \ccode{format} and \ccode{va\_list
168arg}) is optional; the \ccode{format} might be
169\ccode{NULL}. (\ccode{NULL} messages are used when percolating
170exceptions up a stack trace, for example.)
171
172Then, to register your exception handler, you call
173\ccode{esl\_exception\_SetHandler(\&my\_error\_handler)} in your
174application. Normally you would do this before calling any other Easel
175functions. However, in principle, you can change error handlers at any
176time. You can also restore the default handler at any time with
177\ccode{esl\_exception\_RestoreDefaultHandler()}.
178
179The implementation of the exception handler relies on a static
180function pointer that is not threadsafe. If you are writing a threaded
181program, you need to make sure that multiple threads do not try to
182change the handler at the same time.
183
184Because Easel functions call other Easel functions, the function that
185first throws an exception may not be the function that your
186application called.  If you implement a nonfatal handler, an exception
187may result in a partial or complete stack trace of exceptions, as the
188original exception percolates back to your application. Your exception
189handler should be able to deal with a stack trace. The first exception
190code and message will be the most relevant. Subsequent codes and
191messages arise from that exception percolating upwards.
192
193For example, a sophisticated replacement exception handler might push
194each code/message pair into a FIFO queue. When your application
195receives an exception code from an Easel call, your application can
196might then access this queue, and see where the exception occurred in
197Easel, and what messages Easel left for you. A less sophisticated
198replacement exception handler might just register the first
199code/message pair, and ignore the subsequent exceptions from
200percolating up the stack trace. Note the difference between the
201exception handler that you register with Easel (which operates inside
202Easel, and must obey Easel's conventions) and any error handling you
203do in your own application after Easel returns a nonzero status code
204to you (which is your own business).
205
206Although each function's documentation \emph{in principle} lists all
207thrown exceptions, \emph{in practice}, you should not trust this
208list. Because of exceptions percolating up from other Easel calls, it
209is too easy to forget to document all possible exception
210codes.\footnote{Someday we should combine a static code analyzer with
211a script that understands Easel's exception conventions, and automate
212the enumeration of all possible codes.} If you are catching
213exceptions, you should program defensively here, and always have a
214failsafe catch for any nonzero return status. For example, a minimal
215try/catch idiom for an application calling a Easel function is
216something like:
217
218\begin{cchunk}
219     int status;
220     if ((status = esl_foo_function()) != eslOK)  my_failure();
221\end{cchunk}
222
223Or, a little more complex one that catches some specific errors, but
224has a failsafe for everything else, is:
225
226\begin{cchunk}
227     int status;
228     status = esl_foo_function();
229     if      (status == eslEMEM) my_failure("Memory allocation failure");
230     else if (status != eslOK)   my_failure("Unexpected exception %d\n\", status);
231\end{cchunk}
232
233
234\subsection{Violations}
235
236Internally, Easel also distinguishes a third class of error, termed a
237\textbf{fatal violation}. Violations never arise in production code;
238they are used to catch bugs during development and testing. Violations
239always result in immediate program termination. They are generated by
240two mechanisms: from assertions that can be optionally enabled in
241development code, or from test harnesses that call the always-fatal
242\ccode{esl\_fatal()} function when they detect a problem they're
243testing for.
244
245
246\subsection{Internal API for error handling}
247
248You only need to understand this section if you want to understand
249Easel's source code (or other code that uses Easel conventions, like
250HMMER), or if you want to use Easel's error conventions in your own
251source code.
252
253The potentially tricky design issue is the following. One the one
254hand, you want to be able to return an error or throw an exception
255``quickly'' (in less than a line of code). On the other hand, it might
256require several lines of code to free any resources, set an
257appropriate return state, and set the appropriate nonzero status code
258before leaving the function.
259
260Easel uses the following error-handling macros:
261
262\begin{center}
263{\small
264\begin{tabular}{|ll|}\hline
265\ccode{ESL\_FAIL(code, errbuf, mesg, ...)}   & Format errbuf, return failure code. \\
266\ccode{ESL\_EXCEPTION(code, mesg, ...)}      & Throw an exception, return exception code. \\
267\ccode{ESL\_XFAIL(code, errbuf, mesg, ...)}  & A failure message, with cleanup convention.\\
268\ccode{ESL\_XEXCEPTION(code, mesg, ...)}     & An exception, with cleanup convention.\\
269\hline
270\end{tabular}
271}
272\end{center}
273
274They are implementated in \ccode{easel.h} as:
275
276\input{cexcerpts/error_macros}
277
278The \ccode{ESL\_FAIL} and \ccode{ESL\_XFAIL} macros are only used when
279a failure message needs to be formatted. For the simpler case where we
280just return an error code, Easel simply uses \ccode{return code;} or
281\ccode{status = code; goto ERROR;}, respectively.
282
283The \ccode{X} versions, with the cleanup convention, are sure to
284offend some programmers' sensibilities. They require the function to
285provide an \ccode{int status} variable in scope, and they require an
286\ccode{ERROR:} target for a \ccode{goto}. But if you can stomach that,
287they provide for a fairly clean idiom for catching exceptions and
288cleaning up, and cleanly setting different return variable states on
289success versus failure, as illustrated by this pseudoexample:
290
291\begin{cchunk}
292int
293foo(char **ret_buf, char **ret_fp)
294{
295    int status;
296    char *buf = NULL;
297    FILE *fp  = NULL;
298
299    if ((buf = malloc(100))  == NULL) ESL_XEXCEPTION(eslEMEM,      "malloc failed");
300    if ((fp  = fopen("foo")) == NULL) ESL_XEXCEPTION(eslENOTFOUND, "file open failed");
301
302    *ret_buf = buf;
303    *ret_fp  = fp;
304    return eslOK;
305
306  ERROR:
307    if (buf != NULL) free(buf);  *ret_buf = NULL;
308    if (fp  != NULL) fclose(fp); *ret_fp  = NULL;
309    return status;
310}
311\end{cchunk}
312
313Additionally, for memory allocation and reallocation, Easel implements
314two macros \ccode{ESL\_ALLOC()} and \ccode{ESL\_RALLOC()}, which
315encapsulate standard \ccode{malloc()} and \ccode{realloc()} calls
316inside Easel's exception-throwing convention.
317
318
319\vspace*{\fill}
320\begin{quote}
321\emph{Only a complete outsider could ask your question. Are there
322control authorities? There are nothing but control authorities. Of
323course, their purpose is not to uncover errors in the ordinary meaning
324of the word, since errors do not occur and even when an error does in
325fact occur, as in your case, who can say conclusively that it is an
326error?}\\ \hspace*{\fill} -- Franz Kafka, \emph{The Castle}
327\end{quote}
328
329
330\section{Memory management}
331
332
333\section{Replacements for C library functions}
334
335
336\section{Standard banner for Easel miniapplications}
337
338
339\section{File and path name manipulation}
340
341
342\subsection{Secure temporary files}
343
344A program may need to write and read temporary files.  Many of the
345methods for creating temporary files, even using standard library
346calls, are known to create exploitable security holes
347\citep{Wheeler03,ChenDeanWagner04}.
348
349Easel provides a secure and portable POSIX procedure for obtaining an
350open temporary file handle, \ccode{esl\_tmpfile()}. This replaces the
351ANSI C \ccode{tmpfile()} function, which is said to be insecurely
352implemented on some platforms.  Because closing and reopening a
353temporary file can create an exploitable race condition under certain
354circumstances, \ccode{esl\_tmpfile()} does not return the name of the
355invisible file it creates, only an open \ccode{FILE *} handle to
356it. The tmpfile is not persistent, meaning that it automatically
357vanishes when the \ccode{FILE *} handle is closed. The tmpfile is
358created in the usual system world-writable temporary directory, as
359indicated by \ccode{TMPDIR} or \ccode{TMP} environment variables, or
360\ccode{/tmp} if neither environment variable is defined.
361
362Still, it is sometimes useful, even necessary, to close and reopen a
363temporary file. For example, Easel's own test suites generate a
364variety of input files for testing input parsers.  Easel also provides
365the \ccode{esl\_tmpfile\_named()} procedure for creating a persistent
366tmpfile, which returns both an open \ccode{<FILE *>} handle and the
367name of the file. Because the tmpfile name is known, the file may be
368closed and reopened.  \ccode{esl\_tmpfile\_named()} creates its files
369relative to the current working directory, not in \ccode{TMPDIR}, in
370order to reduce the chances of creating the file in a shared directory
371where a race condition might be exploited. Nonetheless, secure use of
372\ccode{esl\_tmpfile\_named()} requires that you must only reopen a
373tmpfile for reading only, not for writing, and moreover, you must not
374trust the contents.  (It may be possible for an attacker to replace
375the tmpfile with a symlink to another file.)
376
377An example that shows both tmpfile mechanisms:
378
379\input{cexcerpts/easel_example_tmpfiles}
380
381\section{Internals}
382
383\subsection{Input maps}
384
385An \esldef{input map} is for converting input ASCII symbols to
386internal encodings. It is a many-to-one mapping of the 128 7-bit ASCII
387symbol codes (0..127) onto new ASCII symbol codes. It is defined as
388an \ccode{unsigned char inmap[128]} or a \ccode{unsigned char *}
389allocated for 128 entries.
390
391Input maps are used in two contexts: for filtering ASCII text input
392into internal text strings, and for converting ASCII input or internal
393ASCII strings into internal digitized sequences (an \eslmod{alphabet}
394object contains an input map that it uses for digitization).
395
396The rationale for input maps is the following. The ASCII strings that
397represent biosequence data require frequent massaging. An input file
398might have sequence data mixed up with numerical coordinates and
399punctuation for human readability. We might want to distinguish
400characters that represent residues (that should be input) from
401characters for coordinates and punctuation (that should be ignored)
402from characters that aren't supposed to be present at all (that should
403trigger an error or warning). Also, in representing a sequence string
404internally, we might want to map the symbols in an input string onto a
405smaller internal alphabet. For example, we might want to be
406case-insensitive (allow both T and t to represent thymine), or we
407might want to allow an input T to mean U in a program that deals with
408RNA sequence analysis, so that input files can either contain RNA or
409DNA sequence data.  Easel reuses the input map concept in routines
410involved in reading and representing input character sequences, for
411example in the \eslmod{alphabet}, \eslmod{sqio}, and \eslmod{msa}
412modules.
413
414