1The easel (esl) module implements a small set of functionality shared 2by all the modules: notably, the error-handling system. 3 4\section{Error handling conventions} 5 6Easel might be used in applications ranging from small command line 7utilities to complex graphical user interfaces and parallel 8systems. Simple and complex applications have different needs for how 9errors should be handled by a library. 10 11In a simple application, we don't want to write a lot of code to 12checking return codes for unexpected problems. We would prefer to have 13Easel crash out with an appropriate message to \ccode{stderr} -- after 14all, that's all a simple application would do anyway. 15 16On the other hand, there are certain problems that even the simplest 17command-line applications should handle gracefully. Errors involving 18user input (including typos in command line arguments, bad file 19formats, nonexistent files, or bad file permissions) are ``normal'' 20and should be expected. Users will do anything. 21 22In a complex application, we may want to guarantee that execution 23never terminates within a library routine. In this case, library 24functions always need to return control to the application, even in 25the most unexpected circumstances, so the application can fail 26gracefully. A failure in an Easel routine should not suddenly crash a 27whole graphical user environment, for example. Additionally, because a 28complex application may not even be associated with a terminal, a 29library cannot count on printing error messages directly to 30\ccode{stderr}. 31 32These considerations motivate Easel's error handling conventions. 33Most Easel procedures return an integer status code. An \ccode{eslOK} 34code indicates that the procedure succeeded. A nonzero code indicates 35an error. Easel distinguishes two kinds of errors: 36 37\begin{itemize} 38\item \textbf{Failures} include normal ``errors'' (like a read failing 39 when the end of a file is reached), and errors that are the user's 40 fault, such as bad input (which are also normal, because users will 41 do anything.) We say that failures are \textbf{returned} by Easel 42 functions. All applications should check the return status of any 43 Easel function that might return a failure code. Relatively few 44 Easel functions can return failure codes. The ones that do are 45 generally functions having to do with reading user input. 46 47\item \textbf{Exceptions} are errors that are the fault of Easel (bugs 48in my code) or your application (bugs in your code) or the system 49(resource allocation failures). We say that exceptions are 50\textbf{thrown} by Easel functions. By default, exceptions result in 51immediate termination of your program. Optionally, you may provide 52your own exception handler, in which case Easel functions may return 53nonzero exception codes (in addition to any nonzero failure codes). 54\end{itemize} 55 56The documentation for each Easel function lists what failure codes it 57may return, as well as what exception codes it may throw (if a 58nonfatal exception handler has been registered), in addition to the 59\ccode{eslOK} normal status code. The list of possible status codes is 60shown in Table~\ref{tbl:statuscodes}. There is no intrinsic 61distinction between failure codes and exception codes. Codes that 62indicate failures in one function may indicate exceptions in another 63function. 64 65\begin{table} 66\begin{center} 67\input{cexcerpts/statuscodes} 68\end{center} 69\caption{List of all status codes that might be returned by Easel functions.} 70\label{tbl:statuscodes} 71\end{table} 72 73Not all Easel functions return status codes. \ccode{*\_Create()} 74functions that allocate and create new objects usually follow a 75convention of returning a valid pointer on success, and \ccode{NULL} 76on failure; these are functions that only fail by memory allocation 77failure. Destructor functions (\ccode{*\_Destroy()}) always return 78\ccode{void}, and must have no points of failure of their own, because 79destructors can be called when we're already handling an 80exception. Functions with names containing \ccode{Is}, such as 81\ccode{esl\_abc\_XIsValid()}, are tests that return \ccode{TRUE} or 82\ccode{FALSE}. Finally, there are some ``true'' functions that simply 83return an answer, rather than a status code; these must be functions 84that have no points of failure. 85 86\subsection{Failure messages} 87 88When failures occur, often the failure status code is sufficient for 89your application to know what went wrong. For instance, \ccode{eslEOF} 90means end-of-file, so your application might report \ccode{"premature 91end of file"} if it receives such a status code unexpectedly. But for 92failures involving a file format syntax problem (for instance) a terse 93\ccode{eslESYNTAX} return code is not as useful as knowing 94\ccode{"Parse failed at line 42 of file foo.data, where I expected to 95see an integer, but I saw nothing"}. When your application might want 96more information to format an informative failure message for the 97user, the Easel API provides (somewhere) a message buffer called 98\ccode{errbuf[]}. 99 100In many cases, file parsers in Easel are encapsulated in objects. In 101these cases, the object itself allocates an \ccode{errbuf[]} message 102string. (For instance, see the \eslmod{sqio} module and its 103\ccode{ESL\_SQFILE} object for sequence file parsing.) In a few 104cases, the \ccode{errbuf[]} is part of the procedure's call API, and 105space is provided by the caller. In such cases, the caller either 106passes \ccode{NULL} (no failure message is requested) or a pointer to 107allocated space for at least \ccode{eslERRBUFSIZE} chars. (For 108instance, see the \eslmod{tree} module and the 109\ccode{esl\_tree\_ReadNewick()} parser.) 110 111Easel uses \ccode{sprintf()} to format the messages in 112\ccode{errbuf[]}'s. Each individual call guarantees that the size of 113its message cannot overflow \ccode{eslERRBUFSIZE} chars, so none of 114these \ccode{sprintf()} calls represent possible security 115vulnerabilities (buffer overrun attacks). 116 117 118\subsection{Exception handling} 119 120Easel's default exception handler prints a message to \ccode{stderr} 121and aborts execution of your program, as in: 122 123\begin{cchunk} 124 Easel exception: Memory allocation failed. 125 Aborted at file sqio.c, line 42. 126\end{cchunk} 127 128Therefore, by default, Easel handles its own exceptions internally, 129and exception status codes are not returned to your 130application. Simple applications don't need to worry about checking 131for exceptions. 132 133If your application wants to handle exceptions itself -- for instance, 134if you want a guarantee that execution will never terminate from 135within Easel -- or even if you simply want to change the format of 136these messages, you can register a custom exception handler which will 137catch the information from Easel and react appropriately. If your 138exception handler prints a message and exits, Easel will still just 139abort without returning exception codes. If your exception handler is 140nonfatal (returning \ccode{void}), Easel procedures then percolate the 141exception code up through the call stack until the exception code is 142returned to your application. 143 144To provide your own exception handler, you define your exception 145handler with the following prototype: 146 147\begin{cchunk} 148extern void my_exception_handler(int code, char *file, int line, char *format, va_list arg); 149\end{cchunk} 150 151An example implementation of a nonfatal exception handler: 152 153\begin{cchunk} 154#include <stdarg.h> 155 156void 157my_exception_handler(int code, char *file, int line, char *format, va_list arg) 158{ 159 fprintf(stderr, ``Easel threw an exception (code %d):\n'', code); 160 if (format != NULL) vfprintf(stderr, format, arg); 161 fprintf(stderr, ``at line %d, file %s\b'', line, file); 162 return; 163} 164\end{cchunk} 165 166The \ccode{code}, \ccode{file}, and \ccode{line} are always 167present. The formatted message (the \ccode{format} and \ccode{va\_list 168arg}) is optional; the \ccode{format} might be 169\ccode{NULL}. (\ccode{NULL} messages are used when percolating 170exceptions up a stack trace, for example.) 171 172Then, to register your exception handler, you call 173\ccode{esl\_exception\_SetHandler(\&my\_error\_handler)} in your 174application. Normally you would do this before calling any other Easel 175functions. However, in principle, you can change error handlers at any 176time. You can also restore the default handler at any time with 177\ccode{esl\_exception\_RestoreDefaultHandler()}. 178 179The implementation of the exception handler relies on a static 180function pointer that is not threadsafe. If you are writing a threaded 181program, you need to make sure that multiple threads do not try to 182change the handler at the same time. 183 184Because Easel functions call other Easel functions, the function that 185first throws an exception may not be the function that your 186application called. If you implement a nonfatal handler, an exception 187may result in a partial or complete stack trace of exceptions, as the 188original exception percolates back to your application. Your exception 189handler should be able to deal with a stack trace. The first exception 190code and message will be the most relevant. Subsequent codes and 191messages arise from that exception percolating upwards. 192 193For example, a sophisticated replacement exception handler might push 194each code/message pair into a FIFO queue. When your application 195receives an exception code from an Easel call, your application can 196might then access this queue, and see where the exception occurred in 197Easel, and what messages Easel left for you. A less sophisticated 198replacement exception handler might just register the first 199code/message pair, and ignore the subsequent exceptions from 200percolating up the stack trace. Note the difference between the 201exception handler that you register with Easel (which operates inside 202Easel, and must obey Easel's conventions) and any error handling you 203do in your own application after Easel returns a nonzero status code 204to you (which is your own business). 205 206Although each function's documentation \emph{in principle} lists all 207thrown exceptions, \emph{in practice}, you should not trust this 208list. Because of exceptions percolating up from other Easel calls, it 209is too easy to forget to document all possible exception 210codes.\footnote{Someday we should combine a static code analyzer with 211a script that understands Easel's exception conventions, and automate 212the enumeration of all possible codes.} If you are catching 213exceptions, you should program defensively here, and always have a 214failsafe catch for any nonzero return status. For example, a minimal 215try/catch idiom for an application calling a Easel function is 216something like: 217 218\begin{cchunk} 219 int status; 220 if ((status = esl_foo_function()) != eslOK) my_failure(); 221\end{cchunk} 222 223Or, a little more complex one that catches some specific errors, but 224has a failsafe for everything else, is: 225 226\begin{cchunk} 227 int status; 228 status = esl_foo_function(); 229 if (status == eslEMEM) my_failure("Memory allocation failure"); 230 else if (status != eslOK) my_failure("Unexpected exception %d\n\", status); 231\end{cchunk} 232 233 234\subsection{Violations} 235 236Internally, Easel also distinguishes a third class of error, termed a 237\textbf{fatal violation}. Violations never arise in production code; 238they are used to catch bugs during development and testing. Violations 239always result in immediate program termination. They are generated by 240two mechanisms: from assertions that can be optionally enabled in 241development code, or from test harnesses that call the always-fatal 242\ccode{esl\_fatal()} function when they detect a problem they're 243testing for. 244 245 246\subsection{Internal API for error handling} 247 248You only need to understand this section if you want to understand 249Easel's source code (or other code that uses Easel conventions, like 250HMMER), or if you want to use Easel's error conventions in your own 251source code. 252 253The potentially tricky design issue is the following. One the one 254hand, you want to be able to return an error or throw an exception 255``quickly'' (in less than a line of code). On the other hand, it might 256require several lines of code to free any resources, set an 257appropriate return state, and set the appropriate nonzero status code 258before leaving the function. 259 260Easel uses the following error-handling macros: 261 262\begin{center} 263{\small 264\begin{tabular}{|ll|}\hline 265\ccode{ESL\_FAIL(code, errbuf, mesg, ...)} & Format errbuf, return failure code. \\ 266\ccode{ESL\_EXCEPTION(code, mesg, ...)} & Throw an exception, return exception code. \\ 267\ccode{ESL\_XFAIL(code, errbuf, mesg, ...)} & A failure message, with cleanup convention.\\ 268\ccode{ESL\_XEXCEPTION(code, mesg, ...)} & An exception, with cleanup convention.\\ 269\hline 270\end{tabular} 271} 272\end{center} 273 274They are implementated in \ccode{easel.h} as: 275 276\input{cexcerpts/error_macros} 277 278The \ccode{ESL\_FAIL} and \ccode{ESL\_XFAIL} macros are only used when 279a failure message needs to be formatted. For the simpler case where we 280just return an error code, Easel simply uses \ccode{return code;} or 281\ccode{status = code; goto ERROR;}, respectively. 282 283The \ccode{X} versions, with the cleanup convention, are sure to 284offend some programmers' sensibilities. They require the function to 285provide an \ccode{int status} variable in scope, and they require an 286\ccode{ERROR:} target for a \ccode{goto}. But if you can stomach that, 287they provide for a fairly clean idiom for catching exceptions and 288cleaning up, and cleanly setting different return variable states on 289success versus failure, as illustrated by this pseudoexample: 290 291\begin{cchunk} 292int 293foo(char **ret_buf, char **ret_fp) 294{ 295 int status; 296 char *buf = NULL; 297 FILE *fp = NULL; 298 299 if ((buf = malloc(100)) == NULL) ESL_XEXCEPTION(eslEMEM, "malloc failed"); 300 if ((fp = fopen("foo")) == NULL) ESL_XEXCEPTION(eslENOTFOUND, "file open failed"); 301 302 *ret_buf = buf; 303 *ret_fp = fp; 304 return eslOK; 305 306 ERROR: 307 if (buf != NULL) free(buf); *ret_buf = NULL; 308 if (fp != NULL) fclose(fp); *ret_fp = NULL; 309 return status; 310} 311\end{cchunk} 312 313Additionally, for memory allocation and reallocation, Easel implements 314two macros \ccode{ESL\_ALLOC()} and \ccode{ESL\_RALLOC()}, which 315encapsulate standard \ccode{malloc()} and \ccode{realloc()} calls 316inside Easel's exception-throwing convention. 317 318 319\vspace*{\fill} 320\begin{quote} 321\emph{Only a complete outsider could ask your question. Are there 322control authorities? There are nothing but control authorities. Of 323course, their purpose is not to uncover errors in the ordinary meaning 324of the word, since errors do not occur and even when an error does in 325fact occur, as in your case, who can say conclusively that it is an 326error?}\\ \hspace*{\fill} -- Franz Kafka, \emph{The Castle} 327\end{quote} 328 329 330\section{Memory management} 331 332 333\section{Replacements for C library functions} 334 335 336\section{Standard banner for Easel miniapplications} 337 338 339\section{File and path name manipulation} 340 341 342\subsection{Secure temporary files} 343 344A program may need to write and read temporary files. Many of the 345methods for creating temporary files, even using standard library 346calls, are known to create exploitable security holes 347\citep{Wheeler03,ChenDeanWagner04}. 348 349Easel provides a secure and portable POSIX procedure for obtaining an 350open temporary file handle, \ccode{esl\_tmpfile()}. This replaces the 351ANSI C \ccode{tmpfile()} function, which is said to be insecurely 352implemented on some platforms. Because closing and reopening a 353temporary file can create an exploitable race condition under certain 354circumstances, \ccode{esl\_tmpfile()} does not return the name of the 355invisible file it creates, only an open \ccode{FILE *} handle to 356it. The tmpfile is not persistent, meaning that it automatically 357vanishes when the \ccode{FILE *} handle is closed. The tmpfile is 358created in the usual system world-writable temporary directory, as 359indicated by \ccode{TMPDIR} or \ccode{TMP} environment variables, or 360\ccode{/tmp} if neither environment variable is defined. 361 362Still, it is sometimes useful, even necessary, to close and reopen a 363temporary file. For example, Easel's own test suites generate a 364variety of input files for testing input parsers. Easel also provides 365the \ccode{esl\_tmpfile\_named()} procedure for creating a persistent 366tmpfile, which returns both an open \ccode{<FILE *>} handle and the 367name of the file. Because the tmpfile name is known, the file may be 368closed and reopened. \ccode{esl\_tmpfile\_named()} creates its files 369relative to the current working directory, not in \ccode{TMPDIR}, in 370order to reduce the chances of creating the file in a shared directory 371where a race condition might be exploited. Nonetheless, secure use of 372\ccode{esl\_tmpfile\_named()} requires that you must only reopen a 373tmpfile for reading only, not for writing, and moreover, you must not 374trust the contents. (It may be possible for an attacker to replace 375the tmpfile with a symlink to another file.) 376 377An example that shows both tmpfile mechanisms: 378 379\input{cexcerpts/easel_example_tmpfiles} 380 381\section{Internals} 382 383\subsection{Input maps} 384 385An \esldef{input map} is for converting input ASCII symbols to 386internal encodings. It is a many-to-one mapping of the 128 7-bit ASCII 387symbol codes (0..127) onto new ASCII symbol codes. It is defined as 388an \ccode{unsigned char inmap[128]} or a \ccode{unsigned char *} 389allocated for 128 entries. 390 391Input maps are used in two contexts: for filtering ASCII text input 392into internal text strings, and for converting ASCII input or internal 393ASCII strings into internal digitized sequences (an \eslmod{alphabet} 394object contains an input map that it uses for digitization). 395 396The rationale for input maps is the following. The ASCII strings that 397represent biosequence data require frequent massaging. An input file 398might have sequence data mixed up with numerical coordinates and 399punctuation for human readability. We might want to distinguish 400characters that represent residues (that should be input) from 401characters for coordinates and punctuation (that should be ignored) 402from characters that aren't supposed to be present at all (that should 403trigger an error or warning). Also, in representing a sequence string 404internally, we might want to map the symbols in an input string onto a 405smaller internal alphabet. For example, we might want to be 406case-insensitive (allow both T and t to represent thymine), or we 407might want to allow an input T to mean U in a program that deals with 408RNA sequence analysis, so that input files can either contain RNA or 409DNA sequence data. Easel reuses the input map concept in routines 410involved in reading and representing input character sequences, for 411example in the \eslmod{alphabet}, \eslmod{sqio}, and \eslmod{msa} 412modules. 413 414