• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

doc/home/max/projects/empy/doc/em/H03-May-2022-

COPYINGH A D22-Jan-201425.8 KiB505418

READMEH A D13-Feb-2017104.9 KiB2,3161,823

em.pyH A D13-Feb-2017111.7 KiB3,3042,707

sample.benchH A D23-Jan-20145.9 KiB165150

sample.emH A D23-Jan-201411.4 KiB337309

setup.pyH A D13-Feb-20171.3 KiB3628

test.shH A D23-Jan-2014533 3022

README

1Summary
2
3    A powerful and robust templating system for Python.
4
5
6Overview
7
8    EmPy is a system for embedding Python expressions and statements
9    in template text; it takes an EmPy source file, processes it, and
10    produces output.  This is accomplished via expansions, which are
11    special signals to the EmPy system and are set off by a special
12    prefix (by default the at sign, '@').  EmPy can expand arbitrary
13    Python expressions and statements in this way, as well as a
14    variety of special forms.  Textual data not explicitly delimited
15    in this way is sent unaffected to the output, allowing Python to
16    be used in effect as a markup language.  Also supported are
17    callbacks via hooks, recording and playback via diversions, and
18    dynamic, chainable filters.  The system is highly configurable via
19    command line options and embedded commands.
20
21    Expressions are embedded in text with the '@(...)' notation;
22    variations include conditional expressions with '@(...?...!...)'
23    and the ability to handle thrown exceptions with '@(...$...)'.  As
24    a shortcut, simple variables and expressions can be abbreviated as
25    '@variable', '@object.attribute', '@function(arguments)',
26    '@sequence' [index], and combinations.  Full-fledged statements
27    are embedded with '@{...}'.  Control flow in terms of conditional
28    or repeated expansion is available with '@[...]'.  A '@' followed
29    by a whitespace character (including a newline) expands to
30    nothing, allowing string concatenations and line continuations.
31    Comments are indicated with '@#' and consume the rest of the line,
32    up to and including the trailing newline.  '@%' indicate
33    "significators," which are special forms of variable assignment
34    intended to specify per-file identification information in a
35    format which is easy to parse externally.  Context name and line
36    number changes can be done with '@?' and '@!' respectively.
37    '@<...>' markups are customizeable by the user and can be used for
38    any desired purpose.  Escape sequences analogous to those in C can
39    be specified with '@\...', and finally a '@@' sequence expands to
40    a single literal at sign.
41
42
43Getting the software
44
45    The current version of empy is 3.3.3.
46
47    The latest version of the software is available in a tarball here:
48    "http://www.alcyone.com/software/empy/empy-latest.tar.gz",
49    http://www.alcyone.com/software/empy/empy-latest.tar.gz.
50
51    The official URL for this Web site is
52    "http://www.alcyone.com/software/empy/",
53    http://www.alcyone.com/software/empy/.
54
55
56Requirements
57
58    EmPy should work with any version of Python from 2.4 onward,
59    including 3.x.
60
61
62License
63
64    This code is released under the "LGPL",
65    http://www.gnu.org/copyleft/lesser.html.
66
67
68Mailing lists
69
70    There are two EmPy related mailing lists available.  The first is
71    a receive-only, very low volume list for important announcements
72    (including releases).  To subscribe, send an email to
73    "empy-announce-list-subscribe@alcyone.com",
74    mailto:empy-announce-list-subscribe@alcyone.com.
75
76    The second is a general discussion list for topics related to
77    EmPy, and is open for everyone to contribute; announcements
78    related to EmPy will also be made on this list.  The author of
79    EmPy (and any future developers) will also be on the list, so it
80    can be used not only to discuss EmPy features with other users,
81    but also to ask questions of the author(s).  To subscribe, send an
82    email to "empy-list-subscribe@alcyone.com",
83    mailto:empy-list-subscribe@alcyone.com.
84
85
86Basics
87
88    EmPy is intended for embedding Python code in otherwise
89    unprocessed text.  Source files are processed, and the results are
90    written to an output file.  Normal text is sent to the output
91    unchanged, but markups are processed, expanded to their results,
92    and then written to the output file as strings (that is, with the
93    'str' function, not 'repr').  The act of processing EmPy source
94    and handling markups is called "expansion."
95
96    Code that is processed is executed exactly as if it were entered
97    into the Python interpreter; that is, it is executed with the
98    equivalent of 'eval' (for expressions) and 'exec' (for
99    statements).  EmPy is intended to be a very thin (though powerful)
100    layer on top of a running Python system; Python and EmPy files can
101    be mixed together (via command line options) without
102    complications.
103
104    By default the embedding prefix is the at sign ('@'), which
105    appears neither in valid Python code nor commonly in arbitrary
106    texts; it can be overridden with the -p option (or with the
107    'empy.setPrefix' function).  The prefix indicates to the EmPy
108    interpreter that a special sequence follows and should be
109    processed rather than sent to the output untouched (to indicate a
110    literal at sign, it can be doubled as in '@@').
111
112    When the interpreter starts processing its target file, no modules
113    are imported by default, save the 'empy' pseudomodule (see below),
114    which is placed in the globals; the 'empy' pseudomodule is
115    associated with a particular interpreter -- in fact, they are the
116    same object -- and it is important that it not be removed from
117    that interpreter's globals, nor that it be shared with other
118    interpreters running concurrently (a name other than 'empy' can be
119    specified with the -m option).  The globals are not cleared or
120    reset in any way.  It is perfectly legal to set variables or
121    explicitly import modules and then use them in later markups,
122    *e.g.*, '@{import time} ... @time.time()'.  Scoping rules are as
123    in normal Python, although all defined variables and objects are
124    taken to be in the global namespace.
125
126    Indentation is significant in Python, and therefore is also
127    significant in EmPy.  EmPy statement markups ('@{...}'), when
128    spanning multiple lines, must be flush with the left margin.  This
129    is because (multiline) statement markups are not treated specially
130    in EmPy and are simply passed to the Python interpreter, where
131    indentation is significant.
132
133    Activities you would like to be done before any processing of the
134    main EmPy file can be specified with the -I, -D, -E, -F, and -P
135    options.  -I imports modules, -D executes a Python variable
136    assignment, -E executes an arbitrary Python (not EmPy) statement,
137    -F executes a Python (not EmPy) file, and -P processes an EmPy
138    (not Python) file.  These operations are done in the order they
139    appear on the command line; any number of each (including, of
140    course, zero) can be used.
141
142
143Expansions
144
145    The following markups are supported.  For concreteness below, '@'
146    is taken for the sake of argument to be the prefix character,
147    although this can be changed.
148
149    **'@# COMMENT NEWLINE'** -- A comment.  Comments, including the
150      trailing newline, are stripped out completely.  Comments should
151      only be present outside of expansions.  The comment itself is
152      not processed in any way: It is completely discarded.  This
153      allows '@#' comments to be used to disable markups.  *Note:* As
154      special support for "bangpaths" in Unix-like operating systems,
155      if the first line of a file (or indeed any context) begins with
156      '#!', and the interpreter has a 'processBangpaths' option set to
157      true (default), it is treated as a '@#' comment.  A '#!'
158      sequence appearing anywhere else will be handled literally and
159      unaltered in the expansion.  Example::
160
161          @# This line is a comment.
162          @# This will NOT be expanded: @x.
163
164    **'@? NAME NEWLINE'** -- Set the name of the current context to be
165      the given string.  Variables are not allowed here; the name is
166      treated as a literal.  (If you wish to use arbitrary
167      expressions, use the 'empy.setContextName' function instead.)
168      Example::
169
170          @?NewName
171          The context name is now @empy.identify()[0] (NewName).
172
173    **'@! INTEGER NEWLINE'** -- Set the line number of the current
174      context to be the given integer value; this is similar to the
175      '#line' C preprocessor directive.  This is done in such a way
176      that the *next* line will have the specified numeric value, not
177      the current one.  Expressions are not allowed here; the number
178      must be a literal integer.  (If you wish to use arbitrary
179      expressions, use the 'empy.setContextLine' function instead.)
180      Example::
181
182          @!100
183          The context line is now @empy.identify()[1] (100).
184
185    **'@ WHITESPACE'** -- A '@' followed by one whitespace character
186      (a space, horizontal tab, vertical tab, carriage return, or
187      newline) is expanded to nothing; it serves as a way to
188      explicitly separate two elements which might otherwise be
189      interpreted as being the same symbol (such as '@name@ s' to mean
190      '@(name)s' -- see below).  Also, since a newline qualifies as
191      whitespace here, the lone '@' at the end of a line represents a
192      line continuation, similar to the backslash in other languages.
193      Coupled with statement expansion below, spurious newlines can be
194      eliminated in statement expansions by use of the '@{...}@'
195      construct.  Example::
196
197          This will appear as one word: salt@ water.
198          This is a line continuation; @
199          this text will appear on the same line.
200
201    **'@\ ESCAPE_CODE'** -- An escape code.  Escape codes in EmPy are
202      similar to C-style escape codes, although they all begin with
203      the prefix character.  Valid escape codes include:
204
205          '@\0' -- NUL, null
206
207          '@\a' -- BEL, bell
208
209          '@\b' -- BS, backspace
210
211          '@\d' -- three-digital decimal code DDD
212
213          '@\e' -- ESC, escape
214
215          '@\f' -- FF, form feed
216
217          '@\h' -- DEL, delete
218
219          '@\n' -- LF, linefeed character, newline
220
221          '@\oOOO' -- three-digit octal code OOO
222
223          '@\qQQQQ' -- four-digit quaternary code QQQQ
224
225          '@\r' -- CR, carriage return
226
227          '@\s' -- SP, space
228
229          '@\t' -- HT, horizontal tab
230
231          '@\v' -- VT, vertical tab
232
233          '@\xHH' -- two-digit hexadecimal code HH
234
235          '@\z' -- EOT, end of transmission
236
237          '@^X' -- the control character ^X
238
239      Unlike in C-style escape codes, escape codes taking some number
240      of digits afterward always take the same number to prevent
241      ambiguities.  Furthermore, unknown escape codes are treated as
242      parse errors to discourage potential subtle mistakes.  Note
243      that, while '@\0' represents the NUL character, to represent an
244      octal code, one must use '@\o...', in contrast to C.  Example::
245
246          This embeds a newline.@\nThis is on the following line.
247          This beeps!@\a
248          There is a tab here:@\tSee?
249          This is the character with octal code 141: @\o141.
250
251    **'@@'** -- A literal at sign ('@').  To embed two adjacent at
252      signs, use '@@@@', and so on.  Any literal at sign that you wish
253      to appear in your text must be written this way, so that it will
254      not be processed by the system.  *Note:* If a prefix other than
255      '@' has been chosen via the command line option, one expresses
256      that literal prefix by doubling it, not by appending a '@'.
257      Example::
258
259          The prefix character is @@.
260          To get the expansion of x you would write @@x.
261
262    **'@)', '@]', '@}'** -- These expand to literal close parentheses,
263      close brackets, and close braces, respectively; these are
264      included for completeness and explicitness only.  Example::
265
266          This is a close parenthesis: @).
267
268    **'@"..."', '@"""..."""', etc.** -- These string literals expand
269      to the literals themselves, so '@"test"' expands to 'test'.
270      Since they are inherently no-operations, the only reason for
271      their use is to override their behavior with hooks.
272
273    **'@( EXPRESSION )'** -- Evaluate an expression, and expand with
274      the string (via a call to 'str') representation evaluation of
275      that expression.  Whitespace immediately inside the parentheses
276      is ignored; '@( expression )' is equivalent to '@(expression)'.
277      If the expression evaluates to 'None', nothing is expanded in
278      its place; this allows function calls that depend on side
279      effects (such as printing) to be called as expressions.  (If you
280      really *do* want a 'None' to appear in the output, then use the
281      Python string '"None"'.)  *Note:* If an expression prints
282      something to 'sys.stdout' as a side effect, then that printing
283      will be spooled to the output *before* the expression's return
284      value is.  Example::
285
286          2 + 2 is @(2 + 2).
287          4 squared is @(4**2).
288          The value of the variable x is @(x).
289          This will be blank: @(None).
290
291    **'@( TEST ? THEN (! ELSE)_opt ($ EXCEPT)_opt )'** -- A special
292      form of expression evaluation representing conditional and
293      protected evaluation.  Evaluate the "test" expression; if it
294      evaluates to true (in the Pythonic sense), then evaluate the
295      "then" section as an expression and expand with the 'str' of
296      that result.  If false, then the "else" section is evaluated and
297      similarly expanded.  The "else" section is optional and, if
298      omitted, is equivalent to 'None' (that is, no expansion will
299      take place).  *Note*: For backward compatibility, the "else"
300      section delimiter, '!', may be expressed as a ':'.  This
301      behavior is supported but deprecated.
302
303      If the "except" section is present, then if any of the prior
304      expressions raises an exception when evaluated, the expansion
305      will be replaced with the evaluation of the except expression.
306      (If the "except" expression itself raises, then that exception
307      will be propagated normally.)  The except section is optional
308      and, if omitted, is equivalent to 'None' (that is, no expansion
309      will take place).  An exception (cough) to this is if one of
310      these first expressions raises a SyntaxError; in that case the
311      protected evaluation lets the error through without evaluating
312      the "except" expression.  The intent of this construct is to
313      except runtime errors, and if there is actually a syntax error
314      in the "try" code, that is a problem that should probably be
315      diagnosed rather than hidden.  Example::
316
317          What is x? x is @(x ? "true" ! "false").
318          Pluralization: How many words? @x word@(x != 1 ? 's').
319          The value of foo is @(foo $ "undefined").
320          Division by zero is @(x/0 $ "illegal").
321
322    **'@ SIMPLE_EXPRESSION'** -- As a shortcut for the '@(...)'
323      notation, the parentheses can be omitted if it is followed by a
324      "simple expression."  A simple expression consists of a name
325      followed by a series of function applications, array
326      subscriptions, or attribute resolutions, with no intervening
327      whitespace.  For example:
328
329          - a name, possibly with qualifying attributes (*e.g.*,
330            '@value', '@os.environ').
331
332          - a straightforward function call (*e.g.*, '@min(2, 3)',
333            '@time.ctime()'), with no space between the function name
334            and the open parenthesis.
335
336          - an array subscription (*e.g.*, '@array[index]',
337            '@os.environ[name]', with no space between the name and
338            the open bracket.
339
340          - any combination of the above (*e.g.*,
341            '@function(args).attr[sub].other[i](foo)').
342
343      In essence, simple expressions are expressions that can be
344      written ambiguously from text, without intervening space.  Note
345      that trailing dots are not considered part of the expansion
346      (*e.g.*, '@x.' is equivalent to '@(x).', not '@(x.)', which
347      would be illegal anyway).  Also, whitespace is allowed within
348      parentheses or brackets since it is unambiguous, but not between
349      identifiers and parentheses, brackets, or dots.  Explicit
350      '@(...)' notation can be used instead of the abbreviation when
351      concatenation is what one really wants (*e.g.*, '@(word)s' for
352      simple pluralization of the contents of the variable 'word').
353      As above, if the expression evaluates to the 'None' object,
354      nothing is expanded.  Note that since a curly appearing where
355      EmPy would expect an open parenthesis or bracket in is
356      meaningless in Python, it is treated as a parse error (*e.g.*,
357      '@x{1, 2}' results in an error).  Example::
358
359          The value of x is @x.
360          The ith value of a is @a[i].
361          The result of calling f with q is @f(q).
362          The attribute a of x is @x.a.
363          The current time is @time.ctime(time.time()).
364          The current year is @time.localtime(time.time())[0].
365          These are the same: @min(2,3) and @min(2, 3).
366          But these are not the same: @min(2, 3) vs. @min (2, 3).
367          The plural of @name is @(name)s, or @name@ s.
368
369    **'@` EXPRESSION `'** -- Evaluate a expression, and expand with
370      the 'repr' (instead of the 'str' which is the default) of the
371      evaluation of that expression.  This expansion is primarily
372      intended for debugging and is unlikely to be useful in actual
373      practice.  That is, a '@`...`' is identical to '@(repr(...))'.
374      Example::
375
376          The repr of the value of x is @`x`.
377          This print the Python repr of a module: @`time`.
378          This actually does print None: @`None`.
379
380    **'@: EXPRESSION : DUMMY :'** -- Evaluate an expression and then
381      expand to a '@:', the original expression, a ':', the evaluation
382      of the expression, and then a ':'.  The current contents of the
383      dummy area are ignored in the new expansion.  In this sense it
384      is self-evaluating; the syntax is available for use in
385      situations where the same text will be sent through the EmPy
386      processor multiple times.  Example::
387
388          This construct allows self-evaluation:
389          @:2 + 2:this will get replaced with 4:
390
391    **'@{ STATEMENTS }'** -- Execute a (potentially compound)
392      statement; statements have no return value, so the expansion is
393      not replaced with anything.  Multiple statements can either be
394      separated on different lines, or with semicolons; indentation is
395      significant, just as in normal Python code.  Statements,
396      however, can have side effects, including printing; output to
397      'sys.stdout' (explicitly or via a 'print' statement) is
398      collected by the interpreter and sent to the output (unless this
399      behavior is suppressed with the -n option).  The usual Python
400      indentation rules must be followed, although if the statement
401      consists of only one statement, leading and trailing whitespace
402      is ignored (*e.g.*, '@{ print time.time() }' is equivalent to
403      '@{print time.time()}').  Example::
404
405          @{x = 123}
406          @{a = 1; b = 2}
407          @{print time.time()}
408          @# Note that extra newlines will appear above because of the
409          @# newlines trailing the close braces.  To suppress them
410          @# use a @ before the newline:
411          @{
412          for i in range(10):
413              print "i is %d" % i
414          }@
415          @{print "Welcome to EmPy."}@
416
417    **'@% KEY (WHITESPACE VALUE)_opt NEWLINE'** -- Declare a
418      significator.  Significators consume the whole line (including
419      the trailing newline), and consist of a key string containing no
420      whitespace, and than optional value prefixed by whitespace.  The
421      key may not start with or contain internal whitespace, but the
422      value may; preceding or following whitespace in the value is
423      stripped.  Significators are totally optional, and are intended
424      to be used for easy external (that is, outside of EmPy)
425      identification when used in large scale environments with many
426      EmPy files to be processed.  The purpose of significators is to
427      provide identification information about each file in a special,
428      easy-to-parse form so that external programs can process the
429      significators and build databases, independently of EmPy.
430      Inside of EmPy, when a significator is encountered, its key,
431      value pair is translated into a simple assignment of the form
432      '__KEY__ = VALUE' , where "__KEY__" is the key string with two
433      underscores on either side and "VALUE" is a Python expression.
434      Example::
435
436          @%title     "Gravitation"
437          @%author    "Misner", "Thorne", "Wheeler"
438          @%publisher "W.H. Freeman and Company"
439          @%pages     1279
440          @%keywords  'physics', 'gravity', 'Einstein', 'relativity'
441          @%copyright 1970, 1971
442
443    **'@< CONTENTS >'** -- Invoke a custom markup.  The custom markup
444      is a special markup reserved for use by the user; it has no
445      prescribed meaning on its own.  If 'contents' is a string
446      representing what appears in between the angle brackets, then
447      expanding this markup is equivalent to
448      'empy.invokeCallback(contents)'.  See the "Custom markup"
449      section for more information.
450
451
452Control
453
454    EmPy version 3 and above includes the ability to direct
455    conditional and repeated expansion of blocks of EmPy code with
456    control markups (the obsolescent "substitution" markups are
457    unavailable as of version 3.0).  Control markups have analogs to
458    control flow structures in Python such as 'if/elif/else', 'for', and
459    'while'.  Control markups are set off with the '@[...]' notation.
460
461    Control markups are designed to be used in precisely the same way
462    that their internal Python analogues are used, except that the
463    control markups are intended to be used where there is much more
464    markup than control structure.
465
466    Some control markups are considered "primary," (*e.g.*, 'if',
467    'for', 'while') as they begin a control markup.  Others are
468    considered "secondary," since they can only appear inside control
469    flow markups delineated by primary markups (*e.g.*, 'elif',
470    'else', 'continue', 'break').
471
472    Since EmPy, unlike Python, cannot use indentation to determine
473    where control structures begin and end, all primary control
474    markups *must* be followed by a corresponding terminating control
475    markup::
476
477        @[PRIMARY ...]...@[end PRIMARY]
478
479    (where 'PRIMARY' represents one of the primary keywords).  The end
480    markup is mandatory, as is the space between the 'end' and the
481    starting keyword.  For instance::
482
483        @# If `person' is alive, show their age.
484        @person.name is @
485        @[if person.isAlive]@person.age@[else]dead@[end if].
486
487    All primary markups must be terminated in this way, and the
488    keyword appearing in the appropriate 'end' markup must match the
489    primary markup it corresponds to; if either of these conditions
490    are not satisfied, the result is a parse error.  Everything
491    between the starting control flow marker ('@[PRIMARY ...]') and
492    the ending marker ('@[end PRIMARY]') -- including other markups,
493    even control markups -- is considered part of the markup.  Control
494    markups can be nested::
495
496        @# Print all non-false elements on separate lines.
497        @[for elem in elements]@[if elem]@elem@\n@[end if]@[end for]
498
499    Three major types of primary control markups are available:
500    conditional (*e.g.*, 'if', 'try'), looping (*e.g.*, 'for',
501    'while'), and definitional (*e.g.*, 'def', discussed below).
502    Conditional control markups conditionally expand their contents,
503    whereas looping control markups repeatedly expand their contents.
504    The third type, definitional markups, will define new objects in
505    the globals relating to their contents.  Conditional and looping
506    markups also differ in one substantial respect: Looping constructs
507    support '@[continue]' and '@[break]' markups which, like their
508    Python equivalents, continue with the next iteration or break out
509    of the innermost looping construct, respectively ('@[continue]'
510    and '@[break]' markups have no meaning inside conditional markups
511    and are an error).  Also like their Python equivalents,
512    '@[continue]' and '@[break]' may appear inside nested markups, so
513    long as they ultimately are contained by at least one looping
514    control markup::
515
516        @# Walk a long a linked list, printing each element.
517        @[while 1]@
518        @node
519        @{node = node.next}@
520        @[if not node]@[break]@[end if]@
521        @[end while]
522
523    The provided markups are designed to mimic the internal Python
524    control structures as closely as possible.  The supported control
525    markups are (the phrases in all uppercase are intended to signify
526    user-selectable patterns)::
527
528	@[if CONDITION1]...@[elif CONDITION2]...@[else]...@[end if]
529	@[try]...@[except ...]...@[except ...]...@[end try]
530	@[try]...@[finally]...@[end try]
531	@[for VARIABLE in SEQUENCE]...@[else]...@[end for]
532	@[while CONDITION]...@[else]...@[end while]
533        @[def SIGNATURE]...@[end def]
534
535    All recognizable forms behave like their Python equivalents; 'if'
536    can contain multiple 'elif' secondary markups within it; the
537    'else' markups are optional (but must appear at the end), the
538    'try' form with the 'except' clause can contain multiple ones
539    which are handled in sequence, the 'try' form can either contain
540    one or more 'except' clauses or one 'finally' clause (but not
541    both), and the 'for' and 'while' structures can contain 'continue'
542    or 'break' clauses internally (even if contained within other
543    markups).
544
545    The third type of primary control markup is "definitional," in
546    that they create objects in the globals for later use (*e.g.*,
547    'def').  This allows the definition of a callable object which,
548    when called, will expand the contained markup (which can in turn,
549    of course, contain further markups).  The argument to the markup
550    can be any legal Python function signature::
551
552        @[def f(x, y, z=2, *args, **keywords)]...@[end def]
553
554    would define a function in the globals named 'f' that takes the
555    given arguments.  A macro markup of the form '@[def
556    SIGNATURE]CODE@[end def]' is equivalent to the Python code::
557
558        def SIGNATURE:
559            r"""CODE""" # so it is a doc string
560            empy.expand(r"""CODE""", locals())
561
562    That is, it creates a Python function with the same name and
563    function arguments, whose docstring is the contents of the EmPy
564    markup that will be expanded when called.  And, when called, it
565    will expand those contents, with the locals passed in.
566
567
568Unicode support
569
570    EmPy version 3.1 and above includes intrinsic Unicode support.
571    EmPy's Unicode support defers to Python's internal Unicode
572    support, available in Python 2.0 and up, in order to allow
573    seamless and transparent translation of different encodings to the
574    native Python Unicode format.
575
576    Knowledge of Python's Unicode support is expected, although not
577    completely required, to gain full benefit of EmPy's Unicode
578    features.  To enable Unicode support, start EmPy with the
579    -u/--unicode option.  EmPy will then transparently encode from the
580    input stream, process markups internally with native Unicode, and
581    then decode transparently to the output stream.
582
583    By default, Python sets 'sys.stdin' and 'sys.stdout' with a
584    default encoding which is accessible via
585    'sys.getdefaultencoding()'; encodings are represented by string
586    names.  These streams have encodings set by the system and
587    *cannot* be changed.
588
589    However, encodings for newly created files (files to be read when
590    specified on the command line, and/or files to be written when
591    used with the -o and -a arguments) can be specified for EmPy via
592    command line options.  The --unicode-encoding option
593    simultaneously indicates the encoding to be used for both input
594    and output, whereas the --unicode-input-encoding and
595    --unicode-output-encoding options can each be used to specify
596    different encodings for both input and output.  (If an encoding is
597    not explicitly indicated, it resorts to the system default in
598    'sys.getdefaultencoding()', which is locale dependent.)
599
600    Python's Unicode implementation has the concept of error handlers,
601    registered with the 'codecs' module, which can be specified to
602    determine what action should take place if input cannot be decoded
603    into Unicode, or Unicode cannot be encoded into output.  EmPy uses
604    these same "errors," as they are called, and can be specified via
605    command line options.  The three most common error handlers are:
606    'ignore', where invalid sequences are simply ignored; 'replace',
607    where invalid sequences are replaced with an encoding-specific
608    indicator, usually a question mark; and 'strict', where invalid
609    sequences raise an error.  The --unicode-errors command line
610    option specifies the same error handler to be used for both input
611    and output, and the --unicode-input-errors and
612    --unicode-output-errors options can specify different error
613    handlers for input and output.  If an error handler is not
614    explicitly specified, the 'strict' handler (which will raise
615    errors) is used.
616
617    Remember, to specify input encodings or errors that will take
618    effect, one cannot take input from 'sys.stdin' and must explicitly
619    specify an EmPy file to process on the command line.  Similarly,
620    for output encodings or errors, 'sys.stdout' cannot be used and an
621    explicit output file must be specified with the -o or -a options.
622    It is perfectly valid to enable the Unicode subsystem (-u option)
623    while using 'sys.stdin' and 'sys.stdout', but the encodings and
624    errors of these preexisting streams cannot be changed.
625
626    Combined with the --no-prefix option, which disables all markup
627    processing, EmPy can act merely as an encoding translator, relying
628    on Python's Unicode facilities::
629
630        em.py --no-prefix \
631            --unicode-input-encoding=utf-8 \
632            --unicode-output-encoding=latin-1 \
633            -o filename.Latin-1 filename.UTF-8
634
635
636Significators
637
638    Significators, introduced in EmPy version 1.2, are intended to
639    represent special assignment in a form that is easy to externally
640    parse.  For instance, if one has a system that contains many EmPy
641    files, each of which has its own title, one could use a 'title'
642    significator in each file and use a simple regular expression to
643    find this significator in each file and organize a database of the
644    EmPy files to be built.  This is an easier proposition than, for
645    instance, attempting to grep for a normal Python assignment
646    (inside a '@{...}' expansion) of the desired variable.
647
648    Significators look like the following::
649
650        @%KEY VALUE
651
652    including the trailing newline, where "key" is a name and "value"
653    is a Python expression, and are separated by any whitespace.  This
654    is equivalent to the following Python code::
655
656        __KEY__ = VALUE
657
658    That is to say, a significator key translates to a Python variable
659    consisting of that key surrounded by double underscores on either
660    side.  The value may contain spaces, but the key may not.  So::
661
662        @%title "All Roads Lead to Rome"
663
664    translates to the Python code::
665
666        __title__ = "All Roads Lead to Rome"
667
668    but obviously in a way that easier to detect externally than if
669    this Python code were to appear somewhere in an expansion.  Since
670    significator keys are surrounded by double underscores,
671    significator keys can be any sequence of alphanumeric and
672    underscore characters; choosing '123' is perfectly valid for a
673    significator (although straight), since it maps to the name
674    '__123__' which is a legal Python identifier.
675
676    Note the value can be any Python expression.  The value can be
677    omitted; if missing, it is treated as 'None'.
678
679    Significators are completely optional; it is completely legal for
680    a EmPy file or files to be processed without containing any
681    significators.  Significators can appear anywhere within a file
682    outside of other markups, but typically they are placed near the
683    top of the file to make them easy to spot and edit by humans.
684
685    A regular expression string designed to match significators (with
686    the default prefix) is available as 'empy.SIGNIFICATOR_RE_STRING',
687    and also is a toplevel definition in the 'em' module itself.
688
689
690
691Diversions
692
693    EmPy supports an extended form of diversions, which are a
694    mechanism for deferring and recalling output on demand, similar to
695    the functionality included in m4.  Multiple "streams" of output
696    can be diverted (deferred) and undiverted (recalled) in this
697    manner.  A diversion is identified with a name, which is any
698    immutable object such an integer or string.  When recalled,
699    diverted code is *not* resent through the EmPy interpreter
700    (although a filter could be set up to do this).
701
702    By default, no diversions take place.  When no diversion is in
703    effect, processing output goes directly to the specified output
704    file.  This state can be explicitly requested at any time by
705    calling the 'empy.stopDiverting' function.  It is always legal to
706    call this function.
707
708    When diverted, however, output goes to a deferred location which
709    can then be recalled later.  Output is diverted with the
710    'empy.startDiversion' function, which takes an argument that is
711    the name of the diversion.  If there is no diversion by that name,
712    a new diversion is created and output will be sent to that
713    diversion; if the diversion already exists, output will be
714    appended to that preexisting diversion.
715
716    Output send to diversions can be recalled in two ways.  The first
717    is through the 'empy.playDiversion' function, which takes the
718    name of the diversion as an argument.  This recalls the named
719    diversion, sends it to the output, and then erases that
720    diversion.  A variant of this behavior is the
721    'empy.replayDiversion', which recalls the named diversion but does
722    not eliminate it afterwards; 'empy.replayDiversion' can be
723    repeatedly called with the same diversion name, and will replay
724    that diversion repeatedly.  'empy.createDiversion' create a
725    diversion without actually diverting to it, for cases where you
726    want to make sure a diversion exists but do not yet want to send
727    anything to it.
728
729    The diversion object itself can be retrieved with
730    'empy.retrieveDiversion'.  Diversions act as writable
731    file-objects, supporting the usual 'write', 'writelines', 'flush',
732    and 'close' methods.  The data that has been diverted to them can
733    be retrieved in one of two ways; either through the 'asString'
734    method, which returns the entire contents of the diversion as a
735    single strong, or through the 'asFile' method, which returns the
736    contents of the diversion as a readable (not writable) file-like
737    object.
738
739    Diversions can also be explicitly deleted without recalling them
740    with the 'empy.purgeDiversion' function, which takes the desired
741    diversion name as an argument.
742
743    Additionally there are three functions which will apply the above
744    operations to all existing diversions: 'empy.playAllDiversions',
745    'empy.replayAllDiversions', and 'empy.purgeAllDiversions'.  All
746    three will do the equivalent of a 'empy.stopDiverting' call before
747    they do their thing.
748
749    The name of the current diversion can be requested with the
750    'empy.getCurrentDiversion' function; also, the names of all
751    existing diversions (in sorted order) can be retrieved with
752    'empy.getAllDiversions'.
753
754    When all processing is finished, the equivalent of a call to
755    'empy.playAllDiversions' is done.
756
757
758Filters
759
760    EmPy also supports dynamic filters, introduced in version 1.3.
761    Filters are put in place right "before" the final output file, and
762    so are only invoked after all other processing has taken place
763    (including interpreting and diverting).  Filters take input, remap
764    it, and then send it to the output.
765
766    The current filter can be retrieved with the 'empy.getFilter'
767    function.  The filter can be cleared (reset to no filter) with
768    'empy.resetFilter' and a special "null filter" which does not send
769    any output at all can be installed with 'empy.nullFilter'.  A
770    custom filter can be set with the 'empy.setFilter' function; for
771    convenience, specialized shortcuts for filters preexist and can be
772    used in lieu of actual 'empy.Filter' instances for the
773    'empy.setFilter' or 'empy.attachFilter' argument:
774
775    - 'None' is a special filter meaning "no filter"; when installed,
776      no filtering whatsoever will take place.  'empy.setFilter(None)'
777      is equivalent to 'empy.resetFilter()'.
778
779    - '0' (or any other numeric constant equal to zero) is another
780      special filter that represents the null filter; when installed,
781      no output will ever be sent to the filter's sink.
782
783    - A filter specified as a function (or lambda) is expected to take
784      one string argument and return one string argument; this filter
785      will execute the function on any input and use the return value
786      as output.
787
788    - A filter that is a string is a 256-character table is
789      substituted with the result of a call to 'string.translate'
790      using that table.
791
792    - A filter can be an instance of a subclass of 'empy.Filter'.
793      This is the most general form of filter.  (In actuality, it can
794      be any object that exhibits a 'Filter' interface, which would
795      include the normal file-like 'write', 'flush', and 'close'
796      methods, as well as 'next', 'attach', and 'detach' methods for
797      filter-specific behavior.)
798
799    - Finally, the argument to 'empy.setFilter' can be a Python list
800      consisting of one or more of the above objects.  In that case,
801      those filters are chained together in the order they appear in
802      the list.  An empty list is the equivalent of 'None'; all
803      filters will be uninstalled.
804
805    Filters are, at their core, simply file-like objects (minimally
806    supporting 'write', 'flush', and 'close' methods that behave in
807    the usual way) which, after performing whatever processing they
808    need to do, send their work to the next file-like object or filter
809    in line, called that filter's "sink."  That is to say, filters can
810    be "chained" together; the action of each filter takes place in
811    sequence, with the output of one filter being the input of the
812    next.  Additionally, filters support a '_flush' method (note the
813    leading underscore) which will always flush the filter's
814    underlying sink; this method should be not overridden.
815
816    Filters also support three additional methods, not part of the
817    traditional file interface: 'attach', which takes as an argument a
818    file-like object (perhaps another filter) and sets that as the
819    filter's "sink" -- that is, the next filter/file-like object in
820    line.  'detach' (which takes no arguments) is another method which
821    flushes the filter and removes its sink, leaving it isolated.
822    Finally, 'next' is an accessor method which returns the filter's
823    sink -- or 'None', if the filter does not yet have a sink
824    attached.
825
826    To create your own filter, you can create an object which supports
827    the above described interface, or simply derive from the
828    'empy.Filter' class and override its 'write' and possibly 'flush'
829    methods.  You can chain filters together by passing them as
830    elements in a list to the 'empy.setFilter' function, or you can
831    chain them together manually with the 'attach' method::
832
833        firstFilter.attach(secondFilter)
834        empy.setFilter(firstFilter)
835
836    or just let EmPy do the chaining for you::
837
838        empy.setFilter([firstFilter, secondFilter])
839
840    In either case, EmPy will walk the filter chain and find the end
841    and then hook that into the appropriate interpreter stream; you
842    need not do this manually.  The function 'empy.attachFilter' can
843    be used to attach a single filter (or shortcut, as above) to the
844    end of a currently existing chain.  Note that unlike its cousin
845    'empy.setFilter', one cannot pass a sequence of filters (or filter
846    shortcuts) to 'empy.attachFilter'.  (If there is no existing
847    filter chain installed, 'empy.attachFilter' will behave the same
848    as 'empy.setFilter'.)
849
850    Subclasses of 'empy.Filter' are already provided with the above
851    null, function, and string functionality described above; they are
852    'NullFilter', 'FunctionFilter', and 'StringFilter', respectively.
853    In addition, a filter which supports buffering, 'BufferedFilter',
854    is provided.  Several variants are included: 'SizeBufferedFilter',
855    a filter which buffers into fixed-sized chunks,
856    'LineBufferedFilter', a filter which buffers by lines, and
857    'MaximallyBufferedFilter', a filter which completely buffers its
858    input.
859
860
861Hooks
862
863    The EmPy system allows for the registry of hooks with a running
864    EmPy interpreter.  Originally introduced in version 2.0 and much
865    improved in 3.2, hooks are objects, registered with an
866    interpreter, whose methods represent specific callbacks.  Any
867    number of hook objects can be registered with an interpreter, and
868    when a callback is invoked, the associated method on each one of
869    those hook objects will be called by the interpreter in sequence.
870
871    Hooks are simply instances, nominally derived from the 'empy.Hook'
872    class.  The 'empy.Hook' class itself defines a series of methods,
873    with the expected arguments, which would be called by a running
874    EmPy interpreter.  This scenario, much improved from the prior
875    implementation in 2.0, allows hooks to keep state and have more
876    direct access to the interpreter they are running in (the
877    'empy.Hook' instance contains an 'interpreter' attribute).
878
879    To use a hook, derive a class from 'empy.Hook' and override the
880    desired methods (with the same signatures as they appear in the
881    base class).  Create an instance of that subclass, and then
882    register it with a running interpreter with the 'empy.addHook'
883    function.  (This same hook instance can be removed with the
884    'empy.removeHook' function.)
885
886    More than one hook instance can be registered with an interpreter;
887    in such a case, the appropriate methods are invoked on each
888    instance in the order in which they were registered.  To adjust
889    this behavior, an optional 'prepend' argument to the
890    'empy.addHook' function can be used dictate that the new hook
891    should placed at the *beginning* of the sequence of hooks, rather
892    than at the end (which is the default).
893
894    All hooks can be enabled and disabled entirely for a given
895    interpreter; this is done with the 'empy.enableHooks' and
896    'empy.disableHooks' functions.  By default hooks are enabled, but
897    obviously if no hooks have been registered no hook callbacks will
898    be made.  Whether hooks are enabled or disabled can be determined
899    by calling 'empy.areHooksEnabled'.  To get a (copy of) the list of
900    registered hooks, call 'empy.getHooks'.  Finally, to invoke a hook
901    manually, use 'empy.invokeHook'.
902
903    For a list of supported hook callbacks, see the 'empy.Hook' class
904    definition.
905
906    As a practical example, this sample Python code would print a
907    pound sign followed by the name of every file that is included
908    with 'empy.include'::
909
910        class IncludeHook(empy.Hook):
911            def beforeInclude(self, name, file, locals):
912                print "# %s" % name
913
914        empy.addHook(IncludeHook())
915
916
917Custom markup
918
919    Since version 3.2.1, the markup '@<...>' is reserved for
920    user-defined use.  Unlike the other markups, this markup has no
921    specified meaning on its own, and can be provided a meaning by the
922    user.  This meaning is provided with the use of a "custom
923    callback," or just "callback," which can be set, queried, or reset
924    using the pseudomodule function.
925
926    The custom callback is a callable object which, when invoked, is
927    passed a single argument: a string representing the contents of
928    what was found inside the custom markup '@<...>'.
929
930    To register a callback, call 'empy.registerCallback'.  To remove
931    one, call 'empy.deregisterCallback'.  To retrieve the callback (if
932    any) registered with the interpreter, use 'empy.getCallback'.
933    Finally, to invoke the callback just as if the custom markup were
934    encountered, call 'empy.invokeCallback'.  For instance, '@<This
935    text>' would be equivalent to the call '@empy.invokeCallback("This
936    text")'.
937
938    By default, to invoke a callback (either explicitly with
939    'empy.invokeCallback' or by processing a '@<...>' custom markup)
940    when no callback has been registered is an error.  This behavior
941    can be changed with the 'CALLBACK_OPT' option, or the
942    --no-callback-error command line option.
943
944
945Pseudomodule
946
947    The 'empy' pseudomodule is available only in an operating EmPy
948    system.  (The name of the module, by default 'empy', can be
949    changed with the -m option or the 'EMPY_PSEUDO' environment
950    variable).  It is called a pseudomodule because it is not actually
951    a module, but rather exports a module-like interface.  In fact,
952    the pseudmodule is actually the same internal object as the
953    interpreter itself.
954
955    The pseudomodule contains the following functions and objects (and
956    their signatures, with a suffixed 'opt' indicating an optional
957    argument):
958
959    First, basic identification:
960
961    **'VERSION'** -- A constant variable which contains a
962      string representation of the EmPy version.
963
964    **'SIGNIFICATOR_RE_STRING'** -- A constant variable representing a
965      regular expression string (using the default prefix) that can be
966      used to find significators in EmPy code.
967
968    **'SIGNIFICATOR_RE_SUFFIX'** -- The portion of the significator
969      regular expression string excluding the prefix, so that those
970      using non-standard prefix can build their own custom regular
971      expression string with 'myPrefix + empy.SIGNIFICATOR_RE_SUFFIX'.
972
973    **'interpreter'** -- The instance of the interpreter that is
974      currently being used to perform execution.  *Note:* This is now
975      obsolete; the pseudomodule is itself the interpreter.  Instead
976      of using 'empy.interpreter', simply use 'empy'.
977
978    **'argv'** -- A list consisting of the name of the primary EmPy
979      script and its command line arguments, in analogue to the
980      'sys.argv' list.
981
982    **'args'** -- A list of the command line arguments following the
983      primary EmPy script; this is equivalent to 'empy.argv[1:]'.
984
985    **'identify() -> string, integer'** -- Retrieve identification
986      information about the current parsing context.  Returns a
987      2-tuple consisting of a filename and a line number; if the file
988      is something other than from a physical file (*e.g.*, an
989      explicit expansion with 'empy.expand', a file-like object within
990      Python, or via the -E or -F command line options), a string
991      representation is presented surrounded by angle brackets.  Note
992      that the context only applies to the *EmPy* context, not the
993      Python context.
994
995    **'atExit(callable)'** -- Register a callable object (such as a
996      function) taking no arguments which will be called at the end of
997      a normal shutdown.  Callable objects registered in this way are
998      called in the reverse order in which they are added, so the
999      first callable registered with 'empy.atExit' is the last one to
1000      be called.  Note that although the functionality is related to
1001      hooks, 'empy.atExit' does no work via the hook mechanism, and
1002      you are guaranteed that the interpreter and stdout will be in a
1003      consistent state when the callable is invoked.
1004
1005    Context manipulation:
1006
1007    **'pushContext(name_opt, line_opt)'** -- Create a new context with
1008      the given name and line and push it on the stack.
1009
1010    **'popContext()'** -- Pop the top context and dispose of it.
1011
1012    **'setContextName(name)'** -- Manually set the name of the current
1013      context.
1014
1015    **'setContextLine(line)'** -- Manually set the line number of the
1016      current context; line must be a numeric value.  Note that
1017      afterward the line number will increment by one for each newline
1018      that is encountered, as before.
1019
1020    Globals manipulation:
1021
1022    **'getGlobals()'** -- Retrieve the globals dictionary for this
1023      interpreter.  Unlike when calling 'globals()' in Python, this
1024      dictionary *can* be manipulated and you *can* expect changes you
1025      make to it to be reflected in the interpreter that holds it.
1026
1027    **'setGlobals(globals)'** -- Reseat the globals dictionary
1028      associated with this interpreter to the provided mapping type.
1029
1030    **'updateGlobals(globals)'** -- Merge the given dictionary into
1031      this interpreter's globals.
1032
1033    **'clearGlobals(globals_opt)'** -- Clear out the globals
1034      (restoring, of course, the 'empy' pseudomodule).  Optionally,
1035      instead of starting with a refresh dictionary, use the
1036      dictionary provided.
1037
1038    **'saveGlobals(deep=True)'** -- Save a copy of the globals onto an
1039      internal history stack from which it can be restored later.  The
1040      optional 'deep' argument indicates whether or not the copying
1041      should be a deep copy (default) or a shallow one.  Copying is
1042      done with 'copy.deepcopy' or 'copy.copy', respectively.
1043
1044    **'restoreGlobals(destructive=True)'** -- Restore the most
1045      recently saved globals from the history stack to as the current
1046      globals for this instance.  The optional 'destructive' argument
1047      indicates whether or not the restore should remove the restored
1048      globals from the history stack (default), or whether it should
1049      be left there for subsequent restores.
1050
1051    Types:
1052
1053    **'Interpreter'** -- The actual interpreter class.
1054
1055    The following functions allow direct execution; optional 'locals'
1056    arguments, if specified, are treated as the locals dictionary in
1057    evaluation and execution:
1058
1059    **'defined(name, locals_opt)'** -- Return true if the given name
1060      is defined either in the (optional) locals or the interpreter
1061      globals; return false otherwise.
1062
1063    **'evaluate(expression, locals_opt)'** -- Evaluate the given
1064      expression.
1065
1066    **'serialize(expression, locals_opt)'** -- Serialize the
1067      expression, just as the interpreter would:  If it is not None,
1068      convert it to a string with the 'str' builtin function, and then
1069      write out the result.  If it evaluates to None, do nothing.
1070
1071    **'execute(statements, locals_opt)'** -- Execute the given
1072      statement(s).
1073
1074    **'single(source, locals_opt)'** -- Interpret the "single" source
1075      code, just as the Python interactive interpreter would.
1076
1077    **'import_(name, locals_opt)'** -- Import a module.
1078
1079    **'atomic(name, value, locals_opt)'** -- Perform a single, atomic
1080      assignment.  In this case name is the string denoating the name
1081      of the (single) variable to be assigned to, and value is a
1082      Python object which the name is to be bound to.
1083
1084    **'assign(name, value, locals_opt)'** -- Perform general
1085      assignment.  This decays to atomic assignment (above) in the
1086      normal case, but supports "tuple unpacking" in the sense that if
1087      name string contains commas, it is treated as a sequence of
1088      names and memberwise assignment with each member of the value
1089      (still a Python object, but which must be a sequence).  This
1090      function will raise a 'TypeError' or 'ValueError' just like
1091      Python would if tuple unpacking is not possible (that is, if the
1092      value is not a sequence or is of an incompatible length,
1093      respectively).  This only supports the assignment of Python
1094      identifiers, not arbitrary Python lvalues.
1095
1096    **'significate(key, value_opt, locals_opt)'** -- Do a manual
1097      signification.  If 'value' is not specified, it is treated as
1098      'None'.
1099
1100    The following functions relate to source manipulation:
1101
1102    **'include(file_or_filename, locals_opt)'** -- Include another
1103      EmPy file, by processing it in place.  The argument can either
1104      be a filename (which is then opened with 'open' in text mode) or
1105      a file object, which is used as is.  Once the included file is
1106      processed, processing of the current file continues.  Includes
1107      can be nested.  The call also takes an optional locals
1108      dictionary which will be passed into the evaluation function.
1109
1110    **'expand(string, locals_opt)' -> string** -- Explicitly invoke
1111      the EmPy parsing system to process the given string and return
1112      its expansion.  This allows multiple levels of expansion,
1113      *e.g.*, '@(empy.expand("@(2 + 2)"))'.  The call also takes an
1114      optional locals dictionary which will be passed into the
1115      evaluation function.  This is necessary when text is being
1116      expanded inside a function definition and it is desired that the
1117      function arguments (or just plain local variables) are available
1118      to be referenced within the expansion.
1119
1120    **'quote(string) -> string'** -- The inverse process of
1121      'empy.expand', this will take a string and return a new string
1122      that, when expanded, would expand to the original string.  In
1123      practice, this means that appearances of the prefix character
1124      are doubled, except when they appear inside a string literal.
1125
1126    **'escape(string, more_opt) -> string'** -- Given a string, quote
1127      the nonprintable characters contained within it with EmPy
1128      escapes.  The optional 'more' argument specifies additional
1129      characters that should be escaped.
1130
1131    **'flush()'** -- Do an explicit flush on the underlying stream.
1132
1133    **'string(string, name_opt, locals_opt)'** -- Explicitly process a
1134      string-like object.  This differs from 'empy.expand' in that the
1135      string is directly processed into the EmPy system, rather than
1136      being evaluated in an isolated context and then returned as a
1137      string.
1138
1139    Changing the behavior of the pseudomodule itself:
1140
1141    **'flatten(keys_opt)'** -- Perform the equivalent of 'from empy
1142      import ...' in code (which is not directly possible because
1143      'empy' is a pseudomodule).  If keys is omitted, it is taken as
1144      being everything in the 'empy' pseudomodule.  Each of the
1145      elements of this pseudomodule is flattened into the globals
1146      namespace; after a call to 'empy.flatten', they can be referred
1147      to simple as globals, *e.g.*, '@divert(3)' instead of
1148      '@empy.divert(3)'.  If any preexisting variables are bound to
1149      these names, they are silently overridden.  Doing this is
1150      tantamount to declaring an 'from ... import ...' which is often
1151      considered bad form in Python.
1152
1153    Prefix-related functions:
1154
1155    **'getPrefix() -> char'** -- Return the current prefix.
1156
1157    **'setPrefix(char)'** -- Set a new prefix.  Immediately after this
1158      call finishes, the prefix will be changed.  Changing the prefix
1159      affects only the current interpreter; any other created
1160      interpreters are unaffected.  Setting the prefix to None or the
1161      null string means that no further markups will be processed,
1162      equivalent to specifying the --no-prefix command line argument.
1163
1164    Diversions:
1165
1166    **'stopDiverting()'** -- Any diversions that are currently taking
1167      place are stopped; thereafter, output will go directly to the
1168      output file as normal.  It is never illegal to call this
1169      function.
1170
1171    **'createDiversion(name)'** -- Create a diversion, but do not
1172      begin diverting to it.  This is the equivalent of starting a
1173      diversion and then immediately stopping diversion; it is used in
1174      cases where you want to make sure that a diversion will exist
1175      for future replaying but may be empty.
1176
1177    **'startDiversion(name)'** -- Start diverting to the specified
1178      diversion name.  If such a diversion does not already exist, it
1179      is created; if it does, then additional material will be
1180      appended to the preexisting diversions.
1181
1182    **'playDiversion(name)'** -- Recall the specified diversion and
1183      then purge it.  The provided diversion name must exist.
1184
1185    **'replayDiversion(name)'** -- Recall the specified diversion
1186      without purging it.  The provided diversion name must exist.
1187
1188    **'purgeDiversion(name)'** -- Purge the specified diversion
1189      without recalling it.  The provided diversion name must exist.
1190
1191    **'playAllDiversions()'** -- Play (and purge) all existing
1192      diversions in the sorted order of their names.  This call does
1193      an implicit 'empy.stopDiverting' before executing.
1194
1195    **'replayAllDiversions()'** -- Replay (without purging) all
1196      existing diversions in the sorted order of their names.  This
1197      call does an implicit 'empy.stopDiverting' before executing.
1198
1199    **'purgeAllDiversions()'** -- Purge all existing diversions
1200      without recalling them.  This call does an implicit
1201      'empy.stopDiverting' before executing.
1202
1203    **'getCurrentDiversion() -> diversion'** -- Return the name of the
1204      current diversion.
1205
1206    **'getAllDiversions() -> sequence'** -- Return a sorted list of
1207      all existing diversions.
1208
1209    Filters:
1210
1211    **'getFilter() -> filter'** -- Retrieve the current filter.
1212      'None' indicates no filter is installed.
1213
1214    **'resetFilter()'** -- Reset the filter so that no filtering is
1215      done.
1216
1217    **'nullFilter()'** -- Install a special null filter, one which
1218      consumes all text and never sends any text to the output.
1219
1220    **'setFilter(shortcut)'** -- Install a new filter.  A filter is
1221      'None' or an empty sequence representing no filter, or '0' for a
1222      null filter, a function for a function filter, a string for a
1223      string filter, or an instance of 'empy.Filter' (or a workalike
1224      object).  If filter is a list of the above things, they will be
1225      chained together manually; if it is only one, it will be
1226      presumed to be solitary or to have already been manually chained
1227      together.  See the "Filters" section for more information.
1228
1229    **'attachFilter(shortcut)'** -- Attach a single filter (sequences
1230      are not allowed here) to the end of a currently existing filter
1231      chain, or if there is no current chain, install it as
1232      'empy.setFilter' would.  As with 'empy.setFilter', the shortcut
1233      versions of filters are also allowed here.
1234
1235    Hooks:
1236
1237    **'areHooksEnabled()'** -- Return whether or not hooks are
1238      presently enabled.
1239
1240    **'enableHooks()'** -- Enable invocation of hooks.  By default
1241      hooks are enabled.
1242
1243    **'disableHooks()'** -- Disable invocation of hooks.  Hooks can
1244      still be added, removed, and queried, but invocation of hooks
1245      will not occur (even explicit invocation with
1246      'empy.invokeHook').
1247
1248    **'getHooks()'** -- Get a (copy of the) list of the hooks
1249      currently registered.
1250
1251    **'clearHooks()'** -- Clear all the hooks registered with this
1252      interpreter.
1253
1254    **'addHook(hook, prepend_opt)'** -- Add this hook to the hooks
1255      associated with this interpreter.  By default, the hook is
1256      appended to the end of the existing hooks, if any; if the
1257      optional insert argument is present and true, it will be
1258      prepended to the list instead.
1259
1260    **'removeHook(hook)'** -- Remove this hook from the hooks
1261      associated with this interpreter.
1262
1263    **'invokeHook(_name, ...)'** -- Manually invoke a hook method.
1264      The remaining arguments are treated as keyword arguments and the
1265      resulting dictionary is passed in as the second argument to the
1266      hooks.
1267
1268    Custom markup callback:
1269
1270    **'getCallback() -> callback'** -- Retrieve the current callback
1271      associated with this interpreter, or 'None' if it does not yet
1272      have one.
1273
1274    **'registerCallback(callback)'** -- Register a callback to be
1275      called whenever a custom markup ('@<...>') is encountered.  When
1276      encountered, 'invokeCallback' is called.
1277
1278    **'deregisterCallback()'** -- Clear any callback previously
1279      registered with the interpreter for being called when a custom
1280      markup is encountered.
1281
1282    **'invokeCallback(contents)'** -- Invoke a custom callback.  This
1283      function is called whenever a custom markup ('@<...>') is
1284      encountered.  It in turn calls the registered callback, with a
1285      single argument, 'contents', which is a string representing of
1286      the contents of the custom markup.
1287
1288
1289Invocation
1290
1291    Basic invocation involves running the interpreter on an EmPy file
1292    and some optional arguments.  If no file are specified, or the
1293    file is named '-', EmPy takes its input from stdin.  One can
1294    suppress option evaluation (to, say, specify a file that begins
1295    with a dash) by using the canonical '--' option.
1296
1297    **'-h'/'--help'** -- Print usage and exit.
1298
1299    **'-H'/'--extended-help'** -- Print extended usage and exit.
1300      Extended usage includes a rundown of all the legal expansions,
1301      escape sequences, pseudomodule contents, used hooks, and
1302      supported environment variables.
1303
1304    **'-v'/'--verbose'** -- The EmPy system will print all manner of
1305      details about what it is doing and what it is processing to
1306      stderr.
1307
1308    **'-V'/'--version'** -- Print version and exit.
1309
1310    **'-a'/'--append' (filename)** -- Open the specified file for
1311      append instead of using stdout.
1312
1313    **'-b'/'--buffered-output'** -- Fully buffer processing output,
1314      including the file open itself.  This is helpful when, should an
1315      error occur, you wish that no output file be generated at all
1316      (for instance, when using EmPy in conjunction with make).  When
1317      specified, either the -o or -a options must be specified, and
1318      the -b option must precede them.  This can also be specified
1319      through the existence of the 'EMPY_BUFFERED_OUTPUT' environment
1320      variable.
1321
1322    **'-f'/'--flatten'** -- Before processing, move the contents of
1323      the 'empy' pseudomodule into the globals, just as if
1324      'empy.flatten()' were executed immediately after starting the
1325      interpreter.  That is, *e.g.*, 'empy.include' can be referred to
1326      simply as 'include' when this flag is specified on the command
1327      line.  This can also be specified through the existence of the
1328      'EMPY_FLATTEN' environment variable.
1329
1330    **'-i'/'--interactive'** -- After the main EmPy file has been
1331      processed, the state of the interpreter is left intact and
1332      further processing is done from stdin.  This is analogous to the
1333      Python interpreter's -i option, which allows interactive
1334      inspection of the state of the system after a main module is
1335      executed.  This behaves as expected when the main file is stdin
1336      itself.  This can also be specified through the existence of the
1337      'EMPY_INTERACTIVE' environment variable.
1338
1339    **'-k'/'--suppress-errors'** -- Normally when an error is
1340      encountered, information about its location is printed and the
1341      EmPy interpreter exits.  With this option, when an error is
1342      encountered (except for keyboard interrupts), processing stops
1343      and the interpreter enters interactive mode, so the state of
1344      affairs can be assessed.  This is also helpful, for instance,
1345      when experimenting with EmPy in an interactive manner.  -k
1346      implies -i.
1347
1348    **'-n'/'--no-override-stdout'** -- Do not override 'sys.stdout'
1349      with a proxy object which the EmPy system interacts with.  If
1350      suppressed, this means that side effect printing will not be
1351      captured and routed through the EmPy system.  However, if this
1352      option is specified, EmPy can support multithreading.
1353
1354    **'-o'/'--output' (filename)** -- Open the specified file for
1355      output instead of using stdout.  If a file with that name
1356      already exists it is overwritten.
1357
1358    **'-p'/'--prefix' (prefix)** -- Change the prefix used to detect
1359      expansions.  The argument is the one-character string that will
1360      be used as the prefix.  Note that whatever it is changed to, the
1361      way to represent the prefix literally is to double it, so if '$'
1362      is the prefix, a literal dollar sign is represented with '$$'.
1363      Note that if the prefix is changed to one of the secondary
1364      characters (those that immediately follow the prefix to indicate
1365      the type of action EmPy should take), it will not be possible to
1366      represent literal prefix characters by doubling them (*e.g.*, if
1367      the prefix were inadvisedly changed to '#' then '##' would
1368      already have to represent a comment, so '##' could not represent
1369      a literal '#').  This can also be specified through the
1370      'EMPY_PREFIX' environment variable.
1371
1372    **'-r'/'--raw-errors'** -- Normally, EmPy catches Python
1373      exceptions and prints them alongside an error notation
1374      indicating the EmPy context in which it occurred.  This option
1375      causes EmPy to display the full Python traceback; this is
1376      sometimes helpful for debugging.  This can also be specified
1377      through the existence of the 'EMPY_RAW_ERRORS' environment
1378      variable.
1379
1380    **'-u'/'--unicode'** -- Enable the Unicode subsystem.  This option
1381      only need be present if you wish to enable the Unicode subsystem
1382      with the defaults; any other Unicode-related option (starting
1383      with --unicode...) will also enable the Unicode subsystem.
1384
1385    **'-D'/'--define' (assignment)** -- Execute a Python assignment of
1386      the form 'variable = expression'.  If only a variable name is
1387      provided (*i.e.*, the statement does not contain an '=' sign),
1388      then it is taken as being assigned to None.  The -D option is
1389      simply a specialized -E option that special cases the lack of an
1390      assignment operator.  Multiple -D options can be specified.
1391
1392    **'-E'/'--execute' (statement)** -- Execute the Python (not EmPy)
1393      statement before processing any files.  Multiple -E options can
1394      be specified.
1395
1396    **'-F'/'--execute-file' (filename)** -- Execute the Python (not
1397      EmPy) file before processing any files.  This is equivalent to
1398      '-E execfile("filename")' but provides a more readable context.
1399      Multiple -F options can be specified.
1400
1401    **'-I'/'--import' (module)** -- Imports the specified module name
1402      before processing any files.  Multiple modules can be specified
1403      by separating them by commas, or by specifying multiple -I
1404      options.
1405
1406    **'-P'/'--preprocess' (filename)** -- Process the EmPy file before
1407      processing the primary EmPy file on the command line.
1408
1409    **'--binary'** -- Treat the file as a binary file, and read in
1410      chunks rather than line by line.  In this mode, the "line"
1411      indicator represents the number of bytes read, not the number of
1412      lines processed.
1413
1414    **'--no-prefix'** -- Disable the prefixing system entirely; when
1415      specified, EmPy will not expand any markups.  This allows EmPy
1416      to merely act as a Unicode encoding translator..
1417
1418    **'--pause-at-end'** -- If present, then 'raw_input' will be
1419      called at the end of processing.  Useful in systems where the
1420      output window would otherwise be closed by the operating
1421      system/window manager immediately after EmPy exited.
1422
1423    **'--relative-path'** -- When present, the path the EmPy script
1424      being invoked is contained in will be prepended to 'sys.path'.
1425      This is analogous to Python's internal handling of 'sys.path'
1426      and scripts.  If input is from stdin ('-' for a filename or no
1427      filename is specified), then nothing is added to the path.
1428
1429    **'--no-callback-error'** -- Do not consider it an error if the
1430      custom markup is invoked '@<...>' and there is no callback
1431      function registered for it.
1432
1433    **'--chunk-size' (chunk)** -- Use the specific binary chunk size
1434      rather than the default; implies --binary.
1435
1436    **'--unicode-encoding' (encoding)** -- Specify the Unicode
1437      encoding to be used for both input and output.
1438
1439    **'--unicode-input-encoding' (encoding)** -- Specify the Unicode
1440      encoding to be used for input.
1441
1442    **'--unicode-output-encoding' (encoding)** -- Specify the Unicode
1443      encoding to be used for output.
1444
1445    **'--unicode-input-errors (errors)** -- Specify the Unicode error
1446      handling to be used for input.
1447
1448    **'--unicode-errors (errors)** -- Specify the Unicode error
1449      handling to be used for both input and output.
1450
1451    **'--unicode-output-errors (errors)** -- Specify the Unicode error
1452      handling to be used for output.
1453
1454
1455Environment variables
1456
1457    EmPy also supports a few environment variables to predefine
1458    certain behaviors.  The settings chosen by environment variables
1459    can be overridden via command line arguments.  The following
1460    environment variables have meaning to EmPy:
1461
1462    **'EMPY_OPTIONS'** -- If present, the contents of this environment
1463      variable will be treated as options, just as if they were
1464      entered on the command line, *before* the actual command line
1465      arguments are processed.  Note that these arguments are *not*
1466      processed by the shell, so quoting, filename globbing, and the
1467      like, will not work.
1468
1469    **'EMPY_PREFIX'** -- If present, the value of this environment
1470      variable represents the prefix that will be used; this is
1471      equivalent to the -p command line option.
1472
1473    **'EMPY_PSEUDO'** -- If present, the value of this environment
1474      variable represents the name of the pseudomodule that will be
1475      incorporated into every running EmPy system; this is equivalent
1476      to the -m command line option.
1477
1478    **'EMPY_FLATTEN'** -- If defined, this is equivalent to including
1479      -f on the command line.
1480
1481    **'EMPY_RAW_ERRORS'** -- If defined, this is equivalent to
1482      including -r on the command line.
1483
1484    **'EMPY_INTERACTIVE'** -- If defined, this is equivalent to
1485      including -i on the command line.
1486
1487    **'EMPY_BUFFERED_OUTPUT'** -- If defined, this is equivalent to
1488      including -b on the command line.
1489
1490    **'EMPY_UNICODE'** -- If defined, this is equivalent to including
1491      -u on the command line.
1492
1493    **'EMPY_UNICODE_INPUT_ENCODING'** -- If present, the value of this
1494      environment variable indicates the name of the Unicode input
1495      encoding to be used.  This is equivalent to the
1496      --unicode-input-encoding command line option.
1497
1498    **'EMPY_UNICODE_OUTPUT_ENCODING'** -- If present, the value of
1499      this environment variable indicates the name of the Unicode
1500      output encoding to be used.  This is equivalent to the
1501      --unicode-output-encoding command line option.
1502
1503    **'EMPY_UNICODE_INPUT_ERRORS'** -- If present, the value of this
1504      environment variable indicates the name of the error handler to
1505      be used for input.  This is equivalent to the
1506      --unicode-input-errors command line option.
1507
1508    **'EMPY_UNICODE_OUTPUT_ERRORS'** -- If present, the value of this
1509      environment variable indicates the name of the error handler to
1510      be used for output.  This is equivalent to the
1511      --unicode-output-errors command line option.
1512
1513
1514Examples and testing EmPy
1515
1516    See the sample EmPy file 'sample.em' which is included with the
1517    distribution.  Run EmPy on it by typing something like::
1518
1519         ./em.py sample.em
1520
1521    and compare the results and the sample source file side by side.
1522    The sample content is intended to be self-documenting, and even an
1523    introduction to the basic features of EmPy while simultaneously
1524    exercising them.
1525
1526    The file 'sample.bench' is the benchmark output of the sample.
1527    Running the EmPy interpreter on the provided 'sample.em' file
1528    should produce precisely the same results.  You can run the
1529    provided test script to see if your EmPy environment is behaving
1530    as expected (presuming a Unix-like operating system)::
1531
1532        ./test.sh
1533
1534    By default this will test with the first Python interpreter
1535    available in the path; if you want to test with another
1536    interpreter, you can provide it as the first argument on the
1537    command line, *e.g.*::
1538
1539        ./test.sh python2.1
1540        ./test.sh /usr/bin/python1.5
1541        ./test.sh jython
1542
1543    A more comprehensive test suite and set of real-world examples is
1544    planned for a future version.
1545
1546
1547Embedding EmPy
1548
1549    For atomic applications, the 'expand' function is provided (the
1550    extra keyword arguments passed in are treated as locals)::
1551
1552        import em
1553        print em.expand("@x + @y is @(x + y).", x=2, y=3)
1554
1555    One can specify a globals dictionary and all the other interpreter
1556    options (below) as well.  One can specify a globals dictionary
1557    that will be used if one wants persistence::
1558
1559        import em
1560        g = {}
1561        em.expand("@{x = 10}", g)
1562        print em.expand("x is @x.", g)
1563
1564    The standalone 'expand' function, however, creates and destroys an
1565    'Interpreter' instance each time it is called.  For repeated
1566    expansions, this can be expensive.  Instead, you will probably
1567    want to use the full-fledged features of embedding.  An EmPy
1568    interpreter can be created with as code as simple as::
1569
1570        import em
1571        interpreter = em.Interpreter()
1572        # The following prints the results to stdout:
1573        interpreter.string("@{x = 123}@x\n")
1574        # This expands to the same thing, but puts the results as a
1575        # string in the variable result:
1576        result = interpreter.expand("@{x = 123}@x\n")
1577        # This just prints the value of x directly:
1578        print interpreter.globals['x']
1579        # Process an actual file (and output to stdout):
1580        interpreter.file(open('/path/to/some/file'))
1581        interpreter.shutdown() # this is important; see below
1582
1583    One can capture the output of a run in something other than stdout
1584    by specifying the *output* parameter::
1585
1586        import em, StringIO
1587        output = StringIO.StringIO()
1588        interpreter = em.Interpreter(output=output)
1589        # Do something.
1590        interpreter.file(open('/path/to/some/file'))
1591        interpreter.shutdown() # again, this is important; see below
1592        print output.getvalue() # this is the result from the session
1593
1594    When you are finished with your interpreter, it is important to
1595    call its shutdown method::
1596
1597        interpreter.shutdown()
1598
1599    This will ensure that the interpreter cleans up all its overhead,
1600    entries in the 'sys.stdout' proxy, and so forth.  It is usually
1601    advisable that this be used in a try...finally clause::
1602
1603        interpreter = em.Interpreter(...)
1604        try:
1605            ...
1606        finally:
1607            interpreter.shutdown()
1608
1609    The 'em.Interpreter' constructor takes the following arguments;
1610    all are optional.  Since options may be added in the future, it is
1611    highly recommended that the constructor be invoked via keyword
1612    arguments, rather than assuming their order.  The arguments are:
1613
1614    *output* -- The output file which the interpreter will be sending
1615     all its processed data to.  This need only be a file-like object;
1616     it need not be an actual file.  If omitted, 'sys.__stdout__' is
1617     used.
1618
1619    *argv* -- An argument list analogous to 'sys.argv', consisting of
1620     the script name and zero or more arguments.  These are available
1621     to executing interpreters via 'empy.argv' and 'empy.args'.  If
1622     omitted, a non-descript script name is used with no arguments.
1623
1624    *prefix* -- The prefix (a single-character string).  Defaults to
1625     '@'.  It is an error for this to be anything other than one
1626     character.
1627
1628    *pseudo* -- The name (string) of the pseudmodule.  Defaults to
1629     'empy'.
1630
1631    *options* -- A dictionary of options that can override the default
1632     behavior of the interpreter.  The names of the options are
1633     constant names ending in '_OPT' and their defaults are given in
1634     'Interpreter.DEFAULT_OPTIONS'.
1635
1636    *globals* -- By default, interpreters begin with a pristine
1637     dictionary of globals (except, of course, for the 'empy'
1638     pseudomodule).  Specifying this argument will allow the globals
1639     to start with more.
1640
1641    *hooks* -- A sequence of hooks (or 'None' for none) to register
1642     with the interpreter at startup.  Hooks can, of course, be added
1643     after the fact, but this allows the hooks to intercept the
1644     'atStartup' event (otherwise, the startup event would already
1645     have occurred by the time new hooks could be registered)..
1646
1647    Many things can be done with EmPy interpreters; for the full
1648    developer documentation, see the generated documentation for the
1649    'em' module.
1650
1651
1652Interpreter options
1653
1654    The following options (passed in as part of the options dictionary
1655    to the Interpreter constructor) have the following meanings.  The
1656    defaults are shown below and are also indicated in an
1657    'Interpreter.DEFAULT_OPTIONS' dictionary.
1658
1659    **'BANGPATH_OPT'** -- Should a bangpath ('#!') as the first line
1660      of an EmPy file be treated as if it were an EmPy comment?  Note
1661      that '#!' sequences starting lines or appearing anywhere else in
1662      the file are untouched regardless of the value of this option.
1663      Default: true.
1664
1665    **'BUFFERED_OPT'** -- Should an 'abort' method be called upon
1666      failure?  This relates to the fully-buffered option, where all
1667      output can be buffered including the file open; this option only
1668      relates to the interpreter's behavior *after* that proxy file
1669      object has been created.  Default: false.
1670
1671    **'RAW_OPT'** -- Should errors be displayed as raw Python errors
1672      (that is, the exception is allowed to propagate through to the
1673      toplevel so that the user gets a standard Python traceback)?
1674      Default: false.
1675
1676    **'EXIT_OPT'** -- Upon an error, should execution continue
1677      (although the interpreter stacks will be purged)?  Note that
1678      even in the event this is set, the interpreter will halt upon
1679      receiving a 'KeyboardInterrupt'.  Default: true.
1680
1681    **'FLATTEN_OPT'** -- Upon initial startup, should the 'empy'
1682      pseudomodule namespace be flattened, *i.e.*, should
1683      'empy.flatten' be called?  Note this option only has an effect
1684      when the interpreter is first created; thereafter it is
1685      ignored.  Default: false.
1686
1687    **'OVERRIDE_OPT'** -- Should the 'sys.stdout' object be overridden
1688      with a proxy object?  If not, side effect output cannot be
1689      captured by the EmPy system, but EmPy will support
1690      multithreading.  Default: true.
1691
1692    **'CALLBACK_OPT'** -- If a callback is invoked when none has yet
1693      been registered, should an error be raised or should the
1694      situation be ignored?  Default: true.
1695
1696
1697Data flow
1698
1699    **input -> interpreter -> diversions -> filters -> output**
1700
1701    Here, in summary, is how data flows through a working EmPy system:
1702
1703    1. Input comes from a source, such an .em file on the command
1704       line, or via an 'empy.include' statement.
1705
1706    2. The interpreter processes this material as it comes in,
1707       expanding EmPy expansions as it goes.
1708
1709    3. After interpretation, data is then sent through the diversion
1710       layer, which may allow it directly through (if no diversion is
1711       in progress) or defer it temporarily.  Diversions that are
1712       recalled initiate from this point.
1713
1714    4. Any filters in place are then used to filter the data and
1715       produce filtered data as output.
1716
1717    5. Finally, any material surviving this far is sent to the output
1718       stream.  That stream is stdout by default, but can be changed
1719       with the -o or -a options, or may be fully buffered with the -b
1720       option (that is, the output file would not even be opened until
1721       the entire system is finished).
1722
1723
1724Author's notes
1725
1726    I originally conceived EmPy as a replacement for my "Web
1727    templating system", http://www.alcyone.com/max/info/m4.html which
1728    uses "m4", http://www.seindal.dk/rene/gnu/ (a general
1729    macroprocessing system for Unix).
1730
1731    Most of my Web sites include a variety of m4 files, some of which
1732    are dynamically generated from databases, which are then scanned
1733    by a cataloging tool to organize them hierarchically (so that,
1734    say, a particular m4 file can understand where it is in the
1735    hierarchy, or what the titles of files related to it are without
1736    duplicating information); the results of the catalog are then
1737    written in database form as an m4 file (which every other m4 file
1738    implicitly includes), and then GNU make converts each m4 to an
1739    HTML file by processing it.
1740
1741    As the Web sites got more complicated, the use of m4 (which I had
1742    originally enjoyed for the challenge and abstractness) really
1743    started to become an impediment to serious work; while I am very
1744    knowledgeable about m4 -- having used it for for so many years --
1745    getting even simple things done with it is awkward and difficult.
1746    Worse yet, as I started to use Python more and more over the
1747    years, the cataloging programs which scanned the m4 and built m4
1748    databases were migrated to Python and made almost trivial, but
1749    writing out huge awkward tables of m4 definitions simply to make
1750    them accessible in other m4 scripts started to become almost
1751    farcical -- especially when coupled with the difficulty in getting
1752    simple things done in m4.
1753
1754    It occurred to me what I really wanted was an all-Python solution.
1755    But replacing what used to be the m4 files with standalone Python
1756    programs would result in somewhat awkward programs normally
1757    consisting mostly of unprocessed text punctuated by small portions
1758    where variables and small amounts of code need to be substituted.
1759    Thus the idea was a sort of inverse of a Python interpreter: a
1760    program that normally would just pass text through unmolested, but
1761    when it found a special signifier would execute Python code in a
1762    normal environment.  I looked at existing Python templating
1763    systems, and didn't find anything that appealed to me -- I wanted
1764    something where the desired markups were simple and unobtrusive.
1765    After considering between choices of signifiers, I settled on '@'
1766    and EmPy was born.
1767
1768    As I developed the tool, I realized it could have general appeal,
1769    even to those with widely varying problems to solve, provided the
1770    core tool they needed was an interpreter that could embed Python
1771    code inside templated text.  As I continue to use the tool, I have
1772    been adding features as unintrusively as possible as I see areas
1773    that can be improved.
1774
1775    A design goal of EmPy is that its feature set should work on
1776    several levels; at each level, if the user does not wish or need
1777    to use features from another level, they are under no obligation
1778    to do so.  If you have no need of diversions, for instance, you
1779    are under no obligation to use them.  If significators will not
1780    help you organize a set of EmPy scripts globally, then you need
1781    not use them.  New features that are being added are whenever
1782    possible transparently backward compatible; if you do not need
1783    them, their introduction should not affect you in any way.  The
1784    use of unknown prefix sequences results in errors, guaranteeing
1785    that they are reserved for future use.
1786
1787
1788Glossary
1789
1790    **control** -- A control markup, used to direct high-level control
1791      flow within an EmPy session.  Control markups are expressed with
1792      the '@[...]' notation.
1793
1794    **diversion** -- A process by which output is deferred, and can be
1795      recalled later on demand, multiple times if necessary.
1796
1797    **document** -- The abstraction of an EmPy document as used by a
1798      processor.
1799
1800    **escape** -- A markup designed to expand to a single (usually
1801      non-printable) character, similar to escape sequences in C or
1802      other languages.
1803
1804    **expansion** -- The process of processing EmPy markups and
1805      producing output.
1806
1807    **expression** -- An expression markup represents a Python
1808      expression to be evaluated, and replaced with the 'str' of its
1809      value.  Expression markups are expressed with the '@(...)'
1810      notation.
1811
1812    **filter** -- A file-like object which can be chained to other
1813      objects (primarily the final stream) and can buffer, alter, or
1814      manipulate in any way the data sent.  Filters can also be
1815      chained together in arbitrary order.
1816
1817    **globals** -- The dictionary (or dictionary-like object) which
1818      resides inside the interpreter and holds the currently-defined
1819      variables.
1820
1821    **hook** -- A callable object that can be registered in a
1822      dictionary, and which will be invoked before, during, or after
1823      certain internal operations, identified by name with a string.
1824
1825    **interpreter** -- The application (or class instance) which
1826      processes EmPy markup.
1827
1828    **markup** -- EmPy substitutions set off with a prefix and
1829      appropriate delimeters.
1830
1831    **output** -- The final destination of the result of processing an
1832      EmPy file.
1833
1834    **prefix** -- The ASCII character used to set off an expansions.
1835      By default, '@'.
1836
1837    **processor** -- An extensible system which processes a group of
1838      EmPy files, usually arranged in a filesystem, and scans them for
1839      significators.
1840
1841    **pseudomodule** -- The module-like object named 'empy' which is
1842      exposed internally inside every EmPy system.
1843
1844    **shortcut** -- A special object which takes the place of an
1845      instance of the 'Filter' class, to represent a special form of
1846      filter.  These include 0 for a null filter, a callable (function
1847      or lambda) to represent a callable filter, or a 256-character
1848      string which represents a translation filter.
1849
1850    **significator** -- A special form of an assignment markup in EmPy
1851      which can be easily parsed externally, primarily designed for
1852      representing uniform assignment across a collection of files.
1853      Significators are indicated with the '@%' markup.
1854
1855    **statement** -- A line of code that needs to be executed;
1856      statements do not have return values.  In EmPy, statements are
1857      set off with '@{...}'.
1858
1859
1860Acknowledgements
1861
1862    Questions, suggestions, bug reports, evangelism, and even
1863    complaints from many people have helped make EmPy what it is
1864    today.  Some, but by no means all, of these people are (in
1865    alphabetical order by surname):
1866
1867    - Biswapesh Chattopadhyay
1868
1869    - Beni Cherniavsky
1870
1871    - Dr. S. Candelaria de Ram
1872
1873    - Eric Eide
1874
1875    - Dinu Gherman
1876
1877    - Grzegorz Adam Hankiewicz
1878
1879    - Bohdan Kushnir
1880
1881    - Robert Kroeger
1882
1883    - Kouichi Takahashi
1884
1885    - Ville Vainio
1886
1887
1888Known issues and caveats
1889
1890    - EmPy was primarily intended for static processing of documents,
1891      rather than dynamic use, and hence speed of processing was not
1892      the primary consideration in its design.
1893
1894    - EmPy is not threadsafe by default.  This is because of the need
1895      for EmPy to override the 'sys.stdout' file with a proxy object
1896      which can capture effects of 'print' and other spooling to
1897      stdout.  This proxy can be suppressed with the -n option, which
1898      will result in EmPy being unable to do anything meaningful with
1899      this output, but will allow EmPy to be threadsafe.
1900
1901    - To function properly, EmPy must override 'sys.stdout' with a
1902      proxy file object, so that it can capture output of side effects
1903      and support diversions for each interpreter instance.  It is
1904      important that code executed in an environment *not* rebind
1905      'sys.stdout', although it is perfectly legal to invoke it
1906      explicitly (*e.g.*, '@sys.stdout.write("Hello world\n")').  If
1907      one really needs to access the "true" stdout, then use
1908      'sys.__stdout__' instead (which should also not be rebound).
1909      EmPy uses the standard Python error handlers when exceptions are
1910      raised in EmPy code, which print to 'sys.stderr'.
1911
1912    - Due to Python's curious handling of the 'print' statement --
1913      particularly the form with a trailing comma to suppress the
1914      final newline -- mixing statement expansions using prints inline
1915      with unexpanded text will often result in surprising behavior,
1916      such as extraneous (sometimes even deferred!) spaces.  This is a
1917      Python "feature," and occurs in non-EmPy applications as well;
1918      for finer control over output formatting, use 'sys.stdout.write'
1919      or 'empy.interpreter.write' directly.
1920
1921    - The 'empy' "module" exposed through the EmPy interface (*e.g.*,
1922      '@empy') is an artificial module.  It cannot be imported with
1923      the 'import' statement (and shouldn't -- it is an artifact of
1924      the EmPy processing system and does not correspond to any
1925      accessible .py file).
1926
1927    - For an EmPy statement expansion all alone on a line, *e.g.*,
1928      '@{a = 1}', note that this will expand to a blank line due to
1929      the newline following the closing curly brace.  To suppress this
1930      blank line, use the symmetric convention '@{a = 1}@'.
1931
1932    - When using EmPy with make, note that partial output may be
1933      created before an error occurs; this is a standard caveat when
1934      using make.  To avoid this, write to a temporary file and move
1935      when complete, delete the file in case of an error, use the -b
1936      option to fully buffer output (including the open), or (with GNU
1937      make) define a '.DELETE_ON_ERROR' target.
1938
1939    - 'empy.identify' tracks the context of executed *EmPy* code, not
1940      Python code.  This means that blocks of code delimited with '@{'
1941      and '}' will identify themselves as appearing on the line at
1942      which the '}' appears, and that pure Python code executed via
1943      the -D, -E and -F command line arguments will show up as all taking
1944      place on line 1.  If you're tracking errors and want more
1945      information about the location of the errors from the Python
1946      code, use the -r command line option, which will provide you
1947      with the full Python traceback.
1948
1949    - The conditional form of expression expansion '@(...?...!...)'
1950      allows the use of a colon instead of an exclamation point,
1951      *e.g.*, '@(...?...:...)'.  This behavior is supported for
1952      backward compatibility, but is deprecated.  Due to an oversight,
1953      the colon was a poor choice since colons can appear legally in
1954      expressions (*e.g.*, dictionary literals or lambda expressions).
1955
1956    - The '@[try]' construct only works with Python exceptions derived
1957      from 'Exception'.  It is not able to catch string exceptions.
1958
1959    - The '@[for]' variable specification supports tuples for tuple
1960      unpacking, even recursive tuples.  However, it is limited in
1961      that the names included may only be valid Python identifiers,
1962      not arbitrary Python lvalues.  Since the internal Python
1963      mechanism is very rarely used for this purpose (*e.g.*, 'for (x,
1964      l[0], q.a) in sequence'), it is not thought to be a significant
1965      limitation.
1966
1967
1968Wish list
1969
1970    Here are some random ideas for future revisions of EmPy.  If any
1971    of these are of particular interest to you, your input would be
1972    appreciated.
1973
1974    - Some real-world examples should really be included for
1975      demonstrating the power and expressiveness of EmPy first-hand.
1976
1977    - More extensive help (rather than a ridiculously long README),
1978      probably inherently using the EmPy system itself for building to
1979      HTML and other formats, thereby acting as a help facility and a
1980      demonstration of the working system.
1981
1982    - A "trivial" mode, where all the EmPy system does is scan for
1983      simple symbols to replace them with evaluations/executions,
1984      rather than having to do the contextual scanning it does now.
1985      This has the down side of being much less configurable and
1986      powerful but the upside of being extremely efficient.
1987
1988    - A "debug" mode, where EmPy prints the contents of everything
1989      it's about to evaluate (probably to stderr) before it does?
1990
1991    - The ability to funnel all code through a configurable 'RExec'
1992      for user-controlled security control.  This would probably
1993      involve abstracting the execution functionality outside of the
1994      interpreter.  [This suggestion is on hold until the
1995      rexec/Bastion exploits are worked out.]
1996
1997    - Optimized handling of processing would be nice for the
1998      possibility of an Apache module devoted to EmPy processing.
1999
2000    - An EmPy emacs mode.
2001
2002    - An optimization of offloading diversions to files when they
2003      become truly huge.  (This is made possible by the abstraction of
2004      the 'Diversion' class.)
2005
2006    - Support for mapping filters (specified by dictionaries).
2007
2008    - Support for some sort of batch processing, where several EmPy
2009      files can be listed at once and all of them evaluated with the
2010      same initial (presumably expensive) environment.
2011      'empy.saveGlobals' and 'empy.restoreGlobals' have been
2012      introduced as a partial solution, but they need to be made more
2013      robust.
2014
2015    - A more elaborate interactive mode, perhaps with a prompt and
2016      readline support.
2017
2018    - A StructuredText and/or reStructuredText filter would be quite
2019      useful, as would SGML/HTML/XML/XHTML, s-expression, Python,
2020      etc. auto-indenter filters.
2021
2022    - An indexing filter, which can process text and pick out
2023      predefined keywords and thereby setup links to them.
2024
2025    - The ability to rerun diverted material back through the
2026      interpreter.  (This can be done, awkwardly, by manually creating
2027      a filter which itself contains an interpreter, but it might be
2028      helpful if this was an all-in-one operation.)
2029
2030    - A caching system that stores off the compilations of repeated
2031      evaluations and executions so that in a persistent environment
2032      the same code does not have to be repeatedly evaluated/executed.
2033      This would probably be a necessity in an Apache module-based
2034      solution.  Perhaps caching even to the point of generating pure
2035      PyWM bytecode?
2036
2037    - An option to change the format of the standard EmPy errors in a
2038      traceback.
2039
2040    - Support for some manner of implicitly processed /etc/empyrc
2041      and/or ~/.empyrc file, and of course an option to inhibit its
2042      processing.  This can already be accomplished (and with greater
2043      control) via use of EMPY_OPTIONS, though.
2044
2045    - More uniform handling of the preprocessing directives (-I, -D,
2046      -E, -F, and -P), probably mapping directly to methods in the
2047      'Interpreter' class.
2048
2049    - Support for integration with mod_python.
2050
2051    - In simple expressions, a '{...}' suffix has no meaning in Python
2052      (*e.g.*, in Python, '@x(...)' is a call, '@x[...]' is
2053      subscription, but '@x{...}' is illegal).  This could be
2054      exploited by having a '{...}' suffix in a simple expression
2055      representing an encapsulation of an expanded string; *e.g.*,
2056      '@bullet{There are @count people here}' would be equivalent to
2057      '@bullet(empy.expand("There are @count people here",
2058      locals()))}'.
2059
2060    - A tool to collect significator information from a hierarchy of
2061      .em files and put them in a database form available for
2062      individual scripts would be extremely useful -- this tool should
2063      be extensible so that users can use it to, say, build ordered
2064      hierarchies of their EmPy files by detecting contextual
2065      information like application-specific links to other EmPy
2066      documents.
2067
2068    - Extensions of the basic EmPy concepts to projects for other
2069      interpreted languages, such as Java, Lua, Ruby, and/or Perl.
2070
2071    - Ignore 'SystemExit' when doing error handling, letting the
2072      exception progagate up?  So far no one seems to worry about
2073      this; deliberately exiting early in a template seems to be an
2074      unlikely occurrence.  (Furthermore, there are the 'os.abort' and
2075      'os._exit' facilities for terminating without exception
2076      propagation.)
2077
2078    - A new markup which is the equivalent of '$...:...$' in source
2079      control systems, where the left-hand portion represents a
2080      keyword and the right-hand portion represents its value which is
2081      substituted in by the EmPy system.
2082
2083    - The ability to obtain the filename (if relevant) and mode of the
2084      primary output file.
2085
2086    - The ability to redirect multiple streams of output; not
2087      diversions, but rather the ability to write to one file and then
2088      another.  Since output would be under the EmPy script's control,
2089      this would imply a useful --no-output option, where by default
2090      no output is written.  This would also suggest the usefulness of
2091      all the output file delegates (diversions, filters, abstract
2092      files, etc.) passing unrecognized method calls all the way down
2093      to underlying file object.
2094
2095    - In addition to the em.py script, an additional support library
2096      (non-executable) should be included which includes ancillary
2097      functionality for more advanced features, but which is not
2098      necessary to use EmPy in its basic form as a standalone
2099      executable.  Such features would include things like
2100      significator processing, metadata scanning, and advanced
2101      prompting systems.
2102
2103
2104Release history
2105
2106    - 3.3.3; 2017 Feb 12.  Fix for 'defined' call.
2107
2108    - 3.3.2; 2014 Jan 24.  Additional fix for source compatibility
2109      between 2.x and 3.0.
2110
2111    - 3.3.1; 2014 Jan 22.  Source compatibility for 2.x and 3.0;
2112      1.x and Jython compatibility dropped.
2113
2114    - 3.3; 2003 Oct 27.  Custom markup '@<...>'; remove separate
2115      pseudomodule instance for greater transparency; deprecate
2116      'interpreter' attribute of pseudomodule; deprecate auxiliary
2117      class name attributes associated with pseudomodule in
2118      preparation for separate support library in 4.0; add
2119      --no-callback-error and --no-bangpath-processing command line
2120      options; add 'atToken' hook.
2121
2122    - 3.2; 2003 Oct 7.  Reengineer hooks support to use hook
2123      instances; add -v option; add --relative-path option; reversed
2124      PEP 317 style; modify Unicode support to give less confusing
2125      errors in the case of unknown encodings and error handlers;
2126      relicensed under LGPL.
2127
2128    - 3.1.1; 2003 Sep 20.  Add literal '@"..."' markup; add
2129      --pause-at-end command line option; fix improper globals
2130      collision error via the 'sys.stdout' proxy.
2131
2132    - 3.1; 2003 Aug 8.  Unicode support (Python 2.0 and above); add
2133      Document and Processor helper classes for processing
2134      significators; add --no-prefix option for suppressing all
2135      markups.
2136
2137    - 3.0.4; 2003 Aug 7.  Implement somewhat more robust lvalue
2138      parsing for '@[for]' construct (thanks to Beni Cherniavsky for
2139      inspiration).
2140
2141    - 3.0.3; 2003 Jul 9.  Fix bug regarding recursive tuple unpacking
2142      using '@[for]'; add 'empy.saveGlobals', 'empy.restoreGlobals',
2143      and 'empy.defined' functions.
2144
2145    - 3.0.2; 2003 Jun 19.  '@?' and '@!' markups for changing the
2146      current context name and line, respectively; add 'update' method
2147      to interpreter; new and renamed context operations,
2148      'empy.setContextName', 'empy.setContextLine',
2149      'empy.pushContext', 'empy.popContext'.
2150
2151    - 3.0.1; 2003 Jun 9.  Fix simple bug preventing command line
2152      preprocessing directives (-I, -D, -E, -F, -P) from executing
2153      properly; defensive PEP 317 compliance [defunct].
2154
2155    - 3.0; 2003 Jun 1.  Control markups with '@[...]'; remove
2156      substitutions (use control markups instead); support
2157      '@(...?...!...)' for conditional expressions in addition to the
2158      now-deprecated '@(...?...:...)' variety; add acknowledgements
2159      and glossary sections to documentation; rename buffering option
2160      back to -b; add -m option and 'EMPY_PSEUDO' environment variable
2161      for changing the pseudomodule name; add -n option and
2162      'EMPY_NO_OVERRIDE' environment variable for suppressing
2163      'sys.stdout' proxy; rename main error class to 'Error'; add
2164      standalone 'expand' function; add --binary and --chunk-size
2165      options; reengineer parsing system to use Tokens for easy
2166      extensibility; safeguard curly braces in simple expressions
2167      (meaningless in Python and thus likely a typographical error) by
2168      making them a parse error; fix bug involving custom Interpreter
2169      instances ignoring globals argument; distutils support.
2170
2171    - 2.3; 2003 Feb 20.  Proper and full support for concurrent and
2172      recursive interpreters; protection from closing the true stdout
2173      file object; detect edge cases of interpreter globals or
2174      'sys.stdout' proxy collisions; add globals manipulation
2175      functions 'empy.getGlobals', 'empy.setGlobals', and
2176      'empy.updateGlobals' which properly preserve the 'empy'
2177      pseudomodule; separate usage info out into easily accessible
2178      lists for easier presentation; have -h option show simple usage
2179      and -H show extened usage; add 'NullFile' utility class.
2180
2181    - 2.2.6; 2003 Jan 30.  Fix a bug in the 'Filter.detach' method
2182      (which would not normally be called anyway).
2183
2184    - 2.2.5; 2003 Jan 9.  Strip carriage returns out of executed code
2185      blocks for DOS/Windows compatibility.
2186
2187    - 2.2.4; 2002 Dec 23.  Abstract Filter interface to use methods
2188      only; add '@[noop: ...]' substitution for completeness and block
2189      commenting [defunct].
2190
2191    - 2.2.3; 2002 Dec 16.  Support compatibility with Jython by
2192      working around a minor difference between CPython and Jython in
2193      string splitting.
2194
2195    - 2.2.2; 2002 Dec 14.  Include better docstrings for pseudomodule
2196      functions; segue to a dictionary-based options system for
2197      interpreters; add 'empy.clearAllHooks' and 'empy.clearGlobals';
2198      include a short documentation section on embedding interpreters;
2199      fix a bug in significator regular expression.
2200
2201    - 2.2.1; 2002 Nov 30.  Tweak test script to avoid writing
2202      unnecessary temporary file; add 'Interpreter.single' method;
2203      expose 'evaluate', 'execute', 'substitute' [defunct], and
2204      'single' methods to the pseudomodule; add (rather obvious)
2205      'EMPY_OPTIONS' environment variable support; add
2206      'empy.enableHooks' and 'empy.disableHooks'; include optimization
2207      to transparently disable hooks until they are actually used.
2208
2209    - 2.2; 2002 Nov 21.  Switched to -V option for version
2210      information; 'empy.createDiversion' for creating initially empty
2211      diversion; direct access to diversion objects with
2212      'empy.retrieveDiversion'; environment variable support; removed
2213      --raw long argument (use --raw-errors instead); added quaternary
2214      escape code (well, why not).
2215
2216    - 2.1; 2002 Oct 18.  'empy.atExit' registry separate from hooks to
2217      allow for normal interpreter support; include a benchmark sample
2218      and test.sh verification script; expose 'empy.string' directly;
2219      -D option for explicit defines on command line; remove
2220      ill-conceived support for '@else:' separator in '@[if ...]'
2221      substitution [defunct] ; handle nested substitutions properly
2222      [defunct] ; '@[macro ...]' substitution for creating recallable
2223      expansions [defunct].
2224
2225    - 2.0.1; 2002 Oct 8.  Fix missing usage information; fix
2226      after_evaluate hook not getting called; add 'empy.atExit' call
2227      to register values.
2228
2229    - 2.0; 2002 Sep 30.  Parsing system completely revamped and
2230      simplified, eliminating a whole class of context-related bugs;
2231      builtin support for buffered filters; support for registering
2232      hooks; support for command line arguments; interactive mode with
2233      -i; significator value extended to be any valid Python
2234      expression.
2235
2236    - 1.5.1; 2002 Sep 24.  Allow '@]' to represent unbalanced close
2237      brackets in '@[...]' markups [defunct].
2238
2239    - 1.5; 2002 Sep 18.  Escape codes ('@\...'); conditional and
2240      repeated expansion substitutions [defunct] ; replaced with control
2241      markups]; fix a few bugs involving files which do not end in
2242      newlines.
2243
2244    - 1.4; 2002 Sep 7.  Fix bug with triple quotes; collapse
2245      conditional and protected expression syntaxes into the single
2246      generalized '@(...)' notation; 'empy.setName' and 'empy.setLine'
2247      functions [deprecated] ; true support for multiple concurrent
2248      interpreters with improved sys.stdout proxy; proper support for
2249      'empy.expand' to return a string evaluated in a subinterpreter
2250      as intended; merged Context and Parser classes together, and
2251      separated out Scanner functionality.
2252
2253    - 1.3; 2002 Aug 24.  Pseudomodule as true instance; move toward
2254      more verbose (and clear) pseudomodule functions; fleshed out
2255      diversion model; filters; conditional expressions; protected
2256      expressions; preprocessing with -P (in preparation for
2257      possible support for command line arguments).
2258
2259    - 1.2; 2002 Aug 16.  Treat bangpaths as comments; 'empy.quote' for
2260      the opposite process of 'empy.expand'; significators ('@%...'
2261      sequences); -I option; -f option; much improved documentation.
2262
2263    - 1.1.5; 2002 Aug 15.  Add a separate 'invoke' function that can be
2264      called multiple times with arguments to simulate multiple runs.
2265
2266    - 1.1.4; 2002 Aug 12.  Handle strings thrown as exceptions
2267      properly; use getopt to process command line arguments; cleanup
2268      file buffering with AbstractFile; very slight documentation and
2269      code cleanup.
2270
2271    - 1.1.3; 2002 Aug 9.  Support for changing the prefix from within
2272      the 'empy' pseudomodule.
2273
2274    - 1.1.2; 2002 Aug 5.  Renamed buffering option [defunct], added -F
2275      option for interpreting Python files from the command line,
2276      fixed improper handling of exceptions from command line options
2277      (-E, -F).
2278
2279    - 1.1.1; 2002 Aug 4.  Typo bugfixes; documentation clarification.
2280
2281    - 1.1; 2002 Aug 4.  Added option for fully buffering output
2282      (including file opens), executing commands through the command
2283      line; some documentation errors fixed.
2284
2285    - 1.0; 2002 Jul 23.  Renamed project to EmPy.  Documentation and
2286      sample tweaks; added 'empy.flatten'.  Added -a option.
2287
2288    - 0.3; 2002 Apr 14.  Extended "simple expression" syntax,
2289      interpreter abstraction, proper context handling, better error
2290      handling, explicit file inclusion, extended samples.
2291
2292    - 0.2; 2002 Apr 13.  Bugfixes, support non-expansion of Nones,
2293      allow choice of alternate prefix.
2294
2295    - 0.1.1; 2002 Apr 12.  Bugfixes, support for Python 1.5.x, add -r
2296      option.
2297
2298    - 0.1; 2002 Apr 12.  Initial early access release.
2299
2300
2301Author
2302
2303    This module was written by "Erik Max Francis",
2304    http://www.alcyone.com/max/.  If you use this software, have
2305    suggestions for future releases, or bug reports, "I'd love to hear
2306    about it", mailto:software@alcyone.com.
2307
2308    Even if you try out EmPy for a project and find it unsuitable, I'd
2309    like to know what stumbling blocks you ran into so they can
2310    potentially be addressed in a future version.
2311
2312
2313Version
2314
2315    Version 3.3.3 $Date: 2014-01-24 13:39:38 -0800 (Fri, 24 Jan 2014) $ $Author: max $
2316