xref: /original-bsd/old/lisp/PSD.doc/ch7.n (revision cd89438c)
Copyright (c) 1980 The Regents of the University of California.
All rights reserved.

%sccs.include.redist.roff%

@(#)ch7.n 6.3 (Berkeley) 04/17/91

." $Header: ch7.n,v 1.3 83/07/01 11:22:58 layer Exp $ .Lc The Lisp Reader 7 .sh 2 Introduction \n(ch 1 .pp The .i read function is responsible for converting a stream of characters into a Lisp expression. .i Read is table driven and the table it uses is called a .i readtable. The .i print function does the inverse of .i read ; it converts a Lisp expression into a stream of characters. Typically the conversion is done in such a way that if that stream of characters were read by .i read , the result would be an expression equal to the one .i print was given. .i Print must also refer to the readtable in order to determine how to format its output. The .i explode function, which returns a list of characters rather than printing them, must also refer to the readtable. .pp A readtable is created with the .i makereadtable function, modified with the .i setsyntax function and interrogated with the .i getsyntax function. The structure of a readtable is hidden from the user - a readtable should only be manipulated with the three functions mentioned above. .pp There is one distinguished readtable called the .i current .i readtable whose value determines what .i read , .i print and .i explode do. The current readtable is the value of the symbol .i readtable . Thus it is possible to rapidly change the current syntax by lambda binding a different readtable to the symbol .i readtable. When the binding is undone, the syntax reverts to its old form. .sh +0 Syntax Classes .pp The readtable describes how each of the 128 ascii characters should be treated by the reader and printer. Each character belongs to a .i syntax .i class which has three properties: .ip character class - Tells what the reader should do when it sees this character. There are a large number of character classes. They are described below. .ip separator - Most types of tokens the reader constructs are one character long. Four token types have an arbitrary length: number (1234), symbol print name (franz), escaped symbol print name (|franz|), and string ("franz"). The reader can easily determine when it has come to the end of one of the last two types: it just looks for the matching delimiter (| or "). When the reader is reading a number or symbol print name, it stops reading when it comes to a character with the .i separator property. The separator character is pushed back into the input stream and will be the first character read when the reader is called again. .ip escape - Tells the printer when to put escapes in front of, or around, a symbol whose print name contains this character. There are three possibilities: always escape a symbol with this character in it, only escape a symbol if this is the only character in the symbol, and only escape a symbol if this is the first character in the symbol. [note: The printer will always escape a symbol which, if printed out, would look like a valid number.] .pp When the Lisp system is built, Lisp code is added to a C-coded kernel and the result becomes the standard lisp system. The readtable present in the C-coded kernel, called the .i raw .i readtable , contains the bare necessities for reading in Lisp code. During the construction of the complete Lisp system, a copy is made of the raw readtable and then the copy is modified by adding macro characters. The result is what is called the .i standard .i readtable . When a new readtable is created with .i makereadtable, a copy is made of either the raw readtable or the current readtable (which is likely to be the standard readtable). .sh +0 Reader Operations .pp The reader has a very simple algorithm. It is either .i scanning for a token, .i collecting a token, or .i processing a token. Scanning involves reading characters and throwing away those which don't start tokens (such as blanks and tabs). Collecting means gathering the characters which make up a token into a buffer. Processing may involve creating symbols, strings, lists, fixnums, bignums or flonums or calling a user written function called a character macro. .pp The components of the syntax class determine when the reader switches between the scanning, collecting and processing states. The reader will continue scanning as long as the character class of the characters it reads is .i cseparator. When it reads a character whose character class is not .i cseparator it stores that character in its buffer and begins the collecting phase. .pp If the character class of that first character is .i ccharacter , .i cnumber , .i cperiod , or .i csign . then it will continue collecting until it runs into a character whose syntax class has the .i separator property. (That last character will be pushed back into the input buffer and will be the first character read next time.) Now the reader goes into the processing phase, checking to see if the token it read is a number or symbol. It is important to note that after the first character is collected the component of the syntax class which tells the reader to stop collecting is the .i separator property, not the character class. .pp If the character class of the character which stopped the scanning is not .i ccharacter , .i cnumber , .i cperiod , or .i csign . then the reader processes that character immediately. The character classes .i csingle-macro , .i csingle-splicing-macro , and .i csingle-infix-macro will act like .i ccharacter if the following token is not a .i separator. The processing which is done for a given character class is described in detail in the next section. .sh +0 Character Classes .tl '\\$1''raw readtable:\\$2' .tl '''standard readtable:\\$3' .. .pc .Cc ccharacter A-Z a-z ^H !#$%&*,/:;<=>?@^_`{}~ A-Z a-z ^H !$%&*/:;<=>?@^_{}~ .pc % A normal character. .Cc cnumber 0-9 0-9 This type is a digit. The syntax for an integer (fixnum or bignum) is a string of .i cnumber characters optionally followed by a .i cperiod. If the digits are not followed by a .i cperiod , then they are interpreted in base .i ibase which must be eight or ten. The syntax for a floating point number is either zero or more .i cnumber 's followed by a .i cperiod and then followed by one or more .i cnumber 's. A floating point number may also be an integer or floating point number followed by 'e' or 'd', an optional '+' or '-' and then zero or more .i cnumber 's. .Cc csign +- +- A leading sign for a number. No other characters should be given this class. .Cc cleft-paren ( ( A left parenthesis. Tells the reader to begin forming a list. .Cc cright-paren ) ) A right parenthesis. Tells the reader that it has reached the end of a list. .Cc cleft-bracket [ [ A left bracket. Tells the reader that it should begin forming a list. See the description of .i cright-bracket for the difference between cleft-bracket and cleft-paren. .Cc cright-bracket ] ] A right bracket. A .i cright-bracket finishes the formation of the current list and all enclosing lists until it finds one which begins with a .i cleft-bracket or until it reaches the top level list. .Cc cperiod . . The period is used to separate element of a cons cell [e.g. (a . (b . nil)) is the same as (a b)]. .i cperiod is also used in numbers as described above. .Cc cseparator ^I-^M esc space ^I-^M esc space Separates tokens. When the reader is scanning, these character are passed over. Note: there is a difference between the .i cseparator character class and the .i separator property of a syntax class. .Cc csingle-quote \\' \\' This causes .i read to be called recursively and the list (quote <value read>) to be returned. .Cc csymbol-delimiter | | This causes the reader to begin collecting characters and to stop only when another identical .i csymbol-delimiter is seen. The only way to escape a .i csymbol-delimiter within a symbol name is with a .i cescape character. The collected characters are converted into a string which becomes the print name of a symbol. If a symbol with an identical print name already exists, then the allocation is not done, rather the existing symbol is used. .Cc cescape \e \e This causes the next character to read in to be treated as a .b vcharacter . A character whose syntax class is .b vcharacter has a character class .i ccharacter and does not have the .i separator property so it will not separate symbols. .Cc cstring-delimiter """" """" This is the same as .i csymbol-delimiter except the result is returned as a string instead of a symbol. .Cc csingle-character-symbol none none This returns a symbol whose print name is the the single character which has been collected. .Cc cmacro none `, The reader calls the macro function associated with this character and the current readtable, passing it no arguments. The result of the macro is added to the structure the reader is building, just as if that form were directly read by the reader. More details on macros are provided below. .Cc csplicing-macro none #; A .i csplicing-macro differs from a .i cmacro in the way the result is incorporated in the structure the reader is building. A .i csplicing-macro must return a list of forms (possibly empty). The reader acts as if it read each element of the list itself without the surrounding parenthesis. .Cc csingle-macro none none This causes to reader to check the next character. If it is a .i cseparator then this acts like a .i cmacro. Otherwise, it acts like a .i ccharacter. .Cc csingle-splicing-macro none none This is triggered like a .i csingle-macro however the result is spliced in like a .i csplicing-macro. .Cc cinfix-macro none none This is differs from a .i cmacro in that the macro function is passed a form representing what the reader has read so far. The result of the macro replaces what the reader had read so far. .Cc csingle-infix-macro none none This differs from the .i cinfix-macro in that the macro will only be triggered if the character following the .i csingle-infix-macro character is a .i cseparator . .Cc cillegal ^@-^G^N-^Z^\e-^_rubout ^@-^G^N-^Z^\e-^_rubout The characters cause the reader to signal an error if read. .sh +0 Syntax Classes .pp The readtable maps each character into a syntax class. The syntax class contains three pieces of information: the character class, whether this is a separator, and the escape properties. The first two properties are used by the reader, the last by the printer (and .i explode ). The initial lisp system has the following syntax classes defined. The user may add syntax classes with .i add-syntax-class . For each syntax class, we list the properties of the class and which characters have this syntax class by default. More information about each syntax class can be found under the description of the syntax class's character class. .(b .tl '\\$1''raw readtable:\\$2' .tl '\\$4''standard readtable:\\$3' .tl '\\$5''' .)b .. .pc .Sy vcharacter A-Z a-z ^H !#$%&*,/:;<=>?@^_`{}~ A-Z a-z ^H !$%&*/:;<=>?@^_{}~ ccharacter .pc % .Sy vnumber 0-9 0-9 cnumber .Sy vsign +- +- csign .Sy vleft-paren ( ( cleft-paren escape-always separator .Sy vright-paren ) ) cright-paren escape-always separator .Sy vleft-bracket [ [ cleft-bracket escape-always separator .Sy vright-bracket ] ] cright-bracket escape-always separator .Sy vperiod . . cperiod escape-when-unique .Sy vseparator ^I-^M esc space ^I-^M esc space cseparator escape-always separator .Sy vsingle-quote \\' \\' csingle-quote escape-always separator .Sy vsymbol-delimiter | | csingle-delimiter escape-always .Sy vescape \e \e cescape escape-always .Sy vstring-delimiter """" """" cstring-delimiter escape-always .Sy vsingle-character-symbol none none csingle-character-symbol separator .Sy vmacro none `, cmacro escape-always separator .Sy vsplicing-macro none #; csplicing-macro escape-always separator .Sy vsingle-macro none none csingle-macro escape-when-unique .Sy vsingle-splicing-macro none none csingle-splicing-macro escape-when-unique .Sy vinfix-macro none none cinfix-macro escape-always separator .Sy vsingle-infix-macro none none csingle-infix-macro escape-when-unique .Sy villegal ^@-^G^N-^Z^\e-^_rubout ^@-^G^N-^Z^\e-^_rubout cillegal escape-always separator .sh +0 Character Macros .pp Character macros are user written functions which are executed during the reading process. The value returned by a character macro may or may not be used by the reader, depending on the type of macro and the value returned. Character macros are always attached to a single character with the .i setsyntax function. .sh +1 Types There are three types of character macros: normal, splicing and infix. These types differ in the arguments they are given or in what is done with the result they return. .sh +1 Normal .pp A normal macro is passed no arguments. The value returned by a normal macro is simply used by the reader as if it had read the value itself. Here is an example of a macro which returns the abbreviation for a given state. .Eb ->(de\kAfun stateabbrev nil \h'|\nAu'(cdr (assq (read) '((california . ca) (pennsylvania . pa))))) stateabbrev -> (setsyntax '\e! 'vmacro 'stateabbrev) t -> '( ! california ! wyoming ! pennsylvania) (ca nil pa) .Ee Notice what happened to ! wyoming. Since it wasn't in the table, the associated function returned nil. The creator of the macro may have wanted to leave the list alone, in such a case, but couldn't with this type of reader macro. The splicing macro, described next, allows a character macro function to return a value that is ignored. .sh +0 Splicing .pp The value returned from a splicing macro must be a list or nil. If the value is nil, then the value is ignored, otherwise the reader acts as if it read each object in the list. Usually the list only contains one element. If the reader is reading at the top level (i.e. not collecting elements of list), then it is illegal for a splicing macro to return more then one element in the list. The major advantage of a splicing macro over a normal macro is the ability of the splicing macro to return nothing. The comment character (usually ;) is a splicing macro bound to a function which reads to the end of the line and always returns nil. Here is the previous example written as a splicing macro .Eb -> (de\kAfun stateabbrev nil \h'|\nAu'(\kC(lam\kBbda (value) \h'|\nBu'(cond \kA(value (list value)) \h'|\nAu'(t nil))) \h'|\nCu'(cdr (assq (read) '((california . ca) (pennsylvania . pa)))))) -> (setsyntax '! 'vsplicing-macro 'stateabbrev) -> '(!pennsylvania ! foo !california) (pa ca) -> '!foo !bar !pennsylvania pa -> .Ee .sh +0 Infix .pp Infix macros are passed a .i conc structure representing what has been read so far. Briefly, a tconc structure is a single list cell whose car points to a list and whose cdr points to the last list cell in that list. The interpretation by the reader of the value returned by an infix macro depends on whether the macro is called while the reader is constructing a list or whether it is called at the top level of the reader. If the macro is called while a list is being constructed, then the value returned should be a tconc structure. The car of that structure replaces the list of elements that the reader has been collecting. If the macro is called at top level, then it will be passed the value nil, and the value it returns should either be nil or a tconc structure. If the macro returns nil, then the value is ignored and the reader continues to read. If the macro returns a tconc structure of one element (i.e. whose car is a list of one element), then that single element is returned as the value of .i read. If the macro returns a tconc structure of more than one element, then that list of elements is returned as the value of read. .Eb -> (de\kAfun plusop (x) \h'|\nAu'(cond \kB((null x) (tconc nil '\e+)) \h'|\nBu'(t (lconc nil (list 'plus (caar x) (read)))))) plusop -> (setsyntax '\e+ 'vinfix-macro 'plusop) t -> '(a + b) (plus a b) -> '+ |+| -> .Ee .sh -1 Invocations .pp There are three different circumstances in which you would like a macro function to be triggered. .ip Always - Whenever the macro character is seen, the macro should be invoked. This is accomplished by using the character classes .i cmacro , .i csplicing-macro , or .i cinfix-macro , and by using the .i separator property. The syntax classes .b vmacro , .b vsplicing-macro , and .b vsingle-macro are defined this way. .ip When first - The macro should only be triggered when the macro character is the first character found after the scanning process. A syntax class for a .i when .i first macro would be defined using .i cmacro , .i csplicing-macro , or .i cinfix-macro and not including the .i separator property. .ip When unique - The macro should only be triggered when the macro character is the only character collected in the token collection phase of the reader, i.e the macro character is preceeded by zero or more .i cseparator s and followed by a .i separator. A syntax class for a .i when .i unique macro would be defined using .i csingle-macro , .i csingle-splicing-macro , or .i csingle-infix-macro and not including the .i separator property. The syntax classes so defined are .b vsingle-macro , .b vsingle-splicing-macro , and .b vsingle-infix-macro . .sh -1 Functions .Lf setsyntax 's_symbol 's_synclass ['ls_func] .Wh ls_func is the name of a function or a lambda body. .Re t .Se S_symbol should be a symbol whose print name is only one character. The syntax class for that character is set to s_synclass in the current readtable. If s_synclass is a class that requires a character macro, then ls_func must be supplied. .No The symbolic syntax codes are new to Opus 38. For compatibility, s_synclass can be one of the fixnum syntax codes which appeared in older versions of the .Fr Manual. This compatibility is only temporary: existing code which uses the fixnum syntax codes should be converted. .Lf getsyntax 's_symbol .Re the syntax class of the first character of s_symbol's print name. s_symbol's print name must be exactly one character long. .No This function is new to Opus 38. It supercedes (status syntax) which no longer exists. .Lf add-syntax-class 's_synclass 'l_properties .Re s_synclass .Se Defines the syntax class s_synclass to have properties l_properties. The list l_properties should contain a character classes mentioned above. l_properties may contain one of the escape properties: .i escape-always , .i escape-when-unique , or .i escape-when-first . l_properties may contain the .i separator property. After a syntax class has been defined with .i add-syntax-class , the .i setsyntax function can be used to give characters that syntax class. .Eb ; Define a non-separating macro character. ; This type of macro character is used in UCI-Lisp, and ; it corresponds to a FIRST MACRO in Interlisp -> (add-syntax-class 'vuci-macro '(cmacro escape-when-first)) vuci-macro -> .Ee