1 2This package provides a parser-generator, somewhat styled after 3\texttt{yacc} or the many programs available for use with other 4languages. You present it with a phrase structure grammar and it 5generates a set of tables that can then be used by the function 6\texttt{yyparse} to read in material in the syntax that you specified. 7Internally it uses a very well established technique known ``LALR'' which 8takes the grammar are derives the description of a stack automaton that 9can accept it. Details of the procedure can be found in standard books 10on compiler construction, such as the one by Aho, Ullman Lam and Sethi. 11 12At the time of writing this explanation the code is not in its final form, 13so this will describe the current state and include a few notes on what 14might chaneg in the future. 15 16Building a parser is done in Reduce symbolic mode, so say "\texttt{symbolic;}" 17or "\texttt{lisp;}" before starting your work. 18 19To use the code here you use a function \texttt{lalr\_create\_parser}, 20giving it two arguments. The first indicates precedence information and 21will be described later: for now just pass the value \var{nil}. 22The second argument is a list of productions, and the first one of these 23is taken to be the top-level target for the whole grammar. 24 25Each production is in the form 26\begin{verbatim} 27 (LHS ((rhs1.1 rhs1.2 ...) a1.1 a1.2 ...) 28 ((rhs2.1 rhs2.1 ...) a2.1 a2.2 ...) 29 ...) 30\end{verbatim} 31\noindent which in regular publication style for grammars might be interpreted 32as meaning 33\begin{align*} 34\text{LHS} & \Rightarrow & \text{rhs}_{1,1} \hspace{0.5em} \text{rhs}_{1,2} \ldots & \left\{ \text{a}_{1,1} \hspace{0.5em} \text{a}_{1,2} \ldots \right\} \\ 35 & \hspace{0.5em}| & \text{rhs}_{2,1} \hspace{0.5em} \text{rhs}_{2,2} \ldots & \left\{ \text{a}_{2,1} \hspace{0.5em} \text{a}_{2,2} \ldots \right\} \\ 36 & \ldots \\ 37 & \hspace{0.5em}; 38\end{align*} 39The various lines specify different options for what the left hand side 40(non-terminal symbol) might correspond to, while the items within the 41braces are sematic actions that get obeyed or evaluated when the 42production ruls is used. 43 44Each LHS is treated as a non-terminal symbol and is specified as a simple 45name. Note that by default the Reduce parser will be folding characters 46within names to lower case and so it will be best to choose names for 47non-terminals that are unambiguous even when case-folded, but I would like 48to establish a convention that in source code they are written in capitals. 49 50The RHS items may be either non-terminals (identified because they are 51present in the left hand side of some production) or terminals. Terminal 52symbols can be specified in two different ways. 53 54The lexer has built-in recipes that decode certain sequences of characters 55and return the special markers for !:symbol, !:number, !:string, !:list for 56commonly used cases. In these cases the variable yylval gets left set 57to associated data, so for instance in the case of !:symbol it gets set 58to the particular symbol concerned. 59The token type :list is used for Lisp or rlisp-like notation where the 60input contains 61 'expression 62or `expression 63so for instance the input `(a b c) leads to the lexer returning !:list and 64yylvel being set to (backquote (a b c)). This treatment is specialised for 65handling rlisp-like syntax. 66 67Other terminals are indicated by writing a string. That may either 68consist of characters that would otherwise form a symbol (ie a letter 69followed by letters, digits and underscores) or a sequence of 70non-alphanumeric characters. In the latter case if a sequence of three or 71more punctuation marks make up a terminal then all the shorter prefixes 72of it will also be grouped to form single entities. So if "<-->" is a 73terminal then '<', '<-' and '<--' will each by parsed as single tokens, and 74any of them that are not used as terminals will be classified as !:symbol. 75 76As well as terminals and non-terminals (which are writtent as symbols or 77strings) it is possible to write one of 78 79\begin{tabular}{l l} 80 ({\ttfamily OPT} \textit{s1} \textit{s2} \ldots) & 0 or 1 instances of the sequence \textit{s1}, \ldots \\ 81 ({\ttfamily STAR} \textit{s1} \textit{s2} \ldots) & 0, 1, 2, \ldots instances. \\ 82 ({\ttfamily PLUS} \textit{s1} \textit{s2} \ldots) & 1, 2, 3, \ldots instances. \\ 83 ({\ttfamily LIST} \textit{sep} \textit{s1} \textit{s2} \ldots) & like ({\ttfamily STAR} \textit{s1} \textit{s2} \ldots) but with the single item \\ 84 & \textit{sep} between each instance. \\ 85 ({\ttfamily LISTPLUS} \textit{sep} \textit{s1} \ldots) & like ({\ttfamily PLUS} \textit{s2} \ldots) but with \textit{sep} interleaved. \\ 86 ({\ttfamily OR} \textit{s1} \textit{s2} \ldots) & one or other of the tokens shown. 87\end{tabular} 88 89When the lexer processes input it will return a numeric code that identifies 90the type of the item seen, so in a production one might write 91 (!:symbol ":=" EXPRESSION) 92and as it recognises the first two tokens the lexer will return a numeric 93code for !:symbol (and set yylval to the actual symbol as seen) and then 94a numeric code that it allocates for ":=". In the latter case it will 95also set yylval to the symbol !:!= in case that is useful. 96% 97Precedence can be set using \texttt{lalr\_precedence}. See examples below. 98 99 100 101\subsection{Limitations} 102\begin{enumerate} 103\item Grammar rules and semantic actions are specified in fairly raw Lisp. 104\item The lexer is hand-written and can not readily be reconfigured for 105 use with languages other than rlisp. For instance it has use of "!" 106 as a character escape built into it. 107\end{enumerate} 108 109 110\subsection{An example} 111\begin{verbatim} 112% Here I set up a sample grammar 113% S' -> S 114% S -> C C { } 115% C -> "c" C { } 116% | "d" { } 117% This is example 4.42 from Aho, Sethi and Ullman's Red Dragon book. 118% It is example 4.54 in the more recent Purple book. 119% 120% 121grammar := '( 122 (s ((cc cc) ) % Use default semantic action here 123 ) 124 (cc (("c" cc) (list 'c !$2)) % First production for C 125 (("d") 'd ) % Second production for C 126 ))$ 127 128parsertables := lalr_create_parser(nil, grammar)$ 129 130<< lex_init(); 131 yyparse() >>; 132c c c d c d ; 133\end{verbatim} 134 135\endinput 136 137 138A grammar is given as a list of rules. Each rule has a single non-terminal 139and then a seqence of productions for it. Each production can optionally 140be provided with an associated semantic action. 141 142\begin{verbatim} 143 GRAMMAR ::= "(" GTAIL 144 ; 145 146 GTAIL ::= ")" % Sequence of rules 147 | RULE GTAIL 148 ; 149 150 RULE ::= "(" nonterminal RULETAIL 151 ; 152 153 RULETAIL::= ")" % Sequence of productions 154 | PRODUCTION RULETAIL 155 ; 156 157 PRODUCTION ::= "(" "(" PT1 PT2 158 ; 159 160 PT1 ::= ")" % Symbols to make one production 161 | nonterminal PT1 162 | terminal PT1 163 ; 164 165 PT2 ::= ")" % Semantic actions as Lisp code 166 | lisp-s-expressoin PT2 167 ; 168\end{verbatim} 169 170\ttindextype{LALR\_PRECEDENCE}{operator} 171Before specifying a grammar it is possible to declare the precedence and 172associativity of some of the terminal symbols it will use. Here is an 173example: 174\begin{verbatim} 175 lalr_precedence '("." !:right "^" !:left ("*" "/") ("+" "-")); 176\end{verbatim} 177\noindent will arrange that the token ``\verb@.@'' has highest precedence 178and that it is left associative. Next comes ``\verb@^@'' which is right 179associative. 180Both ``\verb@*@'' and ``\verb@/@'' have the same precedence and both are 181left associative, and finally ``\verb@+@'' and ``\verb@-@'' are also equal 182in precedence but lower than ``\verb@*@''. 183 184\ttindextype{LALR\_CONSTRUCT\_PARSER}{operator} 185With those precedences in place one could specify a grammar by 186\begin{verbatim} 187 lalr_construct_parser '( 188 (S ((!:symbol)) 189 ((!:number)) 190 ((S "." S)) 191 ((S "^" S)) 192 ((S "*" S)) 193 ((S "/" S)) 194 ((S "+" S)) 195 ((S "-" S)))); 196\end{verbatim} 197 198The special markers \verb+!:symbol+ and \verb+!:number+ will match any 199symbols or numbers -- for commentary on what count as such see the discussion 200of the lexter later on. The strings stand for fixed tokens and by virtue 201of them being used in the grammer the lexer will recognise them specially. 202 203