1
2This package provides a parser-generator, somewhat styled after
3\texttt{yacc} or the many programs available for use with other
4languages. You present it with a phrase structure grammar and it
5generates a set of tables that can then be used by the function
6\texttt{yyparse} to read in material in the syntax that you specified.
7Internally it uses a very well established technique known ``LALR'' which
8takes the grammar are derives the description of a stack automaton that
9can accept it. Details of the procedure can be found in standard books
10on compiler construction, such as the one by Aho, Ullman Lam and Sethi.
11
12At the time of writing this explanation the code is not in its final form,
13so this will describe the current state and include a few notes on what
14might chaneg in the future.
15
16Building a parser is done in Reduce symbolic mode, so say "\texttt{symbolic;}"
17or "\texttt{lisp;}" before starting your work.
18
19To use the code here you use a function \texttt{lalr\_create\_parser},
20giving it two arguments. The first indicates precedence information and
21will be described later: for now just pass the value \var{nil}.
22The second argument is a list of productions, and the first one of these
23is taken to be the top-level target for the whole grammar.
24
25Each production is in the form
26\begin{verbatim}
27    (LHS   ((rhs1.1 rhs1.2 ...) a1.1 a1.2 ...)
28           ((rhs2.1 rhs2.1 ...) a2.1 a2.2 ...)
29           ...)
30\end{verbatim}
31\noindent which in regular publication style for grammars might be interpreted
32as meaning
33\begin{align*}
34\text{LHS} & \Rightarrow & \text{rhs}_{1,1} \hspace{0.5em} \text{rhs}_{1,2} \ldots  & \left\{ \text{a}_{1,1} \hspace{0.5em} \text{a}_{1,2} \ldots \right\} \\
35    & \hspace{0.5em}| & \text{rhs}_{2,1} \hspace{0.5em} \text{rhs}_{2,2} \ldots & \left\{ \text{a}_{2,1} \hspace{0.5em} \text{a}_{2,2} \ldots \right\} \\
36    & \ldots  \\
37    & \hspace{0.5em};
38\end{align*}
39The various lines specify different options for what the left hand side
40(non-terminal symbol) might correspond to, while the items within the
41braces are sematic actions that get obeyed or evaluated when the
42production ruls is used.
43
44Each LHS is treated as a non-terminal symbol and is specified as a simple
45name. Note that by default the Reduce parser will be folding characters
46within names to lower case and so it will be best to choose names for
47non-terminals that are unambiguous even when case-folded, but I would like
48to establish a convention that in source code they are written in capitals.
49
50The RHS items may be either non-terminals (identified because they are
51present in the left hand side of some production) or terminals. Terminal
52symbols can be specified in two different ways.
53
54The lexer has built-in recipes that decode certain sequences of characters
55and return the special markers for !:symbol, !:number, !:string, !:list for
56commonly used cases. In these cases the variable yylval gets left set
57to associated data, so for instance in the case of !:symbol it gets set
58to the particular symbol concerned.
59The token type :list is used for Lisp or rlisp-like notation where the
60input contains
61    'expression
62or  `expression
63so for instance the input `(a b c) leads to the lexer returning !:list and
64yylvel being set to (backquote (a b c)). This treatment is specialised for
65handling rlisp-like syntax.
66
67Other terminals are indicated by writing a string. That may either
68consist of characters that would otherwise form a symbol (ie a letter
69followed by letters, digits and underscores) or a sequence of
70non-alphanumeric characters. In the latter case if a sequence of three or
71more punctuation marks make up a terminal then all the shorter prefixes
72of it will also be grouped to form single entities. So if "<-->" is a
73terminal then '<', '<-' and '<--' will each by parsed as single tokens, and
74any of them that are not used as terminals will be classified as !:symbol.
75
76As well as terminals and non-terminals (which are writtent as symbols or
77strings) it is possible to write one of
78
79\begin{tabular}{l l}
80    ({\ttfamily OPT} \textit{s1} \textit{s2} \ldots)      &     0 or 1 instances of the sequence \textit{s1}, \ldots \\
81    ({\ttfamily STAR} \textit{s1} \textit{s2} \ldots)     &     0, 1, 2, \ldots instances. \\
82    ({\ttfamily PLUS} \textit{s1} \textit{s2} \ldots)     &     1, 2, 3, \ldots instances. \\
83    ({\ttfamily LIST} \textit{sep} \textit{s1} \textit{s2} \ldots) &     like ({\ttfamily STAR} \textit{s1} \textit{s2} \ldots) but with the single item \\
84                            &     \textit{sep} between each instance. \\
85    ({\ttfamily LISTPLUS} \textit{sep} \textit{s1} \ldots) &    like ({\ttfamily PLUS} \textit{s2} \ldots) but with \textit{sep} interleaved. \\
86    ({\ttfamily OR} \textit{s1} \textit{s2} \ldots)        &    one or other of the tokens shown.
87\end{tabular}
88
89When the lexer processes input it will return a numeric code that identifies
90the type of the item seen, so in a production one might write
91    (!:symbol ":=" EXPRESSION)
92and as it recognises the first two tokens the lexer will return a numeric
93code for !:symbol (and set yylval to the actual symbol as seen) and then
94a numeric code that it allocates for ":=". In the latter case it will
95also set yylval to the symbol !:!= in case that is useful.
96%
97Precedence can be set using \texttt{lalr\_precedence}. See examples below.
98
99
100
101\subsection{Limitations}
102\begin{enumerate}
103\item Grammar rules and semantic actions are specified in fairly raw Lisp.
104\item The lexer is hand-written and can not readily be reconfigured for
105    use with languages other than rlisp. For instance it has use of "!"
106    as a character escape built into it.
107\end{enumerate}
108
109
110\subsection{An example}
111\begin{verbatim}
112% Here I set up a sample grammar
113%    S' -> S
114%    S  -> C C        { }
115%    C  -> "c" C      { }
116%        | "d"        { }
117% This is example 4.42 from Aho, Sethi and Ullman's Red Dragon book.
118% It is example 4.54 in the more recent Purple book.
119%
120%
121grammar := '(
122  (s  ((cc cc)  )   % Use default semantic action here
123  )
124  (cc (("c" cc) (list 'c !$2))   % First production for C
125      (("d")    'd           )   % Second production for C
126  ))$
127
128parsertables := lalr_create_parser(nil, grammar)$
129
130<< lex_init();
131   yyparse() >>;
132c c c d c d ;
133\end{verbatim}
134
135\endinput
136
137
138A grammar is given as a list of rules. Each rule has a single non-terminal
139and then a seqence of productions for it. Each production can optionally
140be provided with an associated semantic action.
141
142\begin{verbatim}
143 GRAMMAR ::= "(" GTAIL
144         ;
145
146 GTAIL   ::= ")"         % Sequence of rules
147         |   RULE GTAIL
148         ;
149
150 RULE    ::= "(" nonterminal RULETAIL
151         ;
152
153 RULETAIL::= ")"         % Sequence of productions
154         | PRODUCTION RULETAIL
155         ;
156
157 PRODUCTION ::= "(" "(" PT1 PT2
158         ;
159
160 PT1     ::= ")"         % Symbols to make one production
161         | nonterminal PT1
162         | terminal PT1
163         ;
164
165 PT2     ::= ")"         % Semantic actions as Lisp code
166         | lisp-s-expressoin PT2
167         ;
168\end{verbatim}
169
170\ttindextype{LALR\_PRECEDENCE}{operator}
171Before specifying a grammar it is possible to declare the precedence and
172associativity of some of the terminal symbols it will use. Here is an
173example:
174\begin{verbatim}
175 lalr_precedence '("." !:right "^" !:left ("*" "/") ("+" "-"));
176\end{verbatim}
177\noindent will arrange that the token ``\verb@.@'' has highest precedence
178and that it is left associative. Next comes ``\verb@^@'' which is right
179associative.
180Both ``\verb@*@'' and ``\verb@/@'' have the same precedence and both are
181left associative, and finally ``\verb@+@'' and ``\verb@-@'' are also equal
182in precedence but lower than ``\verb@*@''.
183
184\ttindextype{LALR\_CONSTRUCT\_PARSER}{operator}
185With those precedences in place one could specify a grammar by
186\begin{verbatim}
187 lalr_construct_parser '(
188    (S  ((!:symbol))
189        ((!:number))
190        ((S "." S))
191        ((S "^" S))
192        ((S "*" S))
193        ((S "/" S))
194        ((S "+" S))
195        ((S "-" S))));
196\end{verbatim}
197
198The special markers \verb+!:symbol+ and \verb+!:number+ will match any
199symbols or numbers -- for commentary on what count as such see the discussion
200of the lexter later on. The strings stand for fixed tokens and by virtue
201of them being used in the grammer the lexer will recognise them specially.
202
203