1@c PSPP - a program for statistical analysis.
2@c Copyright (C) 2019 Free Software Foundation, Inc.
3@c Permission is granted to copy, distribute and/or modify this document
4@c under the terms of the GNU Free Documentation License, Version 1.3
5@c or any later version published by the Free Software Foundation;
6@c with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
7@c A copy of the license is included in the section entitled "GNU
8@c Free Documentation License".
9@c
10
11@node q2c Input Format
12@appendix @code{q2c} Input Format
13
14PSPP statistical procedures have a bizarre and somewhat irregular
15syntax.  Despite this, a parser generator has been written that
16adequately addresses many of the possibilities and tries to provide
17hooks for the exceptional cases.  This parser generator is named
18@code{q2c}.
19
20@menu
21* Invoking q2c::                q2c command-line syntax.
22* q2c Input Structure::         High-level layout of the input file.
23* Grammar Rules::               Syntax of the grammar rules.
24@end menu
25
26@node Invoking q2c
27@section Invoking q2c
28
29@example
30q2c @var{input.q} @var{output.c}
31@end example
32
33@code{q2c} translates a @samp{.q} file into a @samp{.c} file.  It takes
34exactly two command-line arguments, which are the input file name and
35output file name, respectively.  @code{q2c} does not accept any
36command-line options.
37
38@node q2c Input Structure
39@section @code{q2c} Input Structure
40
41@code{q2c} input files are divided into two sections: the grammar rules
42and the supporting code.  The @dfn{grammar rules}, which make up the
43first part of the input, are used to define the syntax of the
44statistical procedure to be parsed.  The @dfn{supporting code},
45following the grammar rules, are copied largely unchanged to the output
46file, except for certain escapes.
47
48The most important lines in the grammar rules are used for defining
49procedure syntax.  These lines can be prefixed with a dollar sign
50(@samp{$}), which prevents Emacs' CC-mode from munging them.  Besides
51this, a bang (@samp{!}) at the beginning of a line causes the line,
52minus the bang, to be written verbatim to the output file (useful for
53comments).  As a third special case, any line that begins with the exact
54characters @code{/* *INDENT} is ignored and not written to the output.
55This allows @code{.q} files to be processed through @code{indent}
56without being munged.
57
58The syntax of the grammar rules themselves is given in the following
59sections.
60
61The supporting code is passed into the output file largely unchanged.
62However, the following escapes are supported.  Each escape must appear
63on a line by itself.
64
65@table @code
66@item /* (header) */
67
68Expands to a series of C @code{#include} directives which include the
69headers that are required for the parser generated by @code{q2c}.
70
71@item /* (decls @var{scope}) */
72
73Expands to C variable and data type declarations for the variables and
74@code{enum}s input and output by the @code{q2c} parser.  @var{scope}
75must be either @code{local} or @code{global}.  @code{local} causes the
76declarations to be output as function locals.  @code{global} causes them
77to be declared as @code{static} module variables; thus, @code{global} is
78a bit of a misnomer.
79
80@item /* (parser) */
81
82Expands to the entire parser.  Must be enclosed within a C function.
83
84@item /* (free) */
85
86Expands to a set of calls to the @code{free} function for variables
87declared by the parser.  Only needs to be invoked if subcommands of type
88@code{string} are used in the grammar rules.
89@end table
90
91@node Grammar Rules
92@section Grammar Rules
93
94The grammar rules describe the format of the syntax that the parser
95generated by @code{q2c} will understand.  The way that the grammar rules
96are included in @code{q2c} input file are described above.
97
98The grammar rules are divided into tokens of the following types:
99
100@table @asis
101@item Identifier (@code{ID})
102
103An identifier token is a sequence of letters, digits, and underscores
104(@samp{_}).  Identifiers are @emph{not} case-sensitive.
105
106@item String (@code{STRING})
107
108String tokens are initiated by a double-quote character (@samp{"}) and
109consist of all the characters between that double quote and the next
110double quote, which must be on the same line as the first.  Within a
111string, a backslash can be used as a ``literal escape''.  The only
112reasons to use a literal escape are to include a double quote or a
113backslash within a string.
114
115@item Special character
116
117Other characters, other than white space, constitute tokens in
118themselves.
119
120@end table
121
122The syntax of the grammar rules is as follows:
123
124@example
125grammar-rules ::= command-name opt-prefix : subcommands .
126command-name ::= ID
127             ::= STRING
128opt-prefix ::=
129           ::= ( ID )
130subcommands ::= subcommand
131            ::= subcommands ; subcommand
132@end example
133
134The syntax begins with an ID token that gives the name of the
135procedure to be parsed.  For command names that contain multiple
136words, a STRING token may be used instead, e.g.@: @samp{"FILE
137HANDLE"}.  Optionally, an ID in parentheses specifies a prefix used
138for all file-scope identifiers declared by the emitted code.
139
140The rest of the syntax consists of subcommands separated by semicolons
141(@samp{;}) and terminated with a full stop (@samp{.}).
142
143@example
144subcommand ::= default-opt arity-opt ID sbc-defn
145default-opt ::=
146            ::= *
147arity-opt ::=
148          ::= +
149          ::= ^
150sbc-defn ::= opt-prefix = specifiers
151         ::= [ ID ] = array-sbc
152         ::= opt-prefix = sbc-special-form
153@end example
154
155A subcommand that begins with an asterisk (@samp{*}) is the default
156subcommand.  The keyword used for the default subcommand can be omitted
157in the PSPP syntax file.
158
159A plus sign (@samp{+}) indicates that a subcommand can appear more than
160once.  A caret (@samp{^}) indicate that a subcommand must appear exactly
161once.  A subcommand marked with neither character may appear once or not
162at all, but not more than once.
163
164The subcommand name appears after the leading option characters.
165
166There are three forms of subcommands.  The first and most common form
167simply gives an equals sign (@samp{=}) and a list of specifiers, which
168can each be set to a single setting.  The second form declares an array,
169which is a set of flags that can be individually turned on by the user.
170There are also several special forms that do not take a list of
171specifiers.
172
173Arrays require an additional @code{ID} argument.  This is used as a
174prefix, prepended to the variable names constructed from the
175specifiers.  The other forms also allow an optional prefix to be
176specified.
177
178@example
179array-sbc ::= alternatives
180          ::= array-sbc , alternatives
181alternatives ::= ID
182             ::= alternatives | ID
183@end example
184
185An array subcommand is a set of Boolean values that can independently be
186turned on by the user, listed separated by commas (@samp{,}).  If an value has more
187than one name then these names are separated by pipes (@samp{|}).
188
189@example
190specifiers ::= specifier
191           ::= specifiers , specifier
192specifier ::= opt-id : settings
193opt-id ::=
194       ::= ID
195@end example
196
197Ordinary subcommands (other than arrays and special forms) require a
198list of specifiers.  Each specifier has an optional name and a list of
199settings.  If the name is given then a correspondingly named variable
200will be used to store the user's choice of setting.  If no name is given
201then there is no way to tell which setting the user picked; in this case
202the settings should probably have values attached.
203
204@example
205settings ::= setting
206         ::= settings / setting
207setting ::= setting-options ID setting-value
208setting-options ::=
209                ::= *
210                ::= !
211                ::= * !
212@end example
213
214Individual settings are separated by forward slashes (@samp{/}).  Each
215setting can be as little as an @code{ID} token, but options and values
216can optionally be included.  The @samp{*} option means that, for this
217setting, the @code{ID} can be omitted.  The @samp{!} option means that
218this option is the default for its specifier.
219
220@example
221setting-value ::=
222              ::= ( setting-value-2 )
223              ::= setting-value-2
224setting-value-2 ::= setting-value-options setting-value-type : ID
225setting-value-options ::=
226                      ::= *
227setting-value-type ::= N
228                   ::= D
229                   ::= S
230@end example
231
232Settings may have values.  If the value must be enclosed in parentheses,
233then enclose the value declaration in parentheses.  Declare the setting
234type as @samp{n}, @samp{d}, or @samp{s} for integer, floating-point,
235or string type, respectively.  The given @code{ID} is used to
236construct a variable name.
237If option @samp{*} is given, then the value is optional; otherwise it
238must be specified whenever the corresponding setting is specified.
239
240@example
241sbc-special-form ::= VAR
242                 ::= VARLIST varlist-options
243                 ::= INTEGER opt-list
244                 ::= DOUBLE opt-list
245                 ::= PINT
246                 ::= STRING @r{(the literal word STRING)}
247                 ::= CUSTOM
248varlist-options ::=
249                ::= ( STRING )
250opt-list ::=
251         ::= LIST
252@end example
253
254The special forms are of the following types:
255
256@table @code
257@item VAR
258
259A single variable name.
260
261@item VARLIST
262
263A list of variables.  If given, the string can be used to provide
264@code{PV_@var{*}} options to the call to @code{parse_variables}.
265
266@item INTEGER
267
268A single integer value.
269
270@item INTEGER LIST
271
272A list of integers separated by spaces or commas.
273
274@item DOUBLE
275
276A single floating-point value.
277
278@item DOUBLE LIST
279
280A list of floating-point values.
281
282@item PINT
283
284A single positive integer value.
285
286@item STRING
287
288A string value.
289
290@item CUSTOM
291
292A custom function is used to parse this subcommand.  The function must
293have prototype @code{int custom_@var{name} (void)}.  It should return 0
294on failure (when it has already issued an appropriate diagnostic), 1 on
295success, or 2 if it fails and the calling function should issue a syntax
296error on behalf of the custom handler.
297
298@end table
299@setfilename ignored
300