1@c PSPP - a program for statistical analysis. 2@c Copyright (C) 2019 Free Software Foundation, Inc. 3@c Permission is granted to copy, distribute and/or modify this document 4@c under the terms of the GNU Free Documentation License, Version 1.3 5@c or any later version published by the Free Software Foundation; 6@c with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. 7@c A copy of the license is included in the section entitled "GNU 8@c Free Documentation License". 9@c 10 11@node q2c Input Format 12@appendix @code{q2c} Input Format 13 14PSPP statistical procedures have a bizarre and somewhat irregular 15syntax. Despite this, a parser generator has been written that 16adequately addresses many of the possibilities and tries to provide 17hooks for the exceptional cases. This parser generator is named 18@code{q2c}. 19 20@menu 21* Invoking q2c:: q2c command-line syntax. 22* q2c Input Structure:: High-level layout of the input file. 23* Grammar Rules:: Syntax of the grammar rules. 24@end menu 25 26@node Invoking q2c 27@section Invoking q2c 28 29@example 30q2c @var{input.q} @var{output.c} 31@end example 32 33@code{q2c} translates a @samp{.q} file into a @samp{.c} file. It takes 34exactly two command-line arguments, which are the input file name and 35output file name, respectively. @code{q2c} does not accept any 36command-line options. 37 38@node q2c Input Structure 39@section @code{q2c} Input Structure 40 41@code{q2c} input files are divided into two sections: the grammar rules 42and the supporting code. The @dfn{grammar rules}, which make up the 43first part of the input, are used to define the syntax of the 44statistical procedure to be parsed. The @dfn{supporting code}, 45following the grammar rules, are copied largely unchanged to the output 46file, except for certain escapes. 47 48The most important lines in the grammar rules are used for defining 49procedure syntax. These lines can be prefixed with a dollar sign 50(@samp{$}), which prevents Emacs' CC-mode from munging them. Besides 51this, a bang (@samp{!}) at the beginning of a line causes the line, 52minus the bang, to be written verbatim to the output file (useful for 53comments). As a third special case, any line that begins with the exact 54characters @code{/* *INDENT} is ignored and not written to the output. 55This allows @code{.q} files to be processed through @code{indent} 56without being munged. 57 58The syntax of the grammar rules themselves is given in the following 59sections. 60 61The supporting code is passed into the output file largely unchanged. 62However, the following escapes are supported. Each escape must appear 63on a line by itself. 64 65@table @code 66@item /* (header) */ 67 68Expands to a series of C @code{#include} directives which include the 69headers that are required for the parser generated by @code{q2c}. 70 71@item /* (decls @var{scope}) */ 72 73Expands to C variable and data type declarations for the variables and 74@code{enum}s input and output by the @code{q2c} parser. @var{scope} 75must be either @code{local} or @code{global}. @code{local} causes the 76declarations to be output as function locals. @code{global} causes them 77to be declared as @code{static} module variables; thus, @code{global} is 78a bit of a misnomer. 79 80@item /* (parser) */ 81 82Expands to the entire parser. Must be enclosed within a C function. 83 84@item /* (free) */ 85 86Expands to a set of calls to the @code{free} function for variables 87declared by the parser. Only needs to be invoked if subcommands of type 88@code{string} are used in the grammar rules. 89@end table 90 91@node Grammar Rules 92@section Grammar Rules 93 94The grammar rules describe the format of the syntax that the parser 95generated by @code{q2c} will understand. The way that the grammar rules 96are included in @code{q2c} input file are described above. 97 98The grammar rules are divided into tokens of the following types: 99 100@table @asis 101@item Identifier (@code{ID}) 102 103An identifier token is a sequence of letters, digits, and underscores 104(@samp{_}). Identifiers are @emph{not} case-sensitive. 105 106@item String (@code{STRING}) 107 108String tokens are initiated by a double-quote character (@samp{"}) and 109consist of all the characters between that double quote and the next 110double quote, which must be on the same line as the first. Within a 111string, a backslash can be used as a ``literal escape''. The only 112reasons to use a literal escape are to include a double quote or a 113backslash within a string. 114 115@item Special character 116 117Other characters, other than white space, constitute tokens in 118themselves. 119 120@end table 121 122The syntax of the grammar rules is as follows: 123 124@example 125grammar-rules ::= command-name opt-prefix : subcommands . 126command-name ::= ID 127 ::= STRING 128opt-prefix ::= 129 ::= ( ID ) 130subcommands ::= subcommand 131 ::= subcommands ; subcommand 132@end example 133 134The syntax begins with an ID token that gives the name of the 135procedure to be parsed. For command names that contain multiple 136words, a STRING token may be used instead, e.g.@: @samp{"FILE 137HANDLE"}. Optionally, an ID in parentheses specifies a prefix used 138for all file-scope identifiers declared by the emitted code. 139 140The rest of the syntax consists of subcommands separated by semicolons 141(@samp{;}) and terminated with a full stop (@samp{.}). 142 143@example 144subcommand ::= default-opt arity-opt ID sbc-defn 145default-opt ::= 146 ::= * 147arity-opt ::= 148 ::= + 149 ::= ^ 150sbc-defn ::= opt-prefix = specifiers 151 ::= [ ID ] = array-sbc 152 ::= opt-prefix = sbc-special-form 153@end example 154 155A subcommand that begins with an asterisk (@samp{*}) is the default 156subcommand. The keyword used for the default subcommand can be omitted 157in the PSPP syntax file. 158 159A plus sign (@samp{+}) indicates that a subcommand can appear more than 160once. A caret (@samp{^}) indicate that a subcommand must appear exactly 161once. A subcommand marked with neither character may appear once or not 162at all, but not more than once. 163 164The subcommand name appears after the leading option characters. 165 166There are three forms of subcommands. The first and most common form 167simply gives an equals sign (@samp{=}) and a list of specifiers, which 168can each be set to a single setting. The second form declares an array, 169which is a set of flags that can be individually turned on by the user. 170There are also several special forms that do not take a list of 171specifiers. 172 173Arrays require an additional @code{ID} argument. This is used as a 174prefix, prepended to the variable names constructed from the 175specifiers. The other forms also allow an optional prefix to be 176specified. 177 178@example 179array-sbc ::= alternatives 180 ::= array-sbc , alternatives 181alternatives ::= ID 182 ::= alternatives | ID 183@end example 184 185An array subcommand is a set of Boolean values that can independently be 186turned on by the user, listed separated by commas (@samp{,}). If an value has more 187than one name then these names are separated by pipes (@samp{|}). 188 189@example 190specifiers ::= specifier 191 ::= specifiers , specifier 192specifier ::= opt-id : settings 193opt-id ::= 194 ::= ID 195@end example 196 197Ordinary subcommands (other than arrays and special forms) require a 198list of specifiers. Each specifier has an optional name and a list of 199settings. If the name is given then a correspondingly named variable 200will be used to store the user's choice of setting. If no name is given 201then there is no way to tell which setting the user picked; in this case 202the settings should probably have values attached. 203 204@example 205settings ::= setting 206 ::= settings / setting 207setting ::= setting-options ID setting-value 208setting-options ::= 209 ::= * 210 ::= ! 211 ::= * ! 212@end example 213 214Individual settings are separated by forward slashes (@samp{/}). Each 215setting can be as little as an @code{ID} token, but options and values 216can optionally be included. The @samp{*} option means that, for this 217setting, the @code{ID} can be omitted. The @samp{!} option means that 218this option is the default for its specifier. 219 220@example 221setting-value ::= 222 ::= ( setting-value-2 ) 223 ::= setting-value-2 224setting-value-2 ::= setting-value-options setting-value-type : ID 225setting-value-options ::= 226 ::= * 227setting-value-type ::= N 228 ::= D 229 ::= S 230@end example 231 232Settings may have values. If the value must be enclosed in parentheses, 233then enclose the value declaration in parentheses. Declare the setting 234type as @samp{n}, @samp{d}, or @samp{s} for integer, floating-point, 235or string type, respectively. The given @code{ID} is used to 236construct a variable name. 237If option @samp{*} is given, then the value is optional; otherwise it 238must be specified whenever the corresponding setting is specified. 239 240@example 241sbc-special-form ::= VAR 242 ::= VARLIST varlist-options 243 ::= INTEGER opt-list 244 ::= DOUBLE opt-list 245 ::= PINT 246 ::= STRING @r{(the literal word STRING)} 247 ::= CUSTOM 248varlist-options ::= 249 ::= ( STRING ) 250opt-list ::= 251 ::= LIST 252@end example 253 254The special forms are of the following types: 255 256@table @code 257@item VAR 258 259A single variable name. 260 261@item VARLIST 262 263A list of variables. If given, the string can be used to provide 264@code{PV_@var{*}} options to the call to @code{parse_variables}. 265 266@item INTEGER 267 268A single integer value. 269 270@item INTEGER LIST 271 272A list of integers separated by spaces or commas. 273 274@item DOUBLE 275 276A single floating-point value. 277 278@item DOUBLE LIST 279 280A list of floating-point values. 281 282@item PINT 283 284A single positive integer value. 285 286@item STRING 287 288A string value. 289 290@item CUSTOM 291 292A custom function is used to parse this subcommand. The function must 293have prototype @code{int custom_@var{name} (void)}. It should return 0 294on failure (when it has already issued an appropriate diagnostic), 1 on 295success, or 2 if it fails and the calling function should issue a syntax 296error on behalf of the custom handler. 297 298@end table 299@setfilename ignored 300