@(#)ssA 8.1 (Berkeley) 06/08/93
10: Advanced Topics
This section discusses a number of advanced features of Yacc.
Simulating Error and Accept in ActionsThe parsing actions of error and accept can be simulated in an action by use of macros YYACCEPT and YYERROR. YYACCEPT causes yyparse to return the value 0; YYERROR causes the parser to behave as if the current input symbol had been a syntax error; yyerror is called, and error recovery takes place. These mechanisms can be used to simulate parsers with multiple endmarkers or context-sensitive syntax checking.
Accessing Values in Enclosing Rules.An action may refer to values returned by actions to the left of the current rule. The mechanism is simply the same as with ordinary actions, a dollar sign followed by a digit, but in this case the digit may be 0 or negative. Consider sent : adj noun verb adj noun { look at the sentence . . . } ; adj : THE { $$ = THE; } | YOUNG { $$ = YOUNG; } . . . ; noun : DOG { $$ = DOG; } | CRONE { if( $0 == YOUNG ){ printf( "what?\en" ); } $$ = CRONE; } ; . . . In the action following the word CRONE, a check is made that the preceding token shifted was not YOUNG. Obviously, this is only possible when a great deal is known about what might precede the symbol noun in the input. There is also a distinctly unstructured flavor about this. Nevertheless, at times this mechanism will save a great deal of trouble, especially when a few combinations are to be excluded from an otherwise regular structure.
Support for Arbitrary Value TypesBy default, the values returned by actions and the lexical analyzer are integers. Yacc can also support values of other types, including structures. In addition, Yacc keeps track of the types, and inserts appropriate union member names so that the resulting parser will be strictly type checked. The Yacc value stack (see Section 4) is declared to be a union of the various types of values desired. The user declares the union, and associates union member names to each token and nonterminal symbol having a value. When the value is referenced through a $$ or $n construction, Yacc will automatically insert the appropriate union name, so that no unwanted conversions will take place. In addition, type checking commands such as Lint\| .[ Johnson Lint Checker 1273 .] will be far more silent.
There are three mechanisms used to provide for this typing. First, there is a way of defining the union; this must be done by the user since other programs, notably the lexical analyzer, must know about the union member names. Second, there is a way of associating a union member name with tokens and nonterminals. Finally, there is a mechanism for describing the type of those few values where Yacc can not easily determine the type.
To declare the union, the user includes in the declaration section: %union { body of union ... } This declares the Yacc value stack, and the external variables yylval and yyval , to have type equal to this union. If Yacc was invoked with the -d option, the union declaration is copied onto the y.tab.h file. Alternatively, the union may be declared in a header file, and a typedef used to define the variable YYSTYPE to represent this union. Thus, the header file might also have said: typedef union { body of union ... } YYSTYPE; The header file must be included in the declarations section, by use of %{ and %}.
Once YYSTYPE is defined, the union member names must be associated with the various terminal and nonterminal names. The construction < name > is used to indicate a union member name. If this follows one of the keywords %token, %left, %right, and %nonassoc, the union member name is associated with the tokens listed. Thus, saying %left <optype> \'+\' \'-\' will cause any reference to values returned by these two tokens to be tagged with the union member name optype . Another keyword, %type, is used similarly to associate union member names with nonterminals. Thus, one might say %type <nodetype> expr stat
There remain a couple of cases where these mechanisms are insufficient. If there is an action within a rule, the value returned by this action has no "a priori" type. Similarly, reference to left context values (such as $0 - see the previous subsection ) leaves Yacc with no easy way of knowing the type. In this case, a type can be imposed on the reference by inserting a union member name, between < and >, immediately after the first $. An example of this usage is rule : aaa { $<intval>$ = 3; } bbb { fun( $<intval>2, $<other>0 ); } ; This syntax has little to recommend it, but the situation arises rarely.
A sample specification is given in Appendix C. The facilities in this subsection are not triggered until they are used: in particular, the use of %type will turn on these mechanisms. When they are used, there is a fairly strict level of checking. For example, use of $n or $$ to refer to something with no defined type is diagnosed. If these facilities are not triggered, the Yacc value stack is used to hold int' s, as was true historically.