1.\" $OpenBSD: flex.1,v 1.43 2015/09/21 10:03:46 jmc Exp $ 2.\" 3.\" Copyright (c) 1990 The Regents of the University of California. 4.\" All rights reserved. 5.\" 6.\" This code is derived from software contributed to Berkeley by 7.\" Vern Paxson. 8.\" 9.\" The United States Government has rights in this work pursuant 10.\" to contract no. DE-AC03-76SF00098 between the United States 11.\" Department of Energy and the University of California. 12.\" 13.\" Redistribution and use in source and binary forms, with or without 14.\" modification, are permitted provided that the following conditions 15.\" are met: 16.\" 17.\" 1. Redistributions of source code must retain the above copyright 18.\" notice, this list of conditions and the following disclaimer. 19.\" 2. Redistributions in binary form must reproduce the above copyright 20.\" notice, this list of conditions and the following disclaimer in the 21.\" documentation and/or other materials provided with the distribution. 22.\" 23.\" Neither the name of the University nor the names of its contributors 24.\" may be used to endorse or promote products derived from this software 25.\" without specific prior written permission. 26.\" 27.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR 28.\" IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED 29.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 30.\" PURPOSE. 31.\" 32.Dd $Mdocdate: September 21 2015 $ 33.Dt FLEX 1 34.Os 35.Sh NAME 36.Nm flex , 37.Nm flex++ , 38.Nm lex 39.Nd fast lexical analyzer generator 40.Sh SYNOPSIS 41.Nm 42.Bk -words 43.Op Fl 78BbdFfhIiLlnpsTtVvw+? 44.Op Fl C Ns Op Cm aeFfmr 45.Op Fl Fl help 46.Op Fl Fl version 47.Op Fl o Ns Ar output 48.Op Fl P Ns Ar prefix 49.Op Fl S Ns Ar skeleton 50.Op Ar 51.Ek 52.Sh DESCRIPTION 53.Nm 54is a tool for generating 55.Em scanners : 56programs which recognize lexical patterns in text. 57.Nm 58reads the given input files, or its standard input if no file names are given, 59for a description of a scanner to generate. 60The description is in the form of pairs of regular expressions and C code, 61called 62.Em rules . 63.Nm 64generates as output a C source file, 65.Pa lex.yy.c , 66which defines a routine 67.Fn yylex . 68This file is compiled and linked with the 69.Fl lfl 70library to produce an executable. 71When the executable is run, it analyzes its input for occurrences 72of the regular expressions. 73Whenever it finds one, it executes the corresponding C code. 74.Pp 75.Nm lex 76is a synonym for 77.Nm flex . 78.Nm flex++ 79is a synonym for 80.Nm 81.Fl + . 82.Pp 83The manual includes both tutorial and reference sections: 84.Bl -ohang 85.It Sy Some Simple Examples 86.It Sy Format of the Input File 87.It Sy Patterns 88The extended regular expressions used by 89.Nm . 90.It Sy How the Input is Matched 91The rules for determining what has been matched. 92.It Sy Actions 93How to specify what to do when a pattern is matched. 94.It Sy The Generated Scanner 95Details regarding the scanner that 96.Nm 97produces; 98how to control the input source. 99.It Sy Start Conditions 100Introducing context into scanners, and managing 101.Qq mini-scanners . 102.It Sy Multiple Input Buffers 103How to manipulate multiple input sources; 104how to scan from strings instead of files. 105.It Sy End-of-File Rules 106Special rules for matching the end of the input. 107.It Sy Miscellaneous Macros 108A summary of macros available to the actions. 109.It Sy Values Available to the User 110A summary of values available to the actions. 111.It Sy Interfacing with Yacc 112Connecting flex scanners together with 113.Xr yacc 1 114parsers. 115.It Sy Options 116.Nm 117command-line options, and the 118.Dq %option 119directive. 120.It Sy Performance Considerations 121How to make scanners go as fast as possible. 122.It Sy Generating C++ Scanners 123The 124.Pq experimental 125facility for generating C++ scanner classes. 126.It Sy Incompatibilities with Lex and POSIX 127How 128.Nm 129differs from 130.At 131.Nm lex 132and the 133.Tn POSIX 134.Nm lex 135standard. 136.It Sy Files 137Files used by 138.Nm . 139.It Sy Diagnostics 140Those error messages produced by 141.Nm 142.Pq or scanners it generates 143whose meanings might not be apparent. 144.It Sy See Also 145Other documentation, related tools. 146.It Sy Authors 147Includes contact information. 148.It Sy Bugs 149Known problems with 150.Nm . 151.El 152.Sh SOME SIMPLE EXAMPLES 153First some simple examples to get the flavor of how one uses 154.Nm . 155The following 156.Nm 157input specifies a scanner which whenever it encounters the string 158.Qq username 159will replace it with the user's login name: 160.Bd -literal -offset indent 161%% 162username printf("%s", getlogin()); 163.Ed 164.Pp 165By default, any text not matched by a 166.Nm 167scanner is copied to the output, so the net effect of this scanner is 168to copy its input file to its output with each occurrence of 169.Qq username 170expanded. 171In this input, there is just one rule. 172.Qq username 173is the 174.Em pattern 175and the 176.Qq printf 177is the 178.Em action . 179The 180.Qq %% 181marks the beginning of the rules. 182.Pp 183Here's another simple example: 184.Bd -literal -offset indent 185%{ 186int num_lines = 0, num_chars = 0; 187%} 188 189%% 190\en ++num_lines; ++num_chars; 191\&. ++num_chars; 192 193%% 194main() 195{ 196 yylex(); 197 printf("# of lines = %d, # of chars = %d\en", 198 num_lines, num_chars); 199} 200.Ed 201.Pp 202This scanner counts the number of characters and the number 203of lines in its input 204(it produces no output other than the final report on the counts). 205The first line declares two globals, 206.Qq num_lines 207and 208.Qq num_chars , 209which are accessible both inside 210.Fn yylex 211and in the 212.Fn main 213routine declared after the second 214.Qq %% . 215There are two rules, one which matches a newline 216.Pq \&"\en\&" 217and increments both the line count and the character count, 218and one which matches any character other than a newline 219(indicated by the 220.Qq \&. 221regular expression). 222.Pp 223A somewhat more complicated example: 224.Bd -literal -offset indent 225/* scanner for a toy Pascal-like language */ 226 227%{ 228/* need this for the call to atof() below */ 229#include <math.h> 230%} 231 232DIGIT [0-9] 233ID [a-z][a-z0-9]* 234 235%% 236 237{DIGIT}+ { 238 printf("An integer: %s (%d)\en", yytext, 239 atoi(yytext)); 240} 241 242{DIGIT}+"."{DIGIT}* { 243 printf("A float: %s (%g)\en", yytext, 244 atof(yytext)); 245} 246 247if|then|begin|end|procedure|function { 248 printf("A keyword: %s\en", yytext); 249} 250 251{ID} printf("An identifier: %s\en", yytext); 252 253"+"|"-"|"*"|"/" printf("An operator: %s\en", yytext); 254 255"{"[^}\en]*"}" /* eat up one-line comments */ 256 257[ \et\en]+ /* eat up whitespace */ 258 259\&. printf("Unrecognized character: %s\en", yytext); 260 261%% 262 263main(int argc, char *argv[]) 264{ 265 ++argv; --argc; /* skip over program name */ 266 if (argc > 0) 267 yyin = fopen(argv[0], "r"); 268 else 269 yyin = stdin; 270 271 yylex(); 272} 273.Ed 274.Pp 275This is the beginnings of a simple scanner for a language like Pascal. 276It identifies different types of 277.Em tokens 278and reports on what it has seen. 279.Pp 280The details of this example will be explained in the following sections. 281.Sh FORMAT OF THE INPUT FILE 282The 283.Nm 284input file consists of three sections, separated by a line with just 285.Qq %% 286in it: 287.Bd -unfilled -offset indent 288definitions 289%% 290rules 291%% 292user code 293.Ed 294.Pp 295The 296.Em definitions 297section contains declarations of simple 298.Em name 299definitions to simplify the scanner specification, and declarations of 300.Em start conditions , 301which are explained in a later section. 302.Pp 303Name definitions have the form: 304.Pp 305.D1 name definition 306.Pp 307The 308.Qq name 309is a word beginning with a letter or an underscore 310.Pq Sq _ 311followed by zero or more letters, digits, 312.Sq _ , 313or 314.Sq - 315.Pq dash . 316The definition is taken to begin at the first non-whitespace character 317following the name and continuing to the end of the line. 318The definition can subsequently be referred to using 319.Qq {name} , 320which will expand to 321.Qq (definition) . 322For example: 323.Bd -literal -offset indent 324DIGIT [0-9] 325ID [a-z][a-z0-9]* 326.Ed 327.Pp 328This defines 329.Qq DIGIT 330to be a regular expression which matches a single digit, and 331.Qq ID 332to be a regular expression which matches a letter 333followed by zero-or-more letters-or-digits. 334A subsequent reference to 335.Pp 336.Dl {DIGIT}+"."{DIGIT}* 337.Pp 338is identical to 339.Pp 340.Dl ([0-9])+"."([0-9])* 341.Pp 342and matches one-or-more digits followed by a 343.Sq .\& 344followed by zero-or-more digits. 345.Pp 346The 347.Em rules 348section of the 349.Nm 350input contains a series of rules of the form: 351.Pp 352.Dl pattern action 353.Pp 354The pattern must be unindented and the action must begin 355on the same line. 356.Pp 357See below for a further description of patterns and actions. 358.Pp 359Finally, the user code section is simply copied to 360.Pa lex.yy.c 361verbatim. 362It is used for companion routines which call or are called by the scanner. 363The presence of this section is optional; 364if it is missing, the second 365.Qq %% 366in the input file may be skipped too. 367.Pp 368In the definitions and rules sections, any indented text or text enclosed in 369.Sq %{ 370and 371.Sq %} 372is copied verbatim to the output 373.Pq with the %{}'s removed . 374The %{}'s must appear unindented on lines by themselves. 375.Pp 376In the rules section, 377any indented or %{} text appearing before the first rule may be used to 378declare variables which are local to the scanning routine and 379.Pq after the declarations 380code which is to be executed whenever the scanning routine is entered. 381Other indented or %{} text in the rule section is still copied to the output, 382but its meaning is not well-defined and it may well cause compile-time 383errors (this feature is present for 384.Tn POSIX 385compliance; see below for other such features). 386.Pp 387In the definitions section 388.Pq but not in the rules section , 389an unindented comment 390(i.e., a line beginning with 391.Qq /* ) 392is also copied verbatim to the output up to the next 393.Qq */ . 394.Sh PATTERNS 395The patterns in the input are written using an extended set of regular 396expressions. 397These are: 398.Bl -tag -width "XXXXXXXX" 399.It x 400Match the character 401.Sq x . 402.It .\& 403Any character 404.Pq byte 405except newline. 406.It [xyz] 407A 408.Qq character class ; 409in this case, the pattern matches either an 410.Sq x , 411a 412.Sq y , 413or a 414.Sq z . 415.It [abj-oZ] 416A 417.Qq character class 418with a range in it; matches an 419.Sq a , 420a 421.Sq b , 422any letter from 423.Sq j 424through 425.Sq o , 426or a 427.Sq Z . 428.It [^A-Z] 429A 430.Qq negated character class , 431i.e., any character but those in the class. 432In this case, any character EXCEPT an uppercase letter. 433.It [^A-Z\en] 434Any character EXCEPT an uppercase letter or a newline. 435.It r* 436Zero or more r's, where 437.Sq r 438is any regular expression. 439.It r+ 440One or more r's. 441.It r? 442Zero or one r's (that is, 443.Qq an optional r ) . 444.It r{2,5} 445Anywhere from two to five r's. 446.It r{2,} 447Two or more r's. 448.It r{4} 449Exactly 4 r's. 450.It {name} 451The expansion of the 452.Qq name 453definition 454.Pq see above . 455.It \&"[xyz]\e\&"foo\&" 456The literal string: [xyz]"foo. 457.It \eX 458If 459.Sq X 460is an 461.Sq a , 462.Sq b , 463.Sq f , 464.Sq n , 465.Sq r , 466.Sq t , 467or 468.Sq v , 469then the ANSI-C interpretation of 470.Sq \eX . 471Otherwise, a literal 472.Sq X 473(used to escape operators such as 474.Sq * ) . 475.It \e0 476A NUL character 477.Pq ASCII code 0 . 478.It \e123 479The character with octal value 123. 480.It \ex2a 481The character with hexadecimal value 2a. 482.It (r) 483Match an 484.Sq r ; 485parentheses are used to override precedence 486.Pq see below . 487.It rs 488The regular expression 489.Sq r 490followed by the regular expression 491.Sq s ; 492called 493.Qq concatenation . 494.It r|s 495Either an 496.Sq r 497or an 498.Sq s . 499.It r/s 500An 501.Sq r , 502but only if it is followed by an 503.Sq s . 504The text matched by 505.Sq s 506is included when determining whether this rule is the 507.Qq longest match , 508but is then returned to the input before the action is executed. 509So the action only sees the text matched by 510.Sq r . 511This type of pattern is called 512.Qq trailing context . 513(There are some combinations of r/s that 514.Nm 515cannot match correctly; see notes in the 516.Sx BUGS 517section below regarding 518.Qq dangerous trailing context . ) 519.It ^r 520An 521.Sq r , 522but only at the beginning of a line 523(i.e., just starting to scan, or right after a newline has been scanned). 524.It r$ 525An 526.Sq r , 527but only at the end of a line 528.Pq i.e., just before a newline . 529Equivalent to 530.Qq r/\en . 531.Pp 532Note that 533.Nm flex Ns 's 534notion of 535.Qq newline 536is exactly whatever the C compiler used to compile 537.Nm 538interprets 539.Sq \en 540as. 541.\" In particular, on some DOS systems you must either filter out \er's in the 542.\" input yourself, or explicitly use r/\er\en for 543.\" .Qq r$ . 544.It <s>r 545An 546.Sq r , 547but only in start condition 548.Sq s 549.Pq see below for discussion of start conditions . 550.It <s1,s2,s3>r 551The same, but in any of start conditions s1, s2, or s3. 552.It <*>r 553An 554.Sq r 555in any start condition, even an exclusive one. 556.It <<EOF>> 557An end-of-file. 558.It <s1,s2><<EOF>> 559An end-of-file when in start condition s1 or s2. 560.El 561.Pp 562Note that inside of a character class, all regular expression operators 563lose their special meaning except escape 564.Pq Sq \e 565and the character class operators, 566.Sq - , 567.Sq ]\& , 568and, at the beginning of the class, 569.Sq ^ . 570.Pp 571The regular expressions listed above are grouped according to 572precedence, from highest precedence at the top to lowest at the bottom. 573Those grouped together have equal precedence. 574For example, 575.Pp 576.D1 foo|bar* 577.Pp 578is the same as 579.Pp 580.D1 (foo)|(ba(r*)) 581.Pp 582since the 583.Sq * 584operator has higher precedence than concatenation, 585and concatenation higher than alternation 586.Pq Sq |\& . 587This pattern therefore matches 588.Em either 589the string 590.Qq foo 591.Em or 592the string 593.Qq ba 594followed by zero-or-more r's. 595To match 596.Qq foo 597or zero-or-more "bar"'s, 598use: 599.Pp 600.D1 foo|(bar)* 601.Pp 602and to match zero-or-more "foo"'s-or-"bar"'s: 603.Pp 604.D1 (foo|bar)* 605.Pp 606In addition to characters and ranges of characters, character classes 607can also contain character class 608.Em expressions . 609These are expressions enclosed inside 610.Sq [: 611and 612.Sq :] 613delimiters (which themselves must appear between the 614.Sq \&[ 615and 616.Sq ]\& 617of the 618character class; other elements may occur inside the character class, too). 619The valid expressions are: 620.Bd -unfilled -offset indent 621[:alnum:] [:alpha:] [:blank:] 622[:cntrl:] [:digit:] [:graph:] 623[:lower:] [:print:] [:punct:] 624[:space:] [:upper:] [:xdigit:] 625.Ed 626.Pp 627These expressions all designate a set of characters equivalent to 628the corresponding standard C 629.Fn isXXX 630function. 631For example, [:alnum:] designates those characters for which 632.Xr isalnum 3 633returns true \- i.e., any alphabetic or numeric. 634Some systems don't provide 635.Xr isblank 3 , 636so 637.Nm 638defines [:blank:] as a blank or a tab. 639.Pp 640For example, the following character classes are all equivalent: 641.Bd -unfilled -offset indent 642[[:alnum:]] 643[[:alpha:][:digit:]] 644[[:alpha:]0-9] 645[a-zA-Z0-9] 646.Ed 647.Pp 648If the scanner is case-insensitive (the 649.Fl i 650flag), then [:upper:] and [:lower:] are equivalent to [:alpha:]. 651.Pp 652Some notes on patterns: 653.Bl -dash 654.It 655A negated character class such as the example 656.Qq [^A-Z] 657above will match a newline unless "\en" 658.Pq or an equivalent escape sequence 659is one of the characters explicitly present in the negated character class 660(e.g., 661.Qq [^A-Z\en] ) . 662This is unlike how many other regular expression tools treat negated character 663classes, but unfortunately the inconsistency is historically entrenched. 664Matching newlines means that a pattern like 665.Qq [^"]* 666can match the entire input unless there's another quote in the input. 667.It 668A rule can have at most one instance of trailing context 669(the 670.Sq / 671operator or the 672.Sq $ 673operator). 674The start condition, 675.Sq ^ , 676and 677.Qq <<EOF>> 678patterns can only occur at the beginning of a pattern and, as well as with 679.Sq / 680and 681.Sq $ , 682cannot be grouped inside parentheses. 683A 684.Sq ^ 685which does not occur at the beginning of a rule or a 686.Sq $ 687which does not occur at the end of a rule loses its special properties 688and is treated as a normal character. 689.It 690The following are illegal: 691.Bd -unfilled -offset indent 692foo/bar$ 693<sc1>foo<sc2>bar 694.Ed 695.Pp 696Note that the first of these, can be written 697.Qq foo/bar\en . 698.It 699The following will result in 700.Sq $ 701or 702.Sq ^ 703being treated as a normal character: 704.Bd -unfilled -offset indent 705foo|(bar$) 706foo|^bar 707.Ed 708.Pp 709If what's wanted is a 710.Qq foo 711or a bar-followed-by-a-newline, the following could be used 712(the special 713.Sq |\& 714action is explained below): 715.Bd -unfilled -offset indent 716foo | 717bar$ /* action goes here */ 718.Ed 719.Pp 720A similar trick will work for matching a foo or a 721bar-at-the-beginning-of-a-line. 722.El 723.Sh HOW THE INPUT IS MATCHED 724When the generated scanner is run, 725it analyzes its input looking for strings which match any of its patterns. 726If it finds more than one match, 727it takes the one matching the most text 728(for trailing context rules, this includes the length of the trailing part, 729even though it will then be returned to the input). 730If it finds two or more matches of the same length, 731the rule listed first in the 732.Nm 733input file is chosen. 734.Pp 735Once the match is determined, the text corresponding to the match 736(called the 737.Em token ) 738is made available in the global character pointer 739.Fa yytext , 740and its length in the global integer 741.Fa yyleng . 742The 743.Em action 744corresponding to the matched pattern is then executed 745.Pq a more detailed description of actions follows , 746and then the remaining input is scanned for another match. 747.Pp 748If no match is found, then the default rule is executed: 749the next character in the input is considered matched and 750copied to the standard output. 751Thus, the simplest legal 752.Nm 753input is: 754.Pp 755.D1 %% 756.Pp 757which generates a scanner that simply copies its input 758.Pq one character at a time 759to its output. 760.Pp 761Note that 762.Fa yytext 763can be defined in two different ways: 764either as a character pointer or as a character array. 765Which definition 766.Nm 767uses can be controlled by including one of the special directives 768.Dq %pointer 769or 770.Dq %array 771in the first 772.Pq definitions 773section of flex input. 774The default is 775.Dq %pointer , 776unless the 777.Fl l 778.Nm lex 779compatibility option is used, in which case 780.Fa yytext 781will be an array. 782The advantage of using 783.Dq %pointer 784is substantially faster scanning and no buffer overflow when matching 785very large tokens 786.Pq unless not enough dynamic memory is available . 787The disadvantage is that actions are restricted in how they can modify 788.Fa yytext 789.Pq see the next section , 790and calls to the 791.Fn unput 792function destroy the present contents of 793.Fa yytext , 794which can be a considerable porting headache when moving between different 795.Nm lex 796versions. 797.Pp 798The advantage of 799.Dq %array 800is that 801.Fa yytext 802can be modified as much as wanted, and calls to 803.Fn unput 804do not destroy 805.Fa yytext 806.Pq see below . 807Furthermore, existing 808.Nm lex 809programs sometimes access 810.Fa yytext 811externally using declarations of the form: 812.Pp 813.D1 extern char yytext[]; 814.Pp 815This definition is erroneous when used with 816.Dq %pointer , 817but correct for 818.Dq %array . 819.Pp 820.Dq %array 821defines 822.Fa yytext 823to be an array of 824.Dv YYLMAX 825characters, which defaults to a fairly large value. 826The size can be changed by simply #define'ing 827.Dv YYLMAX 828to a different value in the first section of 829.Nm 830input. 831As mentioned above, with 832.Dq %pointer 833yytext grows dynamically to accommodate large tokens. 834While this means a 835.Dq %pointer 836scanner can accommodate very large tokens 837.Pq such as matching entire blocks of comments , 838bear in mind that each time the scanner must resize 839.Fa yytext 840it also must rescan the entire token from the beginning, so matching such 841tokens can prove slow. 842.Fa yytext 843presently does not dynamically grow if a call to 844.Fn unput 845results in too much text being pushed back; instead, a run-time error results. 846.Pp 847Also note that 848.Dq %array 849cannot be used with C++ scanner classes 850.Pq the c++ option; see below . 851.Sh ACTIONS 852Each pattern in a rule has a corresponding action, 853which can be any arbitrary C statement. 854The pattern ends at the first non-escaped whitespace character; 855the remainder of the line is its action. 856If the action is empty, 857then when the pattern is matched the input token is simply discarded. 858For example, here is the specification for a program 859which deletes all occurrences of 860.Qq zap me 861from its input: 862.Bd -literal -offset indent 863%% 864"zap me" 865.Ed 866.Pp 867(It will copy all other characters in the input to the output since 868they will be matched by the default rule.) 869.Pp 870Here is a program which compresses multiple blanks and tabs down to 871a single blank, and throws away whitespace found at the end of a line: 872.Bd -literal -offset indent 873%% 874[ \et]+ putchar(' '); 875[ \et]+$ /* ignore this token */ 876.Ed 877.Pp 878If the action contains a 879.Sq { , 880then the action spans till the balancing 881.Sq } 882is found, and the action may cross multiple lines. 883.Nm 884knows about C strings and comments and won't be fooled by braces found 885within them, but also allows actions to begin with 886.Sq %{ 887and will consider the action to be all the text up to the next 888.Sq %} 889.Pq regardless of ordinary braces inside the action . 890.Pp 891An action consisting solely of a vertical bar 892.Pq Sq |\& 893means 894.Qq same as the action for the next rule . 895See below for an illustration. 896.Pp 897Actions can include arbitrary C code, 898including return statements to return a value to whatever routine called 899.Fn yylex . 900Each time 901.Fn yylex 902is called, it continues processing tokens from where it last left off 903until it either reaches the end of the file or executes a return. 904.Pp 905Actions are free to modify 906.Fa yytext 907except for lengthening it 908(adding characters to its end \- these will overwrite later characters in the 909input stream). 910This, however, does not apply when using 911.Dq %array 912.Pq see above ; 913in that case, 914.Fa yytext 915may be freely modified in any way. 916.Pp 917Actions are free to modify 918.Fa yyleng 919except they should not do so if the action also includes use of 920.Fn yymore 921.Pq see below . 922.Pp 923There are a number of special directives which can be included within 924an action: 925.Bl -tag -width Ds 926.It ECHO 927Copies 928.Fa yytext 929to the scanner's output. 930.It BEGIN 931Followed by the name of a start condition, places the scanner in the 932corresponding start condition 933.Pq see below . 934.It REJECT 935Directs the scanner to proceed on to the 936.Qq second best 937rule which matched the input 938.Pq or a prefix of the input . 939The rule is chosen as described above in 940.Sx HOW THE INPUT IS MATCHED , 941and 942.Fa yytext 943and 944.Fa yyleng 945set up appropriately. 946It may either be one which matched as much text 947as the originally chosen rule but came later in the 948.Nm 949input file, or one which matched less text. 950For example, the following will both count the 951words in the input and call the routine 952.Fn special 953whenever 954.Qq frob 955is seen: 956.Bd -literal -offset indent 957int word_count = 0; 958%% 959 960frob special(); REJECT; 961[^ \et\en]+ ++word_count; 962.Ed 963.Pp 964Without the 965.Em REJECT , 966any "frob"'s in the input would not be counted as words, 967since the scanner normally executes only one action per token. 968Multiple 969.Em REJECT Ns 's 970are allowed, 971each one finding the next best choice to the currently active rule. 972For example, when the following scanner scans the token 973.Qq abcd , 974it will write 975.Qq abcdabcaba 976to the output: 977.Bd -literal -offset indent 978%% 979a | 980ab | 981abc | 982abcd ECHO; REJECT; 983\&.|\en /* eat up any unmatched character */ 984.Ed 985.Pp 986(The first three rules share the fourth's action since they use 987the special 988.Sq |\& 989action.) 990.Em REJECT 991is a particularly expensive feature in terms of scanner performance; 992if it is used in any of the scanner's actions it will slow down 993all of the scanner's matching. 994Furthermore, 995.Em REJECT 996cannot be used with the 997.Fl Cf 998or 999.Fl CF 1000options 1001.Pq see below . 1002.Pp 1003Note also that unlike the other special actions, 1004.Em REJECT 1005is a 1006.Em branch ; 1007code immediately following it in the action will not be executed. 1008.It yymore() 1009Tells the scanner that the next time it matches a rule, the corresponding 1010token should be appended onto the current value of 1011.Fa yytext 1012rather than replacing it. 1013For example, given the input 1014.Qq mega-kludge 1015the following will write 1016.Qq mega-mega-kludge 1017to the output: 1018.Bd -literal -offset indent 1019%% 1020mega- ECHO; yymore(); 1021kludge ECHO; 1022.Ed 1023.Pp 1024First 1025.Qq mega- 1026is matched and echoed to the output. 1027Then 1028.Qq kludge 1029is matched, but the previous 1030.Qq mega- 1031is still hanging around at the beginning of 1032.Fa yytext 1033so the 1034.Em ECHO 1035for the 1036.Qq kludge 1037rule will actually write 1038.Qq mega-kludge . 1039.Pp 1040Two notes regarding use of 1041.Fn yymore : 1042First, 1043.Fn yymore 1044depends on the value of 1045.Fa yyleng 1046correctly reflecting the size of the current token, so 1047.Fa yyleng 1048must not be modified when using 1049.Fn yymore . 1050Second, the presence of 1051.Fn yymore 1052in the scanner's action entails a minor performance penalty in the 1053scanner's matching speed. 1054.It yyless(n) 1055Returns all but the first 1056.Ar n 1057characters of the current token back to the input stream, where they 1058will be rescanned when the scanner looks for the next match. 1059.Fa yytext 1060and 1061.Fa yyleng 1062are adjusted appropriately (e.g., 1063.Fa yyleng 1064will now be equal to 1065.Ar n ) . 1066For example, on the input 1067.Qq foobar 1068the following will write out 1069.Qq foobarbar : 1070.Bd -literal -offset indent 1071%% 1072foobar ECHO; yyless(3); 1073[a-z]+ ECHO; 1074.Ed 1075.Pp 1076An argument of 0 to 1077.Fa yyless 1078will cause the entire current input string to be scanned again. 1079Unless how the scanner will subsequently process its input has been changed 1080(using 1081.Em BEGIN , 1082for example), 1083this will result in an endless loop. 1084.Pp 1085Note that 1086.Fa yyless 1087is a macro and can only be used in the 1088.Nm 1089input file, not from other source files. 1090.It unput(c) 1091Puts the character 1092.Ar c 1093back into the input stream. 1094It will be the next character scanned. 1095The following action will take the current token and cause it 1096to be rescanned enclosed in parentheses. 1097.Bd -literal -offset indent 1098{ 1099 int i; 1100 char *yycopy; 1101 1102 /* Copy yytext because unput() trashes yytext */ 1103 if ((yycopy = strdup(yytext)) == NULL) 1104 err(1, NULL); 1105 unput(')'); 1106 for (i = yyleng - 1; i >= 0; --i) 1107 unput(yycopy[i]); 1108 unput('('); 1109 free(yycopy); 1110} 1111.Ed 1112.Pp 1113Note that since each 1114.Fn unput 1115puts the given character back at the beginning of the input stream, 1116pushing back strings must be done back-to-front. 1117.Pp 1118An important potential problem when using 1119.Fn unput 1120is that if using 1121.Dq %pointer 1122.Pq the default , 1123a call to 1124.Fn unput 1125destroys the contents of 1126.Fa yytext , 1127starting with its rightmost character and devouring one character to 1128the left with each call. 1129If the value of 1130.Fa yytext 1131should be preserved after a call to 1132.Fn unput 1133.Pq as in the above example , 1134it must either first be copied elsewhere, or the scanner must be built using 1135.Dq %array 1136instead (see 1137.Sx HOW THE INPUT IS MATCHED ) . 1138.Pp 1139Finally, note that EOF cannot be put back 1140to attempt to mark the input stream with an end-of-file. 1141.It input() 1142Reads the next character from the input stream. 1143For example, the following is one way to eat up C comments: 1144.Bd -literal -offset indent 1145%% 1146"/*" { 1147 int c; 1148 1149 for (;;) { 1150 while ((c = input()) != '*' && c != EOF) 1151 ; /* eat up text of comment */ 1152 1153 if (c == '*') { 1154 while ((c = input()) == '*') 1155 ; 1156 if (c == '/') 1157 break; /* found the end */ 1158 } 1159 1160 if (c == EOF) { 1161 errx(1, "EOF in comment"); 1162 break; 1163 } 1164 } 1165} 1166.Ed 1167.Pp 1168(Note that if the scanner is compiled using C++, then 1169.Fn input 1170is instead referred to as 1171.Fn yyinput , 1172in order to avoid a name clash with the C++ stream by the name of input.) 1173.It YY_FLUSH_BUFFER 1174Flushes the scanner's internal buffer 1175so that the next time the scanner attempts to match a token, 1176it will first refill the buffer using 1177.Dv YY_INPUT 1178(see 1179.Sx THE GENERATED SCANNER , 1180below). 1181This action is a special case of the more general 1182.Fn yy_flush_buffer 1183function, described below in the section 1184.Sx MULTIPLE INPUT BUFFERS . 1185.It yyterminate() 1186Can be used in lieu of a return statement in an action. 1187It terminates the scanner and returns a 0 to the scanner's caller, indicating 1188.Qq all done . 1189By default, 1190.Fn yyterminate 1191is also called when an end-of-file is encountered. 1192It is a macro and may be redefined. 1193.El 1194.Sh THE GENERATED SCANNER 1195The output of 1196.Nm 1197is the file 1198.Pa lex.yy.c , 1199which contains the scanning routine 1200.Fn yylex , 1201a number of tables used by it for matching tokens, 1202and a number of auxiliary routines and macros. 1203By default, 1204.Fn yylex 1205is declared as follows: 1206.Bd -unfilled -offset indent 1207int yylex() 1208{ 1209 ... various definitions and the actions in here ... 1210} 1211.Ed 1212.Pp 1213(If the environment supports function prototypes, then it will 1214be "int yylex(void)".) 1215This definition may be changed by defining the 1216.Dv YY_DECL 1217macro. 1218For example: 1219.Bd -literal -offset indent 1220#define YY_DECL float lexscan(a, b) float a, b; 1221.Ed 1222.Pp 1223would give the scanning routine the name 1224.Em lexscan , 1225returning a float, and taking two floats as arguments. 1226Note that if arguments are given to the scanning routine using a 1227K&R-style/non-prototyped function declaration, 1228the definition must be terminated with a semi-colon 1229.Pq Sq ;\& . 1230.Pp 1231Whenever 1232.Fn yylex 1233is called, it scans tokens from the global input file 1234.Pa yyin 1235.Pq which defaults to stdin . 1236It continues until it either reaches an end-of-file 1237.Pq at which point it returns the value 0 1238or one of its actions executes a 1239.Em return 1240statement. 1241.Pp 1242If the scanner reaches an end-of-file, subsequent calls are undefined 1243unless either 1244.Em yyin 1245is pointed at a new input file 1246.Pq in which case scanning continues from that file , 1247or 1248.Fn yyrestart 1249is called. 1250.Fn yyrestart 1251takes one argument, a 1252.Fa FILE * 1253pointer (which can be nil, if 1254.Dv YY_INPUT 1255has been set up to scan from a source other than 1256.Em yyin ) , 1257and initializes 1258.Em yyin 1259for scanning from that file. 1260Essentially there is no difference between just assigning 1261.Em yyin 1262to a new input file or using 1263.Fn yyrestart 1264to do so; the latter is available for compatibility with previous versions of 1265.Nm , 1266and because it can be used to switch input files in the middle of scanning. 1267It can also be used to throw away the current input buffer, 1268by calling it with an argument of 1269.Em yyin ; 1270but better is to use 1271.Dv YY_FLUSH_BUFFER 1272.Pq see above . 1273Note that 1274.Fn yyrestart 1275does not reset the start condition to 1276.Em INITIAL 1277(see 1278.Sx START CONDITIONS , 1279below). 1280.Pp 1281If 1282.Fn yylex 1283stops scanning due to executing a 1284.Em return 1285statement in one of the actions, the scanner may then be called again and it 1286will resume scanning where it left off. 1287.Pp 1288By default 1289.Pq and for purposes of efficiency , 1290the scanner uses block-reads rather than simple 1291.Xr getc 3 1292calls to read characters from 1293.Em yyin . 1294The nature of how it gets its input can be controlled by defining the 1295.Dv YY_INPUT 1296macro. 1297.Dv YY_INPUT Ns 's 1298calling sequence is 1299.Qq YY_INPUT(buf,result,max_size) . 1300Its action is to place up to 1301.Dv max_size 1302characters in the character array 1303.Em buf 1304and return in the integer variable 1305.Em result 1306either the number of characters read or the constant 1307.Dv YY_NULL 1308(0 on 1309.Ux 1310systems) 1311to indicate 1312.Dv EOF . 1313The default 1314.Dv YY_INPUT 1315reads from the global file-pointer 1316.Qq yyin . 1317.Pp 1318A sample definition of 1319.Dv YY_INPUT 1320.Pq in the definitions section of the input file : 1321.Bd -unfilled -offset indent 1322%{ 1323#define YY_INPUT(buf,result,max_size) \e 1324{ \e 1325 int c = getchar(); \e 1326 result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \e 1327} 1328%} 1329.Ed 1330.Pp 1331This definition will change the input processing to occur 1332one character at a time. 1333.Pp 1334When the scanner receives an end-of-file indication from 1335.Dv YY_INPUT , 1336it then checks the 1337.Fn yywrap 1338function. 1339If 1340.Fn yywrap 1341returns false 1342.Pq zero , 1343then it is assumed that the function has gone ahead and set up 1344.Em yyin 1345to point to another input file, and scanning continues. 1346If it returns true 1347.Pq non-zero , 1348then the scanner terminates, returning 0 to its caller. 1349Note that in either case, the start condition remains unchanged; 1350it does not revert to 1351.Em INITIAL . 1352.Pp 1353If you do not supply your own version of 1354.Fn yywrap , 1355then you must either use 1356.Dq %option noyywrap 1357(in which case the scanner behaves as though 1358.Fn yywrap 1359returned 1), or you must link with 1360.Fl lfl 1361to obtain the default version of the routine, which always returns 1. 1362.Pp 1363Three routines are available for scanning from in-memory buffers rather 1364than files: 1365.Fn yy_scan_string , 1366.Fn yy_scan_bytes , 1367and 1368.Fn yy_scan_buffer . 1369See the discussion of them below in the section 1370.Sx MULTIPLE INPUT BUFFERS . 1371.Pp 1372The scanner writes its 1373.Em ECHO 1374output to the 1375.Em yyout 1376global 1377.Pq default, stdout , 1378which may be redefined by the user simply by assigning it to some other 1379.Va FILE 1380pointer. 1381.Sh START CONDITIONS 1382.Nm 1383provides a mechanism for conditionally activating rules. 1384Any rule whose pattern is prefixed with 1385.Qq Aq sc 1386will only be active when the scanner is in the start condition named 1387.Qq sc . 1388For example, 1389.Bd -literal -offset indent 1390<STRING>[^"]* { /* eat up the string body ... */ 1391 ... 1392} 1393.Ed 1394.Pp 1395will be active only when the scanner is in the 1396.Qq STRING 1397start condition, and 1398.Bd -literal -offset indent 1399<INITIAL,STRING,QUOTE>\e. { /* handle an escape ... */ 1400 ... 1401} 1402.Ed 1403.Pp 1404will be active only when the current start condition is either 1405.Qq INITIAL , 1406.Qq STRING , 1407or 1408.Qq QUOTE . 1409.Pp 1410Start conditions are declared in the definitions 1411.Pq first 1412section of the input using unindented lines beginning with either 1413.Sq %s 1414or 1415.Sq %x 1416followed by a list of names. 1417The former declares 1418.Em inclusive 1419start conditions, the latter 1420.Em exclusive 1421start conditions. 1422A start condition is activated using the 1423.Em BEGIN 1424action. 1425Until the next 1426.Em BEGIN 1427action is executed, rules with the given start condition will be active and 1428rules with other start conditions will be inactive. 1429If the start condition is inclusive, 1430then rules with no start conditions at all will also be active. 1431If it is exclusive, 1432then only rules qualified with the start condition will be active. 1433A set of rules contingent on the same exclusive start condition 1434describe a scanner which is independent of any of the other rules in the 1435.Nm 1436input. 1437Because of this, exclusive start conditions make it easy to specify 1438.Qq mini-scanners 1439which scan portions of the input that are syntactically different 1440from the rest 1441.Pq e.g., comments . 1442.Pp 1443If the distinction between inclusive and exclusive start conditions 1444is still a little vague, here's a simple example illustrating the 1445connection between the two. 1446The set of rules: 1447.Bd -literal -offset indent 1448%s example 1449%% 1450 1451<example>foo do_something(); 1452 1453bar something_else(); 1454.Ed 1455.Pp 1456is equivalent to 1457.Bd -literal -offset indent 1458%x example 1459%% 1460 1461<example>foo do_something(); 1462 1463<INITIAL,example>bar something_else(); 1464.Ed 1465.Pp 1466Without the 1467.Aq INITIAL,example 1468qualifier, the 1469.Dq bar 1470pattern in the second example wouldn't be active 1471.Pq i.e., couldn't match 1472when in start condition 1473.Dq example . 1474If we just used 1475.Aq example 1476to qualify 1477.Dq bar , 1478though, then it would only be active in 1479.Dq example 1480and not in 1481.Em INITIAL , 1482while in the first example it's active in both, 1483because in the first example the 1484.Dq example 1485start condition is an inclusive 1486.Pq Sq %s 1487start condition. 1488.Pp 1489Also note that the special start-condition specifier 1490.Sq Aq * 1491matches every start condition. 1492Thus, the above example could also have been written: 1493.Bd -literal -offset indent 1494%x example 1495%% 1496 1497<example>foo do_something(); 1498 1499<*>bar something_else(); 1500.Ed 1501.Pp 1502The default rule (to 1503.Em ECHO 1504any unmatched character) remains active in start conditions. 1505It is equivalent to: 1506.Bd -literal -offset indent 1507<*>.|\en ECHO; 1508.Ed 1509.Pp 1510.Dq BEGIN(0) 1511returns to the original state where only the rules with 1512no start conditions are active. 1513This state can also be referred to as the start-condition 1514.Em INITIAL , 1515so 1516.Dq BEGIN(INITIAL) 1517is equivalent to 1518.Dq BEGIN(0) . 1519(The parentheses around the start condition name are not required but 1520are considered good style.) 1521.Pp 1522.Em BEGIN 1523actions can also be given as indented code at the beginning 1524of the rules section. 1525For example, the following will cause the scanner to enter the 1526.Qq SPECIAL 1527start condition whenever 1528.Fn yylex 1529is called and the global variable 1530.Fa enter_special 1531is true: 1532.Bd -literal -offset indent 1533int enter_special; 1534 1535%x SPECIAL 1536%% 1537 if (enter_special) 1538 BEGIN(SPECIAL); 1539 1540<SPECIAL>blahblahblah 1541\&...more rules follow... 1542.Ed 1543.Pp 1544To illustrate the uses of start conditions, 1545here is a scanner which provides two different interpretations 1546of a string like 1547.Qq 123.456 . 1548By default it will treat it as three tokens: the integer 1549.Qq 123 , 1550a dot 1551.Pq Sq .\& , 1552and the integer 1553.Qq 456 . 1554But if the string is preceded earlier in the line by the string 1555.Qq expect-floats 1556it will treat it as a single token, the floating-point number 123.456: 1557.Bd -literal -offset indent 1558%{ 1559#include <math.h> 1560%} 1561%s expect 1562 1563%% 1564expect-floats BEGIN(expect); 1565 1566<expect>[0-9]+"."[0-9]+ { 1567 printf("found a float, = %f\en", 1568 atof(yytext)); 1569} 1570<expect>\en { 1571 /* 1572 * That's the end of the line, so 1573 * we need another "expect-number" 1574 * before we'll recognize any more 1575 * numbers. 1576 */ 1577 BEGIN(INITIAL); 1578} 1579 1580[0-9]+ { 1581 printf("found an integer, = %d\en", 1582 atoi(yytext)); 1583} 1584 1585"." printf("found a dot\en"); 1586.Ed 1587.Pp 1588Here is a scanner which recognizes 1589.Pq and discards 1590C comments while maintaining a count of the current input line: 1591.Bd -literal -offset indent 1592%x comment 1593%% 1594int line_num = 1; 1595 1596"/*" BEGIN(comment); 1597 1598<comment>[^*\en]* /* eat anything that's not a '*' */ 1599<comment>"*"+[^*/\en]* /* eat up '*'s not followed by '/'s */ 1600<comment>\en ++line_num; 1601<comment>"*"+"/" BEGIN(INITIAL); 1602.Ed 1603.Pp 1604This scanner goes to a bit of trouble to match as much 1605text as possible with each rule. 1606In general, when attempting to write a high-speed scanner 1607try to match as much as possible in each rule, as it's a big win. 1608.Pp 1609Note that start-condition names are really integer values and 1610can be stored as such. 1611Thus, the above could be extended in the following fashion: 1612.Bd -literal -offset indent 1613%x comment foo 1614%% 1615int line_num = 1; 1616int comment_caller; 1617 1618"/*" { 1619 comment_caller = INITIAL; 1620 BEGIN(comment); 1621} 1622 1623\&... 1624 1625<foo>"/*" { 1626 comment_caller = foo; 1627 BEGIN(comment); 1628} 1629 1630<comment>[^*\en]* /* eat anything that's not a '*' */ 1631<comment>"*"+[^*/\en]* /* eat up '*'s not followed by '/'s */ 1632<comment>\en ++line_num; 1633<comment>"*"+"/" BEGIN(comment_caller); 1634.Ed 1635.Pp 1636Furthermore, the current start condition can be accessed by using 1637the integer-valued 1638.Dv YY_START 1639macro. 1640For example, the above assignments to 1641.Em comment_caller 1642could instead be written 1643.Pp 1644.Dl comment_caller = YY_START; 1645.Pp 1646Flex provides 1647.Dv YYSTATE 1648as an alias for 1649.Dv YY_START 1650(since that is what's used by 1651.At 1652.Nm lex ) . 1653.Pp 1654Note that start conditions do not have their own name-space; 1655%s's and %x's declare names in the same fashion as #define's. 1656.Pp 1657Finally, here's an example of how to match C-style quoted strings using 1658exclusive start conditions, including expanded escape sequences 1659(but not including checking for a string that's too long): 1660.Bd -literal -offset indent 1661%x str 1662 1663%% 1664#define MAX_STR_CONST 1024 1665char string_buf[MAX_STR_CONST]; 1666char *string_buf_ptr; 1667 1668\e" string_buf_ptr = string_buf; BEGIN(str); 1669 1670<str>\e" { /* saw closing quote - all done */ 1671 BEGIN(INITIAL); 1672 *string_buf_ptr = '\e0'; 1673 /* 1674 * return string constant token type and 1675 * value to parser 1676 */ 1677} 1678 1679<str>\en { 1680 /* error - unterminated string constant */ 1681 /* generate error message */ 1682} 1683 1684<str>\e\e[0-7]{1,3} { 1685 /* octal escape sequence */ 1686 int result; 1687 1688 (void) sscanf(yytext + 1, "%o", &result); 1689 1690 if (result > 0xff) { 1691 /* error, constant is out-of-bounds */ 1692 } else 1693 *string_buf_ptr++ = result; 1694} 1695 1696<str>\e\e[0-9]+ { 1697 /* 1698 * generate error - bad escape sequence; something 1699 * like '\e48' or '\e0777777' 1700 */ 1701} 1702 1703<str>\e\en *string_buf_ptr++ = '\en'; 1704<str>\e\et *string_buf_ptr++ = '\et'; 1705<str>\e\er *string_buf_ptr++ = '\er'; 1706<str>\e\eb *string_buf_ptr++ = '\eb'; 1707<str>\e\ef *string_buf_ptr++ = '\ef'; 1708 1709<str>\e\e(.|\en) *string_buf_ptr++ = yytext[1]; 1710 1711<str>[^\e\e\en\e"]+ { 1712 char *yptr = yytext; 1713 1714 while (*yptr) 1715 *string_buf_ptr++ = *yptr++; 1716} 1717.Ed 1718.Pp 1719Often, such as in some of the examples above, 1720a whole bunch of rules are all preceded by the same start condition(s). 1721.Nm 1722makes this a little easier and cleaner by introducing a notion of 1723start condition 1724.Em scope . 1725A start condition scope is begun with: 1726.Pp 1727.Dl <SCs>{ 1728.Pp 1729where 1730.Dq SCs 1731is a list of one or more start conditions. 1732Inside the start condition scope, every rule automatically has the prefix 1733.Aq SCs 1734applied to it, until a 1735.Sq } 1736which matches the initial 1737.Sq { . 1738So, for example, 1739.Bd -literal -offset indent 1740<ESC>{ 1741 "\e\en" return '\en'; 1742 "\e\er" return '\er'; 1743 "\e\ef" return '\ef'; 1744 "\e\e0" return '\e0'; 1745} 1746.Ed 1747.Pp 1748is equivalent to: 1749.Bd -literal -offset indent 1750<ESC>"\e\en" return '\en'; 1751<ESC>"\e\er" return '\er'; 1752<ESC>"\e\ef" return '\ef'; 1753<ESC>"\e\e0" return '\e0'; 1754.Ed 1755.Pp 1756Start condition scopes may be nested. 1757.Pp 1758Three routines are available for manipulating stacks of start conditions: 1759.Bl -tag -width Ds 1760.It void yy_push_state(int new_state) 1761Pushes the current start condition onto the top of the start condition 1762stack and switches to 1763.Fa new_state 1764as though 1765.Dq BEGIN new_state 1766had been used 1767.Pq recall that start condition names are also integers . 1768.It void yy_pop_state() 1769Pops the top of the stack and switches to it via 1770.Em BEGIN . 1771.It int yy_top_state() 1772Returns the top of the stack without altering the stack's contents. 1773.El 1774.Pp 1775The start condition stack grows dynamically and so has no built-in 1776size limitation. 1777If memory is exhausted, program execution aborts. 1778.Pp 1779To use start condition stacks, scanners must include a 1780.Dq %option stack 1781directive (see 1782.Sx OPTIONS 1783below). 1784.Sh MULTIPLE INPUT BUFFERS 1785Some scanners 1786(such as those which support 1787.Qq include 1788files) 1789require reading from several input streams. 1790As 1791.Nm 1792scanners do a large amount of buffering, one cannot control 1793where the next input will be read from by simply writing a 1794.Dv YY_INPUT 1795which is sensitive to the scanning context. 1796.Dv YY_INPUT 1797is only called when the scanner reaches the end of its buffer, which 1798may be a long time after scanning a statement such as an 1799.Qq include 1800which requires switching the input source. 1801.Pp 1802To negotiate these sorts of problems, 1803.Nm 1804provides a mechanism for creating and switching between multiple 1805input buffers. 1806An input buffer is created by using: 1807.Pp 1808.D1 YY_BUFFER_STATE yy_create_buffer(FILE *file, int size) 1809.Pp 1810which takes a 1811.Fa FILE 1812pointer and a 1813.Fa size 1814and creates a buffer associated with the given file and large enough to hold 1815.Fa size 1816characters (when in doubt, use 1817.Dv YY_BUF_SIZE 1818for the size). 1819It returns a 1820.Dv YY_BUFFER_STATE 1821handle, which may then be passed to other routines 1822.Pq see below . 1823The 1824.Dv YY_BUFFER_STATE 1825type is a pointer to an opaque 1826.Dq struct yy_buffer_state 1827structure, so 1828.Dv YY_BUFFER_STATE 1829variables may be safely initialized to 1830.Dq ((YY_BUFFER_STATE) 0) 1831if desired, and the opaque structure can also be referred to in order to 1832correctly declare input buffers in source files other than that of scanners. 1833Note that the 1834.Fa FILE 1835pointer in the call to 1836.Fn yy_create_buffer 1837is only used as the value of 1838.Fa yyin 1839seen by 1840.Dv YY_INPUT ; 1841if 1842.Dv YY_INPUT 1843is redefined so that it no longer uses 1844.Fa yyin , 1845then a nil 1846.Fa FILE 1847pointer can safely be passed to 1848.Fn yy_create_buffer . 1849To select a particular buffer to scan: 1850.Pp 1851.D1 void yy_switch_to_buffer(YY_BUFFER_STATE new_buffer) 1852.Pp 1853It switches the scanner's input buffer so subsequent tokens will 1854come from 1855.Fa new_buffer . 1856Note that 1857.Fn yy_switch_to_buffer 1858may be used by 1859.Fn yywrap 1860to set things up for continued scanning, 1861instead of opening a new file and pointing 1862.Fa yyin 1863at it. 1864Note also that switching input sources via either 1865.Fn yy_switch_to_buffer 1866or 1867.Fn yywrap 1868does not change the start condition. 1869.Pp 1870.D1 void yy_delete_buffer(YY_BUFFER_STATE buffer) 1871.Pp 1872is used to reclaim the storage associated with a buffer. 1873.Pf ( Fa buffer 1874can be nil, in which case the routine does nothing.) 1875To clear the current contents of a buffer: 1876.Pp 1877.D1 void yy_flush_buffer(YY_BUFFER_STATE buffer) 1878.Pp 1879This function discards the buffer's contents, 1880so the next time the scanner attempts to match a token from the buffer, 1881it will first fill the buffer anew using 1882.Dv YY_INPUT . 1883.Pp 1884.Fn yy_new_buffer 1885is an alias for 1886.Fn yy_create_buffer , 1887provided for compatibility with the C++ use of 1888.Em new 1889and 1890.Em delete 1891for creating and destroying dynamic objects. 1892.Pp 1893Finally, the 1894.Dv YY_CURRENT_BUFFER 1895macro returns a 1896.Dv YY_BUFFER_STATE 1897handle to the current buffer. 1898.Pp 1899Here is an example of using these features for writing a scanner 1900which expands include files (the 1901.Aq Aq EOF 1902feature is discussed below): 1903.Bd -literal -offset indent 1904/* 1905 * the "incl" state is used for picking up the name 1906 * of an include file 1907 */ 1908%x incl 1909 1910%{ 1911#define MAX_INCLUDE_DEPTH 10 1912YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH]; 1913int include_stack_ptr = 0; 1914%} 1915 1916%% 1917include BEGIN(incl); 1918 1919[a-z]+ ECHO; 1920[^a-z\en]*\en? ECHO; 1921 1922<incl>[ \et]* /* eat the whitespace */ 1923<incl>[^ \et\en]+ { /* got the include file name */ 1924 if (include_stack_ptr >= MAX_INCLUDE_DEPTH) 1925 errx(1, "Includes nested too deeply"); 1926 1927 include_stack[include_stack_ptr++] = 1928 YY_CURRENT_BUFFER; 1929 1930 yyin = fopen(yytext, "r"); 1931 1932 if (yyin == NULL) 1933 err(1, NULL); 1934 1935 yy_switch_to_buffer( 1936 yy_create_buffer(yyin, YY_BUF_SIZE)); 1937 1938 BEGIN(INITIAL); 1939} 1940 1941<<EOF>> { 1942 if (--include_stack_ptr < 0) 1943 yyterminate(); 1944 else { 1945 yy_delete_buffer(YY_CURRENT_BUFFER); 1946 yy_switch_to_buffer( 1947 include_stack[include_stack_ptr]); 1948 } 1949} 1950.Ed 1951.Pp 1952Three routines are available for setting up input buffers for 1953scanning in-memory strings instead of files. 1954All of them create a new input buffer for scanning the string, 1955and return a corresponding 1956.Dv YY_BUFFER_STATE 1957handle (which should be deleted afterwards using 1958.Fn yy_delete_buffer ) . 1959They also switch to the new buffer using 1960.Fn yy_switch_to_buffer , 1961so the next call to 1962.Fn yylex 1963will start scanning the string. 1964.Bl -tag -width Ds 1965.It yy_scan_string(const char *str) 1966Scans a NUL-terminated string. 1967.It yy_scan_bytes(const char *bytes, int len) 1968Scans 1969.Fa len 1970bytes 1971.Pq including possibly NUL's 1972starting at location 1973.Fa bytes . 1974.El 1975.Pp 1976Note that both of these functions create and scan a copy 1977of the string or bytes. 1978(This may be desirable, since 1979.Fn yylex 1980modifies the contents of the buffer it is scanning.) 1981The copy can be avoided by using: 1982.Bl -tag -width Ds 1983.It yy_scan_buffer(char *base, yy_size_t size) 1984Which scans the buffer starting at 1985.Fa base , 1986consisting of 1987.Fa size 1988bytes, the last two bytes of which must be 1989.Dv YY_END_OF_BUFFER_CHAR 1990.Pq ASCII NUL . 1991These last two bytes are not scanned; thus, scanning consists of 1992base[0] through base[size-2], inclusive. 1993.Pp 1994If 1995.Fa base 1996is not set up in this manner 1997(i.e., forget the final two 1998.Dv YY_END_OF_BUFFER_CHAR 1999bytes), then 2000.Fn yy_scan_buffer 2001returns a nil pointer instead of creating a new input buffer. 2002.Pp 2003The type 2004.Fa yy_size_t 2005is an integral type which can be cast to an integer expression 2006reflecting the size of the buffer. 2007.El 2008.Sh END-OF-FILE RULES 2009The special rule 2010.Qq Aq Aq EOF 2011indicates actions which are to be taken when an end-of-file is encountered and 2012.Fn yywrap 2013returns non-zero 2014.Pq i.e., indicates no further files to process . 2015The action must finish by doing one of four things: 2016.Bl -dash 2017.It 2018Assigning 2019.Em yyin 2020to a new input file 2021(in previous versions of 2022.Nm , 2023after doing the assignment, it was necessary to call the special action 2024.Dv YY_NEW_FILE ; 2025this is no longer necessary). 2026.It 2027Executing a 2028.Em return 2029statement. 2030.It 2031Executing the special 2032.Fn yyterminate 2033action. 2034.It 2035Switching to a new buffer using 2036.Fn yy_switch_to_buffer 2037as shown in the example above. 2038.El 2039.Pp 2040.Aq Aq EOF 2041rules may not be used with other patterns; 2042they may only be qualified with a list of start conditions. 2043If an unqualified 2044.Aq Aq EOF 2045rule is given, it applies to all start conditions which do not already have 2046.Aq Aq EOF 2047actions. 2048To specify an 2049.Aq Aq EOF 2050rule for only the initial start condition, use 2051.Pp 2052.Dl <INITIAL><<EOF>> 2053.Pp 2054These rules are useful for catching things like unclosed comments. 2055An example: 2056.Bd -literal -offset indent 2057%x quote 2058%% 2059 2060\&...other rules for dealing with quotes... 2061 2062<quote><<EOF>> { 2063 error("unterminated quote"); 2064 yyterminate(); 2065} 2066<<EOF>> { 2067 if (*++filelist) 2068 yyin = fopen(*filelist, "r"); 2069 else 2070 yyterminate(); 2071} 2072.Ed 2073.Sh MISCELLANEOUS MACROS 2074The macro 2075.Dv YY_USER_ACTION 2076can be defined to provide an action 2077which is always executed prior to the matched rule's action. 2078For example, 2079it could be #define'd to call a routine to convert yytext to lower-case. 2080When 2081.Dv YY_USER_ACTION 2082is invoked, the variable 2083.Fa yy_act 2084gives the number of the matched rule 2085.Pq rules are numbered starting with 1 . 2086For example, to profile how often each rule is matched, 2087the following would do the trick: 2088.Pp 2089.Dl #define YY_USER_ACTION ++ctr[yy_act] 2090.Pp 2091where 2092.Fa ctr 2093is an array to hold the counts for the different rules. 2094Note that the macro 2095.Dv YY_NUM_RULES 2096gives the total number of rules 2097(including the default rule, even if 2098.Fl s 2099is used), 2100so a correct declaration for 2101.Fa ctr 2102is: 2103.Pp 2104.Dl int ctr[YY_NUM_RULES]; 2105.Pp 2106The macro 2107.Dv YY_USER_INIT 2108may be defined to provide an action which is always executed before 2109the first scan 2110.Pq and before the scanner's internal initializations are done . 2111For example, it could be used to call a routine to read 2112in a data table or open a logging file. 2113.Pp 2114The macro 2115.Dv yy_set_interactive(is_interactive) 2116can be used to control whether the current buffer is considered 2117.Em interactive . 2118An interactive buffer is processed more slowly, 2119but must be used when the scanner's input source is indeed 2120interactive to avoid problems due to waiting to fill buffers 2121(see the discussion of the 2122.Fl I 2123flag below). 2124A non-zero value in the macro invocation marks the buffer as interactive, 2125a zero value as non-interactive. 2126Note that use of this macro overrides 2127.Dq %option always-interactive 2128or 2129.Dq %option never-interactive 2130(see 2131.Sx OPTIONS 2132below). 2133.Fn yy_set_interactive 2134must be invoked prior to beginning to scan the buffer that is 2135.Pq or is not 2136to be considered interactive. 2137.Pp 2138The macro 2139.Dv yy_set_bol(at_bol) 2140can be used to control whether the current buffer's scanning 2141context for the next token match is done as though at the 2142beginning of a line. 2143A non-zero macro argument makes rules anchored with 2144.Sq ^ 2145active, while a zero argument makes 2146.Sq ^ 2147rules inactive. 2148.Pp 2149The macro 2150.Dv YY_AT_BOL 2151returns true if the next token scanned from the current buffer will have 2152.Sq ^ 2153rules active, false otherwise. 2154.Pp 2155In the generated scanner, the actions are all gathered in one large 2156switch statement and separated using 2157.Dv YY_BREAK , 2158which may be redefined. 2159By default, it is simply a 2160.Qq break , 2161to separate each rule's action from the following rules. 2162Redefining 2163.Dv YY_BREAK 2164allows, for example, C++ users to 2165.Dq #define YY_BREAK 2166to do nothing 2167(while being very careful that every rule ends with a 2168.Qq break 2169or a 2170.Qq return ! ) 2171to avoid suffering from unreachable statement warnings where because a rule's 2172action ends with 2173.Dq return , 2174the 2175.Dv YY_BREAK 2176is inaccessible. 2177.Sh VALUES AVAILABLE TO THE USER 2178This section summarizes the various values available to the user 2179in the rule actions. 2180.Bl -tag -width Ds 2181.It char *yytext 2182Holds the text of the current token. 2183It may be modified but not lengthened 2184.Pq characters cannot be appended to the end . 2185.Pp 2186If the special directive 2187.Dq %array 2188appears in the first section of the scanner description, then 2189.Fa yytext 2190is instead declared 2191.Dq char yytext[YYLMAX] , 2192where 2193.Dv YYLMAX 2194is a macro definition that can be redefined in the first section 2195to change the default value 2196.Pq generally 8KB . 2197Using 2198.Dq %array 2199results in somewhat slower scanners, but the value of 2200.Fa yytext 2201becomes immune to calls to 2202.Fn input 2203and 2204.Fn unput , 2205which potentially destroy its value when 2206.Fa yytext 2207is a character pointer. 2208The opposite of 2209.Dq %array 2210is 2211.Dq %pointer , 2212which is the default. 2213.Pp 2214.Dq %array 2215cannot be used when generating C++ scanner classes 2216(the 2217.Fl + 2218flag). 2219.It int yyleng 2220Holds the length of the current token. 2221.It FILE *yyin 2222Is the file which by default 2223.Nm 2224reads from. 2225It may be redefined, but doing so only makes sense before 2226scanning begins or after an 2227.Dv EOF 2228has been encountered. 2229Changing it in the midst of scanning will have unexpected results since 2230.Nm 2231buffers its input; use 2232.Fn yyrestart 2233instead. 2234Once scanning terminates because an end-of-file 2235has been seen, 2236.Fa yyin 2237can be assigned as the new input file 2238and the scanner can be called again to continue scanning. 2239.It void yyrestart(FILE *new_file) 2240May be called to point 2241.Fa yyin 2242at the new input file. 2243The switch-over to the new file is immediate 2244.Pq any previously buffered-up input is lost . 2245Note that calling 2246.Fn yyrestart 2247with 2248.Fa yyin 2249as an argument thus throws away the current input buffer and continues 2250scanning the same input file. 2251.It FILE *yyout 2252Is the file to which 2253.Em ECHO 2254actions are done. 2255It can be reassigned by the user. 2256.It YY_CURRENT_BUFFER 2257Returns a 2258.Dv YY_BUFFER_STATE 2259handle to the current buffer. 2260.It YY_START 2261Returns an integer value corresponding to the current start condition. 2262This value can subsequently be used with 2263.Em BEGIN 2264to return to that start condition. 2265.El 2266.Sh INTERFACING WITH YACC 2267One of the main uses of 2268.Nm 2269is as a companion to the 2270.Xr yacc 1 2271parser-generator. 2272yacc parsers expect to call a routine named 2273.Fn yylex 2274to find the next input token. 2275The routine is supposed to return the type of the next token 2276as well as putting any associated value in the global 2277.Fa yylval , 2278which is defined externally, 2279and can be a union or any other complex data structure. 2280To use 2281.Nm 2282with yacc, one specifies the 2283.Fl d 2284option to yacc to instruct it to generate the file 2285.Pa y.tab.h 2286containing definitions of all the 2287.Dq %tokens 2288appearing in the yacc input. 2289This file is then included in the 2290.Nm 2291scanner. 2292For example, if one of the tokens is 2293.Qq TOK_NUMBER , 2294part of the scanner might look like: 2295.Bd -literal -offset indent 2296%{ 2297#include "y.tab.h" 2298%} 2299 2300%% 2301 2302[0-9]+ yylval = atoi(yytext); return TOK_NUMBER; 2303.Ed 2304.Sh OPTIONS 2305.Nm 2306has the following options: 2307.Bl -tag -width Ds 2308.It Fl 7 2309Instructs 2310.Nm 2311to generate a 7-bit scanner, i.e., one which can only recognize 7-bit 2312characters in its input. 2313The advantage of using 2314.Fl 7 2315is that the scanner's tables can be up to half the size of those generated 2316using the 2317.Fl 8 2318option 2319.Pq see below . 2320The disadvantage is that such scanners often hang 2321or crash if their input contains an 8-bit character. 2322.Pp 2323Note, however, that unless generating a scanner using the 2324.Fl Cf 2325or 2326.Fl CF 2327table compression options, use of 2328.Fl 7 2329will save only a small amount of table space, 2330and make the scanner considerably less portable. 2331.Nm flex Ns 's 2332default behavior is to generate an 8-bit scanner unless 2333.Fl Cf 2334or 2335.Fl CF 2336is specified, in which case 2337.Nm 2338defaults to generating 7-bit scanners unless it was 2339configured to generate 8-bit scanners 2340(as will often be the case with non-USA sites). 2341It is possible tell whether 2342.Nm 2343generated a 7-bit or an 8-bit scanner by inspecting the flag summary in the 2344.Fl v 2345output as described below. 2346.Pp 2347Note that if 2348.Fl Cfe 2349or 2350.Fl CFe 2351are used 2352(the table compression options, but also using equivalence classes as 2353discussed below), 2354.Nm 2355still defaults to generating an 8-bit scanner, 2356since usually with these compression options full 8-bit tables 2357are not much more expensive than 7-bit tables. 2358.It Fl 8 2359Instructs 2360.Nm 2361to generate an 8-bit scanner, i.e., one which can recognize 8-bit 2362characters. 2363This flag is only needed for scanners generated using 2364.Fl Cf 2365or 2366.Fl CF , 2367as otherwise 2368.Nm 2369defaults to generating an 8-bit scanner anyway. 2370.Pp 2371See the discussion of 2372.Fl 7 2373above for 2374.Nm flex Ns 's 2375default behavior and the tradeoffs between 7-bit and 8-bit scanners. 2376.It Fl B 2377Instructs 2378.Nm 2379to generate a 2380.Em batch 2381scanner, the opposite of 2382.Em interactive 2383scanners generated by 2384.Fl I 2385.Pq see below . 2386In general, 2387.Fl B 2388is used when the scanner will never be used interactively, 2389and you want to squeeze a little more performance out of it. 2390If the aim is instead to squeeze out a lot more performance, 2391use the 2392.Fl Cf 2393or 2394.Fl CF 2395options 2396.Pq discussed below , 2397which turn on 2398.Fl B 2399automatically anyway. 2400.It Fl b 2401Generate backing-up information to 2402.Pa lex.backup . 2403This is a list of scanner states which require backing up 2404and the input characters on which they do so. 2405By adding rules one can remove backing-up states. 2406If all backing-up states are eliminated and 2407.Fl Cf 2408or 2409.Fl CF 2410is used, the generated scanner will run faster (see the 2411.Fl p 2412flag). 2413Only users who wish to squeeze every last cycle out of their 2414scanners need worry about this option. 2415(See the section on 2416.Sx PERFORMANCE CONSIDERATIONS 2417below.) 2418.It Fl C Ns Op Cm aeFfmr 2419Controls the degree of table compression and, more generally, trade-offs 2420between small scanners and fast scanners. 2421.Bl -tag -width Ds 2422.It Fl Ca 2423Instructs 2424.Nm 2425to trade off larger tables in the generated scanner for faster performance 2426because the elements of the tables are better aligned for memory access 2427and computation. 2428On some 2429.Tn RISC 2430architectures, fetching and manipulating longwords is more efficient 2431than with smaller-sized units such as shortwords. 2432This option can double the size of the tables used by the scanner. 2433.It Fl Ce 2434Directs 2435.Nm 2436to construct 2437.Em equivalence classes , 2438i.e., sets of characters which have identical lexical properties 2439(for example, if the only appearance of digits in the 2440.Nm 2441input is in the character class 2442.Qq [0-9] 2443then the digits 2444.Sq 0 , 2445.Sq 1 , 2446.Sq ... , 2447.Sq 9 2448will all be put in the same equivalence class). 2449Equivalence classes usually give dramatic reductions in the final 2450table/object file sizes 2451.Pq typically a factor of 2\-5 2452and are pretty cheap performance-wise 2453.Pq one array look-up per character scanned . 2454.It Fl CF 2455Specifies that the alternate fast scanner representation 2456(described below under the 2457.Fl F 2458option) 2459should be used. 2460This option cannot be used with 2461.Fl + . 2462.It Fl Cf 2463Specifies that the 2464.Em full 2465scanner tables should be generated \- 2466.Nm 2467should not compress the tables by taking advantage of 2468similar transition functions for different states. 2469.It Fl \&Cm 2470Directs 2471.Nm 2472to construct 2473.Em meta-equivalence classes , 2474which are sets of equivalence classes 2475(or characters, if equivalence classes are not being used) 2476that are commonly used together. 2477Meta-equivalence classes are often a big win when using compressed tables, 2478but they have a moderate performance impact 2479(one or two 2480.Qq if 2481tests and one array look-up per character scanned). 2482.It Fl Cr 2483Causes the generated scanner to 2484.Em bypass 2485use of the standard I/O library 2486.Pq stdio 2487for input. 2488Instead of calling 2489.Xr fread 3 2490or 2491.Xr getc 3 , 2492the scanner will use the 2493.Xr read 2 2494system call, 2495resulting in a performance gain which varies from system to system, 2496but in general is probably negligible unless 2497.Fl Cf 2498or 2499.Fl CF 2500are being used. 2501Using 2502.Fl Cr 2503can cause strange behavior if, for example, reading from 2504.Fa yyin 2505using stdio prior to calling the scanner 2506(because the scanner will miss whatever text previous reads left 2507in the stdio input buffer). 2508.Pp 2509.Fl Cr 2510has no effect if 2511.Dv YY_INPUT 2512is defined 2513(see 2514.Sx THE GENERATED SCANNER 2515above). 2516.El 2517.Pp 2518A lone 2519.Fl C 2520specifies that the scanner tables should be compressed but neither 2521equivalence classes nor meta-equivalence classes should be used. 2522.Pp 2523The options 2524.Fl Cf 2525or 2526.Fl CF 2527and 2528.Fl \&Cm 2529do not make sense together \- there is no opportunity for meta-equivalence 2530classes if the table is not being compressed. 2531Otherwise the options may be freely mixed, and are cumulative. 2532.Pp 2533The default setting is 2534.Fl Cem 2535which specifies that 2536.Nm 2537should generate equivalence classes and meta-equivalence classes. 2538This setting provides the highest degree of table compression. 2539It is possible to trade off faster-executing scanners at the cost of 2540larger tables with the following generally being true: 2541.Bd -unfilled -offset indent 2542slowest & smallest 2543 -Cem 2544 -Cm 2545 -Ce 2546 -C 2547 -C{f,F}e 2548 -C{f,F} 2549 -C{f,F}a 2550fastest & largest 2551.Ed 2552.Pp 2553Note that scanners with the smallest tables are usually generated and 2554compiled the quickest, 2555so during development the default is usually best, 2556maximal compression. 2557.Pp 2558.Fl Cfe 2559is often a good compromise between speed and size for production scanners. 2560.It Fl d 2561Makes the generated scanner run in debug mode. 2562Whenever a pattern is recognized and the global 2563.Fa yy_flex_debug 2564is non-zero 2565.Pq which is the default , 2566the scanner will write to stderr a line of the form: 2567.Pp 2568.D1 --accepting rule at line 53 ("the matched text") 2569.Pp 2570The line number refers to the location of the rule in the file 2571defining the scanner 2572(i.e., the file that was fed to 2573.Nm ) . 2574Messages are also generated when the scanner backs up, 2575accepts the default rule, 2576reaches the end of its input buffer 2577(or encounters a NUL; 2578at this point, the two look the same as far as the scanner's concerned), 2579or reaches an end-of-file. 2580.It Fl F 2581Specifies that the fast scanner table representation should be used 2582.Pq and stdio bypassed . 2583This representation is about as fast as the full table representation 2584.Pq Fl f , 2585and for some sets of patterns will be considerably smaller 2586.Pq and for others, larger . 2587In general, if the pattern set contains both 2588.Qq keywords 2589and a catch-all, 2590.Qq identifier 2591rule, such as in the set: 2592.Bd -unfilled -offset indent 2593"case" return TOK_CASE; 2594"switch" return TOK_SWITCH; 2595\&... 2596"default" return TOK_DEFAULT; 2597[a-z]+ return TOK_ID; 2598.Ed 2599.Pp 2600then it's better to use the full table representation. 2601If only the 2602.Qq identifier 2603rule is present and a hash table or some such is used to detect the keywords, 2604it's better to use 2605.Fl F . 2606.Pp 2607This option is equivalent to 2608.Fl CFr 2609.Pq see above . 2610It cannot be used with 2611.Fl + . 2612.It Fl f 2613Specifies 2614.Em fast scanner . 2615No table compression is done and stdio is bypassed. 2616The result is large but fast. 2617This option is equivalent to 2618.Fl Cfr 2619.Pq see above . 2620.It Fl h 2621Generates a help summary of 2622.Nm flex Ns 's 2623options to stdout and then exits. 2624.Fl ?\& 2625and 2626.Fl Fl help 2627are synonyms for 2628.Fl h . 2629.It Fl I 2630Instructs 2631.Nm 2632to generate an 2633.Em interactive 2634scanner. 2635An interactive scanner is one that only looks ahead to decide 2636what token has been matched if it absolutely must. 2637It turns out that always looking one extra character ahead, 2638even if the scanner has already seen enough text 2639to disambiguate the current token, is a bit faster than 2640only looking ahead when necessary. 2641But scanners that always look ahead give dreadful interactive performance; 2642for example, when a user types a newline, 2643it is not recognized as a newline token until they enter 2644.Em another 2645token, which often means typing in another whole line. 2646.Pp 2647.Nm 2648scanners default to 2649.Em interactive 2650unless 2651.Fl Cf 2652or 2653.Fl CF 2654table-compression options are specified 2655.Pq see above . 2656That's because if high-performance is most important, 2657one of these options should be used, 2658so if they weren't, 2659.Nm 2660assumes it is preferable to trade off a bit of run-time performance for 2661intuitive interactive behavior. 2662Note also that 2663.Fl I 2664cannot be used in conjunction with 2665.Fl Cf 2666or 2667.Fl CF . 2668Thus, this option is not really needed; it is on by default for all those 2669cases in which it is allowed. 2670.Pp 2671A scanner can be forced to not be interactive by using 2672.Fl B 2673.Pq see above . 2674.It Fl i 2675Instructs 2676.Nm 2677to generate a case-insensitive scanner. 2678The case of letters given in the 2679.Nm 2680input patterns will be ignored, 2681and tokens in the input will be matched regardless of case. 2682The matched text given in 2683.Fa yytext 2684will have the preserved case 2685.Pq i.e., it will not be folded . 2686.It Fl L 2687Instructs 2688.Nm 2689not to generate 2690.Dq #line 2691directives. 2692Without this option, 2693.Nm 2694peppers the generated scanner with #line directives so error messages 2695in the actions will be correctly located with respect to either the original 2696.Nm 2697input file 2698(if the errors are due to code in the input file), 2699or 2700.Pa lex.yy.c 2701(if the errors are 2702.Nm flex Ns 's 2703fault \- these sorts of errors should be reported to the email address 2704given below). 2705.It Fl l 2706Turns on maximum compatibility with the original 2707.At 2708.Nm lex 2709implementation. 2710Note that this does not mean full compatibility. 2711Use of this option costs a considerable amount of performance, 2712and it cannot be used with the 2713.Fl + , f , F , Cf , 2714or 2715.Fl CF 2716options. 2717For details on the compatibilities it provides, see the section 2718.Sx INCOMPATIBILITIES WITH LEX AND POSIX 2719below. 2720This option also results in the name 2721.Dv YY_FLEX_LEX_COMPAT 2722being #define'd in the generated scanner. 2723.It Fl n 2724Another do-nothing, deprecated option included only for 2725.Tn POSIX 2726compliance. 2727.It Fl o Ns Ar output 2728Directs 2729.Nm 2730to write the scanner to the file 2731.Ar output 2732instead of 2733.Pa lex.yy.c . 2734If 2735.Fl o 2736is combined with the 2737.Fl t 2738option, then the scanner is written to stdout but its 2739.Dq #line 2740directives 2741(see the 2742.Fl L 2743option above) 2744refer to the file 2745.Ar output . 2746.It Fl P Ns Ar prefix 2747Changes the default 2748.Qq yy 2749prefix used by 2750.Nm 2751for all globally visible variable and function names to instead be 2752.Ar prefix . 2753For example, 2754.Fl P Ns Ar foo 2755changes the name of 2756.Fa yytext 2757to 2758.Fa footext . 2759It also changes the name of the default output file from 2760.Pa lex.yy.c 2761to 2762.Pa lex.foo.c . 2763Here are all of the names affected: 2764.Bd -unfilled -offset indent 2765yy_create_buffer 2766yy_delete_buffer 2767yy_flex_debug 2768yy_init_buffer 2769yy_flush_buffer 2770yy_load_buffer_state 2771yy_switch_to_buffer 2772yyin 2773yyleng 2774yylex 2775yylineno 2776yyout 2777yyrestart 2778yytext 2779yywrap 2780.Ed 2781.Pp 2782(If using a C++ scanner, then only 2783.Fa yywrap 2784and 2785.Fa yyFlexLexer 2786are affected.) 2787Within the scanner itself, it is still possible to refer to the global variables 2788and functions using either version of their name; but externally, they 2789have the modified name. 2790.Pp 2791This option allows multiple 2792.Nm 2793programs to be easily linked together into the same executable. 2794Note, though, that using this option also renames 2795.Fn yywrap , 2796so now either an 2797.Pq appropriately named 2798version of the routine for the scanner must be supplied, or 2799.Dq %option noyywrap 2800must be used, as linking with 2801.Fl lfl 2802no longer provides one by default. 2803.It Fl p 2804Generates a performance report to stderr. 2805The report consists of comments regarding features of the 2806.Nm 2807input file which will cause a serious loss of performance in the resulting 2808scanner. 2809If the flag is specified twice, 2810comments regarding features that lead to minor performance losses 2811will also be reported> 2812.Pp 2813Note that the use of 2814.Em REJECT , 2815.Dq %option yylineno , 2816and variable trailing context 2817(see the 2818.Sx BUGS 2819section below) 2820entails a substantial performance penalty; use of 2821.Fn yymore , 2822the 2823.Sq ^ 2824operator, and the 2825.Fl I 2826flag entail minor performance penalties. 2827.It Fl S Ns Ar skeleton 2828Overrides the default skeleton file from which 2829.Nm 2830constructs its scanners. 2831This option is needed only for 2832.Nm 2833maintenance or development. 2834.It Fl s 2835Causes the default rule 2836.Pq that unmatched scanner input is echoed to stdout 2837to be suppressed. 2838If the scanner encounters input that does not 2839match any of its rules, it aborts with an error. 2840This option is useful for finding holes in a scanner's rule set. 2841.It Fl T 2842Makes 2843.Nm 2844run in 2845.Em trace 2846mode. 2847It will generate a lot of messages to stderr concerning 2848the form of the input and the resultant non-deterministic and deterministic 2849finite automata. 2850This option is mostly for use in maintaining 2851.Nm . 2852.It Fl t 2853Instructs 2854.Nm 2855to write the scanner it generates to standard output instead of 2856.Pa lex.yy.c . 2857.It Fl V 2858Prints the version number to stdout and exits. 2859.Fl Fl version 2860is a synonym for 2861.Fl V . 2862.It Fl v 2863Specifies that 2864.Nm 2865should write to stderr 2866a summary of statistics regarding the scanner it generates. 2867Most of the statistics are meaningless to the casual 2868.Nm 2869user, but the first line identifies the version of 2870.Nm 2871(same as reported by 2872.Fl V ) , 2873and the next line the flags used when generating the scanner, 2874including those that are on by default. 2875.It Fl w 2876Suppresses warning messages. 2877.It Fl + 2878Specifies that 2879.Nm 2880should generate a C++ scanner class. 2881See the section on 2882.Sx GENERATING C++ SCANNERS 2883below for details. 2884.El 2885.Pp 2886.Nm 2887also provides a mechanism for controlling options within the 2888scanner specification itself, rather than from the 2889.Nm 2890command line. 2891This is done by including 2892.Dq %option 2893directives in the first section of the scanner specification. 2894Multiple options can be specified with a single 2895.Dq %option 2896directive, and multiple directives in the first section of the 2897.Nm 2898input file. 2899.Pp 2900Most options are given simply as names, optionally preceded by the word 2901.Qq no 2902.Pq with no intervening whitespace 2903to negate their meaning. 2904A number are equivalent to 2905.Nm 2906flags or their negation: 2907.Bd -unfilled -offset indent 29087bit -7 option 29098bit -8 option 2910align -Ca option 2911backup -b option 2912batch -B option 2913c++ -+ option 2914 2915caseful or 2916case-sensitive opposite of -i (default) 2917 2918case-insensitive or 2919caseless -i option 2920 2921debug -d option 2922default opposite of -s option 2923ecs -Ce option 2924fast -F option 2925full -f option 2926interactive -I option 2927lex-compat -l option 2928meta-ecs -Cm option 2929perf-report -p option 2930read -Cr option 2931stdout -t option 2932verbose -v option 2933warn opposite of -w option 2934 (use "%option nowarn" for -w) 2935 2936array equivalent to "%array" 2937pointer equivalent to "%pointer" (default) 2938.Ed 2939.Pp 2940Some %option's provide features otherwise not available: 2941.Bl -tag -width Ds 2942.It always-interactive 2943Instructs 2944.Nm 2945to generate a scanner which always considers its input 2946.Qq interactive . 2947Normally, on each new input file the scanner calls 2948.Fn isatty 2949in an attempt to determine whether the scanner's input source is interactive 2950and thus should be read a character at a time. 2951When this option is used, however, no such call is made. 2952.It main 2953Directs 2954.Nm 2955to provide a default 2956.Fn main 2957program for the scanner, which simply calls 2958.Fn yylex . 2959This option implies 2960.Dq noyywrap 2961.Pq see below . 2962.It never-interactive 2963Instructs 2964.Nm 2965to generate a scanner which never considers its input 2966.Qq interactive 2967(again, no call made to 2968.Fn isatty ) . 2969This is the opposite of 2970.Dq always-interactive . 2971.It stack 2972Enables the use of start condition stacks 2973(see 2974.Sx START CONDITIONS 2975above). 2976.It stdinit 2977If set (i.e., 2978.Dq %option stdinit ) , 2979initializes 2980.Fa yyin 2981and 2982.Fa yyout 2983to stdin and stdout, instead of the default of 2984.Dq nil . 2985Some existing 2986.Nm lex 2987programs depend on this behavior, even though it is not compliant with ANSI C, 2988which does not require stdin and stdout to be compile-time constant. 2989.It yylineno 2990Directs 2991.Nm 2992to generate a scanner that maintains the number of the current line 2993read from its input in the global variable 2994.Fa yylineno . 2995This option is implied by 2996.Dq %option lex-compat . 2997.It yywrap 2998If unset (i.e., 2999.Dq %option noyywrap ) , 3000makes the scanner not call 3001.Fn yywrap 3002upon an end-of-file, but simply assume that there are no more files to scan 3003(until the user points 3004.Fa yyin 3005at a new file and calls 3006.Fn yylex 3007again). 3008.El 3009.Pp 3010.Nm 3011scans rule actions to determine whether the 3012.Em REJECT 3013or 3014.Fn yymore 3015features are being used. 3016The 3017.Dq reject 3018and 3019.Dq yymore 3020options are available to override its decision as to whether to use the 3021options, either by setting them (e.g., 3022.Dq %option reject ) 3023to indicate the feature is indeed used, 3024or unsetting them to indicate it actually is not used 3025(e.g., 3026.Dq %option noyymore ) . 3027.Pp 3028Three options take string-delimited values, offset with 3029.Sq = : 3030.Pp 3031.D1 %option outfile="ABC" 3032.Pp 3033is equivalent to 3034.Fl o Ns Ar ABC , 3035and 3036.Pp 3037.D1 %option prefix="XYZ" 3038.Pp 3039is equivalent to 3040.Fl P Ns Ar XYZ . 3041Finally, 3042.Pp 3043.D1 %option yyclass="foo" 3044.Pp 3045only applies when generating a C++ scanner 3046.Pf ( Fl + 3047option). 3048It informs 3049.Nm 3050that 3051.Dq foo 3052has been derived as a subclass of yyFlexLexer, so 3053.Nm 3054will place actions in the member function 3055.Dq foo::yylex() 3056instead of 3057.Dq yyFlexLexer::yylex() . 3058It also generates a 3059.Dq yyFlexLexer::yylex() 3060member function that emits a run-time error (by invoking 3061.Dq yyFlexLexer::LexerError() ) 3062if called. 3063See 3064.Sx GENERATING C++ SCANNERS , 3065below, for additional information. 3066.Pp 3067A number of options are available for 3068lint 3069purists who want to suppress the appearance of unneeded routines 3070in the generated scanner. 3071Each of the following, if unset 3072(e.g., 3073.Dq %option nounput ) , 3074results in the corresponding routine not appearing in the generated scanner: 3075.Bd -unfilled -offset indent 3076input, unput 3077yy_push_state, yy_pop_state, yy_top_state 3078yy_scan_buffer, yy_scan_bytes, yy_scan_string 3079.Ed 3080.Pp 3081(though 3082.Fn yy_push_state 3083and friends won't appear anyway unless 3084.Dq %option stack 3085is being used). 3086.Sh PERFORMANCE CONSIDERATIONS 3087The main design goal of 3088.Nm 3089is that it generate high-performance scanners. 3090It has been optimized for dealing well with large sets of rules. 3091Aside from the effects on scanner speed of the table compression 3092.Fl C 3093options outlined above, 3094there are a number of options/actions which degrade performance. 3095These are, from most expensive to least: 3096.Bd -unfilled -offset indent 3097REJECT 3098%option yylineno 3099arbitrary trailing context 3100 3101pattern sets that require backing up 3102%array 3103%option interactive 3104%option always-interactive 3105 3106\&'^' beginning-of-line operator 3107yymore() 3108.Ed 3109.Pp 3110with the first three all being quite expensive 3111and the last two being quite cheap. 3112Note also that 3113.Fn unput 3114is implemented as a routine call that potentially does quite a bit of work, 3115while 3116.Fn yyless 3117is a quite-cheap macro; so if just putting back some excess text, 3118use 3119.Fn yyless . 3120.Pp 3121.Em REJECT 3122should be avoided at all costs when performance is important. 3123It is a particularly expensive option. 3124.Pp 3125Getting rid of backing up is messy and often may be an enormous 3126amount of work for a complicated scanner. 3127In principal, one begins by using the 3128.Fl b 3129flag to generate a 3130.Pa lex.backup 3131file. 3132For example, on the input 3133.Bd -literal -offset indent 3134%% 3135foo return TOK_KEYWORD; 3136foobar return TOK_KEYWORD; 3137.Ed 3138.Pp 3139the file looks like: 3140.Bd -literal -offset indent 3141State #6 is non-accepting - 3142 associated rule line numbers: 3143 2 3 3144 out-transitions: [ o ] 3145 jam-transitions: EOF [ \e001-n p-\e177 ] 3146 3147State #8 is non-accepting - 3148 associated rule line numbers: 3149 3 3150 out-transitions: [ a ] 3151 jam-transitions: EOF [ \e001-` b-\e177 ] 3152 3153State #9 is non-accepting - 3154 associated rule line numbers: 3155 3 3156 out-transitions: [ r ] 3157 jam-transitions: EOF [ \e001-q s-\e177 ] 3158 3159Compressed tables always back up. 3160.Ed 3161.Pp 3162The first few lines tell us that there's a scanner state in 3163which it can make a transition on an 3164.Sq o 3165but not on any other character, 3166and that in that state the currently scanned text does not match any rule. 3167The state occurs when trying to match the rules found 3168at lines 2 and 3 in the input file. 3169If the scanner is in that state and then reads something other than an 3170.Sq o , 3171it will have to back up to find a rule which is matched. 3172With a bit of headscratching one can see that this must be the 3173state it's in when it has seen 3174.Sq fo . 3175When this has happened, if anything other than another 3176.Sq o 3177is seen, the scanner will have to back up to simply match the 3178.Sq f 3179.Pq by the default rule . 3180.Pp 3181The comment regarding State #8 indicates there's a problem when 3182.Qq foob 3183has been scanned. 3184Indeed, on any character other than an 3185.Sq a , 3186the scanner will have to back up to accept 3187.Qq foo . 3188Similarly, the comment for State #9 concerns when 3189.Qq fooba 3190has been scanned and an 3191.Sq r 3192does not follow. 3193.Pp 3194The final comment reminds us that there's no point going to 3195all the trouble of removing backing up from the rules unless we're using 3196.Fl Cf 3197or 3198.Fl CF , 3199since there's no performance gain doing so with compressed scanners. 3200.Pp 3201The way to remove the backing up is to add 3202.Qq error 3203rules: 3204.Bd -literal -offset indent 3205%% 3206foo return TOK_KEYWORD; 3207foobar return TOK_KEYWORD; 3208 3209fooba | 3210foob | 3211fo { 3212 /* false alarm, not really a keyword */ 3213 return TOK_ID; 3214} 3215.Ed 3216.Pp 3217Eliminating backing up among a list of keywords can also be done using a 3218.Qq catch-all 3219rule: 3220.Bd -literal -offset indent 3221%% 3222foo return TOK_KEYWORD; 3223foobar return TOK_KEYWORD; 3224 3225[a-z]+ return TOK_ID; 3226.Ed 3227.Pp 3228This is usually the best solution when appropriate. 3229.Pp 3230Backing up messages tend to cascade. 3231With a complicated set of rules it's not uncommon to get hundreds of messages. 3232If one can decipher them, though, 3233it often only takes a dozen or so rules to eliminate the backing up 3234(though it's easy to make a mistake and have an error rule accidentally match 3235a valid token; a possible future 3236.Nm 3237feature will be to automatically add rules to eliminate backing up). 3238.Pp 3239It's important to keep in mind that the benefits of eliminating 3240backing up are gained only if 3241.Em every 3242instance of backing up is eliminated. 3243Leaving just one gains nothing. 3244.Pp 3245.Em Variable 3246trailing context 3247(where both the leading and trailing parts do not have a fixed length) 3248entails almost the same performance loss as 3249.Em REJECT 3250.Pq i.e., substantial . 3251So when possible a rule like: 3252.Bd -literal -offset indent 3253%% 3254mouse|rat/(cat|dog) run(); 3255.Ed 3256.Pp 3257is better written: 3258.Bd -literal -offset indent 3259%% 3260mouse/cat|dog run(); 3261rat/cat|dog run(); 3262.Ed 3263.Pp 3264or as 3265.Bd -literal -offset indent 3266%% 3267mouse|rat/cat run(); 3268mouse|rat/dog run(); 3269.Ed 3270.Pp 3271Note that here the special 3272.Sq |\& 3273action does not provide any savings, and can even make things worse (see 3274.Sx BUGS 3275below). 3276.Pp 3277Another area where the user can increase a scanner's performance 3278.Pq and one that's easier to implement 3279arises from the fact that the longer the tokens matched, 3280the faster the scanner will run. 3281This is because with long tokens the processing of most input 3282characters takes place in the 3283.Pq short 3284inner scanning loop, and does not often have to go through the additional work 3285of setting up the scanning environment (e.g., 3286.Fa yytext ) 3287for the action. 3288Recall the scanner for C comments: 3289.Bd -literal -offset indent 3290%x comment 3291%% 3292int line_num = 1; 3293 3294"/*" BEGIN(comment); 3295 3296<comment>[^*\en]* 3297<comment>"*"+[^*/\en]* 3298<comment>\en ++line_num; 3299<comment>"*"+"/" BEGIN(INITIAL); 3300.Ed 3301.Pp 3302This could be sped up by writing it as: 3303.Bd -literal -offset indent 3304%x comment 3305%% 3306int line_num = 1; 3307 3308"/*" BEGIN(comment); 3309 3310<comment>[^*\en]* 3311<comment>[^*\en]*\en ++line_num; 3312<comment>"*"+[^*/\en]* 3313<comment>"*"+[^*/\en]*\en ++line_num; 3314<comment>"*"+"/" BEGIN(INITIAL); 3315.Ed 3316.Pp 3317Now instead of each newline requiring the processing of another action, 3318recognizing the newlines is 3319.Qq distributed 3320over the other rules to keep the matched text as long as possible. 3321Note that adding rules does 3322.Em not 3323slow down the scanner! 3324The speed of the scanner is independent of the number of rules or 3325(modulo the considerations given at the beginning of this section) 3326how complicated the rules are with regard to operators such as 3327.Sq * 3328and 3329.Sq |\& . 3330.Pp 3331A final example in speeding up a scanner: 3332scan through a file containing identifiers and keywords, one per line 3333and with no other extraneous characters, and recognize all the keywords. 3334A natural first approach is: 3335.Bd -literal -offset indent 3336%% 3337asm | 3338auto | 3339break | 3340\&... etc ... 3341volatile | 3342while /* it's a keyword */ 3343 3344\&.|\en /* it's not a keyword */ 3345.Ed 3346.Pp 3347To eliminate the back-tracking, introduce a catch-all rule: 3348.Bd -literal -offset indent 3349%% 3350asm | 3351auto | 3352break | 3353\&... etc ... 3354volatile | 3355while /* it's a keyword */ 3356 3357[a-z]+ | 3358\&.|\en /* it's not a keyword */ 3359.Ed 3360.Pp 3361Now, if it's guaranteed that there's exactly one word per line, 3362then we can reduce the total number of matches by a half by 3363merging in the recognition of newlines with that of the other tokens: 3364.Bd -literal -offset indent 3365%% 3366asm\en | 3367auto\en | 3368break\en | 3369\&... etc ... 3370volatile\en | 3371while\en /* it's a keyword */ 3372 3373[a-z]+\en | 3374\&.|\en /* it's not a keyword */ 3375.Ed 3376.Pp 3377One has to be careful here, 3378as we have now reintroduced backing up into the scanner. 3379In particular, while we know that there will never be any characters 3380in the input stream other than letters or newlines, 3381.Nm 3382can't figure this out, and it will plan for possibly needing to back up 3383when it has scanned a token like 3384.Qq auto 3385and then the next character is something other than a newline or a letter. 3386Previously it would then just match the 3387.Qq auto 3388rule and be done, but now it has no 3389.Qq auto 3390rule, only an 3391.Qq auto\en 3392rule. 3393To eliminate the possibility of backing up, 3394we could either duplicate all rules but without final newlines or, 3395since we never expect to encounter such an input and therefore don't 3396how it's classified, we can introduce one more catch-all rule, 3397this one which doesn't include a newline: 3398.Bd -literal -offset indent 3399%% 3400asm\en | 3401auto\en | 3402break\en | 3403\&... etc ... 3404volatile\en | 3405while\en /* it's a keyword */ 3406 3407[a-z]+\en | 3408[a-z]+ | 3409\&.|\en /* it's not a keyword */ 3410.Ed 3411.Pp 3412Compiled with 3413.Fl Cf , 3414this is about as fast as one can get a 3415.Nm 3416scanner to go for this particular problem. 3417.Pp 3418A final note: 3419.Nm 3420is slow when matching NUL's, 3421particularly when a token contains multiple NUL's. 3422It's best to write rules which match short 3423amounts of text if it's anticipated that the text will often include NUL's. 3424.Pp 3425Another final note regarding performance: as mentioned above in the section 3426.Sx HOW THE INPUT IS MATCHED , 3427dynamically resizing 3428.Fa yytext 3429to accommodate huge tokens is a slow process because it presently requires that 3430the 3431.Pq huge 3432token be rescanned from the beginning. 3433Thus if performance is vital, it is better to attempt to match 3434.Qq large 3435quantities of text but not 3436.Qq huge 3437quantities, where the cutoff between the two is at about 8K characters/token. 3438.Sh GENERATING C++ SCANNERS 3439.Nm 3440provides two different ways to generate scanners for use with C++. 3441The first way is to simply compile a scanner generated by 3442.Nm 3443using a C++ compiler instead of a C compiler. 3444This should not generate any compilation errors 3445(please report any found to the email address given in the 3446.Sx AUTHORS 3447section below). 3448C++ code can then be used in rule actions instead of C code. 3449Note that the default input source for scanners remains 3450.Fa yyin , 3451and default echoing is still done to 3452.Fa yyout . 3453Both of these remain 3454.Fa FILE * 3455variables and not C++ streams. 3456.Pp 3457.Nm 3458can also be used to generate a C++ scanner class, using the 3459.Fl + 3460option (or, equivalently, 3461.Dq %option c++ ) , 3462which is automatically specified if the name of the flex executable ends in a 3463.Sq + , 3464such as 3465.Nm flex++ . 3466When using this option, 3467.Nm 3468defaults to generating the scanner to the file 3469.Pa lex.yy.cc 3470instead of 3471.Pa lex.yy.c . 3472The generated scanner includes the header file 3473.In g++/FlexLexer.h , 3474which defines the interface to two C++ classes. 3475.Pp 3476The first class, 3477.Em FlexLexer , 3478provides an abstract base class defining the general scanner class interface. 3479It provides the following member functions: 3480.Bl -tag -width Ds 3481.It const char* YYText() 3482Returns the text of the most recently matched token, the equivalent of 3483.Fa yytext . 3484.It int YYLeng() 3485Returns the length of the most recently matched token, the equivalent of 3486.Fa yyleng . 3487.It int lineno() const 3488Returns the current input line number 3489(see 3490.Dq %option yylineno ) , 3491or 1 if 3492.Dq %option yylineno 3493was not used. 3494.It void set_debug(int flag) 3495Sets the debugging flag for the scanner, equivalent to assigning to 3496.Fa yy_flex_debug 3497(see the 3498.Sx OPTIONS 3499section above). 3500Note that the scanner must be built using 3501.Dq %option debug 3502to include debugging information in it. 3503.It int debug() const 3504Returns the current setting of the debugging flag. 3505.El 3506.Pp 3507Also provided are member functions equivalent to 3508.Fn yy_switch_to_buffer , 3509.Fn yy_create_buffer 3510(though the first argument is an 3511.Fa std::istream* 3512object pointer and not a 3513.Fa FILE* ) , 3514.Fn yy_flush_buffer , 3515.Fn yy_delete_buffer , 3516and 3517.Fn yyrestart 3518(again, the first argument is an 3519.Fa std::istream* 3520object pointer). 3521.Pp 3522The second class defined in 3523.In g++/FlexLexer.h 3524is 3525.Fa yyFlexLexer , 3526which is derived from 3527.Fa FlexLexer . 3528It defines the following additional member functions: 3529.Bl -tag -width Ds 3530.It "yyFlexLexer(std::istream* arg_yyin = 0, std::ostream* arg_yyout = 0)" 3531Constructs a 3532.Fa yyFlexLexer 3533object using the given streams for input and output. 3534If not specified, the streams default to 3535.Fa cin 3536and 3537.Fa cout , 3538respectively. 3539.It virtual int yylex() 3540Performs the same role as 3541.Fn yylex 3542does for ordinary flex scanners: it scans the input stream, consuming 3543tokens, until a rule's action returns a value. 3544If subclass 3545.Sq S 3546is derived from 3547.Fa yyFlexLexer , 3548in order to access the member functions and variables of 3549.Sq S 3550inside 3551.Fn yylex , 3552use 3553.Dq %option yyclass="S" 3554to inform 3555.Nm 3556that the 3557.Sq S 3558subclass will be used instead of 3559.Fa yyFlexLexer . 3560In this case, rather than generating 3561.Dq yyFlexLexer::yylex() , 3562.Nm 3563generates 3564.Dq S::yylex() 3565(and also generates a dummy 3566.Dq yyFlexLexer::yylex() 3567that calls 3568.Dq yyFlexLexer::LexerError() 3569if called). 3570.It "virtual void switch_streams(std::istream* new_in = 0, std::ostream* new_out = 0)" 3571Reassigns 3572.Fa yyin 3573to 3574.Fa new_in 3575.Pq if non-nil 3576and 3577.Fa yyout 3578to 3579.Fa new_out 3580.Pq ditto , 3581deleting the previous input buffer if 3582.Fa yyin 3583is reassigned. 3584.It int yylex(std::istream* new_in, std::ostream* new_out = 0) 3585First switches the input streams via 3586.Dq switch_streams(new_in, new_out) 3587and then returns the value of 3588.Fn yylex . 3589.El 3590.Pp 3591In addition, 3592.Fa yyFlexLexer 3593defines the following protected virtual functions which can be redefined 3594in derived classes to tailor the scanner: 3595.Bl -tag -width Ds 3596.It virtual int LexerInput(char* buf, int max_size) 3597Reads up to 3598.Fa max_size 3599characters into 3600.Fa buf 3601and returns the number of characters read. 3602To indicate end-of-input, return 0 characters. 3603Note that 3604.Qq interactive 3605scanners (see the 3606.Fl B 3607and 3608.Fl I 3609flags) define the macro 3610.Dv YY_INTERACTIVE . 3611If 3612.Fn LexerInput 3613has been redefined, and it's necessary to take different actions depending on 3614whether or not the scanner might be scanning an interactive input source, 3615it's possible to test for the presence of this name via 3616.Dq #ifdef . 3617.It virtual void LexerOutput(const char* buf, int size) 3618Writes out 3619.Fa size 3620characters from the buffer 3621.Fa buf , 3622which, while NUL-terminated, may also contain 3623.Qq internal 3624NUL's if the scanner's rules can match text with NUL's in them. 3625.It virtual void LexerError(const char* msg) 3626Reports a fatal error message. 3627The default version of this function writes the message to the stream 3628.Fa cerr 3629and exits. 3630.El 3631.Pp 3632Note that a 3633.Fa yyFlexLexer 3634object contains its entire scanning state. 3635Thus such objects can be used to create reentrant scanners. 3636Multiple instances of the same 3637.Fa yyFlexLexer 3638class can be instantiated, and multiple C++ scanner classes can be combined 3639in the same program using the 3640.Fl P 3641option discussed above. 3642.Pp 3643Finally, note that the 3644.Dq %array 3645feature is not available to C++ scanner classes; 3646.Dq %pointer 3647must be used 3648.Pq the default . 3649.Pp 3650Here is an example of a simple C++ scanner: 3651.Bd -literal -offset indent 3652// An example of using the flex C++ scanner class. 3653 3654%{ 3655#include <errno.h> 3656int mylineno = 0; 3657%} 3658 3659string \e"[^\en"]+\e" 3660 3661ws [ \et]+ 3662 3663alpha [A-Za-z] 3664dig [0-9] 3665name ({alpha}|{dig}|\e$)({alpha}|{dig}|[_.\e-/$])* 3666num1 [-+]?{dig}+\e.?([eE][-+]?{dig}+)? 3667num2 [-+]?{dig}*\e.{dig}+([eE][-+]?{dig}+)? 3668number {num1}|{num2} 3669 3670%% 3671 3672{ws} /* skip blanks and tabs */ 3673 3674"/*" { 3675 int c; 3676 3677 while ((c = yyinput()) != 0) { 3678 if(c == '\en') 3679 ++mylineno; 3680 else if(c == '*') { 3681 if ((c = yyinput()) == '/') 3682 break; 3683 else 3684 unput(c); 3685 } 3686 } 3687} 3688 3689{number} cout << "number " << YYText() << '\en'; 3690 3691\en mylineno++; 3692 3693{name} cout << "name " << YYText() << '\en'; 3694 3695{string} cout << "string " << YYText() << '\en'; 3696 3697%% 3698 3699int main(int /* argc */, char** /* argv */) 3700{ 3701 FlexLexer* lexer = new yyFlexLexer; 3702 while(lexer->yylex() != 0) 3703 ; 3704 return 0; 3705} 3706.Ed 3707.Pp 3708To create multiple 3709.Pq different 3710lexer classes, use the 3711.Fl P 3712flag 3713(or the 3714.Dq prefix= 3715option) 3716to rename each 3717.Fa yyFlexLexer 3718to some other 3719.Fa xxFlexLexer . 3720.In g++/FlexLexer.h 3721can then be included in other sources once per lexer class, first renaming 3722.Fa yyFlexLexer 3723as follows: 3724.Bd -literal -offset indent 3725#undef yyFlexLexer 3726#define yyFlexLexer xxFlexLexer 3727#include <g++/FlexLexer.h> 3728 3729#undef yyFlexLexer 3730#define yyFlexLexer zzFlexLexer 3731#include <g++/FlexLexer.h> 3732.Ed 3733.Pp 3734If, for example, 3735.Dq %option prefix="xx" 3736is used for one scanner and 3737.Dq %option prefix="zz" 3738is used for the other. 3739.Pp 3740.Sy IMPORTANT : 3741the present form of the scanning class is experimental 3742and may change considerably between major releases. 3743.Sh INCOMPATIBILITIES WITH LEX AND POSIX 3744.Nm 3745is a rewrite of the 3746.At 3747.Nm lex 3748tool 3749(the two implementations do not share any code, though), 3750with some extensions and incompatibilities, both of which are of concern 3751to those who wish to write scanners acceptable to either implementation. 3752.Nm 3753is fully compliant with the 3754.Tn POSIX 3755.Nm lex 3756specification, except that when using 3757.Dq %pointer 3758.Pq the default , 3759a call to 3760.Fn unput 3761destroys the contents of 3762.Fa yytext , 3763which is counter to the 3764.Tn POSIX 3765specification. 3766.Pp 3767In this section we discuss all of the known areas of incompatibility between 3768.Nm , 3769.At 3770.Nm lex , 3771and the 3772.Tn POSIX 3773specification. 3774.Pp 3775.Nm flex Ns 's 3776.Fl l 3777option turns on maximum compatibility with the original 3778.At 3779.Nm lex 3780implementation, at the cost of a major loss in the generated scanner's 3781performance. 3782We note below which incompatibilities can be overcome using the 3783.Fl l 3784option. 3785.Pp 3786.Nm 3787is fully compatible with 3788.Nm lex 3789with the following exceptions: 3790.Bl -dash 3791.It 3792The undocumented 3793.Nm lex 3794scanner internal variable 3795.Fa yylineno 3796is not supported unless 3797.Fl l 3798or 3799.Dq %option yylineno 3800is used. 3801.Pp 3802.Fa yylineno 3803should be maintained on a per-buffer basis, rather than a per-scanner 3804.Pq single global variable 3805basis. 3806.Pp 3807.Fa yylineno 3808is not part of the 3809.Tn POSIX 3810specification. 3811.It 3812The 3813.Fn input 3814routine is not redefinable, though it may be called to read characters 3815following whatever has been matched by a rule. 3816If 3817.Fn input 3818encounters an end-of-file, the normal 3819.Fn yywrap 3820processing is done. 3821A 3822.Dq real 3823end-of-file is returned by 3824.Fn input 3825as 3826.Dv EOF . 3827.Pp 3828Input is instead controlled by defining the 3829.Dv YY_INPUT 3830macro. 3831.Pp 3832The 3833.Nm 3834restriction that 3835.Fn input 3836cannot be redefined is in accordance with the 3837.Tn POSIX 3838specification, which simply does not specify any way of controlling the 3839scanner's input other than by making an initial assignment to 3840.Fa yyin . 3841.It 3842The 3843.Fn unput 3844routine is not redefinable. 3845This restriction is in accordance with 3846.Tn POSIX . 3847.It 3848.Nm 3849scanners are not as reentrant as 3850.Nm lex 3851scanners. 3852In particular, if a scanner is interactive and 3853an interrupt handler long-jumps out of the scanner, 3854and the scanner is subsequently called again, 3855the following error message may be displayed: 3856.Pp 3857.D1 fatal flex scanner internal error--end of buffer missed 3858.Pp 3859To reenter the scanner, first use 3860.Pp 3861.Dl yyrestart(yyin); 3862.Pp 3863Note that this call will throw away any buffered input; 3864usually this isn't a problem with an interactive scanner. 3865.Pp 3866Also note that flex C++ scanner classes are reentrant, 3867so if using C++ is an option , they should be used instead. 3868See 3869.Sx GENERATING C++ SCANNERS 3870above for details. 3871.It 3872.Fn output 3873is not supported. 3874Output from the 3875.Em ECHO 3876macro is done to the file-pointer 3877.Fa yyout 3878.Pq default stdout . 3879.Pp 3880.Fn output 3881is not part of the 3882.Tn POSIX 3883specification. 3884.It 3885.Nm lex 3886does not support exclusive start conditions 3887.Pq %x , 3888though they are in the 3889.Tn POSIX 3890specification. 3891.It 3892When definitions are expanded, 3893.Nm 3894encloses them in parentheses. 3895With 3896.Nm lex , 3897the following: 3898.Bd -literal -offset indent 3899NAME [A-Z][A-Z0-9]* 3900%% 3901foo{NAME}? printf("Found it\en"); 3902%% 3903.Ed 3904.Pp 3905will not match the string 3906.Qq foo 3907because when the macro is expanded the rule is equivalent to 3908.Qq foo[A-Z][A-Z0-9]*? 3909and the precedence is such that the 3910.Sq ?\& 3911is associated with 3912.Qq [A-Z0-9]* . 3913With 3914.Nm , 3915the rule will be expanded to 3916.Qq foo([A-Z][A-Z0-9]*)? 3917and so the string 3918.Qq foo 3919will match. 3920.Pp 3921Note that if the definition begins with 3922.Sq ^ 3923or ends with 3924.Sq $ 3925then it is not expanded with parentheses, to allow these operators to appear in 3926definitions without losing their special meanings. 3927But the 3928.Sq Aq s , 3929.Sq / , 3930and 3931.Aq Aq EOF 3932operators cannot be used in a 3933.Nm 3934definition. 3935.Pp 3936Using 3937.Fl l 3938results in the 3939.Nm lex 3940behavior of no parentheses around the definition. 3941.Pp 3942The 3943.Tn POSIX 3944specification is that the definition be enclosed in parentheses. 3945.It 3946Some implementations of 3947.Nm lex 3948allow a rule's action to begin on a separate line, 3949if the rule's pattern has trailing whitespace: 3950.Bd -literal -offset indent 3951%% 3952foo|bar<space here> 3953 { foobar_action(); } 3954.Ed 3955.Pp 3956.Nm 3957does not support this feature. 3958.It 3959The 3960.Nm lex 3961.Sq %r 3962.Pq generate a Ratfor scanner 3963option is not supported. 3964It is not part of the 3965.Tn POSIX 3966specification. 3967.It 3968After a call to 3969.Fn unput , 3970.Fa yytext 3971is undefined until the next token is matched, 3972unless the scanner was built using 3973.Dq %array . 3974This is not the case with 3975.Nm lex 3976or the 3977.Tn POSIX 3978specification. 3979The 3980.Fl l 3981option does away with this incompatibility. 3982.It 3983The precedence of the 3984.Sq {} 3985.Pq numeric range 3986operator is different. 3987.Nm lex 3988interprets 3989.Qq abc{1,3} 3990as match one, two, or three occurrences of 3991.Sq abc , 3992whereas 3993.Nm 3994interprets it as match 3995.Sq ab 3996followed by one, two, or three occurrences of 3997.Sq c . 3998The latter is in agreement with the 3999.Tn POSIX 4000specification. 4001.It 4002The precedence of the 4003.Sq ^ 4004operator is different. 4005.Nm lex 4006interprets 4007.Qq ^foo|bar 4008as match either 4009.Sq foo 4010at the beginning of a line, or 4011.Sq bar 4012anywhere, whereas 4013.Nm 4014interprets it as match either 4015.Sq foo 4016or 4017.Sq bar 4018if they come at the beginning of a line. 4019The latter is in agreement with the 4020.Tn POSIX 4021specification. 4022.It 4023The special table-size declarations such as 4024.Sq %a 4025supported by 4026.Nm lex 4027are not required by 4028.Nm 4029scanners; 4030.Nm 4031ignores them. 4032.It 4033The name 4034.Dv FLEX_SCANNER 4035is #define'd so scanners may be written for use with either 4036.Nm 4037or 4038.Nm lex . 4039Scanners also include 4040.Dv YY_FLEX_MAJOR_VERSION 4041and 4042.Dv YY_FLEX_MINOR_VERSION 4043indicating which version of 4044.Nm 4045generated the scanner 4046(for example, for the 2.5 release, these defines would be 2 and 5, 4047respectively). 4048.El 4049.Pp 4050The following 4051.Nm 4052features are not included in 4053.Nm lex 4054or the 4055.Tn POSIX 4056specification: 4057.Bd -unfilled -offset indent 4058C++ scanners 4059%option 4060start condition scopes 4061start condition stacks 4062interactive/non-interactive scanners 4063yy_scan_string() and friends 4064yyterminate() 4065yy_set_interactive() 4066yy_set_bol() 4067YY_AT_BOL() 4068<<EOF>> 4069<*> 4070YY_DECL 4071YY_START 4072YY_USER_ACTION 4073YY_USER_INIT 4074#line directives 4075%{}'s around actions 4076multiple actions on a line 4077.Ed 4078.Pp 4079plus almost all of the 4080.Nm 4081flags. 4082The last feature in the list refers to the fact that with 4083.Nm 4084multiple actions can be placed on the same line, 4085separated with semi-colons, while with 4086.Nm lex , 4087the following 4088.Pp 4089.Dl foo handle_foo(); ++num_foos_seen; 4090.Pp 4091is 4092.Pq rather surprisingly 4093truncated to 4094.Pp 4095.Dl foo handle_foo(); 4096.Pp 4097.Nm 4098does not truncate the action. 4099Actions that are not enclosed in braces 4100are simply terminated at the end of the line. 4101.Sh FILES 4102.Bl -tag -width "<g++/FlexLexer.h>" 4103.It Pa flex.skl 4104Skeleton scanner. 4105This file is only used when building flex, not when 4106.Nm 4107executes. 4108.It Pa lex.backup 4109Backing-up information for the 4110.Fl b 4111flag (called 4112.Pa lex.bck 4113on some systems). 4114.It Pa lex.yy.c 4115Generated scanner 4116(called 4117.Pa lexyy.c 4118on some systems). 4119.It Pa lex.yy.cc 4120Generated C++ scanner class, when using 4121.Fl + . 4122.It In g++/FlexLexer.h 4123Header file defining the C++ scanner base class, 4124.Fa FlexLexer , 4125and its derived class, 4126.Fa yyFlexLexer . 4127.It Pa /usr/lib/libl.* 4128.Nm 4129libraries. 4130The 4131.Pa /usr/lib/libfl.*\& 4132libraries are links to these. 4133Scanners must be linked using either 4134.Fl \&ll 4135or 4136.Fl lfl . 4137.El 4138.Sh EXIT STATUS 4139.Ex -std flex 4140.Sh DIAGNOSTICS 4141.Bl -diag 4142.It warning, rule cannot be matched 4143Indicates that the given rule cannot be matched because it follows other rules 4144that will always match the same text as it. 4145For example, in the following 4146.Dq foo 4147cannot be matched because it comes after an identifier 4148.Qq catch-all 4149rule: 4150.Bd -literal -offset indent 4151[a-z]+ got_identifier(); 4152foo got_foo(); 4153.Ed 4154.Pp 4155Using 4156.Em REJECT 4157in a scanner suppresses this warning. 4158.It "warning, \-s option given but default rule can be matched" 4159Means that it is possible 4160.Pq perhaps only in a particular start condition 4161that the default rule 4162.Pq match any single character 4163is the only one that will match a particular input. 4164Since 4165.Fl s 4166was given, presumably this is not intended. 4167.It reject_used_but_not_detected undefined 4168.It yymore_used_but_not_detected undefined 4169These errors can occur at compile time. 4170They indicate that the scanner uses 4171.Em REJECT 4172or 4173.Fn yymore 4174but that 4175.Nm 4176failed to notice the fact, meaning that 4177.Nm 4178scanned the first two sections looking for occurrences of these actions 4179and failed to find any, but somehow they snuck in 4180.Pq via an #include file, for example . 4181Use 4182.Dq %option reject 4183or 4184.Dq %option yymore 4185to indicate to 4186.Nm 4187that these features are really needed. 4188.It flex scanner jammed 4189A scanner compiled with 4190.Fl s 4191has encountered an input string which wasn't matched by any of its rules. 4192This error can also occur due to internal problems. 4193.It token too large, exceeds YYLMAX 4194The scanner uses 4195.Dq %array 4196and one of its rules matched a string longer than the 4197.Dv YYLMAX 4198constant 4199.Pq 8K bytes by default . 4200The value can be increased by #define'ing 4201.Dv YYLMAX 4202in the definitions section of 4203.Nm 4204input. 4205.It "scanner requires \-8 flag to use the character 'x'" 4206The scanner specification includes recognizing the 8-bit character 4207.Sq x 4208and the 4209.Fl 8 4210flag was not specified, and defaulted to 7-bit because the 4211.Fl Cf 4212or 4213.Fl CF 4214table compression options were used. 4215See the discussion of the 4216.Fl 7 4217flag for details. 4218.It flex scanner push-back overflow 4219unput() was used to push back so much text that the scanner's buffer 4220could not hold both the pushed-back text and the current token in 4221.Fa yytext . 4222Ideally the scanner should dynamically resize the buffer in this case, 4223but at present it does not. 4224.It "input buffer overflow, can't enlarge buffer because scanner uses REJECT" 4225The scanner was working on matching an extremely large token and needed 4226to expand the input buffer. 4227This doesn't work with scanners that use 4228.Em REJECT . 4229.It "fatal flex scanner internal error--end of buffer missed" 4230This can occur in an scanner which is reentered after a long-jump 4231has jumped out 4232.Pq or over 4233the scanner's activation frame. 4234Before reentering the scanner, use: 4235.Pp 4236.Dl yyrestart(yyin); 4237.Pp 4238or, as noted above, switch to using the C++ scanner class. 4239.It "too many start conditions in <> construct!" 4240More start conditions than exist were listed in a <> construct 4241(so at least one of them must have been listed twice). 4242.El 4243.Sh SEE ALSO 4244.Xr awk 1 , 4245.Xr sed 1 , 4246.Xr yacc 1 4247.Rs 4248.%A John Levine 4249.%A Tony Mason 4250.%A Doug Brown 4251.%B Lex & Yacc 4252.%I O'Reilly and Associates 4253.%N 2nd edition 4254.Re 4255.Rs 4256.%A Alfred Aho 4257.%A Ravi Sethi 4258.%A Jeffrey Ullman 4259.%B Compilers: Principles, Techniques and Tools 4260.%I Addison-Wesley 4261.%D 1986 4262.%O "Describes the pattern-matching techniques used by flex (deterministic finite automata)" 4263.Re 4264.Sh STANDARDS 4265The 4266.Nm lex 4267utility is compliant with the 4268.St -p1003.1-2008 4269specification, 4270though its presence is optional. 4271.Pp 4272The flags 4273.Op Fl 78BbCdFfhIiLloPpSsTVw+? , 4274.Op Fl -help , 4275and 4276.Op Fl -version 4277are extensions to that specification. 4278.Pp 4279See also the 4280.Sx INCOMPATIBILITIES WITH LEX AND POSIX 4281section, above. 4282.Sh AUTHORS 4283Vern Paxson, with the help of many ideas and much inspiration from 4284Van Jacobson. 4285Original version by Jef Poskanzer. 4286The fast table representation is a partial implementation of a design done by 4287Van Jacobson. 4288The implementation was done by Kevin Gong and Vern Paxson. 4289.Pp 4290Thanks to the many 4291.Nm 4292beta-testers, feedbackers, and contributors, especially Francois Pinard, 4293Casey Leedom, 4294Robert Abramovitz, 4295Stan Adermann, Terry Allen, David Barker-Plummer, John Basrai, 4296Neal Becker, Nelson H.F. Beebe, 4297.Mt benson@odi.com , 4298Karl Berry, Peter A. Bigot, Simon Blanchard, 4299Keith Bostic, Frederic Brehm, Ian Brockbank, Kin Cho, Nick Christopher, 4300Brian Clapper, J.T. Conklin, 4301Jason Coughlin, Bill Cox, Nick Cropper, Dave Curtis, Scott David 4302Daniels, Chris G. Demetriou, Theo de Raadt, 4303Mike Donahue, Chuck Doucette, Tom Epperly, Leo Eskin, 4304Chris Faylor, Chris Flatters, Jon Forrest, Jeffrey Friedl, 4305Joe Gayda, Kaveh R. Ghazi, Wolfgang Glunz, 4306Eric Goldman, Christopher M. Gould, Ulrich Grepel, Peer Griebel, 4307Jan Hajic, Charles Hemphill, NORO Hideo, 4308Jarkko Hietaniemi, Scott Hofmann, 4309Jeff Honig, Dana Hudes, Eric Hughes, John Interrante, 4310Ceriel Jacobs, Michal Jaegermann, Sakari Jalovaara, Jeffrey R. Jones, 4311Henry Juengst, Klaus Kaempf, Jonathan I. Kamens, Terrence O Kane, 4312Amir Katz, 4313.Mt ken@ken.hilco.com , 4314Kevin B. Kenny, 4315Steve Kirsch, Winfried Koenig, Marq Kole, Ronald Lamprecht, 4316Greg Lee, Rohan Lenard, Craig Leres, John Levine, Steve Liddle, 4317David Loffredo, Mike Long, 4318Mohamed el Lozy, Brian Madsen, Malte, Joe Marshall, 4319Bengt Martensson, Chris Metcalf, 4320Luke Mewburn, Jim Meyering, R. Alexander Milowski, Erik Naggum, 4321G.T. Nicol, Landon Noll, James Nordby, Marc Nozell, 4322Richard Ohnemus, Karsten Pahnke, 4323Sven Panne, Roland Pesch, Walter Pelissero, Gaumond Pierre, 4324Esmond Pitt, Jef Poskanzer, Joe Rahmeh, Jarmo Raiha, 4325Frederic Raimbault, Pat Rankin, Rick Richardson, 4326Kevin Rodgers, Kai Uwe Rommel, Jim Roskind, Alberto Santini, 4327Andreas Scherer, Darrell Schiebel, Raf Schietekat, 4328Doug Schmidt, Philippe Schnoebelen, Andreas Schwab, 4329Larry Schwimmer, Alex Siegel, Eckehard Stolz, Jan-Erik Strvmquist, 4330Mike Stump, Paul Stuart, Dave Tallman, Ian Lance Taylor, 4331Chris Thewalt, Richard M. Timoney, Jodi Tsai, 4332Paul Tuinenga, Gary Weik, Frank Whaley, Gerhard Wilhelms, Kent Williams, 4333Ken Yap, Ron Zellar, Nathan Zelle, David Zuhn, 4334and those whose names have slipped my marginal mail-archiving skills 4335but whose contributions are appreciated all the 4336same. 4337.Pp 4338Thanks to Keith Bostic, Jon Forrest, Noah Friedman, 4339John Gilmore, Craig Leres, John Levine, Bob Mulcahy, G.T. 4340Nicol, Francois Pinard, Rich Salz, and Richard Stallman for help with various 4341distribution headaches. 4342.Pp 4343Thanks to Esmond Pitt and Earle Horton for 8-bit character support; 4344to Benson Margulies and Fred Burke for C++ support; 4345to Kent Williams and Tom Epperly for C++ class support; 4346to Ove Ewerlid for support of NUL's; 4347and to Eric Hughes for support of multiple buffers. 4348.Pp 4349This work was primarily done when I was with the Real Time Systems Group 4350at the Lawrence Berkeley Laboratory in Berkeley, CA. 4351Many thanks to all there for the support I received. 4352.Pp 4353Send comments to 4354.Aq Mt vern@ee.lbl.gov . 4355.Sh BUGS 4356Some trailing context patterns cannot be properly matched and generate 4357warning messages 4358.Pq "dangerous trailing context" . 4359These are patterns where the ending of the first part of the rule 4360matches the beginning of the second part, such as 4361.Qq zx*/xy* , 4362where the 4363.Sq x* 4364matches the 4365.Sq x 4366at the beginning of the trailing context. 4367(Note that the POSIX draft states that the text matched by such patterns 4368is undefined.) 4369.Pp 4370For some trailing context rules, parts which are actually fixed-length are 4371not recognized as such, leading to the above mentioned performance loss. 4372In particular, parts using 4373.Sq |\& 4374or 4375.Sq {n} 4376(such as 4377.Qq foo{3} ) 4378are always considered variable-length. 4379.Pp 4380Combining trailing context with the special 4381.Sq |\& 4382action can result in fixed trailing context being turned into 4383the more expensive variable trailing context. 4384For example, in the following: 4385.Bd -literal -offset indent 4386%% 4387abc | 4388xyz/def 4389.Ed 4390.Pp 4391Use of 4392.Fn unput 4393invalidates yytext and yyleng, unless the 4394.Dq %array 4395directive 4396or the 4397.Fl l 4398option has been used. 4399.Pp 4400Pattern-matching of NUL's is substantially slower than matching other 4401characters. 4402.Pp 4403Dynamic resizing of the input buffer is slow, as it entails rescanning 4404all the text matched so far by the current 4405.Pq generally huge 4406token. 4407.Pp 4408Due to both buffering of input and read-ahead, 4409it is not possible to intermix calls to 4410.In stdio.h 4411routines, such as, for example, 4412.Fn getchar , 4413with 4414.Nm 4415rules and expect it to work. 4416Call 4417.Fn input 4418instead. 4419.Pp 4420The total table entries listed by the 4421.Fl v 4422flag excludes the number of table entries needed to determine 4423what rule has been matched. 4424The number of entries is equal to the number of DFA states 4425if the scanner does not use 4426.Em REJECT , 4427and somewhat greater than the number of states if it does. 4428.Pp 4429.Em REJECT 4430cannot be used with the 4431.Fl f 4432or 4433.Fl F 4434options. 4435.Pp 4436The 4437.Nm 4438internal algorithms need documentation. 4439