1.\" $OpenBSD: flex.1,v 1.33 2013/01/18 21:48:43 jmc Exp $ 2.\" 3.\" Copyright (c) 1990 The Regents of the University of California. 4.\" All rights reserved. 5.\" 6.\" This code is derived from software contributed to Berkeley by 7.\" Vern Paxson. 8.\" 9.\" The United States Government has rights in this work pursuant 10.\" to contract no. DE-AC03-76SF00098 between the United States 11.\" Department of Energy and the University of California. 12.\" 13.\" Redistribution and use in source and binary forms, with or without 14.\" modification, are permitted provided that the following conditions 15.\" are met: 16.\" 17.\" 1. Redistributions of source code must retain the above copyright 18.\" notice, this list of conditions and the following disclaimer. 19.\" 2. Redistributions in binary form must reproduce the above copyright 20.\" notice, this list of conditions and the following disclaimer in the 21.\" documentation and/or other materials provided with the distribution. 22.\" 23.\" Neither the name of the University nor the names of its contributors 24.\" may be used to endorse or promote products derived from this software 25.\" without specific prior written permission. 26.\" 27.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR 28.\" IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED 29.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 30.\" PURPOSE. 31.\" 32.Dd $Mdocdate: January 18 2013 $ 33.Dt FLEX 1 34.Os 35.Sh NAME 36.Nm flex 37.Nd fast lexical analyzer generator 38.Sh SYNOPSIS 39.Nm 40.Bk -words 41.Op Fl 78BbdFfhIiLlnpsTtVvw+? 42.Op Fl C Ns Op Cm aeFfmr 43.Op Fl Fl help 44.Op Fl Fl version 45.Op Fl o Ns Ar output 46.Op Fl P Ns Ar prefix 47.Op Fl S Ns Ar skeleton 48.Op Ar 49.Ek 50.Sh DESCRIPTION 51.Nm 52is a tool for generating 53.Em scanners : 54programs which recognize lexical patterns in text. 55.Nm 56reads the given input files, or its standard input if no file names are given, 57for a description of a scanner to generate. 58The description is in the form of pairs of regular expressions and C code, 59called 60.Em rules . 61.Nm 62generates as output a C source file, 63.Pa lex.yy.c , 64which defines a routine 65.Fn yylex . 66This file is compiled and linked with the 67.Fl lfl 68library to produce an executable. 69When the executable is run, it analyzes its input for occurrences 70of the regular expressions. 71Whenever it finds one, it executes the corresponding C code. 72.Pp 73The manual includes both tutorial and reference sections: 74.Bl -ohang 75.It Sy Some Simple Examples 76.It Sy Format of the Input File 77.It Sy Patterns 78The extended regular expressions used by 79.Nm . 80.It Sy How the Input is Matched 81The rules for determining what has been matched. 82.It Sy Actions 83How to specify what to do when a pattern is matched. 84.It Sy The Generated Scanner 85Details regarding the scanner that 86.Nm 87produces; 88how to control the input source. 89.It Sy Start Conditions 90Introducing context into scanners, and managing 91.Qq mini-scanners . 92.It Sy Multiple Input Buffers 93How to manipulate multiple input sources; 94how to scan from strings instead of files. 95.It Sy End-of-File Rules 96Special rules for matching the end of the input. 97.It Sy Miscellaneous Macros 98A summary of macros available to the actions. 99.It Sy Values Available to the User 100A summary of values available to the actions. 101.It Sy Interfacing with Yacc 102Connecting flex scanners together with 103.Xr yacc 1 104parsers. 105.It Sy Options 106.Nm 107command-line options, and the 108.Dq %option 109directive. 110.It Sy Performance Considerations 111How to make scanners go as fast as possible. 112.It Sy Generating C++ Scanners 113The 114.Pq experimental 115facility for generating C++ scanner classes. 116.It Sy Incompatibilities with Lex and POSIX 117How 118.Nm 119differs from AT&T lex and the 120.Tn POSIX 121lex standard. 122.It Sy Files 123Files used by 124.Nm . 125.It Sy Diagnostics 126Those error messages produced by 127.Nm 128.Pq or scanners it generates 129whose meanings might not be apparent. 130.It Sy See Also 131Other documentation, related tools. 132.It Sy Authors 133Includes contact information. 134.It Sy Bugs 135Known problems with 136.Nm . 137.El 138.Sh SOME SIMPLE EXAMPLES 139First some simple examples to get the flavor of how one uses 140.Nm . 141The following 142.Nm 143input specifies a scanner which whenever it encounters the string 144.Qq username 145will replace it with the user's login name: 146.Bd -literal -offset indent 147%% 148username printf("%s", getlogin()); 149.Ed 150.Pp 151By default, any text not matched by a 152.Nm 153scanner is copied to the output, so the net effect of this scanner is 154to copy its input file to its output with each occurrence of 155.Qq username 156expanded. 157In this input, there is just one rule. 158.Qq username 159is the 160.Em pattern 161and the 162.Qq printf 163is the 164.Em action . 165The 166.Qq %% 167marks the beginning of the rules. 168.Pp 169Here's another simple example: 170.Bd -literal -offset indent 171%{ 172int num_lines = 0, num_chars = 0; 173%} 174 175%% 176\en ++num_lines; ++num_chars; 177\&. ++num_chars; 178 179%% 180main() 181{ 182 yylex(); 183 printf("# of lines = %d, # of chars = %d\en", 184 num_lines, num_chars); 185} 186.Ed 187.Pp 188This scanner counts the number of characters and the number 189of lines in its input 190(it produces no output other than the final report on the counts). 191The first line declares two globals, 192.Qq num_lines 193and 194.Qq num_chars , 195which are accessible both inside 196.Fn yylex 197and in the 198.Fn main 199routine declared after the second 200.Qq %% . 201There are two rules, one which matches a newline 202.Pq \&"\en\&" 203and increments both the line count and the character count, 204and one which matches any character other than a newline 205(indicated by the 206.Qq \&. 207regular expression). 208.Pp 209A somewhat more complicated example: 210.Bd -literal -offset indent 211/* scanner for a toy Pascal-like language */ 212 213%{ 214/* need this for the call to atof() below */ 215#include <math.h> 216%} 217 218DIGIT [0-9] 219ID [a-z][a-z0-9]* 220 221%% 222 223{DIGIT}+ { 224 printf("An integer: %s (%d)\en", yytext, 225 atoi(yytext)); 226} 227 228{DIGIT}+"."{DIGIT}* { 229 printf("A float: %s (%g)\en", yytext, 230 atof(yytext)); 231} 232 233if|then|begin|end|procedure|function { 234 printf("A keyword: %s\en", yytext); 235} 236 237{ID} printf("An identifier: %s\en", yytext); 238 239"+"|"-"|"*"|"/" printf("An operator: %s\en", yytext); 240 241"{"[^}\en]*"}" /* eat up one-line comments */ 242 243[ \et\en]+ /* eat up whitespace */ 244 245\&. printf("Unrecognized character: %s\en", yytext); 246 247%% 248 249main(int argc, char *argv[]) 250{ 251 ++argv; --argc; /* skip over program name */ 252 if (argc > 0) 253 yyin = fopen(argv[0], "r"); 254 else 255 yyin = stdin; 256 257 yylex(); 258} 259.Ed 260.Pp 261This is the beginnings of a simple scanner for a language like Pascal. 262It identifies different types of 263.Em tokens 264and reports on what it has seen. 265.Pp 266The details of this example will be explained in the following sections. 267.Sh FORMAT OF THE INPUT FILE 268The 269.Nm 270input file consists of three sections, separated by a line with just 271.Qq %% 272in it: 273.Bd -unfilled -offset indent 274definitions 275%% 276rules 277%% 278user code 279.Ed 280.Pp 281The 282.Em definitions 283section contains declarations of simple 284.Em name 285definitions to simplify the scanner specification, and declarations of 286.Em start conditions , 287which are explained in a later section. 288.Pp 289Name definitions have the form: 290.Pp 291.D1 name definition 292.Pp 293The 294.Qq name 295is a word beginning with a letter or an underscore 296.Pq Sq _ 297followed by zero or more letters, digits, 298.Sq _ , 299or 300.Sq - 301.Pq dash . 302The definition is taken to begin at the first non-whitespace character 303following the name and continuing to the end of the line. 304The definition can subsequently be referred to using 305.Qq {name} , 306which will expand to 307.Qq (definition) . 308For example: 309.Bd -literal -offset indent 310DIGIT [0-9] 311ID [a-z][a-z0-9]* 312.Ed 313.Pp 314This defines 315.Qq DIGIT 316to be a regular expression which matches a single digit, and 317.Qq ID 318to be a regular expression which matches a letter 319followed by zero-or-more letters-or-digits. 320A subsequent reference to 321.Pp 322.Dl {DIGIT}+"."{DIGIT}* 323.Pp 324is identical to 325.Pp 326.Dl ([0-9])+"."([0-9])* 327.Pp 328and matches one-or-more digits followed by a 329.Sq .\& 330followed by zero-or-more digits. 331.Pp 332The 333.Em rules 334section of the 335.Nm 336input contains a series of rules of the form: 337.Pp 338.D1 pattern action 339.Pp 340The pattern must be unindented and the action must begin 341on the same line. 342.Pp 343See below for a further description of patterns and actions. 344.Pp 345Finally, the user code section is simply copied to 346.Pa lex.yy.c 347verbatim. 348It is used for companion routines which call or are called by the scanner. 349The presence of this section is optional; 350if it is missing, the second 351.Qq %% 352in the input file may be skipped too. 353.Pp 354In the definitions and rules sections, any indented text or text enclosed in 355.Sq %{ 356and 357.Sq %} 358is copied verbatim to the output 359.Pq with the %{}'s removed . 360The %{}'s must appear unindented on lines by themselves. 361.Pp 362In the rules section, 363any indented or %{} text appearing before the first rule may be used to 364declare variables which are local to the scanning routine and 365.Pq after the declarations 366code which is to be executed whenever the scanning routine is entered. 367Other indented or %{} text in the rule section is still copied to the output, 368but its meaning is not well-defined and it may well cause compile-time 369errors (this feature is present for 370.Tn POSIX 371compliance; see below for other such features). 372.Pp 373In the definitions section 374.Pq but not in the rules section , 375an unindented comment 376(i.e., a line beginning with 377.Qq /* ) 378is also copied verbatim to the output up to the next 379.Qq */ . 380.Sh PATTERNS 381The patterns in the input are written using an extended set of regular 382expressions. 383These are: 384.Bl -tag -width "XXXXXXXX" 385.It x 386Match the character 387.Sq x . 388.It .\& 389Any character 390.Pq byte 391except newline. 392.It [xyz] 393A 394.Qq character class ; 395in this case, the pattern matches either an 396.Sq x , 397a 398.Sq y , 399or a 400.Sq z . 401.It [abj-oZ] 402A 403.Qq character class 404with a range in it; matches an 405.Sq a , 406a 407.Sq b , 408any letter from 409.Sq j 410through 411.Sq o , 412or a 413.Sq Z . 414.It [^A-Z] 415A 416.Qq negated character class , 417i.e., any character but those in the class. 418In this case, any character EXCEPT an uppercase letter. 419.It [^A-Z\en] 420Any character EXCEPT an uppercase letter or a newline. 421.It r* 422Zero or more r's, where 423.Sq r 424is any regular expression. 425.It r+ 426One or more r's. 427.It r? 428Zero or one r's (that is, 429.Qq an optional r ) . 430.It r{2,5} 431Anywhere from two to five r's. 432.It r{2,} 433Two or more r's. 434.It r{4} 435Exactly 4 r's. 436.It {name} 437The expansion of the 438.Qq name 439definition 440.Pq see above . 441.It \&"[xyz]\e\&"foo\&" 442The literal string: [xyz]"foo. 443.It \eX 444If 445.Sq X 446is an 447.Sq a , 448.Sq b , 449.Sq f , 450.Sq n , 451.Sq r , 452.Sq t , 453or 454.Sq v , 455then the ANSI-C interpretation of 456.Sq \eX . 457Otherwise, a literal 458.Sq X 459(used to escape operators such as 460.Sq * ) . 461.It \e0 462A NUL character 463.Pq ASCII code 0 . 464.It \e123 465The character with octal value 123. 466.It \ex2a 467The character with hexadecimal value 2a. 468.It (r) 469Match an 470.Sq r ; 471parentheses are used to override precedence 472.Pq see below . 473.It rs 474The regular expression 475.Sq r 476followed by the regular expression 477.Sq s ; 478called 479.Qq concatenation . 480.It r|s 481Either an 482.Sq r 483or an 484.Sq s . 485.It r/s 486An 487.Sq r , 488but only if it is followed by an 489.Sq s . 490The text matched by 491.Sq s 492is included when determining whether this rule is the 493.Qq longest match , 494but is then returned to the input before the action is executed. 495So the action only sees the text matched by 496.Sq r . 497This type of pattern is called 498.Qq trailing context . 499(There are some combinations of r/s that 500.Nm 501cannot match correctly; see notes in the 502.Sx BUGS 503section below regarding 504.Qq dangerous trailing context . ) 505.It ^r 506An 507.Sq r , 508but only at the beginning of a line 509(i.e., just starting to scan, or right after a newline has been scanned). 510.It r$ 511An 512.Sq r , 513but only at the end of a line 514.Pq i.e., just before a newline . 515Equivalent to 516.Qq r/\en . 517.Pp 518Note that 519.Nm flex Ns 's 520notion of 521.Qq newline 522is exactly whatever the C compiler used to compile 523.Nm 524interprets 525.Sq \en 526as. 527.\" In particular, on some DOS systems you must either filter out \er's in the 528.\" input yourself, or explicitly use r/\er\en for 529.\" .Qq r$ . 530.It <s>r 531An 532.Sq r , 533but only in start condition 534.Sq s 535.Pq see below for discussion of start conditions . 536.It <s1,s2,s3>r 537The same, but in any of start conditions s1, s2, or s3. 538.It <*>r 539An 540.Sq r 541in any start condition, even an exclusive one. 542.It <<EOF>> 543An end-of-file. 544.It <s1,s2><<EOF>> 545An end-of-file when in start condition s1 or s2. 546.El 547.Pp 548Note that inside of a character class, all regular expression operators 549lose their special meaning except escape 550.Pq Sq \e 551and the character class operators, 552.Sq - , 553.Sq ]\& , 554and, at the beginning of the class, 555.Sq ^ . 556.Pp 557The regular expressions listed above are grouped according to 558precedence, from highest precedence at the top to lowest at the bottom. 559Those grouped together have equal precedence. 560For example, 561.Pp 562.D1 foo|bar* 563.Pp 564is the same as 565.Pp 566.D1 (foo)|(ba(r*)) 567.Pp 568since the 569.Sq * 570operator has higher precedence than concatenation, 571and concatenation higher than alternation 572.Pq Sq |\& . 573This pattern therefore matches 574.Em either 575the string 576.Qq foo 577.Em or 578the string 579.Qq ba 580followed by zero-or-more r's. 581To match 582.Qq foo 583or zero-or-more "bar"'s, 584use: 585.Pp 586.D1 foo|(bar)* 587.Pp 588and to match zero-or-more "foo"'s-or-"bar"'s: 589.Pp 590.D1 (foo|bar)* 591.Pp 592In addition to characters and ranges of characters, character classes 593can also contain character class 594.Em expressions . 595These are expressions enclosed inside 596.Sq [: 597and 598.Sq :] 599delimiters (which themselves must appear between the 600.Sq \&[ 601and 602.Sq ]\& 603of the 604character class; other elements may occur inside the character class, too). 605The valid expressions are: 606.Bd -unfilled -offset indent 607[:alnum:] [:alpha:] [:blank:] 608[:cntrl:] [:digit:] [:graph:] 609[:lower:] [:print:] [:punct:] 610[:space:] [:upper:] [:xdigit:] 611.Ed 612.Pp 613These expressions all designate a set of characters equivalent to 614the corresponding standard C 615.Fn isXXX 616function. 617For example, [:alnum:] designates those characters for which 618.Xr isalnum 3 619returns true \- i.e., any alphabetic or numeric. 620Some systems don't provide 621.Xr isblank 3 , 622so 623.Nm 624defines [:blank:] as a blank or a tab. 625.Pp 626For example, the following character classes are all equivalent: 627.Bd -unfilled -offset indent 628[[:alnum:]] 629[[:alpha:][:digit:]] 630[[:alpha:]0-9] 631[a-zA-Z0-9] 632.Ed 633.Pp 634If the scanner is case-insensitive (the 635.Fl i 636flag), then [:upper:] and [:lower:] are equivalent to [:alpha:]. 637.Pp 638Some notes on patterns: 639.Bl -dash 640.It 641A negated character class such as the example 642.Qq [^A-Z] 643above will match a newline unless "\en" 644.Pq or an equivalent escape sequence 645is one of the characters explicitly present in the negated character class 646(e.g., 647.Qq [^A-Z\en] ) . 648This is unlike how many other regular expression tools treat negated character 649classes, but unfortunately the inconsistency is historically entrenched. 650Matching newlines means that a pattern like 651.Qq [^"]* 652can match the entire input unless there's another quote in the input. 653.It 654A rule can have at most one instance of trailing context 655(the 656.Sq / 657operator or the 658.Sq $ 659operator). 660The start condition, 661.Sq ^ , 662and 663.Qq <<EOF>> 664patterns can only occur at the beginning of a pattern, and, as well as with 665.Sq / 666and 667.Sq $ , 668cannot be grouped inside parentheses. 669A 670.Sq ^ 671which does not occur at the beginning of a rule or a 672.Sq $ 673which does not occur at the end of a rule loses its special properties 674and is treated as a normal character. 675.It 676The following are illegal: 677.Bd -unfilled -offset indent 678foo/bar$ 679<sc1>foo<sc2>bar 680.Ed 681.Pp 682Note that the first of these, can be written 683.Qq foo/bar\en . 684.It 685The following will result in 686.Sq $ 687or 688.Sq ^ 689being treated as a normal character: 690.Bd -unfilled -offset indent 691foo|(bar$) 692foo|^bar 693.Ed 694.Pp 695If what's wanted is a 696.Qq foo 697or a bar-followed-by-a-newline, the following could be used 698(the special 699.Sq |\& 700action is explained below): 701.Bd -unfilled -offset indent 702foo | 703bar$ /* action goes here */ 704.Ed 705.Pp 706A similar trick will work for matching a foo or a 707bar-at-the-beginning-of-a-line. 708.El 709.Sh HOW THE INPUT IS MATCHED 710When the generated scanner is run, 711it analyzes its input looking for strings which match any of its patterns. 712If it finds more than one match, 713it takes the one matching the most text 714(for trailing context rules, this includes the length of the trailing part, 715even though it will then be returned to the input). 716If it finds two or more matches of the same length, 717the rule listed first in the 718.Nm 719input file is chosen. 720.Pp 721Once the match is determined, the text corresponding to the match 722(called the 723.Em token ) 724is made available in the global character pointer 725.Fa yytext , 726and its length in the global integer 727.Fa yyleng . 728The 729.Em action 730corresponding to the matched pattern is then executed 731.Pq a more detailed description of actions follows , 732and then the remaining input is scanned for another match. 733.Pp 734If no match is found, then the default rule is executed: 735the next character in the input is considered matched and 736copied to the standard output. 737Thus, the simplest legal 738.Nm 739input is: 740.Pp 741.D1 %% 742.Pp 743which generates a scanner that simply copies its input 744.Pq one character at a time 745to its output. 746.Pp 747Note that 748.Fa yytext 749can be defined in two different ways: 750either as a character pointer or as a character array. 751Which definition 752.Nm 753uses can be controlled by including one of the special directives 754.Dq %pointer 755or 756.Dq %array 757in the first 758.Pq definitions 759section of flex input. 760The default is 761.Dq %pointer , 762unless the 763.Fl l 764lex compatibility option is used, in which case 765.Fa yytext 766will be an array. 767The advantage of using 768.Dq %pointer 769is substantially faster scanning and no buffer overflow when matching 770very large tokens 771.Pq unless not enough dynamic memory is available . 772The disadvantage is that actions are restricted in how they can modify 773.Fa yytext 774.Pq see the next section , 775and calls to the 776.Fn unput 777function destroy the present contents of 778.Fa yytext , 779which can be a considerable porting headache when moving between different 780.Nm lex 781versions. 782.Pp 783The advantage of 784.Dq %array 785is that 786.Fa yytext 787can be modified as much as wanted, and calls to 788.Fn unput 789do not destroy 790.Fa yytext 791.Pq see below . 792Furthermore, existing 793.Nm lex 794programs sometimes access 795.Fa yytext 796externally using declarations of the form: 797.Pp 798.D1 extern char yytext[]; 799.Pp 800This definition is erroneous when used with 801.Dq %pointer , 802but correct for 803.Dq %array . 804.Pp 805.Dq %array 806defines 807.Fa yytext 808to be an array of 809.Dv YYLMAX 810characters, which defaults to a fairly large value. 811The size can be changed by simply #define'ing 812.Dv YYLMAX 813to a different value in the first section of 814.Nm 815input. 816As mentioned above, with 817.Dq %pointer 818yytext grows dynamically to accommodate large tokens. 819While this means a 820.Dq %pointer 821scanner can accommodate very large tokens 822.Pq such as matching entire blocks of comments , 823bear in mind that each time the scanner must resize 824.Fa yytext 825it also must rescan the entire token from the beginning, so matching such 826tokens can prove slow. 827.Fa yytext 828presently does not dynamically grow if a call to 829.Fn unput 830results in too much text being pushed back; instead, a run-time error results. 831.Pp 832Also note that 833.Dq %array 834cannot be used with C++ scanner classes 835.Pq the c++ option; see below . 836.Sh ACTIONS 837Each pattern in a rule has a corresponding action, 838which can be any arbitrary C statement. 839The pattern ends at the first non-escaped whitespace character; 840the remainder of the line is its action. 841If the action is empty, 842then when the pattern is matched the input token is simply discarded. 843For example, here is the specification for a program 844which deletes all occurrences of 845.Qq zap me 846from its input: 847.Bd -literal -offset indent 848%% 849"zap me" 850.Ed 851.Pp 852(It will copy all other characters in the input to the output since 853they will be matched by the default rule.) 854.Pp 855Here is a program which compresses multiple blanks and tabs down to 856a single blank, and throws away whitespace found at the end of a line: 857.Bd -literal -offset indent 858%% 859[ \et]+ putchar(' '); 860[ \et]+$ /* ignore this token */ 861.Ed 862.Pp 863If the action contains a 864.Sq { , 865then the action spans till the balancing 866.Sq } 867is found, and the action may cross multiple lines. 868.Nm 869knows about C strings and comments and won't be fooled by braces found 870within them, but also allows actions to begin with 871.Sq %{ 872and will consider the action to be all the text up to the next 873.Sq %} 874.Pq regardless of ordinary braces inside the action . 875.Pp 876An action consisting solely of a vertical bar 877.Pq Sq |\& 878means 879.Qq same as the action for the next rule . 880See below for an illustration. 881.Pp 882Actions can include arbitrary C code, 883including return statements to return a value to whatever routine called 884.Fn yylex . 885Each time 886.Fn yylex 887is called, it continues processing tokens from where it last left off 888until it either reaches the end of the file or executes a return. 889.Pp 890Actions are free to modify 891.Fa yytext 892except for lengthening it 893(adding characters to its end \- these will overwrite later characters in the 894input stream). 895This, however, does not apply when using 896.Dq %array 897.Pq see above ; 898in that case, 899.Fa yytext 900may be freely modified in any way. 901.Pp 902Actions are free to modify 903.Fa yyleng 904except they should not do so if the action also includes use of 905.Fn yymore 906.Pq see below . 907.Pp 908There are a number of special directives which can be included within 909an action: 910.Bl -tag -width Ds 911.It ECHO 912Copies 913.Fa yytext 914to the scanner's output. 915.It BEGIN 916Followed by the name of a start condition, places the scanner in the 917corresponding start condition 918.Pq see below . 919.It REJECT 920Directs the scanner to proceed on to the 921.Qq second best 922rule which matched the input 923.Pq or a prefix of the input . 924The rule is chosen as described above in 925.Sx HOW THE INPUT IS MATCHED , 926and 927.Fa yytext 928and 929.Fa yyleng 930set up appropriately. 931It may either be one which matched as much text 932as the originally chosen rule but came later in the 933.Nm 934input file, or one which matched less text. 935For example, the following will both count the 936words in the input and call the routine 937.Fn special 938whenever 939.Qq frob 940is seen: 941.Bd -literal -offset indent 942int word_count = 0; 943%% 944 945frob special(); REJECT; 946[^ \et\en]+ ++word_count; 947.Ed 948.Pp 949Without the 950.Em REJECT , 951any "frob"'s in the input would not be counted as words, 952since the scanner normally executes only one action per token. 953Multiple 954.Em REJECT Ns 's 955are allowed, 956each one finding the next best choice to the currently active rule. 957For example, when the following scanner scans the token 958.Qq abcd , 959it will write 960.Qq abcdabcaba 961to the output: 962.Bd -literal -offset indent 963%% 964a | 965ab | 966abc | 967abcd ECHO; REJECT; 968\&.|\en /* eat up any unmatched character */ 969.Ed 970.Pp 971(The first three rules share the fourth's action since they use 972the special 973.Sq |\& 974action.) 975.Em REJECT 976is a particularly expensive feature in terms of scanner performance; 977if it is used in any of the scanner's actions it will slow down 978all of the scanner's matching. 979Furthermore, 980.Em REJECT 981cannot be used with the 982.Fl Cf 983or 984.Fl CF 985options 986.Pq see below . 987.Pp 988Note also that unlike the other special actions, 989.Em REJECT 990is a 991.Em branch ; 992code immediately following it in the action will not be executed. 993.It yymore() 994Tells the scanner that the next time it matches a rule, the corresponding 995token should be appended onto the current value of 996.Fa yytext 997rather than replacing it. 998For example, given the input 999.Qq mega-kludge 1000the following will write 1001.Qq mega-mega-kludge 1002to the output: 1003.Bd -literal -offset indent 1004%% 1005mega- ECHO; yymore(); 1006kludge ECHO; 1007.Ed 1008.Pp 1009First 1010.Qq mega- 1011is matched and echoed to the output. 1012Then 1013.Qq kludge 1014is matched, but the previous 1015.Qq mega- 1016is still hanging around at the beginning of 1017.Fa yytext 1018so the 1019.Em ECHO 1020for the 1021.Qq kludge 1022rule will actually write 1023.Qq mega-kludge . 1024.Pp 1025Two notes regarding use of 1026.Fn yymore : 1027First, 1028.Fn yymore 1029depends on the value of 1030.Fa yyleng 1031correctly reflecting the size of the current token, so 1032.Fa yyleng 1033must not be modified when using 1034.Fn yymore . 1035Second, the presence of 1036.Fn yymore 1037in the scanner's action entails a minor performance penalty in the 1038scanner's matching speed. 1039.It yyless(n) 1040Returns all but the first 1041.Ar n 1042characters of the current token back to the input stream, where they 1043will be rescanned when the scanner looks for the next match. 1044.Fa yytext 1045and 1046.Fa yyleng 1047are adjusted appropriately (e.g., 1048.Fa yyleng 1049will now be equal to 1050.Ar n ) . 1051For example, on the input 1052.Qq foobar 1053the following will write out 1054.Qq foobarbar : 1055.Bd -literal -offset indent 1056%% 1057foobar ECHO; yyless(3); 1058[a-z]+ ECHO; 1059.Ed 1060.Pp 1061An argument of 0 to 1062.Fa yyless 1063will cause the entire current input string to be scanned again. 1064Unless how the scanner will subsequently process its input has been changed 1065(using 1066.Em BEGIN , 1067for example), 1068this will result in an endless loop. 1069.Pp 1070Note that 1071.Fa yyless 1072is a macro and can only be used in the 1073.Nm 1074input file, not from other source files. 1075.It unput(c) 1076Puts the character 1077.Ar c 1078back into the input stream. 1079It will be the next character scanned. 1080The following action will take the current token and cause it 1081to be rescanned enclosed in parentheses. 1082.Bd -literal -offset indent 1083{ 1084 int i; 1085 char *yycopy; 1086 1087 /* Copy yytext because unput() trashes yytext */ 1088 if ((yycopy = strdup(yytext)) == NULL) 1089 err(1, NULL); 1090 unput(')'); 1091 for (i = yyleng - 1; i >= 0; --i) 1092 unput(yycopy[i]); 1093 unput('('); 1094 free(yycopy); 1095} 1096.Ed 1097.Pp 1098Note that since each 1099.Fn unput 1100puts the given character back at the beginning of the input stream, 1101pushing back strings must be done back-to-front. 1102.Pp 1103An important potential problem when using 1104.Fn unput 1105is that if using 1106.Dq %pointer 1107.Pq the default , 1108a call to 1109.Fn unput 1110destroys the contents of 1111.Fa yytext , 1112starting with its rightmost character and devouring one character to 1113the left with each call. 1114If the value of 1115.Fa yytext 1116should be preserved after a call to 1117.Fn unput 1118.Pq as in the above example , 1119it must either first be copied elsewhere, or the scanner must be built using 1120.Dq %array 1121instead (see 1122.Sx HOW THE INPUT IS MATCHED ) . 1123.Pp 1124Finally, note that EOF cannot be put back 1125to attempt to mark the input stream with an end-of-file. 1126.It input() 1127Reads the next character from the input stream. 1128For example, the following is one way to eat up C comments: 1129.Bd -literal -offset indent 1130%% 1131"/*" { 1132 int c; 1133 1134 for (;;) { 1135 while ((c = input()) != '*' && c != EOF) 1136 ; /* eat up text of comment */ 1137 1138 if (c == '*') { 1139 while ((c = input()) == '*') 1140 ; 1141 if (c == '/') 1142 break; /* found the end */ 1143 } 1144 1145 if (c == EOF) { 1146 errx(1, "EOF in comment"); 1147 break; 1148 } 1149 } 1150} 1151.Ed 1152.Pp 1153(Note that if the scanner is compiled using C++, then 1154.Fn input 1155is instead referred to as 1156.Fn yyinput , 1157in order to avoid a name clash with the C++ stream by the name of input.) 1158.It YY_FLUSH_BUFFER 1159Flushes the scanner's internal buffer 1160so that the next time the scanner attempts to match a token, 1161it will first refill the buffer using 1162.Dv YY_INPUT 1163(see 1164.Sx THE GENERATED SCANNER , 1165below). 1166This action is a special case of the more general 1167.Fn yy_flush_buffer 1168function, described below in the section 1169.Sx MULTIPLE INPUT BUFFERS . 1170.It yyterminate() 1171Can be used in lieu of a return statement in an action. 1172It terminates the scanner and returns a 0 to the scanner's caller, indicating 1173.Qq all done . 1174By default, 1175.Fn yyterminate 1176is also called when an end-of-file is encountered. 1177It is a macro and may be redefined. 1178.El 1179.Sh THE GENERATED SCANNER 1180The output of 1181.Nm 1182is the file 1183.Pa lex.yy.c , 1184which contains the scanning routine 1185.Fn yylex , 1186a number of tables used by it for matching tokens, 1187and a number of auxiliary routines and macros. 1188By default, 1189.Fn yylex 1190is declared as follows: 1191.Bd -unfilled -offset indent 1192int yylex() 1193{ 1194 ... various definitions and the actions in here ... 1195} 1196.Ed 1197.Pp 1198(If the environment supports function prototypes, then it will 1199be "int yylex(void)".) 1200This definition may be changed by defining the 1201.Dv YY_DECL 1202macro. 1203For example: 1204.Bd -literal -offset indent 1205#define YY_DECL float lexscan(a, b) float a, b; 1206.Ed 1207.Pp 1208would give the scanning routine the name 1209.Em lexscan , 1210returning a float, and taking two floats as arguments. 1211Note that if arguments are given to the scanning routine using a 1212K&R-style/non-prototyped function declaration, 1213the definition must be terminated with a semi-colon 1214.Pq Sq ;\& . 1215.Pp 1216Whenever 1217.Fn yylex 1218is called, it scans tokens from the global input file 1219.Pa yyin 1220.Pq which defaults to stdin . 1221It continues until it either reaches an end-of-file 1222.Pq at which point it returns the value 0 1223or one of its actions executes a 1224.Em return 1225statement. 1226.Pp 1227If the scanner reaches an end-of-file, subsequent calls are undefined 1228unless either 1229.Em yyin 1230is pointed at a new input file 1231.Pq in which case scanning continues from that file , 1232or 1233.Fn yyrestart 1234is called. 1235.Fn yyrestart 1236takes one argument, a 1237.Fa FILE * 1238pointer (which can be nil, if 1239.Dv YY_INPUT 1240has been set up to scan from a source other than 1241.Em yyin ) , 1242and initializes 1243.Em yyin 1244for scanning from that file. 1245Essentially there is no difference between just assigning 1246.Em yyin 1247to a new input file or using 1248.Fn yyrestart 1249to do so; the latter is available for compatibility with previous versions of 1250.Nm , 1251and because it can be used to switch input files in the middle of scanning. 1252It can also be used to throw away the current input buffer, 1253by calling it with an argument of 1254.Em yyin ; 1255but better is to use 1256.Dv YY_FLUSH_BUFFER 1257.Pq see above . 1258Note that 1259.Fn yyrestart 1260does not reset the start condition to 1261.Em INITIAL 1262(see 1263.Sx START CONDITIONS , 1264below). 1265.Pp 1266If 1267.Fn yylex 1268stops scanning due to executing a 1269.Em return 1270statement in one of the actions, the scanner may then be called again and it 1271will resume scanning where it left off. 1272.Pp 1273By default 1274.Pq and for purposes of efficiency , 1275the scanner uses block-reads rather than simple 1276.Xr getc 3 1277calls to read characters from 1278.Em yyin . 1279The nature of how it gets its input can be controlled by defining the 1280.Dv YY_INPUT 1281macro. 1282.Dv YY_INPUT Ns 's 1283calling sequence is 1284.Qq YY_INPUT(buf,result,max_size) . 1285Its action is to place up to 1286.Dv max_size 1287characters in the character array 1288.Em buf 1289and return in the integer variable 1290.Em result 1291either the number of characters read or the constant 1292.Dv YY_NULL 1293(0 on 1294.Ux 1295systems) 1296to indicate 1297.Dv EOF . 1298The default 1299.Dv YY_INPUT 1300reads from the global file-pointer 1301.Qq yyin . 1302.Pp 1303A sample definition of 1304.Dv YY_INPUT 1305.Pq in the definitions section of the input file : 1306.Bd -unfilled -offset indent 1307%{ 1308#define YY_INPUT(buf,result,max_size) \e 1309{ \e 1310 int c = getchar(); \e 1311 result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \e 1312} 1313%} 1314.Ed 1315.Pp 1316This definition will change the input processing to occur 1317one character at a time. 1318.Pp 1319When the scanner receives an end-of-file indication from 1320.Dv YY_INPUT , 1321it then checks the 1322.Fn yywrap 1323function. 1324If 1325.Fn yywrap 1326returns false 1327.Pq zero , 1328then it is assumed that the function has gone ahead and set up 1329.Em yyin 1330to point to another input file, and scanning continues. 1331If it returns true 1332.Pq non-zero , 1333then the scanner terminates, returning 0 to its caller. 1334Note that in either case, the start condition remains unchanged; 1335it does not revert to 1336.Em INITIAL . 1337.Pp 1338If you do not supply your own version of 1339.Fn yywrap , 1340then you must either use 1341.Dq %option noyywrap 1342(in which case the scanner behaves as though 1343.Fn yywrap 1344returned 1), or you must link with 1345.Fl lfl 1346to obtain the default version of the routine, which always returns 1. 1347.Pp 1348Three routines are available for scanning from in-memory buffers rather 1349than files: 1350.Fn yy_scan_string , 1351.Fn yy_scan_bytes , 1352and 1353.Fn yy_scan_buffer . 1354See the discussion of them below in the section 1355.Sx MULTIPLE INPUT BUFFERS . 1356.Pp 1357The scanner writes its 1358.Em ECHO 1359output to the 1360.Em yyout 1361global 1362.Pq default, stdout , 1363which may be redefined by the user simply by assigning it to some other 1364.Va FILE 1365pointer. 1366.Sh START CONDITIONS 1367.Nm 1368provides a mechanism for conditionally activating rules. 1369Any rule whose pattern is prefixed with 1370.Qq Aq sc 1371will only be active when the scanner is in the start condition named 1372.Qq sc . 1373For example, 1374.Bd -literal -offset indent 1375<STRING>[^"]* { /* eat up the string body ... */ 1376 ... 1377} 1378.Ed 1379.Pp 1380will be active only when the scanner is in the 1381.Qq STRING 1382start condition, and 1383.Bd -literal -offset indent 1384<INITIAL,STRING,QUOTE>\e. { /* handle an escape ... */ 1385 ... 1386} 1387.Ed 1388.Pp 1389will be active only when the current start condition is either 1390.Qq INITIAL , 1391.Qq STRING , 1392or 1393.Qq QUOTE . 1394.Pp 1395Start conditions are declared in the definitions 1396.Pq first 1397section of the input using unindented lines beginning with either 1398.Sq %s 1399or 1400.Sq %x 1401followed by a list of names. 1402The former declares 1403.Em inclusive 1404start conditions, the latter 1405.Em exclusive 1406start conditions. 1407A start condition is activated using the 1408.Em BEGIN 1409action. 1410Until the next 1411.Em BEGIN 1412action is executed, rules with the given start condition will be active and 1413rules with other start conditions will be inactive. 1414If the start condition is inclusive, 1415then rules with no start conditions at all will also be active. 1416If it is exclusive, 1417then only rules qualified with the start condition will be active. 1418A set of rules contingent on the same exclusive start condition 1419describe a scanner which is independent of any of the other rules in the 1420.Nm 1421input. 1422Because of this, exclusive start conditions make it easy to specify 1423.Qq mini-scanners 1424which scan portions of the input that are syntactically different 1425from the rest 1426.Pq e.g., comments . 1427.Pp 1428If the distinction between inclusive and exclusive start conditions 1429is still a little vague, here's a simple example illustrating the 1430connection between the two. 1431The set of rules: 1432.Bd -literal -offset indent 1433%s example 1434%% 1435 1436<example>foo do_something(); 1437 1438bar something_else(); 1439.Ed 1440.Pp 1441is equivalent to 1442.Bd -literal -offset indent 1443%x example 1444%% 1445 1446<example>foo do_something(); 1447 1448<INITIAL,example>bar something_else(); 1449.Ed 1450.Pp 1451Without the 1452.Aq INITIAL,example 1453qualifier, the 1454.Dq bar 1455pattern in the second example wouldn't be active 1456.Pq i.e., couldn't match 1457when in start condition 1458.Dq example . 1459If we just used 1460.Aq example 1461to qualify 1462.Dq bar , 1463though, then it would only be active in 1464.Dq example 1465and not in 1466.Em INITIAL , 1467while in the first example it's active in both, 1468because in the first example the 1469.Dq example 1470start condition is an inclusive 1471.Pq Sq %s 1472start condition. 1473.Pp 1474Also note that the special start-condition specifier 1475.Sq Aq * 1476matches every start condition. 1477Thus, the above example could also have been written: 1478.Bd -literal -offset indent 1479%x example 1480%% 1481 1482<example>foo do_something(); 1483 1484<*>bar something_else(); 1485.Ed 1486.Pp 1487The default rule (to 1488.Em ECHO 1489any unmatched character) remains active in start conditions. 1490It is equivalent to: 1491.Bd -literal -offset indent 1492<*>.|\en ECHO; 1493.Ed 1494.Pp 1495.Dq BEGIN(0) 1496returns to the original state where only the rules with 1497no start conditions are active. 1498This state can also be referred to as the start-condition 1499.Em INITIAL , 1500so 1501.Dq BEGIN(INITIAL) 1502is equivalent to 1503.Dq BEGIN(0) . 1504(The parentheses around the start condition name are not required but 1505are considered good style.) 1506.Pp 1507.Em BEGIN 1508actions can also be given as indented code at the beginning 1509of the rules section. 1510For example, the following will cause the scanner to enter the 1511.Qq SPECIAL 1512start condition whenever 1513.Fn yylex 1514is called and the global variable 1515.Fa enter_special 1516is true: 1517.Bd -literal -offset indent 1518int enter_special; 1519 1520%x SPECIAL 1521%% 1522 if (enter_special) 1523 BEGIN(SPECIAL); 1524 1525<SPECIAL>blahblahblah 1526\&...more rules follow... 1527.Ed 1528.Pp 1529To illustrate the uses of start conditions, 1530here is a scanner which provides two different interpretations 1531of a string like 1532.Qq 123.456 . 1533By default it will treat it as three tokens: the integer 1534.Qq 123 , 1535a dot 1536.Pq Sq .\& , 1537and the integer 1538.Qq 456 . 1539But if the string is preceded earlier in the line by the string 1540.Qq expect-floats 1541it will treat it as a single token, the floating-point number 123.456: 1542.Bd -literal -offset indent 1543%{ 1544#include <math.h> 1545%} 1546%s expect 1547 1548%% 1549expect-floats BEGIN(expect); 1550 1551<expect>[0-9]+"."[0-9]+ { 1552 printf("found a float, = %f\en", 1553 atof(yytext)); 1554} 1555<expect>\en { 1556 /* 1557 * That's the end of the line, so 1558 * we need another "expect-number" 1559 * before we'll recognize any more 1560 * numbers. 1561 */ 1562 BEGIN(INITIAL); 1563} 1564 1565[0-9]+ { 1566 printf("found an integer, = %d\en", 1567 atoi(yytext)); 1568} 1569 1570"." printf("found a dot\en"); 1571.Ed 1572.Pp 1573Here is a scanner which recognizes 1574.Pq and discards 1575C comments while maintaining a count of the current input line: 1576.Bd -literal -offset indent 1577%x comment 1578%% 1579int line_num = 1; 1580 1581"/*" BEGIN(comment); 1582 1583<comment>[^*\en]* /* eat anything that's not a '*' */ 1584<comment>"*"+[^*/\en]* /* eat up '*'s not followed by '/'s */ 1585<comment>\en ++line_num; 1586<comment>"*"+"/" BEGIN(INITIAL); 1587.Ed 1588.Pp 1589This scanner goes to a bit of trouble to match as much 1590text as possible with each rule. 1591In general, when attempting to write a high-speed scanner 1592try to match as much as possible in each rule, as it's a big win. 1593.Pp 1594Note that start-condition names are really integer values and 1595can be stored as such. 1596Thus, the above could be extended in the following fashion: 1597.Bd -literal -offset indent 1598%x comment foo 1599%% 1600int line_num = 1; 1601int comment_caller; 1602 1603"/*" { 1604 comment_caller = INITIAL; 1605 BEGIN(comment); 1606} 1607 1608\&... 1609 1610<foo>"/*" { 1611 comment_caller = foo; 1612 BEGIN(comment); 1613} 1614 1615<comment>[^*\en]* /* eat anything that's not a '*' */ 1616<comment>"*"+[^*/\en]* /* eat up '*'s not followed by '/'s */ 1617<comment>\en ++line_num; 1618<comment>"*"+"/" BEGIN(comment_caller); 1619.Ed 1620.Pp 1621Furthermore, the current start condition can be accessed by using 1622the integer-valued 1623.Dv YY_START 1624macro. 1625For example, the above assignments to 1626.Em comment_caller 1627could instead be written 1628.Pp 1629.Dl comment_caller = YY_START; 1630.Pp 1631Flex provides 1632.Dv YYSTATE 1633as an alias for 1634.Dv YY_START 1635(since that is what's used by AT&T 1636.Nm lex ) . 1637.Pp 1638Note that start conditions do not have their own name-space; 1639%s's and %x's declare names in the same fashion as #define's. 1640.Pp 1641Finally, here's an example of how to match C-style quoted strings using 1642exclusive start conditions, including expanded escape sequences 1643(but not including checking for a string that's too long): 1644.Bd -literal -offset indent 1645%x str 1646 1647%% 1648#define MAX_STR_CONST 1024 1649char string_buf[MAX_STR_CONST]; 1650char *string_buf_ptr; 1651 1652\e" string_buf_ptr = string_buf; BEGIN(str); 1653 1654<str>\e" { /* saw closing quote - all done */ 1655 BEGIN(INITIAL); 1656 *string_buf_ptr = '\e0'; 1657 /* 1658 * return string constant token type and 1659 * value to parser 1660 */ 1661} 1662 1663<str>\en { 1664 /* error - unterminated string constant */ 1665 /* generate error message */ 1666} 1667 1668<str>\e\e[0-7]{1,3} { 1669 /* octal escape sequence */ 1670 int result; 1671 1672 (void) sscanf(yytext + 1, "%o", &result); 1673 1674 if (result > 0xff) { 1675 /* error, constant is out-of-bounds */ 1676 } else 1677 *string_buf_ptr++ = result; 1678} 1679 1680<str>\e\e[0-9]+ { 1681 /* 1682 * generate error - bad escape sequence; something 1683 * like '\e48' or '\e0777777' 1684 */ 1685} 1686 1687<str>\e\en *string_buf_ptr++ = '\en'; 1688<str>\e\et *string_buf_ptr++ = '\et'; 1689<str>\e\er *string_buf_ptr++ = '\er'; 1690<str>\e\eb *string_buf_ptr++ = '\eb'; 1691<str>\e\ef *string_buf_ptr++ = '\ef'; 1692 1693<str>\e\e(.|\en) *string_buf_ptr++ = yytext[1]; 1694 1695<str>[^\e\e\en\e"]+ { 1696 char *yptr = yytext; 1697 1698 while (*yptr) 1699 *string_buf_ptr++ = *yptr++; 1700} 1701.Ed 1702.Pp 1703Often, such as in some of the examples above, 1704a whole bunch of rules are all preceded by the same start condition(s). 1705.Nm 1706makes this a little easier and cleaner by introducing a notion of 1707start condition 1708.Em scope . 1709A start condition scope is begun with: 1710.Pp 1711.Dl <SCs>{ 1712.Pp 1713where 1714.Dq SCs 1715is a list of one or more start conditions. 1716Inside the start condition scope, every rule automatically has the prefix 1717.Aq SCs 1718applied to it, until a 1719.Sq } 1720which matches the initial 1721.Sq { . 1722So, for example, 1723.Bd -literal -offset indent 1724<ESC>{ 1725 "\e\en" return '\en'; 1726 "\e\er" return '\er'; 1727 "\e\ef" return '\ef'; 1728 "\e\e0" return '\e0'; 1729} 1730.Ed 1731.Pp 1732is equivalent to: 1733.Bd -literal -offset indent 1734<ESC>"\e\en" return '\en'; 1735<ESC>"\e\er" return '\er'; 1736<ESC>"\e\ef" return '\ef'; 1737<ESC>"\e\e0" return '\e0'; 1738.Ed 1739.Pp 1740Start condition scopes may be nested. 1741.Pp 1742Three routines are available for manipulating stacks of start conditions: 1743.Bl -tag -width Ds 1744.It void yy_push_state(int new_state) 1745Pushes the current start condition onto the top of the start condition 1746stack and switches to 1747.Fa new_state 1748as though 1749.Dq BEGIN new_state 1750had been used 1751.Pq recall that start condition names are also integers . 1752.It void yy_pop_state() 1753Pops the top of the stack and switches to it via 1754.Em BEGIN . 1755.It int yy_top_state() 1756Returns the top of the stack without altering the stack's contents. 1757.El 1758.Pp 1759The start condition stack grows dynamically and so has no built-in 1760size limitation. 1761If memory is exhausted, program execution aborts. 1762.Pp 1763To use start condition stacks, scanners must include a 1764.Dq %option stack 1765directive (see 1766.Sx OPTIONS 1767below). 1768.Sh MULTIPLE INPUT BUFFERS 1769Some scanners 1770(such as those which support 1771.Qq include 1772files) 1773require reading from several input streams. 1774As 1775.Nm 1776scanners do a large amount of buffering, one cannot control 1777where the next input will be read from by simply writing a 1778.Dv YY_INPUT 1779which is sensitive to the scanning context. 1780.Dv YY_INPUT 1781is only called when the scanner reaches the end of its buffer, which 1782may be a long time after scanning a statement such as an 1783.Qq include 1784which requires switching the input source. 1785.Pp 1786To negotiate these sorts of problems, 1787.Nm 1788provides a mechanism for creating and switching between multiple 1789input buffers. 1790An input buffer is created by using: 1791.Pp 1792.D1 YY_BUFFER_STATE yy_create_buffer(FILE *file, int size) 1793.Pp 1794which takes a 1795.Fa FILE 1796pointer and a 1797.Fa size 1798and creates a buffer associated with the given file and large enough to hold 1799.Fa size 1800characters (when in doubt, use 1801.Dv YY_BUF_SIZE 1802for the size). 1803It returns a 1804.Dv YY_BUFFER_STATE 1805handle, which may then be passed to other routines 1806.Pq see below . 1807The 1808.Dv YY_BUFFER_STATE 1809type is a pointer to an opaque 1810.Dq struct yy_buffer_state 1811structure, so 1812.Dv YY_BUFFER_STATE 1813variables may be safely initialized to 1814.Dq ((YY_BUFFER_STATE) 0) 1815if desired, and the opaque structure can also be referred to in order to 1816correctly declare input buffers in source files other than that of scanners. 1817Note that the 1818.Fa FILE 1819pointer in the call to 1820.Fn yy_create_buffer 1821is only used as the value of 1822.Fa yyin 1823seen by 1824.Dv YY_INPUT ; 1825if 1826.Dv YY_INPUT 1827is redefined so that it no longer uses 1828.Fa yyin , 1829then a nil 1830.Fa FILE 1831pointer can safely be passed to 1832.Fn yy_create_buffer . 1833To select a particular buffer to scan: 1834.Pp 1835.D1 void yy_switch_to_buffer(YY_BUFFER_STATE new_buffer) 1836.Pp 1837It switches the scanner's input buffer so subsequent tokens will 1838come from 1839.Fa new_buffer . 1840Note that 1841.Fn yy_switch_to_buffer 1842may be used by 1843.Fn yywrap 1844to set things up for continued scanning, 1845instead of opening a new file and pointing 1846.Fa yyin 1847at it. 1848Note also that switching input sources via either 1849.Fn yy_switch_to_buffer 1850or 1851.Fn yywrap 1852does not change the start condition. 1853.Pp 1854.D1 void yy_delete_buffer(YY_BUFFER_STATE buffer) 1855.Pp 1856is used to reclaim the storage associated with a buffer. 1857.Pf ( Fa buffer 1858can be nil, in which case the routine does nothing.) 1859To clear the current contents of a buffer: 1860.Pp 1861.D1 void yy_flush_buffer(YY_BUFFER_STATE buffer) 1862.Pp 1863This function discards the buffer's contents, 1864so the next time the scanner attempts to match a token from the buffer, 1865it will first fill the buffer anew using 1866.Dv YY_INPUT . 1867.Pp 1868.Fn yy_new_buffer 1869is an alias for 1870.Fn yy_create_buffer , 1871provided for compatibility with the C++ use of 1872.Em new 1873and 1874.Em delete 1875for creating and destroying dynamic objects. 1876.Pp 1877Finally, the 1878.Dv YY_CURRENT_BUFFER 1879macro returns a 1880.Dv YY_BUFFER_STATE 1881handle to the current buffer. 1882.Pp 1883Here is an example of using these features for writing a scanner 1884which expands include files (the 1885.Aq Aq EOF 1886feature is discussed below): 1887.Bd -literal -offset indent 1888/* 1889 * the "incl" state is used for picking up the name 1890 * of an include file 1891 */ 1892%x incl 1893 1894%{ 1895#define MAX_INCLUDE_DEPTH 10 1896YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH]; 1897int include_stack_ptr = 0; 1898%} 1899 1900%% 1901include BEGIN(incl); 1902 1903[a-z]+ ECHO; 1904[^a-z\en]*\en? ECHO; 1905 1906<incl>[ \et]* /* eat the whitespace */ 1907<incl>[^ \et\en]+ { /* got the include file name */ 1908 if (include_stack_ptr >= MAX_INCLUDE_DEPTH) 1909 errx(1, "Includes nested too deeply"); 1910 1911 include_stack[include_stack_ptr++] = 1912 YY_CURRENT_BUFFER; 1913 1914 yyin = fopen(yytext, "r"); 1915 1916 if (yyin == NULL) 1917 err(1, NULL); 1918 1919 yy_switch_to_buffer( 1920 yy_create_buffer(yyin, YY_BUF_SIZE)); 1921 1922 BEGIN(INITIAL); 1923} 1924 1925<<EOF>> { 1926 if (--include_stack_ptr < 0) 1927 yyterminate(); 1928 else { 1929 yy_delete_buffer(YY_CURRENT_BUFFER); 1930 yy_switch_to_buffer( 1931 include_stack[include_stack_ptr]); 1932 } 1933} 1934.Ed 1935.Pp 1936Three routines are available for setting up input buffers for 1937scanning in-memory strings instead of files. 1938All of them create a new input buffer for scanning the string, 1939and return a corresponding 1940.Dv YY_BUFFER_STATE 1941handle (which should be deleted afterwards using 1942.Fn yy_delete_buffer ) . 1943They also switch to the new buffer using 1944.Fn yy_switch_to_buffer , 1945so the next call to 1946.Fn yylex 1947will start scanning the string. 1948.Bl -tag -width Ds 1949.It yy_scan_string(const char *str) 1950Scans a NUL-terminated string. 1951.It yy_scan_bytes(const char *bytes, int len) 1952Scans 1953.Fa len 1954bytes 1955.Pq including possibly NUL's 1956starting at location 1957.Fa bytes . 1958.El 1959.Pp 1960Note that both of these functions create and scan a copy 1961of the string or bytes. 1962(This may be desirable, since 1963.Fn yylex 1964modifies the contents of the buffer it is scanning.) 1965The copy can be avoided by using: 1966.Bl -tag -width Ds 1967.It yy_scan_buffer(char *base, yy_size_t size) 1968Which scans the buffer starting at 1969.Fa base , 1970consisting of 1971.Fa size 1972bytes, the last two bytes of which must be 1973.Dv YY_END_OF_BUFFER_CHAR 1974.Pq ASCII NUL . 1975These last two bytes are not scanned; thus, scanning consists of 1976base[0] through base[size-2], inclusive. 1977.Pp 1978If 1979.Fa base 1980is not set up in this manner 1981(i.e., forget the final two 1982.Dv YY_END_OF_BUFFER_CHAR 1983bytes), then 1984.Fn yy_scan_buffer 1985returns a nil pointer instead of creating a new input buffer. 1986.Pp 1987The type 1988.Fa yy_size_t 1989is an integral type which can be cast to an integer expression 1990reflecting the size of the buffer. 1991.El 1992.Sh END-OF-FILE RULES 1993The special rule 1994.Qq Aq Aq EOF 1995indicates actions which are to be taken when an end-of-file is encountered and 1996.Fn yywrap 1997returns non-zero 1998.Pq i.e., indicates no further files to process . 1999The action must finish by doing one of four things: 2000.Bl -dash 2001.It 2002Assigning 2003.Em yyin 2004to a new input file 2005(in previous versions of 2006.Nm , 2007after doing the assignment, it was necessary to call the special action 2008.Dv YY_NEW_FILE ; 2009this is no longer necessary). 2010.It 2011Executing a 2012.Em return 2013statement. 2014.It 2015Executing the special 2016.Fn yyterminate 2017action. 2018.It 2019Switching to a new buffer using 2020.Fn yy_switch_to_buffer 2021as shown in the example above. 2022.El 2023.Pp 2024.Aq Aq EOF 2025rules may not be used with other patterns; 2026they may only be qualified with a list of start conditions. 2027If an unqualified 2028.Aq Aq EOF 2029rule is given, it applies to all start conditions which do not already have 2030.Aq Aq EOF 2031actions. 2032To specify an 2033.Aq Aq EOF 2034rule for only the initial start condition, use 2035.Pp 2036.Dl <INITIAL><<EOF>> 2037.Pp 2038These rules are useful for catching things like unclosed comments. 2039An example: 2040.Bd -literal -offset indent 2041%x quote 2042%% 2043 2044\&...other rules for dealing with quotes... 2045 2046<quote><<EOF>> { 2047 error("unterminated quote"); 2048 yyterminate(); 2049} 2050<<EOF>> { 2051 if (*++filelist) 2052 yyin = fopen(*filelist, "r"); 2053 else 2054 yyterminate(); 2055} 2056.Ed 2057.Sh MISCELLANEOUS MACROS 2058The macro 2059.Dv YY_USER_ACTION 2060can be defined to provide an action 2061which is always executed prior to the matched rule's action. 2062For example, 2063it could be #define'd to call a routine to convert yytext to lower-case. 2064When 2065.Dv YY_USER_ACTION 2066is invoked, the variable 2067.Fa yy_act 2068gives the number of the matched rule 2069.Pq rules are numbered starting with 1 . 2070For example, to profile how often each rule is matched, 2071the following would do the trick: 2072.Pp 2073.Dl #define YY_USER_ACTION ++ctr[yy_act] 2074.Pp 2075where 2076.Fa ctr 2077is an array to hold the counts for the different rules. 2078Note that the macro 2079.Dv YY_NUM_RULES 2080gives the total number of rules 2081(including the default rule, even if 2082.Fl s 2083is used), 2084so a correct declaration for 2085.Fa ctr 2086is: 2087.Pp 2088.Dl int ctr[YY_NUM_RULES]; 2089.Pp 2090The macro 2091.Dv YY_USER_INIT 2092may be defined to provide an action which is always executed before 2093the first scan 2094.Pq and before the scanner's internal initializations are done . 2095For example, it could be used to call a routine to read 2096in a data table or open a logging file. 2097.Pp 2098The macro 2099.Dv yy_set_interactive(is_interactive) 2100can be used to control whether the current buffer is considered 2101.Em interactive . 2102An interactive buffer is processed more slowly, 2103but must be used when the scanner's input source is indeed 2104interactive to avoid problems due to waiting to fill buffers 2105(see the discussion of the 2106.Fl I 2107flag below). 2108A non-zero value in the macro invocation marks the buffer as interactive, 2109a zero value as non-interactive. 2110Note that use of this macro overrides 2111.Dq %option always-interactive 2112or 2113.Dq %option never-interactive 2114(see 2115.Sx OPTIONS 2116below). 2117.Fn yy_set_interactive 2118must be invoked prior to beginning to scan the buffer that is 2119.Pq or is not 2120to be considered interactive. 2121.Pp 2122The macro 2123.Dv yy_set_bol(at_bol) 2124can be used to control whether the current buffer's scanning 2125context for the next token match is done as though at the 2126beginning of a line. 2127A non-zero macro argument makes rules anchored with 2128.Sq ^ 2129active, while a zero argument makes 2130.Sq ^ 2131rules inactive. 2132.Pp 2133The macro 2134.Dv YY_AT_BOL 2135returns true if the next token scanned from the current buffer will have 2136.Sq ^ 2137rules active, false otherwise. 2138.Pp 2139In the generated scanner, the actions are all gathered in one large 2140switch statement and separated using 2141.Dv YY_BREAK , 2142which may be redefined. 2143By default, it is simply a 2144.Qq break , 2145to separate each rule's action from the following rules. 2146Redefining 2147.Dv YY_BREAK 2148allows, for example, C++ users to 2149.Dq #define YY_BREAK 2150to do nothing 2151(while being very careful that every rule ends with a 2152.Qq break 2153or a 2154.Qq return ! ) 2155to avoid suffering from unreachable statement warnings where because a rule's 2156action ends with 2157.Dq return , 2158the 2159.Dv YY_BREAK 2160is inaccessible. 2161.Sh VALUES AVAILABLE TO THE USER 2162This section summarizes the various values available to the user 2163in the rule actions. 2164.Bl -tag -width Ds 2165.It char *yytext 2166Holds the text of the current token. 2167It may be modified but not lengthened 2168.Pq characters cannot be appended to the end . 2169.Pp 2170If the special directive 2171.Dq %array 2172appears in the first section of the scanner description, then 2173.Fa yytext 2174is instead declared 2175.Dq char yytext[YYLMAX] , 2176where 2177.Dv YYLMAX 2178is a macro definition that can be redefined in the first section 2179to change the default value 2180.Pq generally 8KB . 2181Using 2182.Dq %array 2183results in somewhat slower scanners, but the value of 2184.Fa yytext 2185becomes immune to calls to 2186.Fn input 2187and 2188.Fn unput , 2189which potentially destroy its value when 2190.Fa yytext 2191is a character pointer. 2192The opposite of 2193.Dq %array 2194is 2195.Dq %pointer , 2196which is the default. 2197.Pp 2198.Dq %array 2199cannot be used when generating C++ scanner classes 2200(the 2201.Fl + 2202flag). 2203.It int yyleng 2204Holds the length of the current token. 2205.It FILE *yyin 2206Is the file which by default 2207.Nm 2208reads from. 2209It may be redefined, but doing so only makes sense before 2210scanning begins or after an 2211.Dv EOF 2212has been encountered. 2213Changing it in the midst of scanning will have unexpected results since 2214.Nm 2215buffers its input; use 2216.Fn yyrestart 2217instead. 2218Once scanning terminates because an end-of-file 2219has been seen, 2220.Fa yyin 2221can be assigned as the new input file 2222and the scanner can be called again to continue scanning. 2223.It void yyrestart(FILE *new_file) 2224May be called to point 2225.Fa yyin 2226at the new input file. 2227The switch-over to the new file is immediate 2228.Pq any previously buffered-up input is lost . 2229Note that calling 2230.Fn yyrestart 2231with 2232.Fa yyin 2233as an argument thus throws away the current input buffer and continues 2234scanning the same input file. 2235.It FILE *yyout 2236Is the file to which 2237.Em ECHO 2238actions are done. 2239It can be reassigned by the user. 2240.It YY_CURRENT_BUFFER 2241Returns a 2242.Dv YY_BUFFER_STATE 2243handle to the current buffer. 2244.It YY_START 2245Returns an integer value corresponding to the current start condition. 2246This value can subsequently be used with 2247.Em BEGIN 2248to return to that start condition. 2249.El 2250.Sh INTERFACING WITH YACC 2251One of the main uses of 2252.Nm 2253is as a companion to the 2254.Xr yacc 1 2255parser-generator. 2256yacc parsers expect to call a routine named 2257.Fn yylex 2258to find the next input token. 2259The routine is supposed to return the type of the next token 2260as well as putting any associated value in the global 2261.Fa yylval , 2262which is defined externally, 2263and can be a union or any other complex data structure. 2264To use 2265.Nm 2266with yacc, one specifies the 2267.Fl d 2268option to yacc to instruct it to generate the file 2269.Pa y.tab.h 2270containing definitions of all the 2271.Dq %tokens 2272appearing in the yacc input. 2273This file is then included in the 2274.Nm 2275scanner. 2276For example, if one of the tokens is 2277.Qq TOK_NUMBER , 2278part of the scanner might look like: 2279.Bd -literal -offset indent 2280%{ 2281#include "y.tab.h" 2282%} 2283 2284%% 2285 2286[0-9]+ yylval = atoi(yytext); return TOK_NUMBER; 2287.Ed 2288.Sh OPTIONS 2289.Nm 2290has the following options: 2291.Bl -tag -width Ds 2292.It Fl 7 2293Instructs 2294.Nm 2295to generate a 7-bit scanner, i.e., one which can only recognize 7-bit 2296characters in its input. 2297The advantage of using 2298.Fl 7 2299is that the scanner's tables can be up to half the size of those generated 2300using the 2301.Fl 8 2302option 2303.Pq see below . 2304The disadvantage is that such scanners often hang 2305or crash if their input contains an 8-bit character. 2306.Pp 2307Note, however, that unless generating a scanner using the 2308.Fl Cf 2309or 2310.Fl CF 2311table compression options, use of 2312.Fl 7 2313will save only a small amount of table space, 2314and make the scanner considerably less portable. 2315.Nm flex Ns 's 2316default behavior is to generate an 8-bit scanner unless 2317.Fl Cf 2318or 2319.Fl CF 2320is specified, in which case 2321.Nm 2322defaults to generating 7-bit scanners unless it was 2323configured to generate 8-bit scanners 2324(as will often be the case with non-USA sites). 2325It is possible tell whether 2326.Nm 2327generated a 7-bit or an 8-bit scanner by inspecting the flag summary in the 2328.Fl v 2329output as described below. 2330.Pp 2331Note that if 2332.Fl Cfe 2333or 2334.Fl CFe 2335are used 2336(the table compression options, but also using equivalence classes as 2337discussed below), 2338.Nm 2339still defaults to generating an 8-bit scanner, 2340since usually with these compression options full 8-bit tables 2341are not much more expensive than 7-bit tables. 2342.It Fl 8 2343Instructs 2344.Nm 2345to generate an 8-bit scanner, i.e., one which can recognize 8-bit 2346characters. 2347This flag is only needed for scanners generated using 2348.Fl Cf 2349or 2350.Fl CF , 2351as otherwise 2352.Nm 2353defaults to generating an 8-bit scanner anyway. 2354.Pp 2355See the discussion of 2356.Fl 7 2357above for 2358.Nm flex Ns 's 2359default behavior and the tradeoffs between 7-bit and 8-bit scanners. 2360.It Fl B 2361Instructs 2362.Nm 2363to generate a 2364.Em batch 2365scanner, the opposite of 2366.Em interactive 2367scanners generated by 2368.Fl I 2369.Pq see below . 2370In general, 2371.Fl B 2372is used when the scanner will never be used interactively, 2373and you want to squeeze a little more performance out of it. 2374If the aim is instead to squeeze out a lot more performance, 2375use the 2376.Fl Cf 2377or 2378.Fl CF 2379options 2380.Pq discussed below , 2381which turn on 2382.Fl B 2383automatically anyway. 2384.It Fl b 2385Generate backing-up information to 2386.Pa lex.backup . 2387This is a list of scanner states which require backing up 2388and the input characters on which they do so. 2389By adding rules one can remove backing-up states. 2390If all backing-up states are eliminated and 2391.Fl Cf 2392or 2393.Fl CF 2394is used, the generated scanner will run faster (see the 2395.Fl p 2396flag). 2397Only users who wish to squeeze every last cycle out of their 2398scanners need worry about this option. 2399(See the section on 2400.Sx PERFORMANCE CONSIDERATIONS 2401below.) 2402.It Fl C Ns Op Cm aeFfmr 2403Controls the degree of table compression and, more generally, trade-offs 2404between small scanners and fast scanners. 2405.Bl -tag -width Ds 2406.It Fl Ca 2407Instructs 2408.Nm 2409to trade off larger tables in the generated scanner for faster performance 2410because the elements of the tables are better aligned for memory access 2411and computation. 2412On some 2413.Tn RISC 2414architectures, fetching and manipulating longwords is more efficient 2415than with smaller-sized units such as shortwords. 2416This option can double the size of the tables used by the scanner. 2417.It Fl Ce 2418Directs 2419.Nm 2420to construct 2421.Em equivalence classes , 2422i.e., sets of characters which have identical lexical properties 2423(for example, if the only appearance of digits in the 2424.Nm 2425input is in the character class 2426.Qq [0-9] 2427then the digits 2428.Sq 0 , 2429.Sq 1 , 2430.Sq ... , 2431.Sq 9 2432will all be put in the same equivalence class). 2433Equivalence classes usually give dramatic reductions in the final 2434table/object file sizes 2435.Pq typically a factor of 2\-5 2436and are pretty cheap performance-wise 2437.Pq one array look-up per character scanned . 2438.It Fl CF 2439Specifies that the alternate fast scanner representation 2440(described below under the 2441.Fl F 2442option) 2443should be used. 2444This option cannot be used with 2445.Fl + . 2446.It Fl Cf 2447Specifies that the 2448.Em full 2449scanner tables should be generated \- 2450.Nm 2451should not compress the tables by taking advantage of 2452similar transition functions for different states. 2453.It Fl \&Cm 2454Directs 2455.Nm 2456to construct 2457.Em meta-equivalence classes , 2458which are sets of equivalence classes 2459(or characters, if equivalence classes are not being used) 2460that are commonly used together. 2461Meta-equivalence classes are often a big win when using compressed tables, 2462but they have a moderate performance impact 2463(one or two 2464.Qq if 2465tests and one array look-up per character scanned). 2466.It Fl Cr 2467Causes the generated scanner to 2468.Em bypass 2469use of the standard I/O library 2470.Pq stdio 2471for input. 2472Instead of calling 2473.Xr fread 3 2474or 2475.Xr getc 3 , 2476the scanner will use the 2477.Xr read 2 2478system call, 2479resulting in a performance gain which varies from system to system, 2480but in general is probably negligible unless 2481.Fl Cf 2482or 2483.Fl CF 2484are being used. 2485Using 2486.Fl Cr 2487can cause strange behavior if, for example, reading from 2488.Fa yyin 2489using stdio prior to calling the scanner 2490(because the scanner will miss whatever text previous reads left 2491in the stdio input buffer). 2492.Pp 2493.Fl Cr 2494has no effect if 2495.Dv YY_INPUT 2496is defined 2497(see 2498.Sx THE GENERATED SCANNER 2499above). 2500.El 2501.Pp 2502A lone 2503.Fl C 2504specifies that the scanner tables should be compressed but neither 2505equivalence classes nor meta-equivalence classes should be used. 2506.Pp 2507The options 2508.Fl Cf 2509or 2510.Fl CF 2511and 2512.Fl \&Cm 2513do not make sense together \- there is no opportunity for meta-equivalence 2514classes if the table is not being compressed. 2515Otherwise the options may be freely mixed, and are cumulative. 2516.Pp 2517The default setting is 2518.Fl Cem 2519which specifies that 2520.Nm 2521should generate equivalence classes and meta-equivalence classes. 2522This setting provides the highest degree of table compression. 2523It is possible to trade off faster-executing scanners at the cost of 2524larger tables with the following generally being true: 2525.Bd -unfilled -offset indent 2526slowest & smallest 2527 -Cem 2528 -Cm 2529 -Ce 2530 -C 2531 -C{f,F}e 2532 -C{f,F} 2533 -C{f,F}a 2534fastest & largest 2535.Ed 2536.Pp 2537Note that scanners with the smallest tables are usually generated and 2538compiled the quickest, 2539so during development the default is usually best, 2540maximal compression. 2541.Pp 2542.Fl Cfe 2543is often a good compromise between speed and size for production scanners. 2544.It Fl d 2545Makes the generated scanner run in debug mode. 2546Whenever a pattern is recognized and the global 2547.Fa yy_flex_debug 2548is non-zero 2549.Pq which is the default , 2550the scanner will write to stderr a line of the form: 2551.Pp 2552.D1 --accepting rule at line 53 ("the matched text") 2553.Pp 2554The line number refers to the location of the rule in the file 2555defining the scanner 2556(i.e., the file that was fed to 2557.Nm ) . 2558Messages are also generated when the scanner backs up, 2559accepts the default rule, 2560reaches the end of its input buffer 2561(or encounters a NUL; 2562at this point, the two look the same as far as the scanner's concerned), 2563or reaches an end-of-file. 2564.It Fl F 2565Specifies that the fast scanner table representation should be used 2566.Pq and stdio bypassed . 2567This representation is about as fast as the full table representation 2568.Pq Fl f , 2569and for some sets of patterns will be considerably smaller 2570.Pq and for others, larger . 2571In general, if the pattern set contains both 2572.Qq keywords 2573and a catch-all, 2574.Qq identifier 2575rule, such as in the set: 2576.Bd -unfilled -offset indent 2577"case" return TOK_CASE; 2578"switch" return TOK_SWITCH; 2579\&... 2580"default" return TOK_DEFAULT; 2581[a-z]+ return TOK_ID; 2582.Ed 2583.Pp 2584then it's better to use the full table representation. 2585If only the 2586.Qq identifier 2587rule is present and a hash table or some such is used to detect the keywords, 2588it's better to use 2589.Fl F . 2590.Pp 2591This option is equivalent to 2592.Fl CFr 2593.Pq see above . 2594It cannot be used with 2595.Fl + . 2596.It Fl f 2597Specifies 2598.Em fast scanner . 2599No table compression is done and stdio is bypassed. 2600The result is large but fast. 2601This option is equivalent to 2602.Fl Cfr 2603.Pq see above . 2604.It Fl h 2605Generates a help summary of 2606.Nm flex Ns 's 2607options to stdout and then exits. 2608.Fl ?\& 2609and 2610.Fl Fl help 2611are synonyms for 2612.Fl h . 2613.It Fl I 2614Instructs 2615.Nm 2616to generate an 2617.Em interactive 2618scanner. 2619An interactive scanner is one that only looks ahead to decide 2620what token has been matched if it absolutely must. 2621It turns out that always looking one extra character ahead, 2622even if the scanner has already seen enough text 2623to disambiguate the current token, is a bit faster than 2624only looking ahead when necessary. 2625But scanners that always look ahead give dreadful interactive performance; 2626for example, when a user types a newline, 2627it is not recognized as a newline token until they enter 2628.Em another 2629token, which often means typing in another whole line. 2630.Pp 2631.Nm 2632scanners default to 2633.Em interactive 2634unless 2635.Fl Cf 2636or 2637.Fl CF 2638table-compression options are specified 2639.Pq see above . 2640That's because if high-performance is most important, 2641one of these options should be used, 2642so if they weren't, 2643.Nm 2644assumes it is preferable to trade off a bit of run-time performance for 2645intuitive interactive behavior. 2646Note also that 2647.Fl I 2648cannot be used in conjunction with 2649.Fl Cf 2650or 2651.Fl CF . 2652Thus, this option is not really needed; it is on by default for all those 2653cases in which it is allowed. 2654.Pp 2655A scanner can be forced to not be interactive by using 2656.Fl B 2657.Pq see above . 2658.It Fl i 2659Instructs 2660.Nm 2661to generate a case-insensitive scanner. 2662The case of letters given in the 2663.Nm 2664input patterns will be ignored, 2665and tokens in the input will be matched regardless of case. 2666The matched text given in 2667.Fa yytext 2668will have the preserved case 2669.Pq i.e., it will not be folded . 2670.It Fl L 2671Instructs 2672.Nm 2673not to generate 2674.Dq #line 2675directives. 2676Without this option, 2677.Nm 2678peppers the generated scanner with #line directives so error messages 2679in the actions will be correctly located with respect to either the original 2680.Nm 2681input file 2682(if the errors are due to code in the input file), 2683or 2684.Pa lex.yy.c 2685(if the errors are 2686.Nm flex Ns 's 2687fault \- these sorts of errors should be reported to the email address 2688given below). 2689.It Fl l 2690Turns on maximum compatibility with the original AT&T 2691.Nm lex 2692implementation. 2693Note that this does not mean full compatibility. 2694Use of this option costs a considerable amount of performance, 2695and it cannot be used with the 2696.Fl + , f , F , Cf , 2697or 2698.Fl CF 2699options. 2700For details on the compatibilities it provides, see the section 2701.Sx INCOMPATIBILITIES WITH LEX AND POSIX 2702below. 2703This option also results in the name 2704.Dv YY_FLEX_LEX_COMPAT 2705being #define'd in the generated scanner. 2706.It Fl n 2707Another do-nothing, deprecated option included only for 2708.Tn POSIX 2709compliance. 2710.It Fl o Ns Ar output 2711Directs 2712.Nm 2713to write the scanner to the file 2714.Ar output 2715instead of 2716.Pa lex.yy.c . 2717If 2718.Fl o 2719is combined with the 2720.Fl t 2721option, then the scanner is written to stdout but its 2722.Dq #line 2723directives 2724(see the 2725.Fl L 2726option above) 2727refer to the file 2728.Ar output . 2729.It Fl P Ns Ar prefix 2730Changes the default 2731.Qq yy 2732prefix used by 2733.Nm 2734for all globally visible variable and function names to instead be 2735.Ar prefix . 2736For example, 2737.Fl P Ns Ar foo 2738changes the name of 2739.Fa yytext 2740to 2741.Fa footext . 2742It also changes the name of the default output file from 2743.Pa lex.yy.c 2744to 2745.Pa lex.foo.c . 2746Here are all of the names affected: 2747.Bd -unfilled -offset indent 2748yy_create_buffer 2749yy_delete_buffer 2750yy_flex_debug 2751yy_init_buffer 2752yy_flush_buffer 2753yy_load_buffer_state 2754yy_switch_to_buffer 2755yyin 2756yyleng 2757yylex 2758yylineno 2759yyout 2760yyrestart 2761yytext 2762yywrap 2763.Ed 2764.Pp 2765(If using a C++ scanner, then only 2766.Fa yywrap 2767and 2768.Fa yyFlexLexer 2769are affected.) 2770Within the scanner itself, it is still possible to refer to the global variables 2771and functions using either version of their name; but externally, they 2772have the modified name. 2773.Pp 2774This option allows multiple 2775.Nm 2776programs to be easily linked together into the same executable. 2777Note, though, that using this option also renames 2778.Fn yywrap , 2779so now either an 2780.Pq appropriately named 2781version of the routine for the scanner must be supplied, or 2782.Dq %option noyywrap 2783must be used, as linking with 2784.Fl lfl 2785no longer provides one by default. 2786.It Fl p 2787Generates a performance report to stderr. 2788The report consists of comments regarding features of the 2789.Nm 2790input file which will cause a serious loss of performance in the resulting 2791scanner. 2792If the flag is specified twice, 2793comments regarding features that lead to minor performance losses 2794will also be reported> 2795.Pp 2796Note that the use of 2797.Em REJECT , 2798.Dq %option yylineno , 2799and variable trailing context 2800(see the 2801.Sx BUGS 2802section below) 2803entails a substantial performance penalty; use of 2804.Fn yymore , 2805the 2806.Sq ^ 2807operator, and the 2808.Fl I 2809flag entail minor performance penalties. 2810.It Fl S Ns Ar skeleton 2811Overrides the default skeleton file from which 2812.Nm 2813constructs its scanners. 2814This option is needed only for 2815.Nm 2816maintenance or development. 2817.It Fl s 2818Causes the default rule 2819.Pq that unmatched scanner input is echoed to stdout 2820to be suppressed. 2821If the scanner encounters input that does not 2822match any of its rules, it aborts with an error. 2823This option is useful for finding holes in a scanner's rule set. 2824.It Fl T 2825Makes 2826.Nm 2827run in 2828.Em trace 2829mode. 2830It will generate a lot of messages to stderr concerning 2831the form of the input and the resultant non-deterministic and deterministic 2832finite automata. 2833This option is mostly for use in maintaining 2834.Nm . 2835.It Fl t 2836Instructs 2837.Nm 2838to write the scanner it generates to standard output instead of 2839.Pa lex.yy.c . 2840.It Fl V 2841Prints the version number to stdout and exits. 2842.Fl Fl version 2843is a synonym for 2844.Fl V . 2845.It Fl v 2846Specifies that 2847.Nm 2848should write to stderr 2849a summary of statistics regarding the scanner it generates. 2850Most of the statistics are meaningless to the casual 2851.Nm 2852user, but the first line identifies the version of 2853.Nm 2854(same as reported by 2855.Fl V ) , 2856and the next line the flags used when generating the scanner, 2857including those that are on by default. 2858.It Fl w 2859Suppresses warning messages. 2860.It Fl + 2861Specifies that 2862.Nm 2863should generate a C++ scanner class. 2864See the section on 2865.Sx GENERATING C++ SCANNERS 2866below for details. 2867.El 2868.Pp 2869.Nm 2870also provides a mechanism for controlling options within the 2871scanner specification itself, rather than from the 2872.Nm 2873command line. 2874This is done by including 2875.Dq %option 2876directives in the first section of the scanner specification. 2877Multiple options can be specified with a single 2878.Dq %option 2879directive, and multiple directives in the first section of the 2880.Nm 2881input file. 2882.Pp 2883Most options are given simply as names, optionally preceded by the word 2884.Qq no 2885.Pq with no intervening whitespace 2886to negate their meaning. 2887A number are equivalent to 2888.Nm 2889flags or their negation: 2890.Bd -unfilled -offset indent 28917bit -7 option 28928bit -8 option 2893align -Ca option 2894backup -b option 2895batch -B option 2896c++ -+ option 2897 2898caseful or 2899case-sensitive opposite of -i (default) 2900 2901case-insensitive or 2902caseless -i option 2903 2904debug -d option 2905default opposite of -s option 2906ecs -Ce option 2907fast -F option 2908full -f option 2909interactive -I option 2910lex-compat -l option 2911meta-ecs -Cm option 2912perf-report -p option 2913read -Cr option 2914stdout -t option 2915verbose -v option 2916warn opposite of -w option 2917 (use "%option nowarn" for -w) 2918 2919array equivalent to "%array" 2920pointer equivalent to "%pointer" (default) 2921.Ed 2922.Pp 2923Some %option's provide features otherwise not available: 2924.Bl -tag -width Ds 2925.It always-interactive 2926Instructs 2927.Nm 2928to generate a scanner which always considers its input 2929.Qq interactive . 2930Normally, on each new input file the scanner calls 2931.Fn isatty 2932in an attempt to determine whether the scanner's input source is interactive 2933and thus should be read a character at a time. 2934When this option is used, however, no such call is made. 2935.It main 2936Directs 2937.Nm 2938to provide a default 2939.Fn main 2940program for the scanner, which simply calls 2941.Fn yylex . 2942This option implies 2943.Dq noyywrap 2944.Pq see below . 2945.It never-interactive 2946Instructs 2947.Nm 2948to generate a scanner which never considers its input 2949.Qq interactive 2950(again, no call made to 2951.Fn isatty ) . 2952This is the opposite of 2953.Dq always-interactive . 2954.It stack 2955Enables the use of start condition stacks 2956(see 2957.Sx START CONDITIONS 2958above). 2959.It stdinit 2960If set (i.e., 2961.Dq %option stdinit ) , 2962initializes 2963.Fa yyin 2964and 2965.Fa yyout 2966to stdin and stdout, instead of the default of 2967.Dq nil . 2968Some existing 2969.Nm lex 2970programs depend on this behavior, even though it is not compliant with ANSI C, 2971which does not require stdin and stdout to be compile-time constant. 2972.It yylineno 2973Directs 2974.Nm 2975to generate a scanner that maintains the number of the current line 2976read from its input in the global variable 2977.Fa yylineno . 2978This option is implied by 2979.Dq %option lex-compat . 2980.It yywrap 2981If unset (i.e., 2982.Dq %option noyywrap ) , 2983makes the scanner not call 2984.Fn yywrap 2985upon an end-of-file, but simply assume that there are no more files to scan 2986(until the user points 2987.Fa yyin 2988at a new file and calls 2989.Fn yylex 2990again). 2991.El 2992.Pp 2993.Nm 2994scans rule actions to determine whether the 2995.Em REJECT 2996or 2997.Fn yymore 2998features are being used. 2999The 3000.Dq reject 3001and 3002.Dq yymore 3003options are available to override its decision as to whether to use the 3004options, either by setting them (e.g., 3005.Dq %option reject ) 3006to indicate the feature is indeed used, 3007or unsetting them to indicate it actually is not used 3008(e.g., 3009.Dq %option noyymore ) . 3010.Pp 3011Three options take string-delimited values, offset with 3012.Sq = : 3013.Pp 3014.D1 %option outfile="ABC" 3015.Pp 3016is equivalent to 3017.Fl o Ns Ar ABC , 3018and 3019.Pp 3020.D1 %option prefix="XYZ" 3021.Pp 3022is equivalent to 3023.Fl P Ns Ar XYZ . 3024Finally, 3025.Pp 3026.D1 %option yyclass="foo" 3027.Pp 3028only applies when generating a C++ scanner 3029.Pf ( Fl + 3030option). 3031It informs 3032.Nm 3033that 3034.Dq foo 3035has been derived as a subclass of yyFlexLexer, so 3036.Nm 3037will place actions in the member function 3038.Dq foo::yylex() 3039instead of 3040.Dq yyFlexLexer::yylex() . 3041It also generates a 3042.Dq yyFlexLexer::yylex() 3043member function that emits a run-time error (by invoking 3044.Dq yyFlexLexer::LexerError() ) 3045if called. 3046See 3047.Sx GENERATING C++ SCANNERS , 3048below, for additional information. 3049.Pp 3050A number of options are available for 3051lint 3052purists who want to suppress the appearance of unneeded routines 3053in the generated scanner. 3054Each of the following, if unset 3055(e.g., 3056.Dq %option nounput ) , 3057results in the corresponding routine not appearing in the generated scanner: 3058.Bd -unfilled -offset indent 3059input, unput 3060yy_push_state, yy_pop_state, yy_top_state 3061yy_scan_buffer, yy_scan_bytes, yy_scan_string 3062.Ed 3063.Pp 3064(though 3065.Fn yy_push_state 3066and friends won't appear anyway unless 3067.Dq %option stack 3068is being used). 3069.Sh PERFORMANCE CONSIDERATIONS 3070The main design goal of 3071.Nm 3072is that it generate high-performance scanners. 3073It has been optimized for dealing well with large sets of rules. 3074Aside from the effects on scanner speed of the table compression 3075.Fl C 3076options outlined above, 3077there are a number of options/actions which degrade performance. 3078These are, from most expensive to least: 3079.Bd -unfilled -offset indent 3080REJECT 3081%option yylineno 3082arbitrary trailing context 3083 3084pattern sets that require backing up 3085%array 3086%option interactive 3087%option always-interactive 3088 3089\&'^' beginning-of-line operator 3090yymore() 3091.Ed 3092.Pp 3093with the first three all being quite expensive 3094and the last two being quite cheap. 3095Note also that 3096.Fn unput 3097is implemented as a routine call that potentially does quite a bit of work, 3098while 3099.Fn yyless 3100is a quite-cheap macro; so if just putting back some excess text, 3101use 3102.Fn yyless . 3103.Pp 3104.Em REJECT 3105should be avoided at all costs when performance is important. 3106It is a particularly expensive option. 3107.Pp 3108Getting rid of backing up is messy and often may be an enormous 3109amount of work for a complicated scanner. 3110In principal, one begins by using the 3111.Fl b 3112flag to generate a 3113.Pa lex.backup 3114file. 3115For example, on the input 3116.Bd -literal -offset indent 3117%% 3118foo return TOK_KEYWORD; 3119foobar return TOK_KEYWORD; 3120.Ed 3121.Pp 3122the file looks like: 3123.Bd -literal -offset indent 3124State #6 is non-accepting - 3125 associated rule line numbers: 3126 2 3 3127 out-transitions: [ o ] 3128 jam-transitions: EOF [ \e001-n p-\e177 ] 3129 3130State #8 is non-accepting - 3131 associated rule line numbers: 3132 3 3133 out-transitions: [ a ] 3134 jam-transitions: EOF [ \e001-` b-\e177 ] 3135 3136State #9 is non-accepting - 3137 associated rule line numbers: 3138 3 3139 out-transitions: [ r ] 3140 jam-transitions: EOF [ \e001-q s-\e177 ] 3141 3142Compressed tables always back up. 3143.Ed 3144.Pp 3145The first few lines tell us that there's a scanner state in 3146which it can make a transition on an 3147.Sq o 3148but not on any other character, 3149and that in that state the currently scanned text does not match any rule. 3150The state occurs when trying to match the rules found 3151at lines 2 and 3 in the input file. 3152If the scanner is in that state and then reads something other than an 3153.Sq o , 3154it will have to back up to find a rule which is matched. 3155With a bit of headscratching one can see that this must be the 3156state it's in when it has seen 3157.Sq fo . 3158When this has happened, if anything other than another 3159.Sq o 3160is seen, the scanner will have to back up to simply match the 3161.Sq f 3162.Pq by the default rule . 3163.Pp 3164The comment regarding State #8 indicates there's a problem when 3165.Qq foob 3166has been scanned. 3167Indeed, on any character other than an 3168.Sq a , 3169the scanner will have to back up to accept 3170.Qq foo . 3171Similarly, the comment for State #9 concerns when 3172.Qq fooba 3173has been scanned and an 3174.Sq r 3175does not follow. 3176.Pp 3177The final comment reminds us that there's no point going to 3178all the trouble of removing backing up from the rules unless we're using 3179.Fl Cf 3180or 3181.Fl CF , 3182since there's no performance gain doing so with compressed scanners. 3183.Pp 3184The way to remove the backing up is to add 3185.Qq error 3186rules: 3187.Bd -literal -offset indent 3188%% 3189foo return TOK_KEYWORD; 3190foobar return TOK_KEYWORD; 3191 3192fooba | 3193foob | 3194fo { 3195 /* false alarm, not really a keyword */ 3196 return TOK_ID; 3197} 3198.Ed 3199.Pp 3200Eliminating backing up among a list of keywords can also be done using a 3201.Qq catch-all 3202rule: 3203.Bd -literal -offset indent 3204%% 3205foo return TOK_KEYWORD; 3206foobar return TOK_KEYWORD; 3207 3208[a-z]+ return TOK_ID; 3209.Ed 3210.Pp 3211This is usually the best solution when appropriate. 3212.Pp 3213Backing up messages tend to cascade. 3214With a complicated set of rules it's not uncommon to get hundreds of messages. 3215If one can decipher them, though, 3216it often only takes a dozen or so rules to eliminate the backing up 3217(though it's easy to make a mistake and have an error rule accidentally match 3218a valid token; a possible future 3219.Nm 3220feature will be to automatically add rules to eliminate backing up). 3221.Pp 3222It's important to keep in mind that the benefits of eliminating 3223backing up are gained only if 3224.Em every 3225instance of backing up is eliminated. 3226Leaving just one gains nothing. 3227.Pp 3228.Em Variable 3229trailing context 3230(where both the leading and trailing parts do not have a fixed length) 3231entails almost the same performance loss as 3232.Em REJECT 3233.Pq i.e., substantial . 3234So when possible a rule like: 3235.Bd -literal -offset indent 3236%% 3237mouse|rat/(cat|dog) run(); 3238.Ed 3239.Pp 3240is better written: 3241.Bd -literal -offset indent 3242%% 3243mouse/cat|dog run(); 3244rat/cat|dog run(); 3245.Ed 3246.Pp 3247or as 3248.Bd -literal -offset indent 3249%% 3250mouse|rat/cat run(); 3251mouse|rat/dog run(); 3252.Ed 3253.Pp 3254Note that here the special 3255.Sq |\& 3256action does not provide any savings, and can even make things worse (see 3257.Sx BUGS 3258below). 3259.Pp 3260Another area where the user can increase a scanner's performance 3261.Pq and one that's easier to implement 3262arises from the fact that the longer the tokens matched, 3263the faster the scanner will run. 3264This is because with long tokens the processing of most input 3265characters takes place in the 3266.Pq short 3267inner scanning loop, and does not often have to go through the additional work 3268of setting up the scanning environment (e.g., 3269.Fa yytext ) 3270for the action. 3271Recall the scanner for C comments: 3272.Bd -literal -offset indent 3273%x comment 3274%% 3275int line_num = 1; 3276 3277"/*" BEGIN(comment); 3278 3279<comment>[^*\en]* 3280<comment>"*"+[^*/\en]* 3281<comment>\en ++line_num; 3282<comment>"*"+"/" BEGIN(INITIAL); 3283.Ed 3284.Pp 3285This could be sped up by writing it as: 3286.Bd -literal -offset indent 3287%x comment 3288%% 3289int line_num = 1; 3290 3291"/*" BEGIN(comment); 3292 3293<comment>[^*\en]* 3294<comment>[^*\en]*\en ++line_num; 3295<comment>"*"+[^*/\en]* 3296<comment>"*"+[^*/\en]*\en ++line_num; 3297<comment>"*"+"/" BEGIN(INITIAL); 3298.Ed 3299.Pp 3300Now instead of each newline requiring the processing of another action, 3301recognizing the newlines is 3302.Qq distributed 3303over the other rules to keep the matched text as long as possible. 3304Note that adding rules does 3305.Em not 3306slow down the scanner! 3307The speed of the scanner is independent of the number of rules or 3308(modulo the considerations given at the beginning of this section) 3309how complicated the rules are with regard to operators such as 3310.Sq * 3311and 3312.Sq |\& . 3313.Pp 3314A final example in speeding up a scanner: 3315scan through a file containing identifiers and keywords, one per line 3316and with no other extraneous characters, and recognize all the keywords. 3317A natural first approach is: 3318.Bd -literal -offset indent 3319%% 3320asm | 3321auto | 3322break | 3323\&... etc ... 3324volatile | 3325while /* it's a keyword */ 3326 3327\&.|\en /* it's not a keyword */ 3328.Ed 3329.Pp 3330To eliminate the back-tracking, introduce a catch-all rule: 3331.Bd -literal -offset indent 3332%% 3333asm | 3334auto | 3335break | 3336\&... etc ... 3337volatile | 3338while /* it's a keyword */ 3339 3340[a-z]+ | 3341\&.|\en /* it's not a keyword */ 3342.Ed 3343.Pp 3344Now, if it's guaranteed that there's exactly one word per line, 3345then we can reduce the total number of matches by a half by 3346merging in the recognition of newlines with that of the other tokens: 3347.Bd -literal -offset indent 3348%% 3349asm\en | 3350auto\en | 3351break\en | 3352\&... etc ... 3353volatile\en | 3354while\en /* it's a keyword */ 3355 3356[a-z]+\en | 3357\&.|\en /* it's not a keyword */ 3358.Ed 3359.Pp 3360One has to be careful here, 3361as we have now reintroduced backing up into the scanner. 3362In particular, while we know that there will never be any characters 3363in the input stream other than letters or newlines, 3364.Nm 3365can't figure this out, and it will plan for possibly needing to back up 3366when it has scanned a token like 3367.Qq auto 3368and then the next character is something other than a newline or a letter. 3369Previously it would then just match the 3370.Qq auto 3371rule and be done, but now it has no 3372.Qq auto 3373rule, only an 3374.Qq auto\en 3375rule. 3376To eliminate the possibility of backing up, 3377we could either duplicate all rules but without final newlines, or, 3378since we never expect to encounter such an input and therefore don't 3379how it's classified, we can introduce one more catch-all rule, 3380this one which doesn't include a newline: 3381.Bd -literal -offset indent 3382%% 3383asm\en | 3384auto\en | 3385break\en | 3386\&... etc ... 3387volatile\en | 3388while\en /* it's a keyword */ 3389 3390[a-z]+\en | 3391[a-z]+ | 3392\&.|\en /* it's not a keyword */ 3393.Ed 3394.Pp 3395Compiled with 3396.Fl Cf , 3397this is about as fast as one can get a 3398.Nm 3399scanner to go for this particular problem. 3400.Pp 3401A final note: 3402.Nm 3403is slow when matching NUL's, 3404particularly when a token contains multiple NUL's. 3405It's best to write rules which match short 3406amounts of text if it's anticipated that the text will often include NUL's. 3407.Pp 3408Another final note regarding performance: as mentioned above in the section 3409.Sx HOW THE INPUT IS MATCHED , 3410dynamically resizing 3411.Fa yytext 3412to accommodate huge tokens is a slow process because it presently requires that 3413the 3414.Pq huge 3415token be rescanned from the beginning. 3416Thus if performance is vital, it is better to attempt to match 3417.Qq large 3418quantities of text but not 3419.Qq huge 3420quantities, where the cutoff between the two is at about 8K characters/token. 3421.Sh GENERATING C++ SCANNERS 3422.Nm 3423provides two different ways to generate scanners for use with C++. 3424The first way is to simply compile a scanner generated by 3425.Nm 3426using a C++ compiler instead of a C compiler. 3427This should not generate any compilation errors 3428(please report any found to the email address given in the 3429.Sx AUTHORS 3430section below). 3431C++ code can then be used in rule actions instead of C code. 3432Note that the default input source for scanners remains 3433.Fa yyin , 3434and default echoing is still done to 3435.Fa yyout . 3436Both of these remain 3437.Fa FILE * 3438variables and not C++ streams. 3439.Pp 3440.Nm 3441can also be used to generate a C++ scanner class, using the 3442.Fl + 3443option (or, equivalently, 3444.Dq %option c++ ) , 3445which is automatically specified if the name of the flex executable ends in a 3446.Sq + , 3447such as 3448.Nm flex++ . 3449When using this option, 3450.Nm 3451defaults to generating the scanner to the file 3452.Pa lex.yy.cc 3453instead of 3454.Pa lex.yy.c . 3455The generated scanner includes the header file 3456.Aq Pa g++/FlexLexer.h , 3457which defines the interface to two C++ classes. 3458.Pp 3459The first class, 3460.Em FlexLexer , 3461provides an abstract base class defining the general scanner class interface. 3462It provides the following member functions: 3463.Bl -tag -width Ds 3464.It const char* YYText() 3465Returns the text of the most recently matched token, the equivalent of 3466.Fa yytext . 3467.It int YYLeng() 3468Returns the length of the most recently matched token, the equivalent of 3469.Fa yyleng . 3470.It int lineno() const 3471Returns the current input line number 3472(see 3473.Dq %option yylineno ) , 3474or 1 if 3475.Dq %option yylineno 3476was not used. 3477.It void set_debug(int flag) 3478Sets the debugging flag for the scanner, equivalent to assigning to 3479.Fa yy_flex_debug 3480(see the 3481.Sx OPTIONS 3482section above). 3483Note that the scanner must be built using 3484.Dq %option debug 3485to include debugging information in it. 3486.It int debug() const 3487Returns the current setting of the debugging flag. 3488.El 3489.Pp 3490Also provided are member functions equivalent to 3491.Fn yy_switch_to_buffer , 3492.Fn yy_create_buffer 3493(though the first argument is an 3494.Fa std::istream* 3495object pointer and not a 3496.Fa FILE* ) , 3497.Fn yy_flush_buffer , 3498.Fn yy_delete_buffer , 3499and 3500.Fn yyrestart 3501(again, the first argument is an 3502.Fa std::istream* 3503object pointer). 3504.Pp 3505The second class defined in 3506.Aq Pa g++/FlexLexer.h 3507is 3508.Fa yyFlexLexer , 3509which is derived from 3510.Fa FlexLexer . 3511It defines the following additional member functions: 3512.Bl -tag -width Ds 3513.It "yyFlexLexer(std::istream* arg_yyin = 0, std::ostream* arg_yyout = 0)" 3514Constructs a 3515.Fa yyFlexLexer 3516object using the given streams for input and output. 3517If not specified, the streams default to 3518.Fa cin 3519and 3520.Fa cout , 3521respectively. 3522.It virtual int yylex() 3523Performs the same role as 3524.Fn yylex 3525does for ordinary flex scanners: it scans the input stream, consuming 3526tokens, until a rule's action returns a value. 3527If subclass 3528.Sq S 3529is derived from 3530.Fa yyFlexLexer , 3531in order to access the member functions and variables of 3532.Sq S 3533inside 3534.Fn yylex , 3535use 3536.Dq %option yyclass="S" 3537to inform 3538.Nm 3539that the 3540.Sq S 3541subclass will be used instead of 3542.Fa yyFlexLexer . 3543In this case, rather than generating 3544.Dq yyFlexLexer::yylex() , 3545.Nm 3546generates 3547.Dq S::yylex() 3548(and also generates a dummy 3549.Dq yyFlexLexer::yylex() 3550that calls 3551.Dq yyFlexLexer::LexerError() 3552if called). 3553.It "virtual void switch_streams(std::istream* new_in = 0, std::ostream* new_out = 0)" 3554Reassigns 3555.Fa yyin 3556to 3557.Fa new_in 3558.Pq if non-nil 3559and 3560.Fa yyout 3561to 3562.Fa new_out 3563.Pq ditto , 3564deleting the previous input buffer if 3565.Fa yyin 3566is reassigned. 3567.It int yylex(std::istream* new_in, std::ostream* new_out = 0) 3568First switches the input streams via 3569.Dq switch_streams(new_in, new_out) 3570and then returns the value of 3571.Fn yylex . 3572.El 3573.Pp 3574In addition, 3575.Fa yyFlexLexer 3576defines the following protected virtual functions which can be redefined 3577in derived classes to tailor the scanner: 3578.Bl -tag -width Ds 3579.It virtual int LexerInput(char* buf, int max_size) 3580Reads up to 3581.Fa max_size 3582characters into 3583.Fa buf 3584and returns the number of characters read. 3585To indicate end-of-input, return 0 characters. 3586Note that 3587.Qq interactive 3588scanners (see the 3589.Fl B 3590and 3591.Fl I 3592flags) define the macro 3593.Dv YY_INTERACTIVE . 3594If 3595.Fn LexerInput 3596has been redefined, and it's necessary to take different actions depending on 3597whether or not the scanner might be scanning an interactive input source, 3598it's possible to test for the presence of this name via 3599.Dq #ifdef . 3600.It virtual void LexerOutput(const char* buf, int size) 3601Writes out 3602.Fa size 3603characters from the buffer 3604.Fa buf , 3605which, while NUL-terminated, may also contain 3606.Qq internal 3607NUL's if the scanner's rules can match text with NUL's in them. 3608.It virtual void LexerError(const char* msg) 3609Reports a fatal error message. 3610The default version of this function writes the message to the stream 3611.Fa cerr 3612and exits. 3613.El 3614.Pp 3615Note that a 3616.Fa yyFlexLexer 3617object contains its entire scanning state. 3618Thus such objects can be used to create reentrant scanners. 3619Multiple instances of the same 3620.Fa yyFlexLexer 3621class can be instantiated, and multiple C++ scanner classes can be combined 3622in the same program using the 3623.Fl P 3624option discussed above. 3625.Pp 3626Finally, note that the 3627.Dq %array 3628feature is not available to C++ scanner classes; 3629.Dq %pointer 3630must be used 3631.Pq the default . 3632.Pp 3633Here is an example of a simple C++ scanner: 3634.Bd -literal -offset indent 3635// An example of using the flex C++ scanner class. 3636 3637%{ 3638#include <errno.h> 3639int mylineno = 0; 3640%} 3641 3642string \e"[^\en"]+\e" 3643 3644ws [ \et]+ 3645 3646alpha [A-Za-z] 3647dig [0-9] 3648name ({alpha}|{dig}|\e$)({alpha}|{dig}|[_.\e-/$])* 3649num1 [-+]?{dig}+\e.?([eE][-+]?{dig}+)? 3650num2 [-+]?{dig}*\e.{dig}+([eE][-+]?{dig}+)? 3651number {num1}|{num2} 3652 3653%% 3654 3655{ws} /* skip blanks and tabs */ 3656 3657"/*" { 3658 int c; 3659 3660 while ((c = yyinput()) != 0) { 3661 if(c == '\en') 3662 ++mylineno; 3663 else if(c == '*') { 3664 if ((c = yyinput()) == '/') 3665 break; 3666 else 3667 unput(c); 3668 } 3669 } 3670} 3671 3672{number} cout << "number " << YYText() << '\en'; 3673 3674\en mylineno++; 3675 3676{name} cout << "name " << YYText() << '\en'; 3677 3678{string} cout << "string " << YYText() << '\en'; 3679 3680%% 3681 3682int main(int /* argc */, char** /* argv */) 3683{ 3684 FlexLexer* lexer = new yyFlexLexer; 3685 while(lexer->yylex() != 0) 3686 ; 3687 return 0; 3688} 3689.Ed 3690.Pp 3691To create multiple 3692.Pq different 3693lexer classes, use the 3694.Fl P 3695flag 3696(or the 3697.Dq prefix= 3698option) 3699to rename each 3700.Fa yyFlexLexer 3701to some other 3702.Fa xxFlexLexer . 3703.Aq Pa g++/FlexLexer.h 3704can then be included in other sources once per lexer class, first renaming 3705.Fa yyFlexLexer 3706as follows: 3707.Bd -literal -offset indent 3708#undef yyFlexLexer 3709#define yyFlexLexer xxFlexLexer 3710#include <g++/FlexLexer.h> 3711 3712#undef yyFlexLexer 3713#define yyFlexLexer zzFlexLexer 3714#include <g++/FlexLexer.h> 3715.Ed 3716.Pp 3717If, for example, 3718.Dq %option prefix="xx" 3719is used for one scanner and 3720.Dq %option prefix="zz" 3721is used for the other. 3722.Pp 3723.Sy IMPORTANT : 3724the present form of the scanning class is experimental 3725and may change considerably between major releases. 3726.Sh INCOMPATIBILITIES WITH LEX AND POSIX 3727.Nm 3728is a rewrite of the 3729.At 3730.Nm lex 3731tool 3732(the two implementations do not share any code, though), 3733with some extensions and incompatibilities, both of which are of concern 3734to those who wish to write scanners acceptable to either implementation. 3735.Nm 3736is fully compliant with the 3737.Tn POSIX 3738.Nm lex 3739specification, except that when using 3740.Dq %pointer 3741.Pq the default , 3742a call to 3743.Fn unput 3744destroys the contents of 3745.Fa yytext , 3746which is counter to the 3747.Tn POSIX 3748specification. 3749.Pp 3750In this section we discuss all of the known areas of incompatibility between 3751.Nm , 3752AT&T 3753.Nm lex , 3754and the 3755.Tn POSIX 3756specification. 3757.Pp 3758.Nm flex Ns 's 3759.Fl l 3760option turns on maximum compatibility with the original AT&T 3761.Nm lex 3762implementation, at the cost of a major loss in the generated scanner's 3763performance. 3764We note below which incompatibilities can be overcome using the 3765.Fl l 3766option. 3767.Pp 3768.Nm 3769is fully compatible with 3770.Nm lex 3771with the following exceptions: 3772.Bl -dash 3773.It 3774The undocumented 3775.Nm lex 3776scanner internal variable 3777.Fa yylineno 3778is not supported unless 3779.Fl l 3780or 3781.Dq %option yylineno 3782is used. 3783.Pp 3784.Fa yylineno 3785should be maintained on a per-buffer basis, rather than a per-scanner 3786.Pq single global variable 3787basis. 3788.Pp 3789.Fa yylineno 3790is not part of the 3791.Tn POSIX 3792specification. 3793.It 3794The 3795.Fn input 3796routine is not redefinable, though it may be called to read characters 3797following whatever has been matched by a rule. 3798If 3799.Fn input 3800encounters an end-of-file, the normal 3801.Fn yywrap 3802processing is done. 3803A 3804.Dq real 3805end-of-file is returned by 3806.Fn input 3807as 3808.Dv EOF . 3809.Pp 3810Input is instead controlled by defining the 3811.Dv YY_INPUT 3812macro. 3813.Pp 3814The 3815.Nm 3816restriction that 3817.Fn input 3818cannot be redefined is in accordance with the 3819.Tn POSIX 3820specification, which simply does not specify any way of controlling the 3821scanner's input other than by making an initial assignment to 3822.Fa yyin . 3823.It 3824The 3825.Fn unput 3826routine is not redefinable. 3827This restriction is in accordance with 3828.Tn POSIX . 3829.It 3830.Nm 3831scanners are not as reentrant as 3832.Nm lex 3833scanners. 3834In particular, if a scanner is interactive and 3835an interrupt handler long-jumps out of the scanner, 3836and the scanner is subsequently called again, 3837the following error message may be displayed: 3838.Pp 3839.D1 fatal flex scanner internal error--end of buffer missed 3840.Pp 3841To reenter the scanner, first use 3842.Pp 3843.Dl yyrestart(yyin); 3844.Pp 3845Note that this call will throw away any buffered input; 3846usually this isn't a problem with an interactive scanner. 3847.Pp 3848Also note that flex C++ scanner classes are reentrant, 3849so if using C++ is an option , they should be used instead. 3850See 3851.Sx GENERATING C++ SCANNERS 3852above for details. 3853.It 3854.Fn output 3855is not supported. 3856Output from the 3857.Em ECHO 3858macro is done to the file-pointer 3859.Fa yyout 3860.Pq default stdout . 3861.Pp 3862.Fn output 3863is not part of the 3864.Tn POSIX 3865specification. 3866.It 3867.Nm lex 3868does not support exclusive start conditions 3869.Pq %x , 3870though they are in the 3871.Tn POSIX 3872specification. 3873.It 3874When definitions are expanded, 3875.Nm 3876encloses them in parentheses. 3877With 3878.Nm lex , 3879the following: 3880.Bd -literal -offset indent 3881NAME [A-Z][A-Z0-9]* 3882%% 3883foo{NAME}? printf("Found it\en"); 3884%% 3885.Ed 3886.Pp 3887will not match the string 3888.Qq foo 3889because when the macro is expanded the rule is equivalent to 3890.Qq foo[A-Z][A-Z0-9]*? 3891and the precedence is such that the 3892.Sq ?\& 3893is associated with 3894.Qq [A-Z0-9]* . 3895With 3896.Nm , 3897the rule will be expanded to 3898.Qq foo([A-Z][A-Z0-9]*)? 3899and so the string 3900.Qq foo 3901will match. 3902.Pp 3903Note that if the definition begins with 3904.Sq ^ 3905or ends with 3906.Sq $ 3907then it is not expanded with parentheses, to allow these operators to appear in 3908definitions without losing their special meanings. 3909But the 3910.Sq Aq s , 3911.Sq / , 3912and 3913.Aq Aq EOF 3914operators cannot be used in a 3915.Nm 3916definition. 3917.Pp 3918Using 3919.Fl l 3920results in the 3921.Nm lex 3922behavior of no parentheses around the definition. 3923.Pp 3924The 3925.Tn POSIX 3926specification is that the definition be enclosed in parentheses. 3927.It 3928Some implementations of 3929.Nm lex 3930allow a rule's action to begin on a separate line, 3931if the rule's pattern has trailing whitespace: 3932.Bd -literal -offset indent 3933%% 3934foo|bar<space here> 3935 { foobar_action(); } 3936.Ed 3937.Pp 3938.Nm 3939does not support this feature. 3940.It 3941The 3942.Nm lex 3943.Sq %r 3944.Pq generate a Ratfor scanner 3945option is not supported. 3946It is not part of the 3947.Tn POSIX 3948specification. 3949.It 3950After a call to 3951.Fn unput , 3952.Fa yytext 3953is undefined until the next token is matched, 3954unless the scanner was built using 3955.Dq %array . 3956This is not the case with 3957.Nm lex 3958or the 3959.Tn POSIX 3960specification. 3961The 3962.Fl l 3963option does away with this incompatibility. 3964.It 3965The precedence of the 3966.Sq {} 3967.Pq numeric range 3968operator is different. 3969.Nm lex 3970interprets 3971.Qq abc{1,3} 3972as match one, two, or three occurrences of 3973.Sq abc , 3974whereas 3975.Nm 3976interprets it as match 3977.Sq ab 3978followed by one, two, or three occurrences of 3979.Sq c . 3980The latter is in agreement with the 3981.Tn POSIX 3982specification. 3983.It 3984The precedence of the 3985.Sq ^ 3986operator is different. 3987.Nm lex 3988interprets 3989.Qq ^foo|bar 3990as match either 3991.Sq foo 3992at the beginning of a line, or 3993.Sq bar 3994anywhere, whereas 3995.Nm 3996interprets it as match either 3997.Sq foo 3998or 3999.Sq bar 4000if they come at the beginning of a line. 4001The latter is in agreement with the 4002.Tn POSIX 4003specification. 4004.It 4005The special table-size declarations such as 4006.Sq %a 4007supported by 4008.Nm lex 4009are not required by 4010.Nm 4011scanners; 4012.Nm 4013ignores them. 4014.It 4015The name 4016.Dv FLEX_SCANNER 4017is #define'd so scanners may be written for use with either 4018.Nm 4019or 4020.Nm lex . 4021Scanners also include 4022.Dv YY_FLEX_MAJOR_VERSION 4023and 4024.Dv YY_FLEX_MINOR_VERSION 4025indicating which version of 4026.Nm 4027generated the scanner 4028(for example, for the 2.5 release, these defines would be 2 and 5, 4029respectively). 4030.El 4031.Pp 4032The following 4033.Nm 4034features are not included in 4035.Nm lex 4036or the 4037.Tn POSIX 4038specification: 4039.Bd -unfilled -offset indent 4040C++ scanners 4041%option 4042start condition scopes 4043start condition stacks 4044interactive/non-interactive scanners 4045yy_scan_string() and friends 4046yyterminate() 4047yy_set_interactive() 4048yy_set_bol() 4049YY_AT_BOL() 4050<<EOF>> 4051<*> 4052YY_DECL 4053YY_START 4054YY_USER_ACTION 4055YY_USER_INIT 4056#line directives 4057%{}'s around actions 4058multiple actions on a line 4059.Ed 4060.Pp 4061plus almost all of the 4062.Nm 4063flags. 4064The last feature in the list refers to the fact that with 4065.Nm 4066Multiple actions ican be placed on the same line, 4067separated with semi-colons, while with 4068.Nm lex , 4069the following 4070.Pp 4071.Dl foo handle_foo(); ++num_foos_seen; 4072.Pp 4073is 4074.Pq rather surprisingly 4075truncated to 4076.Pp 4077.Dl foo handle_foo(); 4078.Pp 4079.Nm 4080does not truncate the action. 4081Actions that are not enclosed in braces 4082are simply terminated at the end of the line. 4083.Sh FILES 4084.Bl -tag -width "<g++/FlexLexer.h>" 4085.It flex.skl 4086Skeleton scanner. 4087This file is only used when building flex, not when 4088.Nm 4089executes. 4090.It lex.backup 4091Backing-up information for the 4092.Fl b 4093flag (called 4094.Pa lex.bck 4095on some systems). 4096.It lex.yy.c 4097Generated scanner 4098(called 4099.Pa lexyy.c 4100on some systems). 4101.It lex.yy.cc 4102Generated C++ scanner class, when using 4103.Fl + . 4104.It Aq g++/FlexLexer.h 4105Header file defining the C++ scanner base class, 4106.Fa FlexLexer , 4107and its derived class, 4108.Fa yyFlexLexer . 4109.It /usr/lib/libl.* 4110.Nm 4111libraries. 4112The 4113.Pa /usr/lib/libfl.*\& 4114libraries are links to these. 4115Scanners must be linked using either 4116.Fl \&ll 4117or 4118.Fl lfl . 4119.El 4120.Sh EXIT STATUS 4121.Ex -std flex 4122.Sh DIAGNOSTICS 4123.Bl -diag 4124.It warning, rule cannot be matched 4125Indicates that the given rule cannot be matched because it follows other rules 4126that will always match the same text as it. 4127For example, in the following 4128.Dq foo 4129cannot be matched because it comes after an identifier 4130.Qq catch-all 4131rule: 4132.Bd -literal -offset indent 4133[a-z]+ got_identifier(); 4134foo got_foo(); 4135.Ed 4136.Pp 4137Using 4138.Em REJECT 4139in a scanner suppresses this warning. 4140.It "warning, \-s option given but default rule can be matched" 4141Means that it is possible 4142.Pq perhaps only in a particular start condition 4143that the default rule 4144.Pq match any single character 4145is the only one that will match a particular input. 4146Since 4147.Fl s 4148was given, presumably this is not intended. 4149.It reject_used_but_not_detected undefined 4150.It yymore_used_but_not_detected undefined 4151These errors can occur at compile time. 4152They indicate that the scanner uses 4153.Em REJECT 4154or 4155.Fn yymore 4156but that 4157.Nm 4158failed to notice the fact, meaning that 4159.Nm 4160scanned the first two sections looking for occurrences of these actions 4161and failed to find any, but somehow they snuck in 4162.Pq via an #include file, for example . 4163Use 4164.Dq %option reject 4165or 4166.Dq %option yymore 4167to indicate to 4168.Nm 4169that these features are really needed. 4170.It flex scanner jammed 4171A scanner compiled with 4172.Fl s 4173has encountered an input string which wasn't matched by any of its rules. 4174This error can also occur due to internal problems. 4175.It token too large, exceeds YYLMAX 4176The scanner uses 4177.Dq %array 4178and one of its rules matched a string longer than the 4179.Dv YYLMAX 4180constant 4181.Pq 8K bytes by default . 4182The value can be increased by #define'ing 4183.Dv YYLMAX 4184in the definitions section of 4185.Nm 4186input. 4187.It "scanner requires \-8 flag to use the character 'x'" 4188The scanner specification includes recognizing the 8-bit character 4189.Sq x 4190and the 4191.Fl 8 4192flag was not specified, and defaulted to 7-bit because the 4193.Fl Cf 4194or 4195.Fl CF 4196table compression options were used. 4197See the discussion of the 4198.Fl 7 4199flag for details. 4200.It flex scanner push-back overflow 4201unput() was used to push back so much text that the scanner's buffer 4202could not hold both the pushed-back text and the current token in 4203.Fa yytext . 4204Ideally the scanner should dynamically resize the buffer in this case, 4205but at present it does not. 4206.It "input buffer overflow, can't enlarge buffer because scanner uses REJECT" 4207The scanner was working on matching an extremely large token and needed 4208to expand the input buffer. 4209This doesn't work with scanners that use 4210.Em REJECT . 4211.It "fatal flex scanner internal error--end of buffer missed" 4212This can occur in an scanner which is reentered after a long-jump 4213has jumped out 4214.Pq or over 4215the scanner's activation frame. 4216Before reentering the scanner, use: 4217.Pp 4218.Dl yyrestart(yyin); 4219.Pp 4220or, as noted above, switch to using the C++ scanner class. 4221.It "too many start conditions in <> construct!" 4222More start conditions than exist were listed in a <> construct 4223(so at least one of them must have been listed twice). 4224.El 4225.Sh SEE ALSO 4226.Xr awk 1 , 4227.Xr sed 1 , 4228.Xr yacc 1 4229.Rs 4230.%A John Levine 4231.%A Tony Mason 4232.%A Doug Brown 4233.%B Lex & Yacc 4234.%I O'Reilly and Associates 4235.%N 2nd edition 4236.Re 4237.Rs 4238.%A Alfred Aho 4239.%A Ravi Sethi 4240.%A Jeffrey Ullman 4241.%B Compilers: Principles, Techniques and Tools 4242.%I Addison-Wesley 4243.%D 1986 4244.%O "Describes the pattern-matching techniques used by flex (deterministic finite automata)" 4245.Re 4246.Sh STANDARDS 4247The 4248.Nm lex 4249utility is compliant with the 4250.St -p1003.1-2008 4251specification, 4252though its presence is optional. 4253.Pp 4254The flags 4255.Op Fl 78BbCdFfhIiLloPpSsTVw+? , 4256.Op Fl -help , 4257and 4258.Op Fl -version 4259are extensions to that specification. 4260.Sh AUTHORS 4261Vern Paxson, with the help of many ideas and much inspiration from 4262Van Jacobson. 4263Original version by Jef Poskanzer. 4264The fast table representation is a partial implementation of a design done by 4265Van Jacobson. 4266The implementation was done by Kevin Gong and Vern Paxson. 4267.Pp 4268Thanks to the many 4269.Nm 4270beta-testers, feedbackers, and contributors, especially Francois Pinard, 4271Casey Leedom, 4272Robert Abramovitz, 4273Stan Adermann, Terry Allen, David Barker-Plummer, John Basrai, 4274Neal Becker, Nelson H.F. Beebe, benson@odi.com, 4275Karl Berry, Peter A. Bigot, Simon Blanchard, 4276Keith Bostic, Frederic Brehm, Ian Brockbank, Kin Cho, Nick Christopher, 4277Brian Clapper, J.T. Conklin, 4278Jason Coughlin, Bill Cox, Nick Cropper, Dave Curtis, Scott David 4279Daniels, Chris G. Demetriou, Theo de Raadt, 4280Mike Donahue, Chuck Doucette, Tom Epperly, Leo Eskin, 4281Chris Faylor, Chris Flatters, Jon Forrest, Jeffrey Friedl, 4282Joe Gayda, Kaveh R. Ghazi, Wolfgang Glunz, 4283Eric Goldman, Christopher M. Gould, Ulrich Grepel, Peer Griebel, 4284Jan Hajic, Charles Hemphill, NORO Hideo, 4285Jarkko Hietaniemi, Scott Hofmann, 4286Jeff Honig, Dana Hudes, Eric Hughes, John Interrante, 4287Ceriel Jacobs, Michal Jaegermann, Sakari Jalovaara, Jeffrey R. Jones, 4288Henry Juengst, Klaus Kaempf, Jonathan I. Kamens, Terrence O Kane, 4289Amir Katz, ken@ken.hilco.com, Kevin B. Kenny, 4290Steve Kirsch, Winfried Koenig, Marq Kole, Ronald Lamprecht, 4291Greg Lee, Rohan Lenard, Craig Leres, John Levine, Steve Liddle, 4292David Loffredo, Mike Long, 4293Mohamed el Lozy, Brian Madsen, Malte, Joe Marshall, 4294Bengt Martensson, Chris Metcalf, 4295Luke Mewburn, Jim Meyering, R. Alexander Milowski, Erik Naggum, 4296G.T. Nicol, Landon Noll, James Nordby, Marc Nozell, 4297Richard Ohnemus, Karsten Pahnke, 4298Sven Panne, Roland Pesch, Walter Pelissero, Gaumond Pierre, 4299Esmond Pitt, Jef Poskanzer, Joe Rahmeh, Jarmo Raiha, 4300Frederic Raimbault, Pat Rankin, Rick Richardson, 4301Kevin Rodgers, Kai Uwe Rommel, Jim Roskind, Alberto Santini, 4302Andreas Scherer, Darrell Schiebel, Raf Schietekat, 4303Doug Schmidt, Philippe Schnoebelen, Andreas Schwab, 4304Larry Schwimmer, Alex Siegel, Eckehard Stolz, Jan-Erik Strvmquist, 4305Mike Stump, Paul Stuart, Dave Tallman, Ian Lance Taylor, 4306Chris Thewalt, Richard M. Timoney, Jodi Tsai, 4307Paul Tuinenga, Gary Weik, Frank Whaley, Gerhard Wilhelms, Kent Williams, 4308Ken Yap, Ron Zellar, Nathan Zelle, David Zuhn, 4309and those whose names have slipped my marginal mail-archiving skills 4310but whose contributions are appreciated all the 4311same. 4312.Pp 4313Thanks to Keith Bostic, Jon Forrest, Noah Friedman, 4314John Gilmore, Craig Leres, John Levine, Bob Mulcahy, G.T. 4315Nicol, Francois Pinard, Rich Salz, and Richard Stallman for help with various 4316distribution headaches. 4317.Pp 4318Thanks to Esmond Pitt and Earle Horton for 8-bit character support; 4319to Benson Margulies and Fred Burke for C++ support; 4320to Kent Williams and Tom Epperly for C++ class support; 4321to Ove Ewerlid for support of NUL's; 4322and to Eric Hughes for support of multiple buffers. 4323.Pp 4324This work was primarily done when I was with the Real Time Systems Group 4325at the Lawrence Berkeley Laboratory in Berkeley, CA. 4326Many thanks to all there for the support I received. 4327.Pp 4328Send comments to 4329.Aq vern@ee.lbl.gov . 4330.Sh BUGS 4331Some trailing context patterns cannot be properly matched and generate 4332warning messages 4333.Pq "dangerous trailing context" . 4334These are patterns where the ending of the first part of the rule 4335matches the beginning of the second part, such as 4336.Qq zx*/xy* , 4337where the 4338.Sq x* 4339matches the 4340.Sq x 4341at the beginning of the trailing context. 4342(Note that the POSIX draft states that the text matched by such patterns 4343is undefined.) 4344.Pp 4345For some trailing context rules, parts which are actually fixed-length are 4346not recognized as such, leading to the above mentioned performance loss. 4347In particular, parts using 4348.Sq |\& 4349or 4350.Sq {n} 4351(such as 4352.Qq foo{3} ) 4353are always considered variable-length. 4354.Pp 4355Combining trailing context with the special 4356.Sq |\& 4357action can result in fixed trailing context being turned into 4358the more expensive variable trailing context. 4359For example, in the following: 4360.Bd -literal -offset indent 4361%% 4362abc | 4363xyz/def 4364.Ed 4365.Pp 4366Use of 4367.Fn unput 4368invalidates yytext and yyleng, unless the 4369.Dq %array 4370directive 4371or the 4372.Fl l 4373option has been used. 4374.Pp 4375Pattern-matching of NUL's is substantially slower than matching other 4376characters. 4377.Pp 4378Dynamic resizing of the input buffer is slow, as it entails rescanning 4379all the text matched so far by the current 4380.Pq generally huge 4381token. 4382.Pp 4383Due to both buffering of input and read-ahead, 4384it is not possible to intermix calls to 4385.Aq Pa stdio.h 4386routines, such as, for example, 4387.Fn getchar , 4388with 4389.Nm 4390rules and expect it to work. 4391Call 4392.Fn input 4393instead. 4394.Pp 4395The total table entries listed by the 4396.Fl v 4397flag excludes the number of table entries needed to determine 4398what rule has been matched. 4399The number of entries is equal to the number of DFA states 4400if the scanner does not use 4401.Em REJECT , 4402and somewhat greater than the number of states if it does. 4403.Pp 4404.Em REJECT 4405cannot be used with the 4406.Fl f 4407or 4408.Fl F 4409options. 4410.Pp 4411The 4412.Nm 4413internal algorithms need documentation. 4414