1.\" $OpenBSD: flex.1,v 1.37 2014/03/23 16:28:29 jmc Exp $ 2.\" 3.\" Copyright (c) 1990 The Regents of the University of California. 4.\" All rights reserved. 5.\" 6.\" This code is derived from software contributed to Berkeley by 7.\" Vern Paxson. 8.\" 9.\" The United States Government has rights in this work pursuant 10.\" to contract no. DE-AC03-76SF00098 between the United States 11.\" Department of Energy and the University of California. 12.\" 13.\" Redistribution and use in source and binary forms, with or without 14.\" modification, are permitted provided that the following conditions 15.\" are met: 16.\" 17.\" 1. Redistributions of source code must retain the above copyright 18.\" notice, this list of conditions and the following disclaimer. 19.\" 2. Redistributions in binary form must reproduce the above copyright 20.\" notice, this list of conditions and the following disclaimer in the 21.\" documentation and/or other materials provided with the distribution. 22.\" 23.\" Neither the name of the University nor the names of its contributors 24.\" may be used to endorse or promote products derived from this software 25.\" without specific prior written permission. 26.\" 27.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR 28.\" IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED 29.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 30.\" PURPOSE. 31.\" 32.Dd $Mdocdate: March 23 2014 $ 33.Dt FLEX 1 34.Os 35.Sh NAME 36.Nm flex 37.Nd fast lexical analyzer generator 38.Sh SYNOPSIS 39.Nm 40.Bk -words 41.Op Fl 78BbdFfhIiLlnpsTtVvw+? 42.Op Fl C Ns Op Cm aeFfmr 43.Op Fl Fl help 44.Op Fl Fl version 45.Op Fl o Ns Ar output 46.Op Fl P Ns Ar prefix 47.Op Fl S Ns Ar skeleton 48.Op Ar 49.Ek 50.Sh DESCRIPTION 51.Nm 52is a tool for generating 53.Em scanners : 54programs which recognize lexical patterns in text. 55.Nm 56reads the given input files, or its standard input if no file names are given, 57for a description of a scanner to generate. 58The description is in the form of pairs of regular expressions and C code, 59called 60.Em rules . 61.Nm 62generates as output a C source file, 63.Pa lex.yy.c , 64which defines a routine 65.Fn yylex . 66This file is compiled and linked with the 67.Fl lfl 68library to produce an executable. 69When the executable is run, it analyzes its input for occurrences 70of the regular expressions. 71Whenever it finds one, it executes the corresponding C code. 72.Pp 73The manual includes both tutorial and reference sections: 74.Bl -ohang 75.It Sy Some Simple Examples 76.It Sy Format of the Input File 77.It Sy Patterns 78The extended regular expressions used by 79.Nm . 80.It Sy How the Input is Matched 81The rules for determining what has been matched. 82.It Sy Actions 83How to specify what to do when a pattern is matched. 84.It Sy The Generated Scanner 85Details regarding the scanner that 86.Nm 87produces; 88how to control the input source. 89.It Sy Start Conditions 90Introducing context into scanners, and managing 91.Qq mini-scanners . 92.It Sy Multiple Input Buffers 93How to manipulate multiple input sources; 94how to scan from strings instead of files. 95.It Sy End-of-File Rules 96Special rules for matching the end of the input. 97.It Sy Miscellaneous Macros 98A summary of macros available to the actions. 99.It Sy Values Available to the User 100A summary of values available to the actions. 101.It Sy Interfacing with Yacc 102Connecting flex scanners together with 103.Xr yacc 1 104parsers. 105.It Sy Options 106.Nm 107command-line options, and the 108.Dq %option 109directive. 110.It Sy Performance Considerations 111How to make scanners go as fast as possible. 112.It Sy Generating C++ Scanners 113The 114.Pq experimental 115facility for generating C++ scanner classes. 116.It Sy Incompatibilities with Lex and POSIX 117How 118.Nm 119differs from 120.At 121.Nm lex 122and the 123.Tn POSIX 124.Nm lex 125standard. 126.It Sy Files 127Files used by 128.Nm . 129.It Sy Diagnostics 130Those error messages produced by 131.Nm 132.Pq or scanners it generates 133whose meanings might not be apparent. 134.It Sy See Also 135Other documentation, related tools. 136.It Sy Authors 137Includes contact information. 138.It Sy Bugs 139Known problems with 140.Nm . 141.El 142.Sh SOME SIMPLE EXAMPLES 143First some simple examples to get the flavor of how one uses 144.Nm . 145The following 146.Nm 147input specifies a scanner which whenever it encounters the string 148.Qq username 149will replace it with the user's login name: 150.Bd -literal -offset indent 151%% 152username printf("%s", getlogin()); 153.Ed 154.Pp 155By default, any text not matched by a 156.Nm 157scanner is copied to the output, so the net effect of this scanner is 158to copy its input file to its output with each occurrence of 159.Qq username 160expanded. 161In this input, there is just one rule. 162.Qq username 163is the 164.Em pattern 165and the 166.Qq printf 167is the 168.Em action . 169The 170.Qq %% 171marks the beginning of the rules. 172.Pp 173Here's another simple example: 174.Bd -literal -offset indent 175%{ 176int num_lines = 0, num_chars = 0; 177%} 178 179%% 180\en ++num_lines; ++num_chars; 181\&. ++num_chars; 182 183%% 184main() 185{ 186 yylex(); 187 printf("# of lines = %d, # of chars = %d\en", 188 num_lines, num_chars); 189} 190.Ed 191.Pp 192This scanner counts the number of characters and the number 193of lines in its input 194(it produces no output other than the final report on the counts). 195The first line declares two globals, 196.Qq num_lines 197and 198.Qq num_chars , 199which are accessible both inside 200.Fn yylex 201and in the 202.Fn main 203routine declared after the second 204.Qq %% . 205There are two rules, one which matches a newline 206.Pq \&"\en\&" 207and increments both the line count and the character count, 208and one which matches any character other than a newline 209(indicated by the 210.Qq \&. 211regular expression). 212.Pp 213A somewhat more complicated example: 214.Bd -literal -offset indent 215/* scanner for a toy Pascal-like language */ 216 217%{ 218/* need this for the call to atof() below */ 219#include <math.h> 220%} 221 222DIGIT [0-9] 223ID [a-z][a-z0-9]* 224 225%% 226 227{DIGIT}+ { 228 printf("An integer: %s (%d)\en", yytext, 229 atoi(yytext)); 230} 231 232{DIGIT}+"."{DIGIT}* { 233 printf("A float: %s (%g)\en", yytext, 234 atof(yytext)); 235} 236 237if|then|begin|end|procedure|function { 238 printf("A keyword: %s\en", yytext); 239} 240 241{ID} printf("An identifier: %s\en", yytext); 242 243"+"|"-"|"*"|"/" printf("An operator: %s\en", yytext); 244 245"{"[^}\en]*"}" /* eat up one-line comments */ 246 247[ \et\en]+ /* eat up whitespace */ 248 249\&. printf("Unrecognized character: %s\en", yytext); 250 251%% 252 253main(int argc, char *argv[]) 254{ 255 ++argv; --argc; /* skip over program name */ 256 if (argc > 0) 257 yyin = fopen(argv[0], "r"); 258 else 259 yyin = stdin; 260 261 yylex(); 262} 263.Ed 264.Pp 265This is the beginnings of a simple scanner for a language like Pascal. 266It identifies different types of 267.Em tokens 268and reports on what it has seen. 269.Pp 270The details of this example will be explained in the following sections. 271.Sh FORMAT OF THE INPUT FILE 272The 273.Nm 274input file consists of three sections, separated by a line with just 275.Qq %% 276in it: 277.Bd -unfilled -offset indent 278definitions 279%% 280rules 281%% 282user code 283.Ed 284.Pp 285The 286.Em definitions 287section contains declarations of simple 288.Em name 289definitions to simplify the scanner specification, and declarations of 290.Em start conditions , 291which are explained in a later section. 292.Pp 293Name definitions have the form: 294.Pp 295.D1 name definition 296.Pp 297The 298.Qq name 299is a word beginning with a letter or an underscore 300.Pq Sq _ 301followed by zero or more letters, digits, 302.Sq _ , 303or 304.Sq - 305.Pq dash . 306The definition is taken to begin at the first non-whitespace character 307following the name and continuing to the end of the line. 308The definition can subsequently be referred to using 309.Qq {name} , 310which will expand to 311.Qq (definition) . 312For example: 313.Bd -literal -offset indent 314DIGIT [0-9] 315ID [a-z][a-z0-9]* 316.Ed 317.Pp 318This defines 319.Qq DIGIT 320to be a regular expression which matches a single digit, and 321.Qq ID 322to be a regular expression which matches a letter 323followed by zero-or-more letters-or-digits. 324A subsequent reference to 325.Pp 326.Dl {DIGIT}+"."{DIGIT}* 327.Pp 328is identical to 329.Pp 330.Dl ([0-9])+"."([0-9])* 331.Pp 332and matches one-or-more digits followed by a 333.Sq .\& 334followed by zero-or-more digits. 335.Pp 336The 337.Em rules 338section of the 339.Nm 340input contains a series of rules of the form: 341.Pp 342.Dl pattern action 343.Pp 344The pattern must be unindented and the action must begin 345on the same line. 346.Pp 347See below for a further description of patterns and actions. 348.Pp 349Finally, the user code section is simply copied to 350.Pa lex.yy.c 351verbatim. 352It is used for companion routines which call or are called by the scanner. 353The presence of this section is optional; 354if it is missing, the second 355.Qq %% 356in the input file may be skipped too. 357.Pp 358In the definitions and rules sections, any indented text or text enclosed in 359.Sq %{ 360and 361.Sq %} 362is copied verbatim to the output 363.Pq with the %{}'s removed . 364The %{}'s must appear unindented on lines by themselves. 365.Pp 366In the rules section, 367any indented or %{} text appearing before the first rule may be used to 368declare variables which are local to the scanning routine and 369.Pq after the declarations 370code which is to be executed whenever the scanning routine is entered. 371Other indented or %{} text in the rule section is still copied to the output, 372but its meaning is not well-defined and it may well cause compile-time 373errors (this feature is present for 374.Tn POSIX 375compliance; see below for other such features). 376.Pp 377In the definitions section 378.Pq but not in the rules section , 379an unindented comment 380(i.e., a line beginning with 381.Qq /* ) 382is also copied verbatim to the output up to the next 383.Qq */ . 384.Sh PATTERNS 385The patterns in the input are written using an extended set of regular 386expressions. 387These are: 388.Bl -tag -width "XXXXXXXX" 389.It x 390Match the character 391.Sq x . 392.It .\& 393Any character 394.Pq byte 395except newline. 396.It [xyz] 397A 398.Qq character class ; 399in this case, the pattern matches either an 400.Sq x , 401a 402.Sq y , 403or a 404.Sq z . 405.It [abj-oZ] 406A 407.Qq character class 408with a range in it; matches an 409.Sq a , 410a 411.Sq b , 412any letter from 413.Sq j 414through 415.Sq o , 416or a 417.Sq Z . 418.It [^A-Z] 419A 420.Qq negated character class , 421i.e., any character but those in the class. 422In this case, any character EXCEPT an uppercase letter. 423.It [^A-Z\en] 424Any character EXCEPT an uppercase letter or a newline. 425.It r* 426Zero or more r's, where 427.Sq r 428is any regular expression. 429.It r+ 430One or more r's. 431.It r? 432Zero or one r's (that is, 433.Qq an optional r ) . 434.It r{2,5} 435Anywhere from two to five r's. 436.It r{2,} 437Two or more r's. 438.It r{4} 439Exactly 4 r's. 440.It {name} 441The expansion of the 442.Qq name 443definition 444.Pq see above . 445.It \&"[xyz]\e\&"foo\&" 446The literal string: [xyz]"foo. 447.It \eX 448If 449.Sq X 450is an 451.Sq a , 452.Sq b , 453.Sq f , 454.Sq n , 455.Sq r , 456.Sq t , 457or 458.Sq v , 459then the ANSI-C interpretation of 460.Sq \eX . 461Otherwise, a literal 462.Sq X 463(used to escape operators such as 464.Sq * ) . 465.It \e0 466A NUL character 467.Pq ASCII code 0 . 468.It \e123 469The character with octal value 123. 470.It \ex2a 471The character with hexadecimal value 2a. 472.It (r) 473Match an 474.Sq r ; 475parentheses are used to override precedence 476.Pq see below . 477.It rs 478The regular expression 479.Sq r 480followed by the regular expression 481.Sq s ; 482called 483.Qq concatenation . 484.It r|s 485Either an 486.Sq r 487or an 488.Sq s . 489.It r/s 490An 491.Sq r , 492but only if it is followed by an 493.Sq s . 494The text matched by 495.Sq s 496is included when determining whether this rule is the 497.Qq longest match , 498but is then returned to the input before the action is executed. 499So the action only sees the text matched by 500.Sq r . 501This type of pattern is called 502.Qq trailing context . 503(There are some combinations of r/s that 504.Nm 505cannot match correctly; see notes in the 506.Sx BUGS 507section below regarding 508.Qq dangerous trailing context . ) 509.It ^r 510An 511.Sq r , 512but only at the beginning of a line 513(i.e., just starting to scan, or right after a newline has been scanned). 514.It r$ 515An 516.Sq r , 517but only at the end of a line 518.Pq i.e., just before a newline . 519Equivalent to 520.Qq r/\en . 521.Pp 522Note that 523.Nm flex Ns 's 524notion of 525.Qq newline 526is exactly whatever the C compiler used to compile 527.Nm 528interprets 529.Sq \en 530as. 531.\" In particular, on some DOS systems you must either filter out \er's in the 532.\" input yourself, or explicitly use r/\er\en for 533.\" .Qq r$ . 534.It <s>r 535An 536.Sq r , 537but only in start condition 538.Sq s 539.Pq see below for discussion of start conditions . 540.It <s1,s2,s3>r 541The same, but in any of start conditions s1, s2, or s3. 542.It <*>r 543An 544.Sq r 545in any start condition, even an exclusive one. 546.It <<EOF>> 547An end-of-file. 548.It <s1,s2><<EOF>> 549An end-of-file when in start condition s1 or s2. 550.El 551.Pp 552Note that inside of a character class, all regular expression operators 553lose their special meaning except escape 554.Pq Sq \e 555and the character class operators, 556.Sq - , 557.Sq ]\& , 558and, at the beginning of the class, 559.Sq ^ . 560.Pp 561The regular expressions listed above are grouped according to 562precedence, from highest precedence at the top to lowest at the bottom. 563Those grouped together have equal precedence. 564For example, 565.Pp 566.D1 foo|bar* 567.Pp 568is the same as 569.Pp 570.D1 (foo)|(ba(r*)) 571.Pp 572since the 573.Sq * 574operator has higher precedence than concatenation, 575and concatenation higher than alternation 576.Pq Sq |\& . 577This pattern therefore matches 578.Em either 579the string 580.Qq foo 581.Em or 582the string 583.Qq ba 584followed by zero-or-more r's. 585To match 586.Qq foo 587or zero-or-more "bar"'s, 588use: 589.Pp 590.D1 foo|(bar)* 591.Pp 592and to match zero-or-more "foo"'s-or-"bar"'s: 593.Pp 594.D1 (foo|bar)* 595.Pp 596In addition to characters and ranges of characters, character classes 597can also contain character class 598.Em expressions . 599These are expressions enclosed inside 600.Sq [: 601and 602.Sq :] 603delimiters (which themselves must appear between the 604.Sq \&[ 605and 606.Sq ]\& 607of the 608character class; other elements may occur inside the character class, too). 609The valid expressions are: 610.Bd -unfilled -offset indent 611[:alnum:] [:alpha:] [:blank:] 612[:cntrl:] [:digit:] [:graph:] 613[:lower:] [:print:] [:punct:] 614[:space:] [:upper:] [:xdigit:] 615.Ed 616.Pp 617These expressions all designate a set of characters equivalent to 618the corresponding standard C 619.Fn isXXX 620function. 621For example, [:alnum:] designates those characters for which 622.Xr isalnum 3 623returns true \- i.e., any alphabetic or numeric. 624Some systems don't provide 625.Xr isblank 3 , 626so 627.Nm 628defines [:blank:] as a blank or a tab. 629.Pp 630For example, the following character classes are all equivalent: 631.Bd -unfilled -offset indent 632[[:alnum:]] 633[[:alpha:][:digit:]] 634[[:alpha:]0-9] 635[a-zA-Z0-9] 636.Ed 637.Pp 638If the scanner is case-insensitive (the 639.Fl i 640flag), then [:upper:] and [:lower:] are equivalent to [:alpha:]. 641.Pp 642Some notes on patterns: 643.Bl -dash 644.It 645A negated character class such as the example 646.Qq [^A-Z] 647above will match a newline unless "\en" 648.Pq or an equivalent escape sequence 649is one of the characters explicitly present in the negated character class 650(e.g., 651.Qq [^A-Z\en] ) . 652This is unlike how many other regular expression tools treat negated character 653classes, but unfortunately the inconsistency is historically entrenched. 654Matching newlines means that a pattern like 655.Qq [^"]* 656can match the entire input unless there's another quote in the input. 657.It 658A rule can have at most one instance of trailing context 659(the 660.Sq / 661operator or the 662.Sq $ 663operator). 664The start condition, 665.Sq ^ , 666and 667.Qq <<EOF>> 668patterns can only occur at the beginning of a pattern, and, as well as with 669.Sq / 670and 671.Sq $ , 672cannot be grouped inside parentheses. 673A 674.Sq ^ 675which does not occur at the beginning of a rule or a 676.Sq $ 677which does not occur at the end of a rule loses its special properties 678and is treated as a normal character. 679.It 680The following are illegal: 681.Bd -unfilled -offset indent 682foo/bar$ 683<sc1>foo<sc2>bar 684.Ed 685.Pp 686Note that the first of these, can be written 687.Qq foo/bar\en . 688.It 689The following will result in 690.Sq $ 691or 692.Sq ^ 693being treated as a normal character: 694.Bd -unfilled -offset indent 695foo|(bar$) 696foo|^bar 697.Ed 698.Pp 699If what's wanted is a 700.Qq foo 701or a bar-followed-by-a-newline, the following could be used 702(the special 703.Sq |\& 704action is explained below): 705.Bd -unfilled -offset indent 706foo | 707bar$ /* action goes here */ 708.Ed 709.Pp 710A similar trick will work for matching a foo or a 711bar-at-the-beginning-of-a-line. 712.El 713.Sh HOW THE INPUT IS MATCHED 714When the generated scanner is run, 715it analyzes its input looking for strings which match any of its patterns. 716If it finds more than one match, 717it takes the one matching the most text 718(for trailing context rules, this includes the length of the trailing part, 719even though it will then be returned to the input). 720If it finds two or more matches of the same length, 721the rule listed first in the 722.Nm 723input file is chosen. 724.Pp 725Once the match is determined, the text corresponding to the match 726(called the 727.Em token ) 728is made available in the global character pointer 729.Fa yytext , 730and its length in the global integer 731.Fa yyleng . 732The 733.Em action 734corresponding to the matched pattern is then executed 735.Pq a more detailed description of actions follows , 736and then the remaining input is scanned for another match. 737.Pp 738If no match is found, then the default rule is executed: 739the next character in the input is considered matched and 740copied to the standard output. 741Thus, the simplest legal 742.Nm 743input is: 744.Pp 745.D1 %% 746.Pp 747which generates a scanner that simply copies its input 748.Pq one character at a time 749to its output. 750.Pp 751Note that 752.Fa yytext 753can be defined in two different ways: 754either as a character pointer or as a character array. 755Which definition 756.Nm 757uses can be controlled by including one of the special directives 758.Dq %pointer 759or 760.Dq %array 761in the first 762.Pq definitions 763section of flex input. 764The default is 765.Dq %pointer , 766unless the 767.Fl l 768.Nm lex 769compatibility option is used, in which case 770.Fa yytext 771will be an array. 772The advantage of using 773.Dq %pointer 774is substantially faster scanning and no buffer overflow when matching 775very large tokens 776.Pq unless not enough dynamic memory is available . 777The disadvantage is that actions are restricted in how they can modify 778.Fa yytext 779.Pq see the next section , 780and calls to the 781.Fn unput 782function destroy the present contents of 783.Fa yytext , 784which can be a considerable porting headache when moving between different 785.Nm lex 786versions. 787.Pp 788The advantage of 789.Dq %array 790is that 791.Fa yytext 792can be modified as much as wanted, and calls to 793.Fn unput 794do not destroy 795.Fa yytext 796.Pq see below . 797Furthermore, existing 798.Nm lex 799programs sometimes access 800.Fa yytext 801externally using declarations of the form: 802.Pp 803.D1 extern char yytext[]; 804.Pp 805This definition is erroneous when used with 806.Dq %pointer , 807but correct for 808.Dq %array . 809.Pp 810.Dq %array 811defines 812.Fa yytext 813to be an array of 814.Dv YYLMAX 815characters, which defaults to a fairly large value. 816The size can be changed by simply #define'ing 817.Dv YYLMAX 818to a different value in the first section of 819.Nm 820input. 821As mentioned above, with 822.Dq %pointer 823yytext grows dynamically to accommodate large tokens. 824While this means a 825.Dq %pointer 826scanner can accommodate very large tokens 827.Pq such as matching entire blocks of comments , 828bear in mind that each time the scanner must resize 829.Fa yytext 830it also must rescan the entire token from the beginning, so matching such 831tokens can prove slow. 832.Fa yytext 833presently does not dynamically grow if a call to 834.Fn unput 835results in too much text being pushed back; instead, a run-time error results. 836.Pp 837Also note that 838.Dq %array 839cannot be used with C++ scanner classes 840.Pq the c++ option; see below . 841.Sh ACTIONS 842Each pattern in a rule has a corresponding action, 843which can be any arbitrary C statement. 844The pattern ends at the first non-escaped whitespace character; 845the remainder of the line is its action. 846If the action is empty, 847then when the pattern is matched the input token is simply discarded. 848For example, here is the specification for a program 849which deletes all occurrences of 850.Qq zap me 851from its input: 852.Bd -literal -offset indent 853%% 854"zap me" 855.Ed 856.Pp 857(It will copy all other characters in the input to the output since 858they will be matched by the default rule.) 859.Pp 860Here is a program which compresses multiple blanks and tabs down to 861a single blank, and throws away whitespace found at the end of a line: 862.Bd -literal -offset indent 863%% 864[ \et]+ putchar(' '); 865[ \et]+$ /* ignore this token */ 866.Ed 867.Pp 868If the action contains a 869.Sq { , 870then the action spans till the balancing 871.Sq } 872is found, and the action may cross multiple lines. 873.Nm 874knows about C strings and comments and won't be fooled by braces found 875within them, but also allows actions to begin with 876.Sq %{ 877and will consider the action to be all the text up to the next 878.Sq %} 879.Pq regardless of ordinary braces inside the action . 880.Pp 881An action consisting solely of a vertical bar 882.Pq Sq |\& 883means 884.Qq same as the action for the next rule . 885See below for an illustration. 886.Pp 887Actions can include arbitrary C code, 888including return statements to return a value to whatever routine called 889.Fn yylex . 890Each time 891.Fn yylex 892is called, it continues processing tokens from where it last left off 893until it either reaches the end of the file or executes a return. 894.Pp 895Actions are free to modify 896.Fa yytext 897except for lengthening it 898(adding characters to its end \- these will overwrite later characters in the 899input stream). 900This, however, does not apply when using 901.Dq %array 902.Pq see above ; 903in that case, 904.Fa yytext 905may be freely modified in any way. 906.Pp 907Actions are free to modify 908.Fa yyleng 909except they should not do so if the action also includes use of 910.Fn yymore 911.Pq see below . 912.Pp 913There are a number of special directives which can be included within 914an action: 915.Bl -tag -width Ds 916.It ECHO 917Copies 918.Fa yytext 919to the scanner's output. 920.It BEGIN 921Followed by the name of a start condition, places the scanner in the 922corresponding start condition 923.Pq see below . 924.It REJECT 925Directs the scanner to proceed on to the 926.Qq second best 927rule which matched the input 928.Pq or a prefix of the input . 929The rule is chosen as described above in 930.Sx HOW THE INPUT IS MATCHED , 931and 932.Fa yytext 933and 934.Fa yyleng 935set up appropriately. 936It may either be one which matched as much text 937as the originally chosen rule but came later in the 938.Nm 939input file, or one which matched less text. 940For example, the following will both count the 941words in the input and call the routine 942.Fn special 943whenever 944.Qq frob 945is seen: 946.Bd -literal -offset indent 947int word_count = 0; 948%% 949 950frob special(); REJECT; 951[^ \et\en]+ ++word_count; 952.Ed 953.Pp 954Without the 955.Em REJECT , 956any "frob"'s in the input would not be counted as words, 957since the scanner normally executes only one action per token. 958Multiple 959.Em REJECT Ns 's 960are allowed, 961each one finding the next best choice to the currently active rule. 962For example, when the following scanner scans the token 963.Qq abcd , 964it will write 965.Qq abcdabcaba 966to the output: 967.Bd -literal -offset indent 968%% 969a | 970ab | 971abc | 972abcd ECHO; REJECT; 973\&.|\en /* eat up any unmatched character */ 974.Ed 975.Pp 976(The first three rules share the fourth's action since they use 977the special 978.Sq |\& 979action.) 980.Em REJECT 981is a particularly expensive feature in terms of scanner performance; 982if it is used in any of the scanner's actions it will slow down 983all of the scanner's matching. 984Furthermore, 985.Em REJECT 986cannot be used with the 987.Fl Cf 988or 989.Fl CF 990options 991.Pq see below . 992.Pp 993Note also that unlike the other special actions, 994.Em REJECT 995is a 996.Em branch ; 997code immediately following it in the action will not be executed. 998.It yymore() 999Tells the scanner that the next time it matches a rule, the corresponding 1000token should be appended onto the current value of 1001.Fa yytext 1002rather than replacing it. 1003For example, given the input 1004.Qq mega-kludge 1005the following will write 1006.Qq mega-mega-kludge 1007to the output: 1008.Bd -literal -offset indent 1009%% 1010mega- ECHO; yymore(); 1011kludge ECHO; 1012.Ed 1013.Pp 1014First 1015.Qq mega- 1016is matched and echoed to the output. 1017Then 1018.Qq kludge 1019is matched, but the previous 1020.Qq mega- 1021is still hanging around at the beginning of 1022.Fa yytext 1023so the 1024.Em ECHO 1025for the 1026.Qq kludge 1027rule will actually write 1028.Qq mega-kludge . 1029.Pp 1030Two notes regarding use of 1031.Fn yymore : 1032First, 1033.Fn yymore 1034depends on the value of 1035.Fa yyleng 1036correctly reflecting the size of the current token, so 1037.Fa yyleng 1038must not be modified when using 1039.Fn yymore . 1040Second, the presence of 1041.Fn yymore 1042in the scanner's action entails a minor performance penalty in the 1043scanner's matching speed. 1044.It yyless(n) 1045Returns all but the first 1046.Ar n 1047characters of the current token back to the input stream, where they 1048will be rescanned when the scanner looks for the next match. 1049.Fa yytext 1050and 1051.Fa yyleng 1052are adjusted appropriately (e.g., 1053.Fa yyleng 1054will now be equal to 1055.Ar n ) . 1056For example, on the input 1057.Qq foobar 1058the following will write out 1059.Qq foobarbar : 1060.Bd -literal -offset indent 1061%% 1062foobar ECHO; yyless(3); 1063[a-z]+ ECHO; 1064.Ed 1065.Pp 1066An argument of 0 to 1067.Fa yyless 1068will cause the entire current input string to be scanned again. 1069Unless how the scanner will subsequently process its input has been changed 1070(using 1071.Em BEGIN , 1072for example), 1073this will result in an endless loop. 1074.Pp 1075Note that 1076.Fa yyless 1077is a macro and can only be used in the 1078.Nm 1079input file, not from other source files. 1080.It unput(c) 1081Puts the character 1082.Ar c 1083back into the input stream. 1084It will be the next character scanned. 1085The following action will take the current token and cause it 1086to be rescanned enclosed in parentheses. 1087.Bd -literal -offset indent 1088{ 1089 int i; 1090 char *yycopy; 1091 1092 /* Copy yytext because unput() trashes yytext */ 1093 if ((yycopy = strdup(yytext)) == NULL) 1094 err(1, NULL); 1095 unput(')'); 1096 for (i = yyleng - 1; i >= 0; --i) 1097 unput(yycopy[i]); 1098 unput('('); 1099 free(yycopy); 1100} 1101.Ed 1102.Pp 1103Note that since each 1104.Fn unput 1105puts the given character back at the beginning of the input stream, 1106pushing back strings must be done back-to-front. 1107.Pp 1108An important potential problem when using 1109.Fn unput 1110is that if using 1111.Dq %pointer 1112.Pq the default , 1113a call to 1114.Fn unput 1115destroys the contents of 1116.Fa yytext , 1117starting with its rightmost character and devouring one character to 1118the left with each call. 1119If the value of 1120.Fa yytext 1121should be preserved after a call to 1122.Fn unput 1123.Pq as in the above example , 1124it must either first be copied elsewhere, or the scanner must be built using 1125.Dq %array 1126instead (see 1127.Sx HOW THE INPUT IS MATCHED ) . 1128.Pp 1129Finally, note that EOF cannot be put back 1130to attempt to mark the input stream with an end-of-file. 1131.It input() 1132Reads the next character from the input stream. 1133For example, the following is one way to eat up C comments: 1134.Bd -literal -offset indent 1135%% 1136"/*" { 1137 int c; 1138 1139 for (;;) { 1140 while ((c = input()) != '*' && c != EOF) 1141 ; /* eat up text of comment */ 1142 1143 if (c == '*') { 1144 while ((c = input()) == '*') 1145 ; 1146 if (c == '/') 1147 break; /* found the end */ 1148 } 1149 1150 if (c == EOF) { 1151 errx(1, "EOF in comment"); 1152 break; 1153 } 1154 } 1155} 1156.Ed 1157.Pp 1158(Note that if the scanner is compiled using C++, then 1159.Fn input 1160is instead referred to as 1161.Fn yyinput , 1162in order to avoid a name clash with the C++ stream by the name of input.) 1163.It YY_FLUSH_BUFFER 1164Flushes the scanner's internal buffer 1165so that the next time the scanner attempts to match a token, 1166it will first refill the buffer using 1167.Dv YY_INPUT 1168(see 1169.Sx THE GENERATED SCANNER , 1170below). 1171This action is a special case of the more general 1172.Fn yy_flush_buffer 1173function, described below in the section 1174.Sx MULTIPLE INPUT BUFFERS . 1175.It yyterminate() 1176Can be used in lieu of a return statement in an action. 1177It terminates the scanner and returns a 0 to the scanner's caller, indicating 1178.Qq all done . 1179By default, 1180.Fn yyterminate 1181is also called when an end-of-file is encountered. 1182It is a macro and may be redefined. 1183.El 1184.Sh THE GENERATED SCANNER 1185The output of 1186.Nm 1187is the file 1188.Pa lex.yy.c , 1189which contains the scanning routine 1190.Fn yylex , 1191a number of tables used by it for matching tokens, 1192and a number of auxiliary routines and macros. 1193By default, 1194.Fn yylex 1195is declared as follows: 1196.Bd -unfilled -offset indent 1197int yylex() 1198{ 1199 ... various definitions and the actions in here ... 1200} 1201.Ed 1202.Pp 1203(If the environment supports function prototypes, then it will 1204be "int yylex(void)".) 1205This definition may be changed by defining the 1206.Dv YY_DECL 1207macro. 1208For example: 1209.Bd -literal -offset indent 1210#define YY_DECL float lexscan(a, b) float a, b; 1211.Ed 1212.Pp 1213would give the scanning routine the name 1214.Em lexscan , 1215returning a float, and taking two floats as arguments. 1216Note that if arguments are given to the scanning routine using a 1217K&R-style/non-prototyped function declaration, 1218the definition must be terminated with a semi-colon 1219.Pq Sq ;\& . 1220.Pp 1221Whenever 1222.Fn yylex 1223is called, it scans tokens from the global input file 1224.Pa yyin 1225.Pq which defaults to stdin . 1226It continues until it either reaches an end-of-file 1227.Pq at which point it returns the value 0 1228or one of its actions executes a 1229.Em return 1230statement. 1231.Pp 1232If the scanner reaches an end-of-file, subsequent calls are undefined 1233unless either 1234.Em yyin 1235is pointed at a new input file 1236.Pq in which case scanning continues from that file , 1237or 1238.Fn yyrestart 1239is called. 1240.Fn yyrestart 1241takes one argument, a 1242.Fa FILE * 1243pointer (which can be nil, if 1244.Dv YY_INPUT 1245has been set up to scan from a source other than 1246.Em yyin ) , 1247and initializes 1248.Em yyin 1249for scanning from that file. 1250Essentially there is no difference between just assigning 1251.Em yyin 1252to a new input file or using 1253.Fn yyrestart 1254to do so; the latter is available for compatibility with previous versions of 1255.Nm , 1256and because it can be used to switch input files in the middle of scanning. 1257It can also be used to throw away the current input buffer, 1258by calling it with an argument of 1259.Em yyin ; 1260but better is to use 1261.Dv YY_FLUSH_BUFFER 1262.Pq see above . 1263Note that 1264.Fn yyrestart 1265does not reset the start condition to 1266.Em INITIAL 1267(see 1268.Sx START CONDITIONS , 1269below). 1270.Pp 1271If 1272.Fn yylex 1273stops scanning due to executing a 1274.Em return 1275statement in one of the actions, the scanner may then be called again and it 1276will resume scanning where it left off. 1277.Pp 1278By default 1279.Pq and for purposes of efficiency , 1280the scanner uses block-reads rather than simple 1281.Xr getc 3 1282calls to read characters from 1283.Em yyin . 1284The nature of how it gets its input can be controlled by defining the 1285.Dv YY_INPUT 1286macro. 1287.Dv YY_INPUT Ns 's 1288calling sequence is 1289.Qq YY_INPUT(buf,result,max_size) . 1290Its action is to place up to 1291.Dv max_size 1292characters in the character array 1293.Em buf 1294and return in the integer variable 1295.Em result 1296either the number of characters read or the constant 1297.Dv YY_NULL 1298(0 on 1299.Ux 1300systems) 1301to indicate 1302.Dv EOF . 1303The default 1304.Dv YY_INPUT 1305reads from the global file-pointer 1306.Qq yyin . 1307.Pp 1308A sample definition of 1309.Dv YY_INPUT 1310.Pq in the definitions section of the input file : 1311.Bd -unfilled -offset indent 1312%{ 1313#define YY_INPUT(buf,result,max_size) \e 1314{ \e 1315 int c = getchar(); \e 1316 result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \e 1317} 1318%} 1319.Ed 1320.Pp 1321This definition will change the input processing to occur 1322one character at a time. 1323.Pp 1324When the scanner receives an end-of-file indication from 1325.Dv YY_INPUT , 1326it then checks the 1327.Fn yywrap 1328function. 1329If 1330.Fn yywrap 1331returns false 1332.Pq zero , 1333then it is assumed that the function has gone ahead and set up 1334.Em yyin 1335to point to another input file, and scanning continues. 1336If it returns true 1337.Pq non-zero , 1338then the scanner terminates, returning 0 to its caller. 1339Note that in either case, the start condition remains unchanged; 1340it does not revert to 1341.Em INITIAL . 1342.Pp 1343If you do not supply your own version of 1344.Fn yywrap , 1345then you must either use 1346.Dq %option noyywrap 1347(in which case the scanner behaves as though 1348.Fn yywrap 1349returned 1), or you must link with 1350.Fl lfl 1351to obtain the default version of the routine, which always returns 1. 1352.Pp 1353Three routines are available for scanning from in-memory buffers rather 1354than files: 1355.Fn yy_scan_string , 1356.Fn yy_scan_bytes , 1357and 1358.Fn yy_scan_buffer . 1359See the discussion of them below in the section 1360.Sx MULTIPLE INPUT BUFFERS . 1361.Pp 1362The scanner writes its 1363.Em ECHO 1364output to the 1365.Em yyout 1366global 1367.Pq default, stdout , 1368which may be redefined by the user simply by assigning it to some other 1369.Va FILE 1370pointer. 1371.Sh START CONDITIONS 1372.Nm 1373provides a mechanism for conditionally activating rules. 1374Any rule whose pattern is prefixed with 1375.Qq Aq sc 1376will only be active when the scanner is in the start condition named 1377.Qq sc . 1378For example, 1379.Bd -literal -offset indent 1380<STRING>[^"]* { /* eat up the string body ... */ 1381 ... 1382} 1383.Ed 1384.Pp 1385will be active only when the scanner is in the 1386.Qq STRING 1387start condition, and 1388.Bd -literal -offset indent 1389<INITIAL,STRING,QUOTE>\e. { /* handle an escape ... */ 1390 ... 1391} 1392.Ed 1393.Pp 1394will be active only when the current start condition is either 1395.Qq INITIAL , 1396.Qq STRING , 1397or 1398.Qq QUOTE . 1399.Pp 1400Start conditions are declared in the definitions 1401.Pq first 1402section of the input using unindented lines beginning with either 1403.Sq %s 1404or 1405.Sq %x 1406followed by a list of names. 1407The former declares 1408.Em inclusive 1409start conditions, the latter 1410.Em exclusive 1411start conditions. 1412A start condition is activated using the 1413.Em BEGIN 1414action. 1415Until the next 1416.Em BEGIN 1417action is executed, rules with the given start condition will be active and 1418rules with other start conditions will be inactive. 1419If the start condition is inclusive, 1420then rules with no start conditions at all will also be active. 1421If it is exclusive, 1422then only rules qualified with the start condition will be active. 1423A set of rules contingent on the same exclusive start condition 1424describe a scanner which is independent of any of the other rules in the 1425.Nm 1426input. 1427Because of this, exclusive start conditions make it easy to specify 1428.Qq mini-scanners 1429which scan portions of the input that are syntactically different 1430from the rest 1431.Pq e.g., comments . 1432.Pp 1433If the distinction between inclusive and exclusive start conditions 1434is still a little vague, here's a simple example illustrating the 1435connection between the two. 1436The set of rules: 1437.Bd -literal -offset indent 1438%s example 1439%% 1440 1441<example>foo do_something(); 1442 1443bar something_else(); 1444.Ed 1445.Pp 1446is equivalent to 1447.Bd -literal -offset indent 1448%x example 1449%% 1450 1451<example>foo do_something(); 1452 1453<INITIAL,example>bar something_else(); 1454.Ed 1455.Pp 1456Without the 1457.Aq INITIAL,example 1458qualifier, the 1459.Dq bar 1460pattern in the second example wouldn't be active 1461.Pq i.e., couldn't match 1462when in start condition 1463.Dq example . 1464If we just used 1465.Aq example 1466to qualify 1467.Dq bar , 1468though, then it would only be active in 1469.Dq example 1470and not in 1471.Em INITIAL , 1472while in the first example it's active in both, 1473because in the first example the 1474.Dq example 1475start condition is an inclusive 1476.Pq Sq %s 1477start condition. 1478.Pp 1479Also note that the special start-condition specifier 1480.Sq Aq * 1481matches every start condition. 1482Thus, the above example could also have been written: 1483.Bd -literal -offset indent 1484%x example 1485%% 1486 1487<example>foo do_something(); 1488 1489<*>bar something_else(); 1490.Ed 1491.Pp 1492The default rule (to 1493.Em ECHO 1494any unmatched character) remains active in start conditions. 1495It is equivalent to: 1496.Bd -literal -offset indent 1497<*>.|\en ECHO; 1498.Ed 1499.Pp 1500.Dq BEGIN(0) 1501returns to the original state where only the rules with 1502no start conditions are active. 1503This state can also be referred to as the start-condition 1504.Em INITIAL , 1505so 1506.Dq BEGIN(INITIAL) 1507is equivalent to 1508.Dq BEGIN(0) . 1509(The parentheses around the start condition name are not required but 1510are considered good style.) 1511.Pp 1512.Em BEGIN 1513actions can also be given as indented code at the beginning 1514of the rules section. 1515For example, the following will cause the scanner to enter the 1516.Qq SPECIAL 1517start condition whenever 1518.Fn yylex 1519is called and the global variable 1520.Fa enter_special 1521is true: 1522.Bd -literal -offset indent 1523int enter_special; 1524 1525%x SPECIAL 1526%% 1527 if (enter_special) 1528 BEGIN(SPECIAL); 1529 1530<SPECIAL>blahblahblah 1531\&...more rules follow... 1532.Ed 1533.Pp 1534To illustrate the uses of start conditions, 1535here is a scanner which provides two different interpretations 1536of a string like 1537.Qq 123.456 . 1538By default it will treat it as three tokens: the integer 1539.Qq 123 , 1540a dot 1541.Pq Sq .\& , 1542and the integer 1543.Qq 456 . 1544But if the string is preceded earlier in the line by the string 1545.Qq expect-floats 1546it will treat it as a single token, the floating-point number 123.456: 1547.Bd -literal -offset indent 1548%{ 1549#include <math.h> 1550%} 1551%s expect 1552 1553%% 1554expect-floats BEGIN(expect); 1555 1556<expect>[0-9]+"."[0-9]+ { 1557 printf("found a float, = %f\en", 1558 atof(yytext)); 1559} 1560<expect>\en { 1561 /* 1562 * That's the end of the line, so 1563 * we need another "expect-number" 1564 * before we'll recognize any more 1565 * numbers. 1566 */ 1567 BEGIN(INITIAL); 1568} 1569 1570[0-9]+ { 1571 printf("found an integer, = %d\en", 1572 atoi(yytext)); 1573} 1574 1575"." printf("found a dot\en"); 1576.Ed 1577.Pp 1578Here is a scanner which recognizes 1579.Pq and discards 1580C comments while maintaining a count of the current input line: 1581.Bd -literal -offset indent 1582%x comment 1583%% 1584int line_num = 1; 1585 1586"/*" BEGIN(comment); 1587 1588<comment>[^*\en]* /* eat anything that's not a '*' */ 1589<comment>"*"+[^*/\en]* /* eat up '*'s not followed by '/'s */ 1590<comment>\en ++line_num; 1591<comment>"*"+"/" BEGIN(INITIAL); 1592.Ed 1593.Pp 1594This scanner goes to a bit of trouble to match as much 1595text as possible with each rule. 1596In general, when attempting to write a high-speed scanner 1597try to match as much as possible in each rule, as it's a big win. 1598.Pp 1599Note that start-condition names are really integer values and 1600can be stored as such. 1601Thus, the above could be extended in the following fashion: 1602.Bd -literal -offset indent 1603%x comment foo 1604%% 1605int line_num = 1; 1606int comment_caller; 1607 1608"/*" { 1609 comment_caller = INITIAL; 1610 BEGIN(comment); 1611} 1612 1613\&... 1614 1615<foo>"/*" { 1616 comment_caller = foo; 1617 BEGIN(comment); 1618} 1619 1620<comment>[^*\en]* /* eat anything that's not a '*' */ 1621<comment>"*"+[^*/\en]* /* eat up '*'s not followed by '/'s */ 1622<comment>\en ++line_num; 1623<comment>"*"+"/" BEGIN(comment_caller); 1624.Ed 1625.Pp 1626Furthermore, the current start condition can be accessed by using 1627the integer-valued 1628.Dv YY_START 1629macro. 1630For example, the above assignments to 1631.Em comment_caller 1632could instead be written 1633.Pp 1634.Dl comment_caller = YY_START; 1635.Pp 1636Flex provides 1637.Dv YYSTATE 1638as an alias for 1639.Dv YY_START 1640(since that is what's used by 1641.At 1642.Nm lex ) . 1643.Pp 1644Note that start conditions do not have their own name-space; 1645%s's and %x's declare names in the same fashion as #define's. 1646.Pp 1647Finally, here's an example of how to match C-style quoted strings using 1648exclusive start conditions, including expanded escape sequences 1649(but not including checking for a string that's too long): 1650.Bd -literal -offset indent 1651%x str 1652 1653%% 1654#define MAX_STR_CONST 1024 1655char string_buf[MAX_STR_CONST]; 1656char *string_buf_ptr; 1657 1658\e" string_buf_ptr = string_buf; BEGIN(str); 1659 1660<str>\e" { /* saw closing quote - all done */ 1661 BEGIN(INITIAL); 1662 *string_buf_ptr = '\e0'; 1663 /* 1664 * return string constant token type and 1665 * value to parser 1666 */ 1667} 1668 1669<str>\en { 1670 /* error - unterminated string constant */ 1671 /* generate error message */ 1672} 1673 1674<str>\e\e[0-7]{1,3} { 1675 /* octal escape sequence */ 1676 int result; 1677 1678 (void) sscanf(yytext + 1, "%o", &result); 1679 1680 if (result > 0xff) { 1681 /* error, constant is out-of-bounds */ 1682 } else 1683 *string_buf_ptr++ = result; 1684} 1685 1686<str>\e\e[0-9]+ { 1687 /* 1688 * generate error - bad escape sequence; something 1689 * like '\e48' or '\e0777777' 1690 */ 1691} 1692 1693<str>\e\en *string_buf_ptr++ = '\en'; 1694<str>\e\et *string_buf_ptr++ = '\et'; 1695<str>\e\er *string_buf_ptr++ = '\er'; 1696<str>\e\eb *string_buf_ptr++ = '\eb'; 1697<str>\e\ef *string_buf_ptr++ = '\ef'; 1698 1699<str>\e\e(.|\en) *string_buf_ptr++ = yytext[1]; 1700 1701<str>[^\e\e\en\e"]+ { 1702 char *yptr = yytext; 1703 1704 while (*yptr) 1705 *string_buf_ptr++ = *yptr++; 1706} 1707.Ed 1708.Pp 1709Often, such as in some of the examples above, 1710a whole bunch of rules are all preceded by the same start condition(s). 1711.Nm 1712makes this a little easier and cleaner by introducing a notion of 1713start condition 1714.Em scope . 1715A start condition scope is begun with: 1716.Pp 1717.Dl <SCs>{ 1718.Pp 1719where 1720.Dq SCs 1721is a list of one or more start conditions. 1722Inside the start condition scope, every rule automatically has the prefix 1723.Aq SCs 1724applied to it, until a 1725.Sq } 1726which matches the initial 1727.Sq { . 1728So, for example, 1729.Bd -literal -offset indent 1730<ESC>{ 1731 "\e\en" return '\en'; 1732 "\e\er" return '\er'; 1733 "\e\ef" return '\ef'; 1734 "\e\e0" return '\e0'; 1735} 1736.Ed 1737.Pp 1738is equivalent to: 1739.Bd -literal -offset indent 1740<ESC>"\e\en" return '\en'; 1741<ESC>"\e\er" return '\er'; 1742<ESC>"\e\ef" return '\ef'; 1743<ESC>"\e\e0" return '\e0'; 1744.Ed 1745.Pp 1746Start condition scopes may be nested. 1747.Pp 1748Three routines are available for manipulating stacks of start conditions: 1749.Bl -tag -width Ds 1750.It void yy_push_state(int new_state) 1751Pushes the current start condition onto the top of the start condition 1752stack and switches to 1753.Fa new_state 1754as though 1755.Dq BEGIN new_state 1756had been used 1757.Pq recall that start condition names are also integers . 1758.It void yy_pop_state() 1759Pops the top of the stack and switches to it via 1760.Em BEGIN . 1761.It int yy_top_state() 1762Returns the top of the stack without altering the stack's contents. 1763.El 1764.Pp 1765The start condition stack grows dynamically and so has no built-in 1766size limitation. 1767If memory is exhausted, program execution aborts. 1768.Pp 1769To use start condition stacks, scanners must include a 1770.Dq %option stack 1771directive (see 1772.Sx OPTIONS 1773below). 1774.Sh MULTIPLE INPUT BUFFERS 1775Some scanners 1776(such as those which support 1777.Qq include 1778files) 1779require reading from several input streams. 1780As 1781.Nm 1782scanners do a large amount of buffering, one cannot control 1783where the next input will be read from by simply writing a 1784.Dv YY_INPUT 1785which is sensitive to the scanning context. 1786.Dv YY_INPUT 1787is only called when the scanner reaches the end of its buffer, which 1788may be a long time after scanning a statement such as an 1789.Qq include 1790which requires switching the input source. 1791.Pp 1792To negotiate these sorts of problems, 1793.Nm 1794provides a mechanism for creating and switching between multiple 1795input buffers. 1796An input buffer is created by using: 1797.Pp 1798.D1 YY_BUFFER_STATE yy_create_buffer(FILE *file, int size) 1799.Pp 1800which takes a 1801.Fa FILE 1802pointer and a 1803.Fa size 1804and creates a buffer associated with the given file and large enough to hold 1805.Fa size 1806characters (when in doubt, use 1807.Dv YY_BUF_SIZE 1808for the size). 1809It returns a 1810.Dv YY_BUFFER_STATE 1811handle, which may then be passed to other routines 1812.Pq see below . 1813The 1814.Dv YY_BUFFER_STATE 1815type is a pointer to an opaque 1816.Dq struct yy_buffer_state 1817structure, so 1818.Dv YY_BUFFER_STATE 1819variables may be safely initialized to 1820.Dq ((YY_BUFFER_STATE) 0) 1821if desired, and the opaque structure can also be referred to in order to 1822correctly declare input buffers in source files other than that of scanners. 1823Note that the 1824.Fa FILE 1825pointer in the call to 1826.Fn yy_create_buffer 1827is only used as the value of 1828.Fa yyin 1829seen by 1830.Dv YY_INPUT ; 1831if 1832.Dv YY_INPUT 1833is redefined so that it no longer uses 1834.Fa yyin , 1835then a nil 1836.Fa FILE 1837pointer can safely be passed to 1838.Fn yy_create_buffer . 1839To select a particular buffer to scan: 1840.Pp 1841.D1 void yy_switch_to_buffer(YY_BUFFER_STATE new_buffer) 1842.Pp 1843It switches the scanner's input buffer so subsequent tokens will 1844come from 1845.Fa new_buffer . 1846Note that 1847.Fn yy_switch_to_buffer 1848may be used by 1849.Fn yywrap 1850to set things up for continued scanning, 1851instead of opening a new file and pointing 1852.Fa yyin 1853at it. 1854Note also that switching input sources via either 1855.Fn yy_switch_to_buffer 1856or 1857.Fn yywrap 1858does not change the start condition. 1859.Pp 1860.D1 void yy_delete_buffer(YY_BUFFER_STATE buffer) 1861.Pp 1862is used to reclaim the storage associated with a buffer. 1863.Pf ( Fa buffer 1864can be nil, in which case the routine does nothing.) 1865To clear the current contents of a buffer: 1866.Pp 1867.D1 void yy_flush_buffer(YY_BUFFER_STATE buffer) 1868.Pp 1869This function discards the buffer's contents, 1870so the next time the scanner attempts to match a token from the buffer, 1871it will first fill the buffer anew using 1872.Dv YY_INPUT . 1873.Pp 1874.Fn yy_new_buffer 1875is an alias for 1876.Fn yy_create_buffer , 1877provided for compatibility with the C++ use of 1878.Em new 1879and 1880.Em delete 1881for creating and destroying dynamic objects. 1882.Pp 1883Finally, the 1884.Dv YY_CURRENT_BUFFER 1885macro returns a 1886.Dv YY_BUFFER_STATE 1887handle to the current buffer. 1888.Pp 1889Here is an example of using these features for writing a scanner 1890which expands include files (the 1891.Aq Aq EOF 1892feature is discussed below): 1893.Bd -literal -offset indent 1894/* 1895 * the "incl" state is used for picking up the name 1896 * of an include file 1897 */ 1898%x incl 1899 1900%{ 1901#define MAX_INCLUDE_DEPTH 10 1902YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH]; 1903int include_stack_ptr = 0; 1904%} 1905 1906%% 1907include BEGIN(incl); 1908 1909[a-z]+ ECHO; 1910[^a-z\en]*\en? ECHO; 1911 1912<incl>[ \et]* /* eat the whitespace */ 1913<incl>[^ \et\en]+ { /* got the include file name */ 1914 if (include_stack_ptr >= MAX_INCLUDE_DEPTH) 1915 errx(1, "Includes nested too deeply"); 1916 1917 include_stack[include_stack_ptr++] = 1918 YY_CURRENT_BUFFER; 1919 1920 yyin = fopen(yytext, "r"); 1921 1922 if (yyin == NULL) 1923 err(1, NULL); 1924 1925 yy_switch_to_buffer( 1926 yy_create_buffer(yyin, YY_BUF_SIZE)); 1927 1928 BEGIN(INITIAL); 1929} 1930 1931<<EOF>> { 1932 if (--include_stack_ptr < 0) 1933 yyterminate(); 1934 else { 1935 yy_delete_buffer(YY_CURRENT_BUFFER); 1936 yy_switch_to_buffer( 1937 include_stack[include_stack_ptr]); 1938 } 1939} 1940.Ed 1941.Pp 1942Three routines are available for setting up input buffers for 1943scanning in-memory strings instead of files. 1944All of them create a new input buffer for scanning the string, 1945and return a corresponding 1946.Dv YY_BUFFER_STATE 1947handle (which should be deleted afterwards using 1948.Fn yy_delete_buffer ) . 1949They also switch to the new buffer using 1950.Fn yy_switch_to_buffer , 1951so the next call to 1952.Fn yylex 1953will start scanning the string. 1954.Bl -tag -width Ds 1955.It yy_scan_string(const char *str) 1956Scans a NUL-terminated string. 1957.It yy_scan_bytes(const char *bytes, int len) 1958Scans 1959.Fa len 1960bytes 1961.Pq including possibly NUL's 1962starting at location 1963.Fa bytes . 1964.El 1965.Pp 1966Note that both of these functions create and scan a copy 1967of the string or bytes. 1968(This may be desirable, since 1969.Fn yylex 1970modifies the contents of the buffer it is scanning.) 1971The copy can be avoided by using: 1972.Bl -tag -width Ds 1973.It yy_scan_buffer(char *base, yy_size_t size) 1974Which scans the buffer starting at 1975.Fa base , 1976consisting of 1977.Fa size 1978bytes, the last two bytes of which must be 1979.Dv YY_END_OF_BUFFER_CHAR 1980.Pq ASCII NUL . 1981These last two bytes are not scanned; thus, scanning consists of 1982base[0] through base[size-2], inclusive. 1983.Pp 1984If 1985.Fa base 1986is not set up in this manner 1987(i.e., forget the final two 1988.Dv YY_END_OF_BUFFER_CHAR 1989bytes), then 1990.Fn yy_scan_buffer 1991returns a nil pointer instead of creating a new input buffer. 1992.Pp 1993The type 1994.Fa yy_size_t 1995is an integral type which can be cast to an integer expression 1996reflecting the size of the buffer. 1997.El 1998.Sh END-OF-FILE RULES 1999The special rule 2000.Qq Aq Aq EOF 2001indicates actions which are to be taken when an end-of-file is encountered and 2002.Fn yywrap 2003returns non-zero 2004.Pq i.e., indicates no further files to process . 2005The action must finish by doing one of four things: 2006.Bl -dash 2007.It 2008Assigning 2009.Em yyin 2010to a new input file 2011(in previous versions of 2012.Nm , 2013after doing the assignment, it was necessary to call the special action 2014.Dv YY_NEW_FILE ; 2015this is no longer necessary). 2016.It 2017Executing a 2018.Em return 2019statement. 2020.It 2021Executing the special 2022.Fn yyterminate 2023action. 2024.It 2025Switching to a new buffer using 2026.Fn yy_switch_to_buffer 2027as shown in the example above. 2028.El 2029.Pp 2030.Aq Aq EOF 2031rules may not be used with other patterns; 2032they may only be qualified with a list of start conditions. 2033If an unqualified 2034.Aq Aq EOF 2035rule is given, it applies to all start conditions which do not already have 2036.Aq Aq EOF 2037actions. 2038To specify an 2039.Aq Aq EOF 2040rule for only the initial start condition, use 2041.Pp 2042.Dl <INITIAL><<EOF>> 2043.Pp 2044These rules are useful for catching things like unclosed comments. 2045An example: 2046.Bd -literal -offset indent 2047%x quote 2048%% 2049 2050\&...other rules for dealing with quotes... 2051 2052<quote><<EOF>> { 2053 error("unterminated quote"); 2054 yyterminate(); 2055} 2056<<EOF>> { 2057 if (*++filelist) 2058 yyin = fopen(*filelist, "r"); 2059 else 2060 yyterminate(); 2061} 2062.Ed 2063.Sh MISCELLANEOUS MACROS 2064The macro 2065.Dv YY_USER_ACTION 2066can be defined to provide an action 2067which is always executed prior to the matched rule's action. 2068For example, 2069it could be #define'd to call a routine to convert yytext to lower-case. 2070When 2071.Dv YY_USER_ACTION 2072is invoked, the variable 2073.Fa yy_act 2074gives the number of the matched rule 2075.Pq rules are numbered starting with 1 . 2076For example, to profile how often each rule is matched, 2077the following would do the trick: 2078.Pp 2079.Dl #define YY_USER_ACTION ++ctr[yy_act] 2080.Pp 2081where 2082.Fa ctr 2083is an array to hold the counts for the different rules. 2084Note that the macro 2085.Dv YY_NUM_RULES 2086gives the total number of rules 2087(including the default rule, even if 2088.Fl s 2089is used), 2090so a correct declaration for 2091.Fa ctr 2092is: 2093.Pp 2094.Dl int ctr[YY_NUM_RULES]; 2095.Pp 2096The macro 2097.Dv YY_USER_INIT 2098may be defined to provide an action which is always executed before 2099the first scan 2100.Pq and before the scanner's internal initializations are done . 2101For example, it could be used to call a routine to read 2102in a data table or open a logging file. 2103.Pp 2104The macro 2105.Dv yy_set_interactive(is_interactive) 2106can be used to control whether the current buffer is considered 2107.Em interactive . 2108An interactive buffer is processed more slowly, 2109but must be used when the scanner's input source is indeed 2110interactive to avoid problems due to waiting to fill buffers 2111(see the discussion of the 2112.Fl I 2113flag below). 2114A non-zero value in the macro invocation marks the buffer as interactive, 2115a zero value as non-interactive. 2116Note that use of this macro overrides 2117.Dq %option always-interactive 2118or 2119.Dq %option never-interactive 2120(see 2121.Sx OPTIONS 2122below). 2123.Fn yy_set_interactive 2124must be invoked prior to beginning to scan the buffer that is 2125.Pq or is not 2126to be considered interactive. 2127.Pp 2128The macro 2129.Dv yy_set_bol(at_bol) 2130can be used to control whether the current buffer's scanning 2131context for the next token match is done as though at the 2132beginning of a line. 2133A non-zero macro argument makes rules anchored with 2134.Sq ^ 2135active, while a zero argument makes 2136.Sq ^ 2137rules inactive. 2138.Pp 2139The macro 2140.Dv YY_AT_BOL 2141returns true if the next token scanned from the current buffer will have 2142.Sq ^ 2143rules active, false otherwise. 2144.Pp 2145In the generated scanner, the actions are all gathered in one large 2146switch statement and separated using 2147.Dv YY_BREAK , 2148which may be redefined. 2149By default, it is simply a 2150.Qq break , 2151to separate each rule's action from the following rules. 2152Redefining 2153.Dv YY_BREAK 2154allows, for example, C++ users to 2155.Dq #define YY_BREAK 2156to do nothing 2157(while being very careful that every rule ends with a 2158.Qq break 2159or a 2160.Qq return ! ) 2161to avoid suffering from unreachable statement warnings where because a rule's 2162action ends with 2163.Dq return , 2164the 2165.Dv YY_BREAK 2166is inaccessible. 2167.Sh VALUES AVAILABLE TO THE USER 2168This section summarizes the various values available to the user 2169in the rule actions. 2170.Bl -tag -width Ds 2171.It char *yytext 2172Holds the text of the current token. 2173It may be modified but not lengthened 2174.Pq characters cannot be appended to the end . 2175.Pp 2176If the special directive 2177.Dq %array 2178appears in the first section of the scanner description, then 2179.Fa yytext 2180is instead declared 2181.Dq char yytext[YYLMAX] , 2182where 2183.Dv YYLMAX 2184is a macro definition that can be redefined in the first section 2185to change the default value 2186.Pq generally 8KB . 2187Using 2188.Dq %array 2189results in somewhat slower scanners, but the value of 2190.Fa yytext 2191becomes immune to calls to 2192.Fn input 2193and 2194.Fn unput , 2195which potentially destroy its value when 2196.Fa yytext 2197is a character pointer. 2198The opposite of 2199.Dq %array 2200is 2201.Dq %pointer , 2202which is the default. 2203.Pp 2204.Dq %array 2205cannot be used when generating C++ scanner classes 2206(the 2207.Fl + 2208flag). 2209.It int yyleng 2210Holds the length of the current token. 2211.It FILE *yyin 2212Is the file which by default 2213.Nm 2214reads from. 2215It may be redefined, but doing so only makes sense before 2216scanning begins or after an 2217.Dv EOF 2218has been encountered. 2219Changing it in the midst of scanning will have unexpected results since 2220.Nm 2221buffers its input; use 2222.Fn yyrestart 2223instead. 2224Once scanning terminates because an end-of-file 2225has been seen, 2226.Fa yyin 2227can be assigned as the new input file 2228and the scanner can be called again to continue scanning. 2229.It void yyrestart(FILE *new_file) 2230May be called to point 2231.Fa yyin 2232at the new input file. 2233The switch-over to the new file is immediate 2234.Pq any previously buffered-up input is lost . 2235Note that calling 2236.Fn yyrestart 2237with 2238.Fa yyin 2239as an argument thus throws away the current input buffer and continues 2240scanning the same input file. 2241.It FILE *yyout 2242Is the file to which 2243.Em ECHO 2244actions are done. 2245It can be reassigned by the user. 2246.It YY_CURRENT_BUFFER 2247Returns a 2248.Dv YY_BUFFER_STATE 2249handle to the current buffer. 2250.It YY_START 2251Returns an integer value corresponding to the current start condition. 2252This value can subsequently be used with 2253.Em BEGIN 2254to return to that start condition. 2255.El 2256.Sh INTERFACING WITH YACC 2257One of the main uses of 2258.Nm 2259is as a companion to the 2260.Xr yacc 1 2261parser-generator. 2262yacc parsers expect to call a routine named 2263.Fn yylex 2264to find the next input token. 2265The routine is supposed to return the type of the next token 2266as well as putting any associated value in the global 2267.Fa yylval , 2268which is defined externally, 2269and can be a union or any other complex data structure. 2270To use 2271.Nm 2272with yacc, one specifies the 2273.Fl d 2274option to yacc to instruct it to generate the file 2275.Pa y.tab.h 2276containing definitions of all the 2277.Dq %tokens 2278appearing in the yacc input. 2279This file is then included in the 2280.Nm 2281scanner. 2282For example, if one of the tokens is 2283.Qq TOK_NUMBER , 2284part of the scanner might look like: 2285.Bd -literal -offset indent 2286%{ 2287#include "y.tab.h" 2288%} 2289 2290%% 2291 2292[0-9]+ yylval = atoi(yytext); return TOK_NUMBER; 2293.Ed 2294.Sh OPTIONS 2295.Nm 2296has the following options: 2297.Bl -tag -width Ds 2298.It Fl 7 2299Instructs 2300.Nm 2301to generate a 7-bit scanner, i.e., one which can only recognize 7-bit 2302characters in its input. 2303The advantage of using 2304.Fl 7 2305is that the scanner's tables can be up to half the size of those generated 2306using the 2307.Fl 8 2308option 2309.Pq see below . 2310The disadvantage is that such scanners often hang 2311or crash if their input contains an 8-bit character. 2312.Pp 2313Note, however, that unless generating a scanner using the 2314.Fl Cf 2315or 2316.Fl CF 2317table compression options, use of 2318.Fl 7 2319will save only a small amount of table space, 2320and make the scanner considerably less portable. 2321.Nm flex Ns 's 2322default behavior is to generate an 8-bit scanner unless 2323.Fl Cf 2324or 2325.Fl CF 2326is specified, in which case 2327.Nm 2328defaults to generating 7-bit scanners unless it was 2329configured to generate 8-bit scanners 2330(as will often be the case with non-USA sites). 2331It is possible tell whether 2332.Nm 2333generated a 7-bit or an 8-bit scanner by inspecting the flag summary in the 2334.Fl v 2335output as described below. 2336.Pp 2337Note that if 2338.Fl Cfe 2339or 2340.Fl CFe 2341are used 2342(the table compression options, but also using equivalence classes as 2343discussed below), 2344.Nm 2345still defaults to generating an 8-bit scanner, 2346since usually with these compression options full 8-bit tables 2347are not much more expensive than 7-bit tables. 2348.It Fl 8 2349Instructs 2350.Nm 2351to generate an 8-bit scanner, i.e., one which can recognize 8-bit 2352characters. 2353This flag is only needed for scanners generated using 2354.Fl Cf 2355or 2356.Fl CF , 2357as otherwise 2358.Nm 2359defaults to generating an 8-bit scanner anyway. 2360.Pp 2361See the discussion of 2362.Fl 7 2363above for 2364.Nm flex Ns 's 2365default behavior and the tradeoffs between 7-bit and 8-bit scanners. 2366.It Fl B 2367Instructs 2368.Nm 2369to generate a 2370.Em batch 2371scanner, the opposite of 2372.Em interactive 2373scanners generated by 2374.Fl I 2375.Pq see below . 2376In general, 2377.Fl B 2378is used when the scanner will never be used interactively, 2379and you want to squeeze a little more performance out of it. 2380If the aim is instead to squeeze out a lot more performance, 2381use the 2382.Fl Cf 2383or 2384.Fl CF 2385options 2386.Pq discussed below , 2387which turn on 2388.Fl B 2389automatically anyway. 2390.It Fl b 2391Generate backing-up information to 2392.Pa lex.backup . 2393This is a list of scanner states which require backing up 2394and the input characters on which they do so. 2395By adding rules one can remove backing-up states. 2396If all backing-up states are eliminated and 2397.Fl Cf 2398or 2399.Fl CF 2400is used, the generated scanner will run faster (see the 2401.Fl p 2402flag). 2403Only users who wish to squeeze every last cycle out of their 2404scanners need worry about this option. 2405(See the section on 2406.Sx PERFORMANCE CONSIDERATIONS 2407below.) 2408.It Fl C Ns Op Cm aeFfmr 2409Controls the degree of table compression and, more generally, trade-offs 2410between small scanners and fast scanners. 2411.Bl -tag -width Ds 2412.It Fl Ca 2413Instructs 2414.Nm 2415to trade off larger tables in the generated scanner for faster performance 2416because the elements of the tables are better aligned for memory access 2417and computation. 2418On some 2419.Tn RISC 2420architectures, fetching and manipulating longwords is more efficient 2421than with smaller-sized units such as shortwords. 2422This option can double the size of the tables used by the scanner. 2423.It Fl Ce 2424Directs 2425.Nm 2426to construct 2427.Em equivalence classes , 2428i.e., sets of characters which have identical lexical properties 2429(for example, if the only appearance of digits in the 2430.Nm 2431input is in the character class 2432.Qq [0-9] 2433then the digits 2434.Sq 0 , 2435.Sq 1 , 2436.Sq ... , 2437.Sq 9 2438will all be put in the same equivalence class). 2439Equivalence classes usually give dramatic reductions in the final 2440table/object file sizes 2441.Pq typically a factor of 2\-5 2442and are pretty cheap performance-wise 2443.Pq one array look-up per character scanned . 2444.It Fl CF 2445Specifies that the alternate fast scanner representation 2446(described below under the 2447.Fl F 2448option) 2449should be used. 2450This option cannot be used with 2451.Fl + . 2452.It Fl Cf 2453Specifies that the 2454.Em full 2455scanner tables should be generated \- 2456.Nm 2457should not compress the tables by taking advantage of 2458similar transition functions for different states. 2459.It Fl \&Cm 2460Directs 2461.Nm 2462to construct 2463.Em meta-equivalence classes , 2464which are sets of equivalence classes 2465(or characters, if equivalence classes are not being used) 2466that are commonly used together. 2467Meta-equivalence classes are often a big win when using compressed tables, 2468but they have a moderate performance impact 2469(one or two 2470.Qq if 2471tests and one array look-up per character scanned). 2472.It Fl Cr 2473Causes the generated scanner to 2474.Em bypass 2475use of the standard I/O library 2476.Pq stdio 2477for input. 2478Instead of calling 2479.Xr fread 3 2480or 2481.Xr getc 3 , 2482the scanner will use the 2483.Xr read 2 2484system call, 2485resulting in a performance gain which varies from system to system, 2486but in general is probably negligible unless 2487.Fl Cf 2488or 2489.Fl CF 2490are being used. 2491Using 2492.Fl Cr 2493can cause strange behavior if, for example, reading from 2494.Fa yyin 2495using stdio prior to calling the scanner 2496(because the scanner will miss whatever text previous reads left 2497in the stdio input buffer). 2498.Pp 2499.Fl Cr 2500has no effect if 2501.Dv YY_INPUT 2502is defined 2503(see 2504.Sx THE GENERATED SCANNER 2505above). 2506.El 2507.Pp 2508A lone 2509.Fl C 2510specifies that the scanner tables should be compressed but neither 2511equivalence classes nor meta-equivalence classes should be used. 2512.Pp 2513The options 2514.Fl Cf 2515or 2516.Fl CF 2517and 2518.Fl \&Cm 2519do not make sense together \- there is no opportunity for meta-equivalence 2520classes if the table is not being compressed. 2521Otherwise the options may be freely mixed, and are cumulative. 2522.Pp 2523The default setting is 2524.Fl Cem 2525which specifies that 2526.Nm 2527should generate equivalence classes and meta-equivalence classes. 2528This setting provides the highest degree of table compression. 2529It is possible to trade off faster-executing scanners at the cost of 2530larger tables with the following generally being true: 2531.Bd -unfilled -offset indent 2532slowest & smallest 2533 -Cem 2534 -Cm 2535 -Ce 2536 -C 2537 -C{f,F}e 2538 -C{f,F} 2539 -C{f,F}a 2540fastest & largest 2541.Ed 2542.Pp 2543Note that scanners with the smallest tables are usually generated and 2544compiled the quickest, 2545so during development the default is usually best, 2546maximal compression. 2547.Pp 2548.Fl Cfe 2549is often a good compromise between speed and size for production scanners. 2550.It Fl d 2551Makes the generated scanner run in debug mode. 2552Whenever a pattern is recognized and the global 2553.Fa yy_flex_debug 2554is non-zero 2555.Pq which is the default , 2556the scanner will write to stderr a line of the form: 2557.Pp 2558.D1 --accepting rule at line 53 ("the matched text") 2559.Pp 2560The line number refers to the location of the rule in the file 2561defining the scanner 2562(i.e., the file that was fed to 2563.Nm ) . 2564Messages are also generated when the scanner backs up, 2565accepts the default rule, 2566reaches the end of its input buffer 2567(or encounters a NUL; 2568at this point, the two look the same as far as the scanner's concerned), 2569or reaches an end-of-file. 2570.It Fl F 2571Specifies that the fast scanner table representation should be used 2572.Pq and stdio bypassed . 2573This representation is about as fast as the full table representation 2574.Pq Fl f , 2575and for some sets of patterns will be considerably smaller 2576.Pq and for others, larger . 2577In general, if the pattern set contains both 2578.Qq keywords 2579and a catch-all, 2580.Qq identifier 2581rule, such as in the set: 2582.Bd -unfilled -offset indent 2583"case" return TOK_CASE; 2584"switch" return TOK_SWITCH; 2585\&... 2586"default" return TOK_DEFAULT; 2587[a-z]+ return TOK_ID; 2588.Ed 2589.Pp 2590then it's better to use the full table representation. 2591If only the 2592.Qq identifier 2593rule is present and a hash table or some such is used to detect the keywords, 2594it's better to use 2595.Fl F . 2596.Pp 2597This option is equivalent to 2598.Fl CFr 2599.Pq see above . 2600It cannot be used with 2601.Fl + . 2602.It Fl f 2603Specifies 2604.Em fast scanner . 2605No table compression is done and stdio is bypassed. 2606The result is large but fast. 2607This option is equivalent to 2608.Fl Cfr 2609.Pq see above . 2610.It Fl h 2611Generates a help summary of 2612.Nm flex Ns 's 2613options to stdout and then exits. 2614.Fl ?\& 2615and 2616.Fl Fl help 2617are synonyms for 2618.Fl h . 2619.It Fl I 2620Instructs 2621.Nm 2622to generate an 2623.Em interactive 2624scanner. 2625An interactive scanner is one that only looks ahead to decide 2626what token has been matched if it absolutely must. 2627It turns out that always looking one extra character ahead, 2628even if the scanner has already seen enough text 2629to disambiguate the current token, is a bit faster than 2630only looking ahead when necessary. 2631But scanners that always look ahead give dreadful interactive performance; 2632for example, when a user types a newline, 2633it is not recognized as a newline token until they enter 2634.Em another 2635token, which often means typing in another whole line. 2636.Pp 2637.Nm 2638scanners default to 2639.Em interactive 2640unless 2641.Fl Cf 2642or 2643.Fl CF 2644table-compression options are specified 2645.Pq see above . 2646That's because if high-performance is most important, 2647one of these options should be used, 2648so if they weren't, 2649.Nm 2650assumes it is preferable to trade off a bit of run-time performance for 2651intuitive interactive behavior. 2652Note also that 2653.Fl I 2654cannot be used in conjunction with 2655.Fl Cf 2656or 2657.Fl CF . 2658Thus, this option is not really needed; it is on by default for all those 2659cases in which it is allowed. 2660.Pp 2661A scanner can be forced to not be interactive by using 2662.Fl B 2663.Pq see above . 2664.It Fl i 2665Instructs 2666.Nm 2667to generate a case-insensitive scanner. 2668The case of letters given in the 2669.Nm 2670input patterns will be ignored, 2671and tokens in the input will be matched regardless of case. 2672The matched text given in 2673.Fa yytext 2674will have the preserved case 2675.Pq i.e., it will not be folded . 2676.It Fl L 2677Instructs 2678.Nm 2679not to generate 2680.Dq #line 2681directives. 2682Without this option, 2683.Nm 2684peppers the generated scanner with #line directives so error messages 2685in the actions will be correctly located with respect to either the original 2686.Nm 2687input file 2688(if the errors are due to code in the input file), 2689or 2690.Pa lex.yy.c 2691(if the errors are 2692.Nm flex Ns 's 2693fault \- these sorts of errors should be reported to the email address 2694given below). 2695.It Fl l 2696Turns on maximum compatibility with the original 2697.At 2698.Nm lex 2699implementation. 2700Note that this does not mean full compatibility. 2701Use of this option costs a considerable amount of performance, 2702and it cannot be used with the 2703.Fl + , f , F , Cf , 2704or 2705.Fl CF 2706options. 2707For details on the compatibilities it provides, see the section 2708.Sx INCOMPATIBILITIES WITH LEX AND POSIX 2709below. 2710This option also results in the name 2711.Dv YY_FLEX_LEX_COMPAT 2712being #define'd in the generated scanner. 2713.It Fl n 2714Another do-nothing, deprecated option included only for 2715.Tn POSIX 2716compliance. 2717.It Fl o Ns Ar output 2718Directs 2719.Nm 2720to write the scanner to the file 2721.Ar output 2722instead of 2723.Pa lex.yy.c . 2724If 2725.Fl o 2726is combined with the 2727.Fl t 2728option, then the scanner is written to stdout but its 2729.Dq #line 2730directives 2731(see the 2732.Fl L 2733option above) 2734refer to the file 2735.Ar output . 2736.It Fl P Ns Ar prefix 2737Changes the default 2738.Qq yy 2739prefix used by 2740.Nm 2741for all globally visible variable and function names to instead be 2742.Ar prefix . 2743For example, 2744.Fl P Ns Ar foo 2745changes the name of 2746.Fa yytext 2747to 2748.Fa footext . 2749It also changes the name of the default output file from 2750.Pa lex.yy.c 2751to 2752.Pa lex.foo.c . 2753Here are all of the names affected: 2754.Bd -unfilled -offset indent 2755yy_create_buffer 2756yy_delete_buffer 2757yy_flex_debug 2758yy_init_buffer 2759yy_flush_buffer 2760yy_load_buffer_state 2761yy_switch_to_buffer 2762yyin 2763yyleng 2764yylex 2765yylineno 2766yyout 2767yyrestart 2768yytext 2769yywrap 2770.Ed 2771.Pp 2772(If using a C++ scanner, then only 2773.Fa yywrap 2774and 2775.Fa yyFlexLexer 2776are affected.) 2777Within the scanner itself, it is still possible to refer to the global variables 2778and functions using either version of their name; but externally, they 2779have the modified name. 2780.Pp 2781This option allows multiple 2782.Nm 2783programs to be easily linked together into the same executable. 2784Note, though, that using this option also renames 2785.Fn yywrap , 2786so now either an 2787.Pq appropriately named 2788version of the routine for the scanner must be supplied, or 2789.Dq %option noyywrap 2790must be used, as linking with 2791.Fl lfl 2792no longer provides one by default. 2793.It Fl p 2794Generates a performance report to stderr. 2795The report consists of comments regarding features of the 2796.Nm 2797input file which will cause a serious loss of performance in the resulting 2798scanner. 2799If the flag is specified twice, 2800comments regarding features that lead to minor performance losses 2801will also be reported> 2802.Pp 2803Note that the use of 2804.Em REJECT , 2805.Dq %option yylineno , 2806and variable trailing context 2807(see the 2808.Sx BUGS 2809section below) 2810entails a substantial performance penalty; use of 2811.Fn yymore , 2812the 2813.Sq ^ 2814operator, and the 2815.Fl I 2816flag entail minor performance penalties. 2817.It Fl S Ns Ar skeleton 2818Overrides the default skeleton file from which 2819.Nm 2820constructs its scanners. 2821This option is needed only for 2822.Nm 2823maintenance or development. 2824.It Fl s 2825Causes the default rule 2826.Pq that unmatched scanner input is echoed to stdout 2827to be suppressed. 2828If the scanner encounters input that does not 2829match any of its rules, it aborts with an error. 2830This option is useful for finding holes in a scanner's rule set. 2831.It Fl T 2832Makes 2833.Nm 2834run in 2835.Em trace 2836mode. 2837It will generate a lot of messages to stderr concerning 2838the form of the input and the resultant non-deterministic and deterministic 2839finite automata. 2840This option is mostly for use in maintaining 2841.Nm . 2842.It Fl t 2843Instructs 2844.Nm 2845to write the scanner it generates to standard output instead of 2846.Pa lex.yy.c . 2847.It Fl V 2848Prints the version number to stdout and exits. 2849.Fl Fl version 2850is a synonym for 2851.Fl V . 2852.It Fl v 2853Specifies that 2854.Nm 2855should write to stderr 2856a summary of statistics regarding the scanner it generates. 2857Most of the statistics are meaningless to the casual 2858.Nm 2859user, but the first line identifies the version of 2860.Nm 2861(same as reported by 2862.Fl V ) , 2863and the next line the flags used when generating the scanner, 2864including those that are on by default. 2865.It Fl w 2866Suppresses warning messages. 2867.It Fl + 2868Specifies that 2869.Nm 2870should generate a C++ scanner class. 2871See the section on 2872.Sx GENERATING C++ SCANNERS 2873below for details. 2874.El 2875.Pp 2876.Nm 2877also provides a mechanism for controlling options within the 2878scanner specification itself, rather than from the 2879.Nm 2880command line. 2881This is done by including 2882.Dq %option 2883directives in the first section of the scanner specification. 2884Multiple options can be specified with a single 2885.Dq %option 2886directive, and multiple directives in the first section of the 2887.Nm 2888input file. 2889.Pp 2890Most options are given simply as names, optionally preceded by the word 2891.Qq no 2892.Pq with no intervening whitespace 2893to negate their meaning. 2894A number are equivalent to 2895.Nm 2896flags or their negation: 2897.Bd -unfilled -offset indent 28987bit -7 option 28998bit -8 option 2900align -Ca option 2901backup -b option 2902batch -B option 2903c++ -+ option 2904 2905caseful or 2906case-sensitive opposite of -i (default) 2907 2908case-insensitive or 2909caseless -i option 2910 2911debug -d option 2912default opposite of -s option 2913ecs -Ce option 2914fast -F option 2915full -f option 2916interactive -I option 2917lex-compat -l option 2918meta-ecs -Cm option 2919perf-report -p option 2920read -Cr option 2921stdout -t option 2922verbose -v option 2923warn opposite of -w option 2924 (use "%option nowarn" for -w) 2925 2926array equivalent to "%array" 2927pointer equivalent to "%pointer" (default) 2928.Ed 2929.Pp 2930Some %option's provide features otherwise not available: 2931.Bl -tag -width Ds 2932.It always-interactive 2933Instructs 2934.Nm 2935to generate a scanner which always considers its input 2936.Qq interactive . 2937Normally, on each new input file the scanner calls 2938.Fn isatty 2939in an attempt to determine whether the scanner's input source is interactive 2940and thus should be read a character at a time. 2941When this option is used, however, no such call is made. 2942.It main 2943Directs 2944.Nm 2945to provide a default 2946.Fn main 2947program for the scanner, which simply calls 2948.Fn yylex . 2949This option implies 2950.Dq noyywrap 2951.Pq see below . 2952.It never-interactive 2953Instructs 2954.Nm 2955to generate a scanner which never considers its input 2956.Qq interactive 2957(again, no call made to 2958.Fn isatty ) . 2959This is the opposite of 2960.Dq always-interactive . 2961.It stack 2962Enables the use of start condition stacks 2963(see 2964.Sx START CONDITIONS 2965above). 2966.It stdinit 2967If set (i.e., 2968.Dq %option stdinit ) , 2969initializes 2970.Fa yyin 2971and 2972.Fa yyout 2973to stdin and stdout, instead of the default of 2974.Dq nil . 2975Some existing 2976.Nm lex 2977programs depend on this behavior, even though it is not compliant with ANSI C, 2978which does not require stdin and stdout to be compile-time constant. 2979.It yylineno 2980Directs 2981.Nm 2982to generate a scanner that maintains the number of the current line 2983read from its input in the global variable 2984.Fa yylineno . 2985This option is implied by 2986.Dq %option lex-compat . 2987.It yywrap 2988If unset (i.e., 2989.Dq %option noyywrap ) , 2990makes the scanner not call 2991.Fn yywrap 2992upon an end-of-file, but simply assume that there are no more files to scan 2993(until the user points 2994.Fa yyin 2995at a new file and calls 2996.Fn yylex 2997again). 2998.El 2999.Pp 3000.Nm 3001scans rule actions to determine whether the 3002.Em REJECT 3003or 3004.Fn yymore 3005features are being used. 3006The 3007.Dq reject 3008and 3009.Dq yymore 3010options are available to override its decision as to whether to use the 3011options, either by setting them (e.g., 3012.Dq %option reject ) 3013to indicate the feature is indeed used, 3014or unsetting them to indicate it actually is not used 3015(e.g., 3016.Dq %option noyymore ) . 3017.Pp 3018Three options take string-delimited values, offset with 3019.Sq = : 3020.Pp 3021.D1 %option outfile="ABC" 3022.Pp 3023is equivalent to 3024.Fl o Ns Ar ABC , 3025and 3026.Pp 3027.D1 %option prefix="XYZ" 3028.Pp 3029is equivalent to 3030.Fl P Ns Ar XYZ . 3031Finally, 3032.Pp 3033.D1 %option yyclass="foo" 3034.Pp 3035only applies when generating a C++ scanner 3036.Pf ( Fl + 3037option). 3038It informs 3039.Nm 3040that 3041.Dq foo 3042has been derived as a subclass of yyFlexLexer, so 3043.Nm 3044will place actions in the member function 3045.Dq foo::yylex() 3046instead of 3047.Dq yyFlexLexer::yylex() . 3048It also generates a 3049.Dq yyFlexLexer::yylex() 3050member function that emits a run-time error (by invoking 3051.Dq yyFlexLexer::LexerError() ) 3052if called. 3053See 3054.Sx GENERATING C++ SCANNERS , 3055below, for additional information. 3056.Pp 3057A number of options are available for 3058lint 3059purists who want to suppress the appearance of unneeded routines 3060in the generated scanner. 3061Each of the following, if unset 3062(e.g., 3063.Dq %option nounput ) , 3064results in the corresponding routine not appearing in the generated scanner: 3065.Bd -unfilled -offset indent 3066input, unput 3067yy_push_state, yy_pop_state, yy_top_state 3068yy_scan_buffer, yy_scan_bytes, yy_scan_string 3069.Ed 3070.Pp 3071(though 3072.Fn yy_push_state 3073and friends won't appear anyway unless 3074.Dq %option stack 3075is being used). 3076.Sh PERFORMANCE CONSIDERATIONS 3077The main design goal of 3078.Nm 3079is that it generate high-performance scanners. 3080It has been optimized for dealing well with large sets of rules. 3081Aside from the effects on scanner speed of the table compression 3082.Fl C 3083options outlined above, 3084there are a number of options/actions which degrade performance. 3085These are, from most expensive to least: 3086.Bd -unfilled -offset indent 3087REJECT 3088%option yylineno 3089arbitrary trailing context 3090 3091pattern sets that require backing up 3092%array 3093%option interactive 3094%option always-interactive 3095 3096\&'^' beginning-of-line operator 3097yymore() 3098.Ed 3099.Pp 3100with the first three all being quite expensive 3101and the last two being quite cheap. 3102Note also that 3103.Fn unput 3104is implemented as a routine call that potentially does quite a bit of work, 3105while 3106.Fn yyless 3107is a quite-cheap macro; so if just putting back some excess text, 3108use 3109.Fn yyless . 3110.Pp 3111.Em REJECT 3112should be avoided at all costs when performance is important. 3113It is a particularly expensive option. 3114.Pp 3115Getting rid of backing up is messy and often may be an enormous 3116amount of work for a complicated scanner. 3117In principal, one begins by using the 3118.Fl b 3119flag to generate a 3120.Pa lex.backup 3121file. 3122For example, on the input 3123.Bd -literal -offset indent 3124%% 3125foo return TOK_KEYWORD; 3126foobar return TOK_KEYWORD; 3127.Ed 3128.Pp 3129the file looks like: 3130.Bd -literal -offset indent 3131State #6 is non-accepting - 3132 associated rule line numbers: 3133 2 3 3134 out-transitions: [ o ] 3135 jam-transitions: EOF [ \e001-n p-\e177 ] 3136 3137State #8 is non-accepting - 3138 associated rule line numbers: 3139 3 3140 out-transitions: [ a ] 3141 jam-transitions: EOF [ \e001-` b-\e177 ] 3142 3143State #9 is non-accepting - 3144 associated rule line numbers: 3145 3 3146 out-transitions: [ r ] 3147 jam-transitions: EOF [ \e001-q s-\e177 ] 3148 3149Compressed tables always back up. 3150.Ed 3151.Pp 3152The first few lines tell us that there's a scanner state in 3153which it can make a transition on an 3154.Sq o 3155but not on any other character, 3156and that in that state the currently scanned text does not match any rule. 3157The state occurs when trying to match the rules found 3158at lines 2 and 3 in the input file. 3159If the scanner is in that state and then reads something other than an 3160.Sq o , 3161it will have to back up to find a rule which is matched. 3162With a bit of headscratching one can see that this must be the 3163state it's in when it has seen 3164.Sq fo . 3165When this has happened, if anything other than another 3166.Sq o 3167is seen, the scanner will have to back up to simply match the 3168.Sq f 3169.Pq by the default rule . 3170.Pp 3171The comment regarding State #8 indicates there's a problem when 3172.Qq foob 3173has been scanned. 3174Indeed, on any character other than an 3175.Sq a , 3176the scanner will have to back up to accept 3177.Qq foo . 3178Similarly, the comment for State #9 concerns when 3179.Qq fooba 3180has been scanned and an 3181.Sq r 3182does not follow. 3183.Pp 3184The final comment reminds us that there's no point going to 3185all the trouble of removing backing up from the rules unless we're using 3186.Fl Cf 3187or 3188.Fl CF , 3189since there's no performance gain doing so with compressed scanners. 3190.Pp 3191The way to remove the backing up is to add 3192.Qq error 3193rules: 3194.Bd -literal -offset indent 3195%% 3196foo return TOK_KEYWORD; 3197foobar return TOK_KEYWORD; 3198 3199fooba | 3200foob | 3201fo { 3202 /* false alarm, not really a keyword */ 3203 return TOK_ID; 3204} 3205.Ed 3206.Pp 3207Eliminating backing up among a list of keywords can also be done using a 3208.Qq catch-all 3209rule: 3210.Bd -literal -offset indent 3211%% 3212foo return TOK_KEYWORD; 3213foobar return TOK_KEYWORD; 3214 3215[a-z]+ return TOK_ID; 3216.Ed 3217.Pp 3218This is usually the best solution when appropriate. 3219.Pp 3220Backing up messages tend to cascade. 3221With a complicated set of rules it's not uncommon to get hundreds of messages. 3222If one can decipher them, though, 3223it often only takes a dozen or so rules to eliminate the backing up 3224(though it's easy to make a mistake and have an error rule accidentally match 3225a valid token; a possible future 3226.Nm 3227feature will be to automatically add rules to eliminate backing up). 3228.Pp 3229It's important to keep in mind that the benefits of eliminating 3230backing up are gained only if 3231.Em every 3232instance of backing up is eliminated. 3233Leaving just one gains nothing. 3234.Pp 3235.Em Variable 3236trailing context 3237(where both the leading and trailing parts do not have a fixed length) 3238entails almost the same performance loss as 3239.Em REJECT 3240.Pq i.e., substantial . 3241So when possible a rule like: 3242.Bd -literal -offset indent 3243%% 3244mouse|rat/(cat|dog) run(); 3245.Ed 3246.Pp 3247is better written: 3248.Bd -literal -offset indent 3249%% 3250mouse/cat|dog run(); 3251rat/cat|dog run(); 3252.Ed 3253.Pp 3254or as 3255.Bd -literal -offset indent 3256%% 3257mouse|rat/cat run(); 3258mouse|rat/dog run(); 3259.Ed 3260.Pp 3261Note that here the special 3262.Sq |\& 3263action does not provide any savings, and can even make things worse (see 3264.Sx BUGS 3265below). 3266.Pp 3267Another area where the user can increase a scanner's performance 3268.Pq and one that's easier to implement 3269arises from the fact that the longer the tokens matched, 3270the faster the scanner will run. 3271This is because with long tokens the processing of most input 3272characters takes place in the 3273.Pq short 3274inner scanning loop, and does not often have to go through the additional work 3275of setting up the scanning environment (e.g., 3276.Fa yytext ) 3277for the action. 3278Recall the scanner for C comments: 3279.Bd -literal -offset indent 3280%x comment 3281%% 3282int line_num = 1; 3283 3284"/*" BEGIN(comment); 3285 3286<comment>[^*\en]* 3287<comment>"*"+[^*/\en]* 3288<comment>\en ++line_num; 3289<comment>"*"+"/" BEGIN(INITIAL); 3290.Ed 3291.Pp 3292This could be sped up by writing it as: 3293.Bd -literal -offset indent 3294%x comment 3295%% 3296int line_num = 1; 3297 3298"/*" BEGIN(comment); 3299 3300<comment>[^*\en]* 3301<comment>[^*\en]*\en ++line_num; 3302<comment>"*"+[^*/\en]* 3303<comment>"*"+[^*/\en]*\en ++line_num; 3304<comment>"*"+"/" BEGIN(INITIAL); 3305.Ed 3306.Pp 3307Now instead of each newline requiring the processing of another action, 3308recognizing the newlines is 3309.Qq distributed 3310over the other rules to keep the matched text as long as possible. 3311Note that adding rules does 3312.Em not 3313slow down the scanner! 3314The speed of the scanner is independent of the number of rules or 3315(modulo the considerations given at the beginning of this section) 3316how complicated the rules are with regard to operators such as 3317.Sq * 3318and 3319.Sq |\& . 3320.Pp 3321A final example in speeding up a scanner: 3322scan through a file containing identifiers and keywords, one per line 3323and with no other extraneous characters, and recognize all the keywords. 3324A natural first approach is: 3325.Bd -literal -offset indent 3326%% 3327asm | 3328auto | 3329break | 3330\&... etc ... 3331volatile | 3332while /* it's a keyword */ 3333 3334\&.|\en /* it's not a keyword */ 3335.Ed 3336.Pp 3337To eliminate the back-tracking, introduce a catch-all rule: 3338.Bd -literal -offset indent 3339%% 3340asm | 3341auto | 3342break | 3343\&... etc ... 3344volatile | 3345while /* it's a keyword */ 3346 3347[a-z]+ | 3348\&.|\en /* it's not a keyword */ 3349.Ed 3350.Pp 3351Now, if it's guaranteed that there's exactly one word per line, 3352then we can reduce the total number of matches by a half by 3353merging in the recognition of newlines with that of the other tokens: 3354.Bd -literal -offset indent 3355%% 3356asm\en | 3357auto\en | 3358break\en | 3359\&... etc ... 3360volatile\en | 3361while\en /* it's a keyword */ 3362 3363[a-z]+\en | 3364\&.|\en /* it's not a keyword */ 3365.Ed 3366.Pp 3367One has to be careful here, 3368as we have now reintroduced backing up into the scanner. 3369In particular, while we know that there will never be any characters 3370in the input stream other than letters or newlines, 3371.Nm 3372can't figure this out, and it will plan for possibly needing to back up 3373when it has scanned a token like 3374.Qq auto 3375and then the next character is something other than a newline or a letter. 3376Previously it would then just match the 3377.Qq auto 3378rule and be done, but now it has no 3379.Qq auto 3380rule, only an 3381.Qq auto\en 3382rule. 3383To eliminate the possibility of backing up, 3384we could either duplicate all rules but without final newlines, or, 3385since we never expect to encounter such an input and therefore don't 3386how it's classified, we can introduce one more catch-all rule, 3387this one which doesn't include a newline: 3388.Bd -literal -offset indent 3389%% 3390asm\en | 3391auto\en | 3392break\en | 3393\&... etc ... 3394volatile\en | 3395while\en /* it's a keyword */ 3396 3397[a-z]+\en | 3398[a-z]+ | 3399\&.|\en /* it's not a keyword */ 3400.Ed 3401.Pp 3402Compiled with 3403.Fl Cf , 3404this is about as fast as one can get a 3405.Nm 3406scanner to go for this particular problem. 3407.Pp 3408A final note: 3409.Nm 3410is slow when matching NUL's, 3411particularly when a token contains multiple NUL's. 3412It's best to write rules which match short 3413amounts of text if it's anticipated that the text will often include NUL's. 3414.Pp 3415Another final note regarding performance: as mentioned above in the section 3416.Sx HOW THE INPUT IS MATCHED , 3417dynamically resizing 3418.Fa yytext 3419to accommodate huge tokens is a slow process because it presently requires that 3420the 3421.Pq huge 3422token be rescanned from the beginning. 3423Thus if performance is vital, it is better to attempt to match 3424.Qq large 3425quantities of text but not 3426.Qq huge 3427quantities, where the cutoff between the two is at about 8K characters/token. 3428.Sh GENERATING C++ SCANNERS 3429.Nm 3430provides two different ways to generate scanners for use with C++. 3431The first way is to simply compile a scanner generated by 3432.Nm 3433using a C++ compiler instead of a C compiler. 3434This should not generate any compilation errors 3435(please report any found to the email address given in the 3436.Sx AUTHORS 3437section below). 3438C++ code can then be used in rule actions instead of C code. 3439Note that the default input source for scanners remains 3440.Fa yyin , 3441and default echoing is still done to 3442.Fa yyout . 3443Both of these remain 3444.Fa FILE * 3445variables and not C++ streams. 3446.Pp 3447.Nm 3448can also be used to generate a C++ scanner class, using the 3449.Fl + 3450option (or, equivalently, 3451.Dq %option c++ ) , 3452which is automatically specified if the name of the flex executable ends in a 3453.Sq + , 3454such as 3455.Nm flex++ . 3456When using this option, 3457.Nm 3458defaults to generating the scanner to the file 3459.Pa lex.yy.cc 3460instead of 3461.Pa lex.yy.c . 3462The generated scanner includes the header file 3463.Aq Pa g++/FlexLexer.h , 3464which defines the interface to two C++ classes. 3465.Pp 3466The first class, 3467.Em FlexLexer , 3468provides an abstract base class defining the general scanner class interface. 3469It provides the following member functions: 3470.Bl -tag -width Ds 3471.It const char* YYText() 3472Returns the text of the most recently matched token, the equivalent of 3473.Fa yytext . 3474.It int YYLeng() 3475Returns the length of the most recently matched token, the equivalent of 3476.Fa yyleng . 3477.It int lineno() const 3478Returns the current input line number 3479(see 3480.Dq %option yylineno ) , 3481or 1 if 3482.Dq %option yylineno 3483was not used. 3484.It void set_debug(int flag) 3485Sets the debugging flag for the scanner, equivalent to assigning to 3486.Fa yy_flex_debug 3487(see the 3488.Sx OPTIONS 3489section above). 3490Note that the scanner must be built using 3491.Dq %option debug 3492to include debugging information in it. 3493.It int debug() const 3494Returns the current setting of the debugging flag. 3495.El 3496.Pp 3497Also provided are member functions equivalent to 3498.Fn yy_switch_to_buffer , 3499.Fn yy_create_buffer 3500(though the first argument is an 3501.Fa std::istream* 3502object pointer and not a 3503.Fa FILE* ) , 3504.Fn yy_flush_buffer , 3505.Fn yy_delete_buffer , 3506and 3507.Fn yyrestart 3508(again, the first argument is an 3509.Fa std::istream* 3510object pointer). 3511.Pp 3512The second class defined in 3513.Aq Pa g++/FlexLexer.h 3514is 3515.Fa yyFlexLexer , 3516which is derived from 3517.Fa FlexLexer . 3518It defines the following additional member functions: 3519.Bl -tag -width Ds 3520.It "yyFlexLexer(std::istream* arg_yyin = 0, std::ostream* arg_yyout = 0)" 3521Constructs a 3522.Fa yyFlexLexer 3523object using the given streams for input and output. 3524If not specified, the streams default to 3525.Fa cin 3526and 3527.Fa cout , 3528respectively. 3529.It virtual int yylex() 3530Performs the same role as 3531.Fn yylex 3532does for ordinary flex scanners: it scans the input stream, consuming 3533tokens, until a rule's action returns a value. 3534If subclass 3535.Sq S 3536is derived from 3537.Fa yyFlexLexer , 3538in order to access the member functions and variables of 3539.Sq S 3540inside 3541.Fn yylex , 3542use 3543.Dq %option yyclass="S" 3544to inform 3545.Nm 3546that the 3547.Sq S 3548subclass will be used instead of 3549.Fa yyFlexLexer . 3550In this case, rather than generating 3551.Dq yyFlexLexer::yylex() , 3552.Nm 3553generates 3554.Dq S::yylex() 3555(and also generates a dummy 3556.Dq yyFlexLexer::yylex() 3557that calls 3558.Dq yyFlexLexer::LexerError() 3559if called). 3560.It "virtual void switch_streams(std::istream* new_in = 0, std::ostream* new_out = 0)" 3561Reassigns 3562.Fa yyin 3563to 3564.Fa new_in 3565.Pq if non-nil 3566and 3567.Fa yyout 3568to 3569.Fa new_out 3570.Pq ditto , 3571deleting the previous input buffer if 3572.Fa yyin 3573is reassigned. 3574.It int yylex(std::istream* new_in, std::ostream* new_out = 0) 3575First switches the input streams via 3576.Dq switch_streams(new_in, new_out) 3577and then returns the value of 3578.Fn yylex . 3579.El 3580.Pp 3581In addition, 3582.Fa yyFlexLexer 3583defines the following protected virtual functions which can be redefined 3584in derived classes to tailor the scanner: 3585.Bl -tag -width Ds 3586.It virtual int LexerInput(char* buf, int max_size) 3587Reads up to 3588.Fa max_size 3589characters into 3590.Fa buf 3591and returns the number of characters read. 3592To indicate end-of-input, return 0 characters. 3593Note that 3594.Qq interactive 3595scanners (see the 3596.Fl B 3597and 3598.Fl I 3599flags) define the macro 3600.Dv YY_INTERACTIVE . 3601If 3602.Fn LexerInput 3603has been redefined, and it's necessary to take different actions depending on 3604whether or not the scanner might be scanning an interactive input source, 3605it's possible to test for the presence of this name via 3606.Dq #ifdef . 3607.It virtual void LexerOutput(const char* buf, int size) 3608Writes out 3609.Fa size 3610characters from the buffer 3611.Fa buf , 3612which, while NUL-terminated, may also contain 3613.Qq internal 3614NUL's if the scanner's rules can match text with NUL's in them. 3615.It virtual void LexerError(const char* msg) 3616Reports a fatal error message. 3617The default version of this function writes the message to the stream 3618.Fa cerr 3619and exits. 3620.El 3621.Pp 3622Note that a 3623.Fa yyFlexLexer 3624object contains its entire scanning state. 3625Thus such objects can be used to create reentrant scanners. 3626Multiple instances of the same 3627.Fa yyFlexLexer 3628class can be instantiated, and multiple C++ scanner classes can be combined 3629in the same program using the 3630.Fl P 3631option discussed above. 3632.Pp 3633Finally, note that the 3634.Dq %array 3635feature is not available to C++ scanner classes; 3636.Dq %pointer 3637must be used 3638.Pq the default . 3639.Pp 3640Here is an example of a simple C++ scanner: 3641.Bd -literal -offset indent 3642// An example of using the flex C++ scanner class. 3643 3644%{ 3645#include <errno.h> 3646int mylineno = 0; 3647%} 3648 3649string \e"[^\en"]+\e" 3650 3651ws [ \et]+ 3652 3653alpha [A-Za-z] 3654dig [0-9] 3655name ({alpha}|{dig}|\e$)({alpha}|{dig}|[_.\e-/$])* 3656num1 [-+]?{dig}+\e.?([eE][-+]?{dig}+)? 3657num2 [-+]?{dig}*\e.{dig}+([eE][-+]?{dig}+)? 3658number {num1}|{num2} 3659 3660%% 3661 3662{ws} /* skip blanks and tabs */ 3663 3664"/*" { 3665 int c; 3666 3667 while ((c = yyinput()) != 0) { 3668 if(c == '\en') 3669 ++mylineno; 3670 else if(c == '*') { 3671 if ((c = yyinput()) == '/') 3672 break; 3673 else 3674 unput(c); 3675 } 3676 } 3677} 3678 3679{number} cout << "number " << YYText() << '\en'; 3680 3681\en mylineno++; 3682 3683{name} cout << "name " << YYText() << '\en'; 3684 3685{string} cout << "string " << YYText() << '\en'; 3686 3687%% 3688 3689int main(int /* argc */, char** /* argv */) 3690{ 3691 FlexLexer* lexer = new yyFlexLexer; 3692 while(lexer->yylex() != 0) 3693 ; 3694 return 0; 3695} 3696.Ed 3697.Pp 3698To create multiple 3699.Pq different 3700lexer classes, use the 3701.Fl P 3702flag 3703(or the 3704.Dq prefix= 3705option) 3706to rename each 3707.Fa yyFlexLexer 3708to some other 3709.Fa xxFlexLexer . 3710.Aq Pa g++/FlexLexer.h 3711can then be included in other sources once per lexer class, first renaming 3712.Fa yyFlexLexer 3713as follows: 3714.Bd -literal -offset indent 3715#undef yyFlexLexer 3716#define yyFlexLexer xxFlexLexer 3717#include <g++/FlexLexer.h> 3718 3719#undef yyFlexLexer 3720#define yyFlexLexer zzFlexLexer 3721#include <g++/FlexLexer.h> 3722.Ed 3723.Pp 3724If, for example, 3725.Dq %option prefix="xx" 3726is used for one scanner and 3727.Dq %option prefix="zz" 3728is used for the other. 3729.Pp 3730.Sy IMPORTANT : 3731the present form of the scanning class is experimental 3732and may change considerably between major releases. 3733.Sh INCOMPATIBILITIES WITH LEX AND POSIX 3734.Nm 3735is a rewrite of the 3736.At 3737.Nm lex 3738tool 3739(the two implementations do not share any code, though), 3740with some extensions and incompatibilities, both of which are of concern 3741to those who wish to write scanners acceptable to either implementation. 3742.Nm 3743is fully compliant with the 3744.Tn POSIX 3745.Nm lex 3746specification, except that when using 3747.Dq %pointer 3748.Pq the default , 3749a call to 3750.Fn unput 3751destroys the contents of 3752.Fa yytext , 3753which is counter to the 3754.Tn POSIX 3755specification. 3756.Pp 3757In this section we discuss all of the known areas of incompatibility between 3758.Nm , 3759.At 3760.Nm lex , 3761and the 3762.Tn POSIX 3763specification. 3764.Pp 3765.Nm flex Ns 's 3766.Fl l 3767option turns on maximum compatibility with the original 3768.At 3769.Nm lex 3770implementation, at the cost of a major loss in the generated scanner's 3771performance. 3772We note below which incompatibilities can be overcome using the 3773.Fl l 3774option. 3775.Pp 3776.Nm 3777is fully compatible with 3778.Nm lex 3779with the following exceptions: 3780.Bl -dash 3781.It 3782The undocumented 3783.Nm lex 3784scanner internal variable 3785.Fa yylineno 3786is not supported unless 3787.Fl l 3788or 3789.Dq %option yylineno 3790is used. 3791.Pp 3792.Fa yylineno 3793should be maintained on a per-buffer basis, rather than a per-scanner 3794.Pq single global variable 3795basis. 3796.Pp 3797.Fa yylineno 3798is not part of the 3799.Tn POSIX 3800specification. 3801.It 3802The 3803.Fn input 3804routine is not redefinable, though it may be called to read characters 3805following whatever has been matched by a rule. 3806If 3807.Fn input 3808encounters an end-of-file, the normal 3809.Fn yywrap 3810processing is done. 3811A 3812.Dq real 3813end-of-file is returned by 3814.Fn input 3815as 3816.Dv EOF . 3817.Pp 3818Input is instead controlled by defining the 3819.Dv YY_INPUT 3820macro. 3821.Pp 3822The 3823.Nm 3824restriction that 3825.Fn input 3826cannot be redefined is in accordance with the 3827.Tn POSIX 3828specification, which simply does not specify any way of controlling the 3829scanner's input other than by making an initial assignment to 3830.Fa yyin . 3831.It 3832The 3833.Fn unput 3834routine is not redefinable. 3835This restriction is in accordance with 3836.Tn POSIX . 3837.It 3838.Nm 3839scanners are not as reentrant as 3840.Nm lex 3841scanners. 3842In particular, if a scanner is interactive and 3843an interrupt handler long-jumps out of the scanner, 3844and the scanner is subsequently called again, 3845the following error message may be displayed: 3846.Pp 3847.D1 fatal flex scanner internal error--end of buffer missed 3848.Pp 3849To reenter the scanner, first use 3850.Pp 3851.Dl yyrestart(yyin); 3852.Pp 3853Note that this call will throw away any buffered input; 3854usually this isn't a problem with an interactive scanner. 3855.Pp 3856Also note that flex C++ scanner classes are reentrant, 3857so if using C++ is an option , they should be used instead. 3858See 3859.Sx GENERATING C++ SCANNERS 3860above for details. 3861.It 3862.Fn output 3863is not supported. 3864Output from the 3865.Em ECHO 3866macro is done to the file-pointer 3867.Fa yyout 3868.Pq default stdout . 3869.Pp 3870.Fn output 3871is not part of the 3872.Tn POSIX 3873specification. 3874.It 3875.Nm lex 3876does not support exclusive start conditions 3877.Pq %x , 3878though they are in the 3879.Tn POSIX 3880specification. 3881.It 3882When definitions are expanded, 3883.Nm 3884encloses them in parentheses. 3885With 3886.Nm lex , 3887the following: 3888.Bd -literal -offset indent 3889NAME [A-Z][A-Z0-9]* 3890%% 3891foo{NAME}? printf("Found it\en"); 3892%% 3893.Ed 3894.Pp 3895will not match the string 3896.Qq foo 3897because when the macro is expanded the rule is equivalent to 3898.Qq foo[A-Z][A-Z0-9]*? 3899and the precedence is such that the 3900.Sq ?\& 3901is associated with 3902.Qq [A-Z0-9]* . 3903With 3904.Nm , 3905the rule will be expanded to 3906.Qq foo([A-Z][A-Z0-9]*)? 3907and so the string 3908.Qq foo 3909will match. 3910.Pp 3911Note that if the definition begins with 3912.Sq ^ 3913or ends with 3914.Sq $ 3915then it is not expanded with parentheses, to allow these operators to appear in 3916definitions without losing their special meanings. 3917But the 3918.Sq Aq s , 3919.Sq / , 3920and 3921.Aq Aq EOF 3922operators cannot be used in a 3923.Nm 3924definition. 3925.Pp 3926Using 3927.Fl l 3928results in the 3929.Nm lex 3930behavior of no parentheses around the definition. 3931.Pp 3932The 3933.Tn POSIX 3934specification is that the definition be enclosed in parentheses. 3935.It 3936Some implementations of 3937.Nm lex 3938allow a rule's action to begin on a separate line, 3939if the rule's pattern has trailing whitespace: 3940.Bd -literal -offset indent 3941%% 3942foo|bar<space here> 3943 { foobar_action(); } 3944.Ed 3945.Pp 3946.Nm 3947does not support this feature. 3948.It 3949The 3950.Nm lex 3951.Sq %r 3952.Pq generate a Ratfor scanner 3953option is not supported. 3954It is not part of the 3955.Tn POSIX 3956specification. 3957.It 3958After a call to 3959.Fn unput , 3960.Fa yytext 3961is undefined until the next token is matched, 3962unless the scanner was built using 3963.Dq %array . 3964This is not the case with 3965.Nm lex 3966or the 3967.Tn POSIX 3968specification. 3969The 3970.Fl l 3971option does away with this incompatibility. 3972.It 3973The precedence of the 3974.Sq {} 3975.Pq numeric range 3976operator is different. 3977.Nm lex 3978interprets 3979.Qq abc{1,3} 3980as match one, two, or three occurrences of 3981.Sq abc , 3982whereas 3983.Nm 3984interprets it as match 3985.Sq ab 3986followed by one, two, or three occurrences of 3987.Sq c . 3988The latter is in agreement with the 3989.Tn POSIX 3990specification. 3991.It 3992The precedence of the 3993.Sq ^ 3994operator is different. 3995.Nm lex 3996interprets 3997.Qq ^foo|bar 3998as match either 3999.Sq foo 4000at the beginning of a line, or 4001.Sq bar 4002anywhere, whereas 4003.Nm 4004interprets it as match either 4005.Sq foo 4006or 4007.Sq bar 4008if they come at the beginning of a line. 4009The latter is in agreement with the 4010.Tn POSIX 4011specification. 4012.It 4013The special table-size declarations such as 4014.Sq %a 4015supported by 4016.Nm lex 4017are not required by 4018.Nm 4019scanners; 4020.Nm 4021ignores them. 4022.It 4023The name 4024.Dv FLEX_SCANNER 4025is #define'd so scanners may be written for use with either 4026.Nm 4027or 4028.Nm lex . 4029Scanners also include 4030.Dv YY_FLEX_MAJOR_VERSION 4031and 4032.Dv YY_FLEX_MINOR_VERSION 4033indicating which version of 4034.Nm 4035generated the scanner 4036(for example, for the 2.5 release, these defines would be 2 and 5, 4037respectively). 4038.El 4039.Pp 4040The following 4041.Nm 4042features are not included in 4043.Nm lex 4044or the 4045.Tn POSIX 4046specification: 4047.Bd -unfilled -offset indent 4048C++ scanners 4049%option 4050start condition scopes 4051start condition stacks 4052interactive/non-interactive scanners 4053yy_scan_string() and friends 4054yyterminate() 4055yy_set_interactive() 4056yy_set_bol() 4057YY_AT_BOL() 4058<<EOF>> 4059<*> 4060YY_DECL 4061YY_START 4062YY_USER_ACTION 4063YY_USER_INIT 4064#line directives 4065%{}'s around actions 4066multiple actions on a line 4067.Ed 4068.Pp 4069plus almost all of the 4070.Nm 4071flags. 4072The last feature in the list refers to the fact that with 4073.Nm 4074multiple actions can be placed on the same line, 4075separated with semi-colons, while with 4076.Nm lex , 4077the following 4078.Pp 4079.Dl foo handle_foo(); ++num_foos_seen; 4080.Pp 4081is 4082.Pq rather surprisingly 4083truncated to 4084.Pp 4085.Dl foo handle_foo(); 4086.Pp 4087.Nm 4088does not truncate the action. 4089Actions that are not enclosed in braces 4090are simply terminated at the end of the line. 4091.Sh FILES 4092.Bl -tag -width "<g++/FlexLexer.h>" 4093.It flex.skl 4094Skeleton scanner. 4095This file is only used when building flex, not when 4096.Nm 4097executes. 4098.It lex.backup 4099Backing-up information for the 4100.Fl b 4101flag (called 4102.Pa lex.bck 4103on some systems). 4104.It lex.yy.c 4105Generated scanner 4106(called 4107.Pa lexyy.c 4108on some systems). 4109.It lex.yy.cc 4110Generated C++ scanner class, when using 4111.Fl + . 4112.It Aq g++/FlexLexer.h 4113Header file defining the C++ scanner base class, 4114.Fa FlexLexer , 4115and its derived class, 4116.Fa yyFlexLexer . 4117.It /usr/lib/libl.* 4118.Nm 4119libraries. 4120The 4121.Pa /usr/lib/libfl.*\& 4122libraries are links to these. 4123Scanners must be linked using either 4124.Fl \&ll 4125or 4126.Fl lfl . 4127.El 4128.Sh EXIT STATUS 4129.Ex -std flex 4130.Sh DIAGNOSTICS 4131.Bl -diag 4132.It warning, rule cannot be matched 4133Indicates that the given rule cannot be matched because it follows other rules 4134that will always match the same text as it. 4135For example, in the following 4136.Dq foo 4137cannot be matched because it comes after an identifier 4138.Qq catch-all 4139rule: 4140.Bd -literal -offset indent 4141[a-z]+ got_identifier(); 4142foo got_foo(); 4143.Ed 4144.Pp 4145Using 4146.Em REJECT 4147in a scanner suppresses this warning. 4148.It "warning, \-s option given but default rule can be matched" 4149Means that it is possible 4150.Pq perhaps only in a particular start condition 4151that the default rule 4152.Pq match any single character 4153is the only one that will match a particular input. 4154Since 4155.Fl s 4156was given, presumably this is not intended. 4157.It reject_used_but_not_detected undefined 4158.It yymore_used_but_not_detected undefined 4159These errors can occur at compile time. 4160They indicate that the scanner uses 4161.Em REJECT 4162or 4163.Fn yymore 4164but that 4165.Nm 4166failed to notice the fact, meaning that 4167.Nm 4168scanned the first two sections looking for occurrences of these actions 4169and failed to find any, but somehow they snuck in 4170.Pq via an #include file, for example . 4171Use 4172.Dq %option reject 4173or 4174.Dq %option yymore 4175to indicate to 4176.Nm 4177that these features are really needed. 4178.It flex scanner jammed 4179A scanner compiled with 4180.Fl s 4181has encountered an input string which wasn't matched by any of its rules. 4182This error can also occur due to internal problems. 4183.It token too large, exceeds YYLMAX 4184The scanner uses 4185.Dq %array 4186and one of its rules matched a string longer than the 4187.Dv YYLMAX 4188constant 4189.Pq 8K bytes by default . 4190The value can be increased by #define'ing 4191.Dv YYLMAX 4192in the definitions section of 4193.Nm 4194input. 4195.It "scanner requires \-8 flag to use the character 'x'" 4196The scanner specification includes recognizing the 8-bit character 4197.Sq x 4198and the 4199.Fl 8 4200flag was not specified, and defaulted to 7-bit because the 4201.Fl Cf 4202or 4203.Fl CF 4204table compression options were used. 4205See the discussion of the 4206.Fl 7 4207flag for details. 4208.It flex scanner push-back overflow 4209unput() was used to push back so much text that the scanner's buffer 4210could not hold both the pushed-back text and the current token in 4211.Fa yytext . 4212Ideally the scanner should dynamically resize the buffer in this case, 4213but at present it does not. 4214.It "input buffer overflow, can't enlarge buffer because scanner uses REJECT" 4215The scanner was working on matching an extremely large token and needed 4216to expand the input buffer. 4217This doesn't work with scanners that use 4218.Em REJECT . 4219.It "fatal flex scanner internal error--end of buffer missed" 4220This can occur in an scanner which is reentered after a long-jump 4221has jumped out 4222.Pq or over 4223the scanner's activation frame. 4224Before reentering the scanner, use: 4225.Pp 4226.Dl yyrestart(yyin); 4227.Pp 4228or, as noted above, switch to using the C++ scanner class. 4229.It "too many start conditions in <> construct!" 4230More start conditions than exist were listed in a <> construct 4231(so at least one of them must have been listed twice). 4232.El 4233.Sh SEE ALSO 4234.Xr awk 1 , 4235.Xr sed 1 , 4236.Xr yacc 1 4237.Rs 4238.%A John Levine 4239.%A Tony Mason 4240.%A Doug Brown 4241.%B Lex & Yacc 4242.%I O'Reilly and Associates 4243.%N 2nd edition 4244.Re 4245.Rs 4246.%A Alfred Aho 4247.%A Ravi Sethi 4248.%A Jeffrey Ullman 4249.%B Compilers: Principles, Techniques and Tools 4250.%I Addison-Wesley 4251.%D 1986 4252.%O "Describes the pattern-matching techniques used by flex (deterministic finite automata)" 4253.Re 4254.Sh STANDARDS 4255The 4256.Nm lex 4257utility is compliant with the 4258.St -p1003.1-2008 4259specification, 4260though its presence is optional. 4261.Pp 4262The flags 4263.Op Fl 78BbCdFfhIiLloPpSsTVw+? , 4264.Op Fl -help , 4265and 4266.Op Fl -version 4267are extensions to that specification. 4268.Pp 4269See also the 4270.Sx INCOMPATIBILITIES WITH LEX AND POSIX 4271section, above. 4272.Sh AUTHORS 4273Vern Paxson, with the help of many ideas and much inspiration from 4274Van Jacobson. 4275Original version by Jef Poskanzer. 4276The fast table representation is a partial implementation of a design done by 4277Van Jacobson. 4278The implementation was done by Kevin Gong and Vern Paxson. 4279.Pp 4280Thanks to the many 4281.Nm 4282beta-testers, feedbackers, and contributors, especially Francois Pinard, 4283Casey Leedom, 4284Robert Abramovitz, 4285Stan Adermann, Terry Allen, David Barker-Plummer, John Basrai, 4286Neal Becker, Nelson H.F. Beebe, benson@odi.com, 4287Karl Berry, Peter A. Bigot, Simon Blanchard, 4288Keith Bostic, Frederic Brehm, Ian Brockbank, Kin Cho, Nick Christopher, 4289Brian Clapper, J.T. Conklin, 4290Jason Coughlin, Bill Cox, Nick Cropper, Dave Curtis, Scott David 4291Daniels, Chris G. Demetriou, Theo de Raadt, 4292Mike Donahue, Chuck Doucette, Tom Epperly, Leo Eskin, 4293Chris Faylor, Chris Flatters, Jon Forrest, Jeffrey Friedl, 4294Joe Gayda, Kaveh R. Ghazi, Wolfgang Glunz, 4295Eric Goldman, Christopher M. Gould, Ulrich Grepel, Peer Griebel, 4296Jan Hajic, Charles Hemphill, NORO Hideo, 4297Jarkko Hietaniemi, Scott Hofmann, 4298Jeff Honig, Dana Hudes, Eric Hughes, John Interrante, 4299Ceriel Jacobs, Michal Jaegermann, Sakari Jalovaara, Jeffrey R. Jones, 4300Henry Juengst, Klaus Kaempf, Jonathan I. Kamens, Terrence O Kane, 4301Amir Katz, ken@ken.hilco.com, Kevin B. Kenny, 4302Steve Kirsch, Winfried Koenig, Marq Kole, Ronald Lamprecht, 4303Greg Lee, Rohan Lenard, Craig Leres, John Levine, Steve Liddle, 4304David Loffredo, Mike Long, 4305Mohamed el Lozy, Brian Madsen, Malte, Joe Marshall, 4306Bengt Martensson, Chris Metcalf, 4307Luke Mewburn, Jim Meyering, R. Alexander Milowski, Erik Naggum, 4308G.T. Nicol, Landon Noll, James Nordby, Marc Nozell, 4309Richard Ohnemus, Karsten Pahnke, 4310Sven Panne, Roland Pesch, Walter Pelissero, Gaumond Pierre, 4311Esmond Pitt, Jef Poskanzer, Joe Rahmeh, Jarmo Raiha, 4312Frederic Raimbault, Pat Rankin, Rick Richardson, 4313Kevin Rodgers, Kai Uwe Rommel, Jim Roskind, Alberto Santini, 4314Andreas Scherer, Darrell Schiebel, Raf Schietekat, 4315Doug Schmidt, Philippe Schnoebelen, Andreas Schwab, 4316Larry Schwimmer, Alex Siegel, Eckehard Stolz, Jan-Erik Strvmquist, 4317Mike Stump, Paul Stuart, Dave Tallman, Ian Lance Taylor, 4318Chris Thewalt, Richard M. Timoney, Jodi Tsai, 4319Paul Tuinenga, Gary Weik, Frank Whaley, Gerhard Wilhelms, Kent Williams, 4320Ken Yap, Ron Zellar, Nathan Zelle, David Zuhn, 4321and those whose names have slipped my marginal mail-archiving skills 4322but whose contributions are appreciated all the 4323same. 4324.Pp 4325Thanks to Keith Bostic, Jon Forrest, Noah Friedman, 4326John Gilmore, Craig Leres, John Levine, Bob Mulcahy, G.T. 4327Nicol, Francois Pinard, Rich Salz, and Richard Stallman for help with various 4328distribution headaches. 4329.Pp 4330Thanks to Esmond Pitt and Earle Horton for 8-bit character support; 4331to Benson Margulies and Fred Burke for C++ support; 4332to Kent Williams and Tom Epperly for C++ class support; 4333to Ove Ewerlid for support of NUL's; 4334and to Eric Hughes for support of multiple buffers. 4335.Pp 4336This work was primarily done when I was with the Real Time Systems Group 4337at the Lawrence Berkeley Laboratory in Berkeley, CA. 4338Many thanks to all there for the support I received. 4339.Pp 4340Send comments to 4341.Aq Mt vern@ee.lbl.gov . 4342.Sh BUGS 4343Some trailing context patterns cannot be properly matched and generate 4344warning messages 4345.Pq "dangerous trailing context" . 4346These are patterns where the ending of the first part of the rule 4347matches the beginning of the second part, such as 4348.Qq zx*/xy* , 4349where the 4350.Sq x* 4351matches the 4352.Sq x 4353at the beginning of the trailing context. 4354(Note that the POSIX draft states that the text matched by such patterns 4355is undefined.) 4356.Pp 4357For some trailing context rules, parts which are actually fixed-length are 4358not recognized as such, leading to the above mentioned performance loss. 4359In particular, parts using 4360.Sq |\& 4361or 4362.Sq {n} 4363(such as 4364.Qq foo{3} ) 4365are always considered variable-length. 4366.Pp 4367Combining trailing context with the special 4368.Sq |\& 4369action can result in fixed trailing context being turned into 4370the more expensive variable trailing context. 4371For example, in the following: 4372.Bd -literal -offset indent 4373%% 4374abc | 4375xyz/def 4376.Ed 4377.Pp 4378Use of 4379.Fn unput 4380invalidates yytext and yyleng, unless the 4381.Dq %array 4382directive 4383or the 4384.Fl l 4385option has been used. 4386.Pp 4387Pattern-matching of NUL's is substantially slower than matching other 4388characters. 4389.Pp 4390Dynamic resizing of the input buffer is slow, as it entails rescanning 4391all the text matched so far by the current 4392.Pq generally huge 4393token. 4394.Pp 4395Due to both buffering of input and read-ahead, 4396it is not possible to intermix calls to 4397.Aq Pa stdio.h 4398routines, such as, for example, 4399.Fn getchar , 4400with 4401.Nm 4402rules and expect it to work. 4403Call 4404.Fn input 4405instead. 4406.Pp 4407The total table entries listed by the 4408.Fl v 4409flag excludes the number of table entries needed to determine 4410what rule has been matched. 4411The number of entries is equal to the number of DFA states 4412if the scanner does not use 4413.Em REJECT , 4414and somewhat greater than the number of states if it does. 4415.Pp 4416.Em REJECT 4417cannot be used with the 4418.Fl f 4419or 4420.Fl F 4421options. 4422.Pp 4423The 4424.Nm 4425internal algorithms need documentation. 4426