xref: /netbsd/external/bsd/flex/dist/doc/flex.info-1 (revision d6563c0d)
1This is flex.info, produced by makeinfo version 6.1 from flex.texi.
2
3The flex manual is placed under the same licensing conditions as the
4rest of flex:
5
6   Copyright (C) 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2012 The Flex
7Project.
8
9   Copyright (C) 1990, 1997 The Regents of the University of California.
10All rights reserved.
11
12   This code is derived from software contributed to Berkeley by Vern
13Paxson.
14
15   The United States Government has rights in this work pursuant to
16contract no.  DE-AC03-76SF00098 between the United States Department of
17Energy and the University of California.
18
19   Redistribution and use in source and binary forms, with or without
20modification, are permitted provided that the following conditions are
21met:
22
23  1. Redistributions of source code must retain the above copyright
24     notice, this list of conditions and the following disclaimer.
25
26  2. Redistributions in binary form must reproduce the above copyright
27     notice, this list of conditions and the following disclaimer in the
28     documentation and/or other materials provided with the
29     distribution.
30
31   Neither the name of the University nor the names of its contributors
32may be used to endorse or promote products derived from this software
33without specific prior written permission.
34
35   THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
36WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
37MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
38INFO-DIR-SECTION Programming
39START-INFO-DIR-ENTRY
40* flex: (flex).      Fast lexical analyzer generator (lex replacement).
41END-INFO-DIR-ENTRY
42
43
44File: flex.info,  Node: Top,  Next: Copyright,  Prev: (dir),  Up: (dir)
45
46flex
47****
48
49This manual describes 'flex', a tool for generating programs that
50perform pattern-matching on text.  The manual includes both tutorial and
51reference sections.
52
53   This edition of 'The flex Manual' documents 'flex' version 2.6.4.  It
54was last updated on 6 May 2017.
55
56   This manual was written by Vern Paxson, Will Estes and John Millaway.
57
58* Menu:
59
60* Copyright::
61* Reporting Bugs::
62* Introduction::
63* Simple Examples::
64* Format::
65* Patterns::
66* Matching::
67* Actions::
68* Generated Scanner::
69* Start Conditions::
70* Multiple Input Buffers::
71* EOF::
72* Misc Macros::
73* User Values::
74* Yacc::
75* Scanner Options::
76* Performance::
77* Cxx::
78* Reentrant::
79* Lex and Posix::
80* Memory Management::
81* Serialized Tables::
82* Diagnostics::
83* Limitations::
84* Bibliography::
85* FAQ::
86* Appendices::
87* Indices::
88
89 -- The Detailed Node Listing --
90
91Format of the Input File
92
93* Definitions Section::
94* Rules Section::
95* User Code Section::
96* Comments in the Input::
97
98Scanner Options
99
100* Options for Specifying Filenames::
101* Options Affecting Scanner Behavior::
102* Code-Level And API Options::
103* Options for Scanner Speed and Size::
104* Debugging Options::
105* Miscellaneous Options::
106
107Reentrant C Scanners
108
109* Reentrant Uses::
110* Reentrant Overview::
111* Reentrant Example::
112* Reentrant Detail::
113* Reentrant Functions::
114
115The Reentrant API in Detail
116
117* Specify Reentrant::
118* Extra Reentrant Argument::
119* Global Replacement::
120* Init and Destroy Functions::
121* Accessor Methods::
122* Extra Data::
123* About yyscan_t::
124
125Memory Management
126
127* The Default Memory Management::
128* Overriding The Default Memory Management::
129* A Note About yytext And Memory::
130
131Serialized Tables
132
133* Creating Serialized Tables::
134* Loading and Unloading Serialized Tables::
135* Tables File Format::
136
137FAQ
138
139* When was flex born?::
140* How do I expand backslash-escape sequences in C-style quoted strings?::
141* Why do flex scanners call fileno if it is not ANSI compatible?::
142* Does flex support recursive pattern definitions?::
143* How do I skip huge chunks of input (tens of megabytes) while using flex?::
144* Flex is not matching my patterns in the same order that I defined them.::
145* My actions are executing out of order or sometimes not at all.::
146* How can I have multiple input sources feed into the same scanner at the same time?::
147* Can I build nested parsers that work with the same input file?::
148* How can I match text only at the end of a file?::
149* How can I make REJECT cascade across start condition boundaries?::
150* Why cant I use fast or full tables with interactive mode?::
151* How much faster is -F or -f than -C?::
152* If I have a simple grammar cant I just parse it with flex?::
153* Why doesn't yyrestart() set the start state back to INITIAL?::
154* How can I match C-style comments?::
155* The period isn't working the way I expected.::
156* Can I get the flex manual in another format?::
157* Does there exist a "faster" NDFA->DFA algorithm?::
158* How does flex compile the DFA so quickly?::
159* How can I use more than 8192 rules?::
160* How do I abandon a file in the middle of a scan and switch to a new file?::
161* How do I execute code only during initialization (only before the first scan)?::
162* How do I execute code at termination?::
163* Where else can I find help?::
164* Can I include comments in the "rules" section of the file?::
165* I get an error about undefined yywrap().::
166* How can I change the matching pattern at run time?::
167* How can I expand macros in the input?::
168* How can I build a two-pass scanner?::
169* How do I match any string not matched in the preceding rules?::
170* I am trying to port code from AT&T lex that uses yysptr and yysbuf.::
171* Is there a way to make flex treat NULL like a regular character?::
172* Whenever flex can not match the input it says "flex scanner jammed".::
173* Why doesn't flex have non-greedy operators like perl does?::
174* Memory leak - 16386 bytes allocated by malloc.::
175* How do I track the byte offset for lseek()?::
176* How do I use my own I/O classes in a C++ scanner?::
177* How do I skip as many chars as possible?::
178* deleteme00::
179* Are certain equivalent patterns faster than others?::
180* Is backing up a big deal?::
181* Can I fake multi-byte character support?::
182* deleteme01::
183* Can you discuss some flex internals?::
184* unput() messes up yy_at_bol::
185* The | operator is not doing what I want::
186* Why can't flex understand this variable trailing context pattern?::
187* The ^ operator isn't working::
188* Trailing context is getting confused with trailing optional patterns::
189* Is flex GNU or not?::
190* ERASEME53::
191* I need to scan if-then-else blocks and while loops::
192* ERASEME55::
193* ERASEME56::
194* ERASEME57::
195* Is there a repository for flex scanners?::
196* How can I conditionally compile or preprocess my flex input file?::
197* Where can I find grammars for lex and yacc?::
198* I get an end-of-buffer message for each character scanned.::
199* unnamed-faq-62::
200* unnamed-faq-63::
201* unnamed-faq-64::
202* unnamed-faq-65::
203* unnamed-faq-66::
204* unnamed-faq-67::
205* unnamed-faq-68::
206* unnamed-faq-69::
207* unnamed-faq-70::
208* unnamed-faq-71::
209* unnamed-faq-72::
210* unnamed-faq-73::
211* unnamed-faq-74::
212* unnamed-faq-75::
213* unnamed-faq-76::
214* unnamed-faq-77::
215* unnamed-faq-78::
216* unnamed-faq-79::
217* unnamed-faq-80::
218* unnamed-faq-81::
219* unnamed-faq-82::
220* unnamed-faq-83::
221* unnamed-faq-84::
222* unnamed-faq-85::
223* unnamed-faq-86::
224* unnamed-faq-87::
225* unnamed-faq-88::
226* unnamed-faq-90::
227* unnamed-faq-91::
228* unnamed-faq-92::
229* unnamed-faq-93::
230* unnamed-faq-94::
231* unnamed-faq-95::
232* unnamed-faq-96::
233* unnamed-faq-97::
234* unnamed-faq-98::
235* unnamed-faq-99::
236* unnamed-faq-100::
237* unnamed-faq-101::
238* What is the difference between YYLEX_PARAM and YY_DECL?::
239* Why do I get "conflicting types for yylex" error?::
240* How do I access the values set in a Flex action from within a Bison action?::
241
242Appendices
243
244* Makefiles and Flex::
245* Bison Bridge::
246* M4 Dependency::
247* Common Patterns::
248
249Indices
250
251* Concept Index::
252* Index of Functions and Macros::
253* Index of Variables::
254* Index of Data Types::
255* Index of Hooks::
256* Index of Scanner Options::
257
258
259
260File: flex.info,  Node: Copyright,  Next: Reporting Bugs,  Prev: Top,  Up: Top
261
2621 Copyright
263***********
264
265The flex manual is placed under the same licensing conditions as the
266rest of flex:
267
268   Copyright (C) 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2012 The Flex
269Project.
270
271   Copyright (C) 1990, 1997 The Regents of the University of California.
272All rights reserved.
273
274   This code is derived from software contributed to Berkeley by Vern
275Paxson.
276
277   The United States Government has rights in this work pursuant to
278contract no.  DE-AC03-76SF00098 between the United States Department of
279Energy and the University of California.
280
281   Redistribution and use in source and binary forms, with or without
282modification, are permitted provided that the following conditions are
283met:
284
285  1. Redistributions of source code must retain the above copyright
286     notice, this list of conditions and the following disclaimer.
287
288  2. Redistributions in binary form must reproduce the above copyright
289     notice, this list of conditions and the following disclaimer in the
290     documentation and/or other materials provided with the
291     distribution.
292
293   Neither the name of the University nor the names of its contributors
294may be used to endorse or promote products derived from this software
295without specific prior written permission.
296
297   THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
298WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
299MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
300
301
302File: flex.info,  Node: Reporting Bugs,  Next: Introduction,  Prev: Copyright,  Up: Top
303
3042 Reporting Bugs
305****************
306
307If you find a bug in 'flex', please report it using GitHub's issue
308tracking facility at <https://github.com/westes/flex/issues/>
309
310
311File: flex.info,  Node: Introduction,  Next: Simple Examples,  Prev: Reporting Bugs,  Up: Top
312
3133 Introduction
314**************
315
316'flex' is a tool for generating "scanners".  A scanner is a program
317which recognizes lexical patterns in text.  The 'flex' program reads the
318given input files, or its standard input if no file names are given, for
319a description of a scanner to generate.  The description is in the form
320of pairs of regular expressions and C code, called "rules".  'flex'
321generates as output a C source file, 'lex.yy.c' by default, which
322defines a routine 'yylex()'.  This file can be compiled and linked with
323the flex runtime library to produce an executable.  When the executable
324is run, it analyzes its input for occurrences of the regular
325expressions.  Whenever it finds one, it executes the corresponding C
326code.
327
328
329File: flex.info,  Node: Simple Examples,  Next: Format,  Prev: Introduction,  Up: Top
330
3314 Some Simple Examples
332**********************
333
334First some simple examples to get the flavor of how one uses 'flex'.
335
336   The following 'flex' input specifies a scanner which, when it
337encounters the string 'username' will replace it with the user's login
338name:
339
340         %%
341         username    printf( "%s", getlogin() );
342
343   By default, any text not matched by a 'flex' scanner is copied to the
344output, so the net effect of this scanner is to copy its input file to
345its output with each occurrence of 'username' expanded.  In this input,
346there is just one rule.  'username' is the "pattern" and the 'printf' is
347the "action".  The '%%' symbol marks the beginning of the rules.
348
349   Here's another simple example:
350
351                 int num_lines = 0, num_chars = 0;
352
353         %%
354         \n      ++num_lines; ++num_chars;
355         .       ++num_chars;
356
357         %%
358
359         int main()
360                 {
361                 yylex();
362                 printf( "# of lines = %d, # of chars = %d\n",
363                         num_lines, num_chars );
364                 }
365
366   This scanner counts the number of characters and the number of lines
367in its input.  It produces no output other than the final report on the
368character and line counts.  The first line declares two globals,
369'num_lines' and 'num_chars', which are accessible both inside 'yylex()'
370and in the 'main()' routine declared after the second '%%'.  There are
371two rules, one which matches a newline ('\n') and increments both the
372line count and the character count, and one which matches any character
373other than a newline (indicated by the '.' regular expression).
374
375   A somewhat more complicated example:
376
377         /* scanner for a toy Pascal-like language */
378
379         %{
380         /* need this for the call to atof() below */
381         #include <math.h>
382         %}
383
384         DIGIT    [0-9]
385         ID       [a-z][a-z0-9]*
386
387         %%
388
389         {DIGIT}+    {
390                     printf( "An integer: %s (%d)\n", yytext,
391                             atoi( yytext ) );
392                     }
393
394         {DIGIT}+"."{DIGIT}*        {
395                     printf( "A float: %s (%g)\n", yytext,
396                             atof( yytext ) );
397                     }
398
399         if|then|begin|end|procedure|function        {
400                     printf( "A keyword: %s\n", yytext );
401                     }
402
403         {ID}        printf( "An identifier: %s\n", yytext );
404
405         "+"|"-"|"*"|"/"   printf( "An operator: %s\n", yytext );
406
407         "{"[^{}\n]*"}"     /* eat up one-line comments */
408
409         [ \t\n]+          /* eat up whitespace */
410
411         .           printf( "Unrecognized character: %s\n", yytext );
412
413         %%
414
415         int main( int argc, char **argv )
416             {
417             ++argv, --argc;  /* skip over program name */
418             if ( argc > 0 )
419                     yyin = fopen( argv[0], "r" );
420             else
421                     yyin = stdin;
422
423             yylex();
424             }
425
426   This is the beginnings of a simple scanner for a language like
427Pascal.  It identifies different types of "tokens" and reports on what
428it has seen.
429
430   The details of this example will be explained in the following
431sections.
432
433
434File: flex.info,  Node: Format,  Next: Patterns,  Prev: Simple Examples,  Up: Top
435
4365 Format of the Input File
437**************************
438
439The 'flex' input file consists of three sections, separated by a line
440containing only '%%'.
441
442         definitions
443         %%
444         rules
445         %%
446         user code
447
448* Menu:
449
450* Definitions Section::
451* Rules Section::
452* User Code Section::
453* Comments in the Input::
454
455
456File: flex.info,  Node: Definitions Section,  Next: Rules Section,  Prev: Format,  Up: Format
457
4585.1 Format of the Definitions Section
459=====================================
460
461The "definitions section" contains declarations of simple "name"
462definitions to simplify the scanner specification, and declarations of
463"start conditions", which are explained in a later section.
464
465   Name definitions have the form:
466
467         name definition
468
469   The 'name' is a word beginning with a letter or an underscore ('_')
470followed by zero or more letters, digits, '_', or '-' (dash).  The
471definition is taken to begin at the first non-whitespace character
472following the name and continuing to the end of the line.  The
473definition can subsequently be referred to using '{name}', which will
474expand to '(definition)'.  For example,
475
476         DIGIT    [0-9]
477         ID       [a-z][a-z0-9]*
478
479   Defines 'DIGIT' to be a regular expression which matches a single
480digit, and 'ID' to be a regular expression which matches a letter
481followed by zero-or-more letters-or-digits.  A subsequent reference to
482
483         {DIGIT}+"."{DIGIT}*
484
485   is identical to
486
487         ([0-9])+"."([0-9])*
488
489   and matches one-or-more digits followed by a '.' followed by
490zero-or-more digits.
491
492   An unindented comment (i.e., a line beginning with '/*') is copied
493verbatim to the output up to the next '*/'.
494
495   Any _indented_ text or text enclosed in '%{' and '%}' is also copied
496verbatim to the output (with the %{ and %} symbols removed).  The %{ and
497%} symbols must appear unindented on lines by themselves.
498
499   A '%top' block is similar to a '%{' ...  '%}' block, except that the
500code in a '%top' block is relocated to the _top_ of the generated file,
501before any flex definitions (1).  The '%top' block is useful when you
502want certain preprocessor macros to be defined or certain files to be
503included before the generated code.  The single characters, '{' and '}'
504are used to delimit the '%top' block, as show in the example below:
505
506         %top{
507             /* This code goes at the "top" of the generated file. */
508             #include <stdint.h>
509             #include <inttypes.h>
510         }
511
512   Multiple '%top' blocks are allowed, and their order is preserved.
513
514   ---------- Footnotes ----------
515
516   (1) Actually, 'yyIN_HEADER' is defined before the '%top' block.
517
518
519File: flex.info,  Node: Rules Section,  Next: User Code Section,  Prev: Definitions Section,  Up: Format
520
5215.2 Format of the Rules Section
522===============================
523
524The "rules" section of the 'flex' input contains a series of rules of
525the form:
526
527         pattern   action
528
529   where the pattern must be unindented and the action must begin on the
530same line.  *Note Patterns::, for a further description of patterns and
531actions.
532
533   In the rules section, any indented or %{ %} enclosed text appearing
534before the first rule may be used to declare variables which are local
535to the scanning routine and (after the declarations) code which is to be
536executed whenever the scanning routine is entered.  Other indented or %{
537%} text in the rule section is still copied to the output, but its
538meaning is not well-defined and it may well cause compile-time errors
539(this feature is present for POSIX compliance.  *Note Lex and Posix::,
540for other such features).
541
542   Any _indented_ text or text enclosed in '%{' and '%}' is copied
543verbatim to the output (with the %{ and %} symbols removed).  The %{ and
544%} symbols must appear unindented on lines by themselves.
545
546
547File: flex.info,  Node: User Code Section,  Next: Comments in the Input,  Prev: Rules Section,  Up: Format
548
5495.3 Format of the User Code Section
550===================================
551
552The user code section is simply copied to 'lex.yy.c' verbatim.  It is
553used for companion routines which call or are called by the scanner.
554The presence of this section is optional; if it is missing, the second
555'%%' in the input file may be skipped, too.
556
557
558File: flex.info,  Node: Comments in the Input,  Prev: User Code Section,  Up: Format
559
5605.4 Comments in the Input
561=========================
562
563Flex supports C-style comments, that is, anything between '/*' and '*/'
564is considered a comment.  Whenever flex encounters a comment, it copies
565the entire comment verbatim to the generated source code.  Comments may
566appear just about anywhere, but with the following exceptions:
567
568   * Comments may not appear in the Rules Section wherever flex is
569     expecting a regular expression.  This means comments may not appear
570     at the beginning of a line, or immediately following a list of
571     scanner states.
572   * Comments may not appear on an '%option' line in the Definitions
573     Section.
574
575   If you want to follow a simple rule, then always begin a comment on a
576new line, with one or more whitespace characters before the initial
577'/*').  This rule will work anywhere in the input file.
578
579   All the comments in the following example are valid:
580
581     %{
582     /* code block */
583     %}
584
585     /* Definitions Section */
586     %x STATE_X
587
588     %%
589         /* Rules Section */
590     ruleA   /* after regex */ { /* code block */ } /* after code block */
591             /* Rules Section (indented) */
592     <STATE_X>{
593     ruleC   ECHO;
594     ruleD   ECHO;
595     %{
596     /* code block */
597     %}
598     }
599     %%
600     /* User Code Section */
601
602
603
604File: flex.info,  Node: Patterns,  Next: Matching,  Prev: Format,  Up: Top
605
6066 Patterns
607**********
608
609The patterns in the input (see *note Rules Section::) are written using
610an extended set of regular expressions.  These are:
611
612'x'
613     match the character 'x'
614
615'.'
616     any character (byte) except newline
617
618'[xyz]'
619     a "character class"; in this case, the pattern matches either an
620     'x', a 'y', or a 'z'
621
622'[abj-oZ]'
623     a "character class" with a range in it; matches an 'a', a 'b', any
624     letter from 'j' through 'o', or a 'Z'
625
626'[^A-Z]'
627     a "negated character class", i.e., any character but those in the
628     class.  In this case, any character EXCEPT an uppercase letter.
629
630'[^A-Z\n]'
631     any character EXCEPT an uppercase letter or a newline
632
633'[a-z]{-}[aeiou]'
634     the lowercase consonants
635
636'r*'
637     zero or more r's, where r is any regular expression
638
639'r+'
640     one or more r's
641
642'r?'
643     zero or one r's (that is, "an optional r")
644
645'r{2,5}'
646     anywhere from two to five r's
647
648'r{2,}'
649     two or more r's
650
651'r{4}'
652     exactly 4 r's
653
654'{name}'
655     the expansion of the 'name' definition (*note Format::).
656
657'"[xyz]\"foo"'
658     the literal string: '[xyz]"foo'
659
660'\X'
661     if X is 'a', 'b', 'f', 'n', 'r', 't', or 'v', then the ANSI-C
662     interpretation of '\x'.  Otherwise, a literal 'X' (used to escape
663     operators such as '*')
664
665'\0'
666     a NUL character (ASCII code 0)
667
668'\123'
669     the character with octal value 123
670
671'\x2a'
672     the character with hexadecimal value 2a
673
674'(r)'
675     match an 'r'; parentheses are used to override precedence (see
676     below)
677
678'(?r-s:pattern)'
679     apply option 'r' and omit option 's' while interpreting pattern.
680     Options may be zero or more of the characters 'i', 's', or 'x'.
681
682     'i' means case-insensitive.  '-i' means case-sensitive.
683
684     's' alters the meaning of the '.' syntax to match any single byte
685     whatsoever.  '-s' alters the meaning of '.' to match any byte
686     except '\n'.
687
688     'x' ignores comments and whitespace in patterns.  Whitespace is
689     ignored unless it is backslash-escaped, contained within '""'s, or
690     appears inside a character class.
691
692     The following are all valid:
693
694     (?:foo)         same as  (foo)
695     (?i:ab7)        same as  ([aA][bB]7)
696     (?-i:ab)        same as  (ab)
697     (?s:.)          same as  [\x00-\xFF]
698     (?-s:.)         same as  [^\n]
699     (?ix-s: a . b)  same as  ([Aa][^\n][bB])
700     (?x:a  b)       same as  ("ab")
701     (?x:a\ b)       same as  ("a b")
702     (?x:a" "b)      same as  ("a b")
703     (?x:a[ ]b)      same as  ("a b")
704     (?x:a
705         /* comment */
706         b
707         c)          same as  (abc)
708
709'(?# comment )'
710     omit everything within '()'.  The first ')' character encountered
711     ends the pattern.  It is not possible to for the comment to contain
712     a ')' character.  The comment may span lines.
713
714'rs'
715     the regular expression 'r' followed by the regular expression 's';
716     called "concatenation"
717
718'r|s'
719     either an 'r' or an 's'
720
721'r/s'
722     an 'r' but only if it is followed by an 's'.  The text matched by
723     's' is included when determining whether this rule is the longest
724     match, but is then returned to the input before the action is
725     executed.  So the action only sees the text matched by 'r'.  This
726     type of pattern is called "trailing context".  (There are some
727     combinations of 'r/s' that flex cannot match correctly.  *Note
728     Limitations::, regarding dangerous trailing context.)
729
730'^r'
731     an 'r', but only at the beginning of a line (i.e., when just
732     starting to scan, or right after a newline has been scanned).
733
734'r$'
735     an 'r', but only at the end of a line (i.e., just before a
736     newline).  Equivalent to 'r/\n'.
737
738     Note that 'flex''s notion of "newline" is exactly whatever the C
739     compiler used to compile 'flex' interprets '\n' as; in particular,
740     on some DOS systems you must either filter out '\r's in the input
741     yourself, or explicitly use 'r/\r\n' for 'r$'.
742
743'<s>r'
744     an 'r', but only in start condition 's' (see *note Start
745     Conditions:: for discussion of start conditions).
746
747'<s1,s2,s3>r'
748     same, but in any of start conditions 's1', 's2', or 's3'.
749
750'<*>r'
751     an 'r' in any start condition, even an exclusive one.
752
753'<<EOF>>'
754     an end-of-file.
755
756'<s1,s2><<EOF>>'
757     an end-of-file when in start condition 's1' or 's2'
758
759   Note that inside of a character class, all regular expression
760operators lose their special meaning except escape ('\') and the
761character class operators, '-', ']]', and, at the beginning of the
762class, '^'.
763
764   The regular expressions listed above are grouped according to
765precedence, from highest precedence at the top to lowest at the bottom.
766Those grouped together have equal precedence (see special note on the
767precedence of the repeat operator, '{}', under the documentation for the
768'--posix' POSIX compliance option).  For example,
769
770         foo|bar*
771
772   is the same as
773
774         (foo)|(ba(r*))
775
776   since the '*' operator has higher precedence than concatenation, and
777concatenation higher than alternation ('|').  This pattern therefore
778matches _either_ the string 'foo' _or_ the string 'ba' followed by
779zero-or-more 'r''s.  To match 'foo' or zero-or-more repetitions of the
780string 'bar', use:
781
782         foo|(bar)*
783
784   And to match a sequence of zero or more repetitions of 'foo' and
785'bar':
786
787         (foo|bar)*
788
789   In addition to characters and ranges of characters, character classes
790can also contain "character class expressions".  These are expressions
791enclosed inside '[:' and ':]' delimiters (which themselves must appear
792between the '[' and ']' of the character class.  Other elements may
793occur inside the character class, too).  The valid expressions are:
794
795         [:alnum:] [:alpha:] [:blank:]
796         [:cntrl:] [:digit:] [:graph:]
797         [:lower:] [:print:] [:punct:]
798         [:space:] [:upper:] [:xdigit:]
799
800   These expressions all designate a set of characters equivalent to the
801corresponding standard C 'isXXX' function.  For example, '[:alnum:]'
802designates those characters for which 'isalnum()' returns true - i.e.,
803any alphabetic or numeric character.  Some systems don't provide
804'isblank()', so flex defines '[:blank:]' as a blank or a tab.
805
806   For example, the following character classes are all equivalent:
807
808         [[:alnum:]]
809         [[:alpha:][:digit:]]
810         [[:alpha:][0-9]]
811         [a-zA-Z0-9]
812
813   A word of caution.  Character classes are expanded immediately when
814seen in the 'flex' input.  This means the character classes are
815sensitive to the locale in which 'flex' is executed, and the resulting
816scanner will not be sensitive to the runtime locale.  This may or may
817not be desirable.
818
819   * If your scanner is case-insensitive (the '-i' flag), then
820     '[:upper:]' and '[:lower:]' are equivalent to '[:alpha:]'.
821
822   * Character classes with ranges, such as '[a-Z]', should be used with
823     caution in a case-insensitive scanner if the range spans upper or
824     lowercase characters.  Flex does not know if you want to fold all
825     upper and lowercase characters together, or if you want the literal
826     numeric range specified (with no case folding).  When in doubt,
827     flex will assume that you meant the literal numeric range, and will
828     issue a warning.  The exception to this rule is a character range
829     such as '[a-z]' or '[S-W]' where it is obvious that you want
830     case-folding to occur.  Here are some examples with the '-i' flag
831     enabled:
832
833     Range        Result      Literal Range        Alternate Range
834     '[a-t]'      ok          '[a-tA-T]'
835     '[A-T]'      ok          '[a-tA-T]'
836     '[A-t]'      ambiguous   '[A-Z\[\\\]_`a-t]'   '[a-tA-T]'
837     '[_-{]'      ambiguous   '[_`a-z{]'           '[_`a-zA-Z{]'
838     '[@-C]'      ambiguous   '[@ABC]'             '[@A-Z\[\\\]_`abc]'
839
840   * A negated character class such as the example '[^A-Z]' above _will_
841     match a newline unless '\n' (or an equivalent escape sequence) is
842     one of the characters explicitly present in the negated character
843     class (e.g., '[^A-Z\n]').  This is unlike how many other regular
844     expression tools treat negated character classes, but unfortunately
845     the inconsistency is historically entrenched.  Matching newlines
846     means that a pattern like '[^"]*' can match the entire input unless
847     there's another quote in the input.
848
849     Flex allows negation of character class expressions by prepending
850     '^' to the POSIX character class name.
851
852              [:^alnum:] [:^alpha:] [:^blank:]
853              [:^cntrl:] [:^digit:] [:^graph:]
854              [:^lower:] [:^print:] [:^punct:]
855              [:^space:] [:^upper:] [:^xdigit:]
856
857     Flex will issue a warning if the expressions '[:^upper:]' and
858     '[:^lower:]' appear in a case-insensitive scanner, since their
859     meaning is unclear.  The current behavior is to skip them entirely,
860     but this may change without notice in future revisions of flex.
861
862   *
863     The '{-}' operator computes the difference of two character
864     classes.  For example, '[a-c]{-}[b-z]' represents all the
865     characters in the class '[a-c]' that are not in the class '[b-z]'
866     (which in this case, is just the single character 'a').  The '{-}'
867     operator is left associative, so '[abc]{-}[b]{-}[c]' is the same as
868     '[a]'.  Be careful not to accidentally create an empty set, which
869     will never match.
870
871   *
872     The '{+}' operator computes the union of two character classes.
873     For example, '[a-z]{+}[0-9]' is the same as '[a-z0-9]'.  This
874     operator is useful when preceded by the result of a difference
875     operation, as in, '[[:alpha:]]{-}[[:lower:]]{+}[q]', which is
876     equivalent to '[A-Zq]' in the "C" locale.
877
878   * A rule can have at most one instance of trailing context (the '/'
879     operator or the '$' operator).  The start condition, '^', and
880     '<<EOF>>' patterns can only occur at the beginning of a pattern,
881     and, as well as with '/' and '$', cannot be grouped inside
882     parentheses.  A '^' which does not occur at the beginning of a rule
883     or a '$' which does not occur at the end of a rule loses its
884     special properties and is treated as a normal character.
885
886   * The following are invalid:
887
888              foo/bar$
889              <sc1>foo<sc2>bar
890
891     Note that the first of these can be written 'foo/bar\n'.
892
893   * The following will result in '$' or '^' being treated as a normal
894     character:
895
896              foo|(bar$)
897              foo|^bar
898
899     If the desired meaning is a 'foo' or a 'bar'-followed-by-a-newline,
900     the following could be used (the special '|' action is explained
901     below, *note Actions::):
902
903              foo      |
904              bar$     /* action goes here */
905
906     A similar trick will work for matching a 'foo' or a
907     'bar'-at-the-beginning-of-a-line.
908
909
910File: flex.info,  Node: Matching,  Next: Actions,  Prev: Patterns,  Up: Top
911
9127 How the Input Is Matched
913**************************
914
915When the generated scanner is run, it analyzes its input looking for
916strings which match any of its patterns.  If it finds more than one
917match, it takes the one matching the most text (for trailing context
918rules, this includes the length of the trailing part, even though it
919will then be returned to the input).  If it finds two or more matches of
920the same length, the rule listed first in the 'flex' input file is
921chosen.
922
923   Once the match is determined, the text corresponding to the match
924(called the "token") is made available in the global character pointer
925'yytext', and its length in the global integer 'yyleng'.  The "action"
926corresponding to the matched pattern is then executed (*note Actions::),
927and then the remaining input is scanned for another match.
928
929   If no match is found, then the "default rule" is executed: the next
930character in the input is considered matched and copied to the standard
931output.  Thus, the simplest valid 'flex' input is:
932
933         %%
934
935   which generates a scanner that simply copies its input (one character
936at a time) to its output.
937
938   Note that 'yytext' can be defined in two different ways: either as a
939character _pointer_ or as a character _array_.  You can control which
940definition 'flex' uses by including one of the special directives
941'%pointer' or '%array' in the first (definitions) section of your flex
942input.  The default is '%pointer', unless you use the '-l' lex
943compatibility option, in which case 'yytext' will be an array.  The
944advantage of using '%pointer' is substantially faster scanning and no
945buffer overflow when matching very large tokens (unless you run out of
946dynamic memory).  The disadvantage is that you are restricted in how
947your actions can modify 'yytext' (*note Actions::), and calls to the
948'unput()' function destroys the present contents of 'yytext', which can
949be a considerable porting headache when moving between different 'lex'
950versions.
951
952   The advantage of '%array' is that you can then modify 'yytext' to
953your heart's content, and calls to 'unput()' do not destroy 'yytext'
954(*note Actions::).  Furthermore, existing 'lex' programs sometimes
955access 'yytext' externally using declarations of the form:
956
957         extern char yytext[];
958
959   This definition is erroneous when used with '%pointer', but correct
960for '%array'.
961
962   The '%array' declaration defines 'yytext' to be an array of 'YYLMAX'
963characters, which defaults to a fairly large value.  You can change the
964size by simply #define'ing 'YYLMAX' to a different value in the first
965section of your 'flex' input.  As mentioned above, with '%pointer'
966yytext grows dynamically to accommodate large tokens.  While this means
967your '%pointer' scanner can accommodate very large tokens (such as
968matching entire blocks of comments), bear in mind that each time the
969scanner must resize 'yytext' it also must rescan the entire token from
970the beginning, so matching such tokens can prove slow.  'yytext'
971presently does _not_ dynamically grow if a call to 'unput()' results in
972too much text being pushed back; instead, a run-time error results.
973
974   Also note that you cannot use '%array' with C++ scanner classes
975(*note Cxx::).
976
977
978File: flex.info,  Node: Actions,  Next: Generated Scanner,  Prev: Matching,  Up: Top
979
9808 Actions
981*********
982
983Each pattern in a rule has a corresponding "action", which can be any
984arbitrary C statement.  The pattern ends at the first non-escaped
985whitespace character; the remainder of the line is its action.  If the
986action is empty, then when the pattern is matched the input token is
987simply discarded.  For example, here is the specification for a program
988which deletes all occurrences of 'zap me' from its input:
989
990         %%
991         "zap me"
992
993   This example will copy all other characters in the input to the
994output since they will be matched by the default rule.
995
996   Here is a program which compresses multiple blanks and tabs down to a
997single blank, and throws away whitespace found at the end of a line:
998
999         %%
1000         [ \t]+        putchar( ' ' );
1001         [ \t]+$       /* ignore this token */
1002
1003   If the action contains a '{', then the action spans till the
1004balancing '}' is found, and the action may cross multiple lines.  'flex'
1005knows about C strings and comments and won't be fooled by braces found
1006within them, but also allows actions to begin with '%{' and will
1007consider the action to be all the text up to the next '%}' (regardless
1008of ordinary braces inside the action).
1009
1010   An action consisting solely of a vertical bar ('|') means "same as
1011the action for the next rule".  See below for an illustration.
1012
1013   Actions can include arbitrary C code, including 'return' statements
1014to return a value to whatever routine called 'yylex()'.  Each time
1015'yylex()' is called it continues processing tokens from where it last
1016left off until it either reaches the end of the file or executes a
1017return.
1018
1019   Actions are free to modify 'yytext' except for lengthening it (adding
1020characters to its end-these will overwrite later characters in the input
1021stream).  This however does not apply when using '%array' (*note
1022Matching::).  In that case, 'yytext' may be freely modified in any way.
1023
1024   Actions are free to modify 'yyleng' except they should not do so if
1025the action also includes use of 'yymore()' (see below).
1026
1027   There are a number of special directives which can be included within
1028an action:
1029
1030'ECHO'
1031     copies yytext to the scanner's output.
1032
1033'BEGIN'
1034     followed by the name of a start condition places the scanner in the
1035     corresponding start condition (see below).
1036
1037'REJECT'
1038     directs the scanner to proceed on to the "second best" rule which
1039     matched the input (or a prefix of the input).  The rule is chosen
1040     as described above in *note Matching::, and 'yytext' and 'yyleng'
1041     set up appropriately.  It may either be one which matched as much
1042     text as the originally chosen rule but came later in the 'flex'
1043     input file, or one which matched less text.  For example, the
1044     following will both count the words in the input and call the
1045     routine 'special()' whenever 'frob' is seen:
1046
1047                      int word_count = 0;
1048              %%
1049
1050              frob        special(); REJECT;
1051              [^ \t\n]+   ++word_count;
1052
1053     Without the 'REJECT', any occurrences of 'frob' in the input would
1054     not be counted as words, since the scanner normally executes only
1055     one action per token.  Multiple uses of 'REJECT' are allowed, each
1056     one finding the next best choice to the currently active rule.  For
1057     example, when the following scanner scans the token 'abcd', it will
1058     write 'abcdabcaba' to the output:
1059
1060              %%
1061              a        |
1062              ab       |
1063              abc      |
1064              abcd     ECHO; REJECT;
1065              .|\n     /* eat up any unmatched character */
1066
1067     The first three rules share the fourth's action since they use the
1068     special '|' action.
1069
1070     'REJECT' is a particularly expensive feature in terms of scanner
1071     performance; if it is used in _any_ of the scanner's actions it
1072     will slow down _all_ of the scanner's matching.  Furthermore,
1073     'REJECT' cannot be used with the '-Cf' or '-CF' options (*note
1074     Scanner Options::).
1075
1076     Note also that unlike the other special actions, 'REJECT' is a
1077     _branch_.  Code immediately following it in the action will _not_
1078     be executed.
1079
1080'yymore()'
1081     tells the scanner that the next time it matches a rule, the
1082     corresponding token should be _appended_ onto the current value of
1083     'yytext' rather than replacing it.  For example, given the input
1084     'mega-kludge' the following will write 'mega-mega-kludge' to the
1085     output:
1086
1087              %%
1088              mega-    ECHO; yymore();
1089              kludge   ECHO;
1090
1091     First 'mega-' is matched and echoed to the output.  Then 'kludge'
1092     is matched, but the previous 'mega-' is still hanging around at the
1093     beginning of 'yytext' so the 'ECHO' for the 'kludge' rule will
1094     actually write 'mega-kludge'.
1095
1096   Two notes regarding use of 'yymore()'.  First, 'yymore()' depends on
1097the value of 'yyleng' correctly reflecting the size of the current
1098token, so you must not modify 'yyleng' if you are using 'yymore()'.
1099Second, the presence of 'yymore()' in the scanner's action entails a
1100minor performance penalty in the scanner's matching speed.
1101
1102   'yyless(n)' returns all but the first 'n' characters of the current
1103token back to the input stream, where they will be rescanned when the
1104scanner looks for the next match.  'yytext' and 'yyleng' are adjusted
1105appropriately (e.g., 'yyleng' will now be equal to 'n').  For example,
1106on the input 'foobar' the following will write out 'foobarbar':
1107
1108         %%
1109         foobar    ECHO; yyless(3);
1110         [a-z]+    ECHO;
1111
1112   An argument of 0 to 'yyless()' will cause the entire current input
1113string to be scanned again.  Unless you've changed how the scanner will
1114subsequently process its input (using 'BEGIN', for example), this will
1115result in an endless loop.
1116
1117   Note that 'yyless()' is a macro and can only be used in the flex
1118input file, not from other source files.
1119
1120   'unput(c)' puts the character 'c' back onto the input stream.  It
1121will be the next character scanned.  The following action will take the
1122current token and cause it to be rescanned enclosed in parentheses.
1123
1124         {
1125         int i;
1126         /* Copy yytext because unput() trashes yytext */
1127         char *yycopy = strdup( yytext );
1128         unput( ')' );
1129         for ( i = yyleng - 1; i >= 0; --i )
1130             unput( yycopy[i] );
1131         unput( '(' );
1132         free( yycopy );
1133         }
1134
1135   Note that since each 'unput()' puts the given character back at the
1136_beginning_ of the input stream, pushing back strings must be done
1137back-to-front.
1138
1139   An important potential problem when using 'unput()' is that if you
1140are using '%pointer' (the default), a call to 'unput()' _destroys_ the
1141contents of 'yytext', starting with its rightmost character and
1142devouring one character to the left with each call.  If you need the
1143value of 'yytext' preserved after a call to 'unput()' (as in the above
1144example), you must either first copy it elsewhere, or build your scanner
1145using '%array' instead (*note Matching::).
1146
1147   Finally, note that you cannot put back 'EOF' to attempt to mark the
1148input stream with an end-of-file.
1149
1150   'input()' reads the next character from the input stream.  For
1151example, the following is one way to eat up C comments:
1152
1153         %%
1154         "/*"        {
1155                     int c;
1156
1157                     for ( ; ; )
1158                         {
1159                         while ( (c = input()) != '*' &&
1160                                 c != EOF )
1161                             ;    /* eat up text of comment */
1162
1163                         if ( c == '*' )
1164                             {
1165                             while ( (c = input()) == '*' )
1166                                 ;
1167                             if ( c == '/' )
1168                                 break;    /* found the end */
1169                             }
1170
1171                         if ( c == EOF )
1172                             {
1173                             error( "EOF in comment" );
1174                             break;
1175                             }
1176                         }
1177                     }
1178
1179   (Note that if the scanner is compiled using 'C++', then 'input()' is
1180instead referred to as yyinput(), in order to avoid a name clash with
1181the 'C++' stream by the name of 'input'.)
1182
1183   'YY_FLUSH_BUFFER;' flushes the scanner's internal buffer so that the
1184next time the scanner attempts to match a token, it will first refill
1185the buffer using 'YY_INPUT()' (*note Generated Scanner::).  This action
1186is a special case of the more general 'yy_flush_buffer;' function,
1187described below (*note Multiple Input Buffers::)
1188
1189   'yyterminate()' can be used in lieu of a return statement in an
1190action.  It terminates the scanner and returns a 0 to the scanner's
1191caller, indicating "all done".  By default, 'yyterminate()' is also
1192called when an end-of-file is encountered.  It is a macro and may be
1193redefined.
1194
1195
1196File: flex.info,  Node: Generated Scanner,  Next: Start Conditions,  Prev: Actions,  Up: Top
1197
11989 The Generated Scanner
1199***********************
1200
1201The output of 'flex' is the file 'lex.yy.c', which contains the scanning
1202routine 'yylex()', a number of tables used by it for matching tokens,
1203and a number of auxiliary routines and macros.  By default, 'yylex()' is
1204declared as follows:
1205
1206         int yylex()
1207             {
1208             ... various definitions and the actions in here ...
1209             }
1210
1211   (If your environment supports function prototypes, then it will be
1212'int yylex( void )'.)  This definition may be changed by defining the
1213'YY_DECL' macro.  For example, you could use:
1214
1215         #define YY_DECL float lexscan( a, b ) float a, b;
1216
1217   to give the scanning routine the name 'lexscan', returning a float,
1218and taking two floats as arguments.  Note that if you give arguments to
1219the scanning routine using a K&R-style/non-prototyped function
1220declaration, you must terminate the definition with a semi-colon (;).
1221
1222   'flex' generates 'C99' function definitions by default.  Flex used to
1223have the ability to generate obsolete, er, 'traditional', function
1224definitions.  This was to support bootstrapping gcc on old systems.
1225Unfortunately, traditional definitions prevent us from using any
1226standard data types smaller than int (such as short, char, or bool) as
1227function arguments.  Furthermore, traditional definitions support added
1228extra complexity in the skeleton file.  For this reason, current
1229versions of 'flex' generate standard C99 code only, leaving K&R-style
1230functions to the historians.
1231
1232   Whenever 'yylex()' is called, it scans tokens from the global input
1233file 'yyin' (which defaults to stdin).  It continues until it either
1234reaches an end-of-file (at which point it returns the value 0) or one of
1235its actions executes a 'return' statement.
1236
1237   If the scanner reaches an end-of-file, subsequent calls are undefined
1238unless either 'yyin' is pointed at a new input file (in which case
1239scanning continues from that file), or 'yyrestart()' is called.
1240'yyrestart()' takes one argument, a 'FILE *' pointer (which can be NULL,
1241if you've set up 'YY_INPUT' to scan from a source other than 'yyin'),
1242and initializes 'yyin' for scanning from that file.  Essentially there
1243is no difference between just assigning 'yyin' to a new input file or
1244using 'yyrestart()' to do so; the latter is available for compatibility
1245with previous versions of 'flex', and because it can be used to switch
1246input files in the middle of scanning.  It can also be used to throw
1247away the current input buffer, by calling it with an argument of 'yyin';
1248but it would be better to use 'YY_FLUSH_BUFFER' (*note Actions::).  Note
1249that 'yyrestart()' does _not_ reset the start condition to 'INITIAL'
1250(*note Start Conditions::).
1251
1252   If 'yylex()' stops scanning due to executing a 'return' statement in
1253one of the actions, the scanner may then be called again and it will
1254resume scanning where it left off.
1255
1256   By default (and for purposes of efficiency), the scanner uses
1257block-reads rather than simple 'getc()' calls to read characters from
1258'yyin'.  The nature of how it gets its input can be controlled by
1259defining the 'YY_INPUT' macro.  The calling sequence for 'YY_INPUT()' is
1260'YY_INPUT(buf,result,max_size)'.  Its action is to place up to
1261'max_size' characters in the character array 'buf' and return in the
1262integer variable 'result' either the number of characters read or the
1263constant 'YY_NULL' (0 on Unix systems) to indicate 'EOF'.  The default
1264'YY_INPUT' reads from the global file-pointer 'yyin'.
1265
1266   Here is a sample definition of 'YY_INPUT' (in the definitions section
1267of the input file):
1268
1269         %{
1270         #define YY_INPUT(buf,result,max_size) \
1271             { \
1272             int c = getchar(); \
1273             result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \
1274             }
1275         %}
1276
1277   This definition will change the input processing to occur one
1278character at a time.
1279
1280   When the scanner receives an end-of-file indication from YY_INPUT, it
1281then checks the 'yywrap()' function.  If 'yywrap()' returns false
1282(zero), then it is assumed that the function has gone ahead and set up
1283'yyin' to point to another input file, and scanning continues.  If it
1284returns true (non-zero), then the scanner terminates, returning 0 to its
1285caller.  Note that in either case, the start condition remains
1286unchanged; it does _not_ revert to 'INITIAL'.
1287
1288   If you do not supply your own version of 'yywrap()', then you must
1289either use '%option noyywrap' (in which case the scanner behaves as
1290though 'yywrap()' returned 1), or you must link with '-lfl' to obtain
1291the default version of the routine, which always returns 1.
1292
1293   For scanning from in-memory buffers (e.g., scanning strings), see
1294*note Scanning Strings::.  *Note Multiple Input Buffers::.
1295
1296   The scanner writes its 'ECHO' output to the 'yyout' global (default,
1297'stdout'), which may be redefined by the user simply by assigning it to
1298some other 'FILE' pointer.
1299
1300
1301File: flex.info,  Node: Start Conditions,  Next: Multiple Input Buffers,  Prev: Generated Scanner,  Up: Top
1302
130310 Start Conditions
1304*******************
1305
1306'flex' provides a mechanism for conditionally activating rules.  Any
1307rule whose pattern is prefixed with '<sc>' will only be active when the
1308scanner is in the "start condition" named 'sc'.  For example,
1309
1310         <STRING>[^"]*        { /* eat up the string body ... */
1311                     ...
1312                     }
1313
1314   will be active only when the scanner is in the 'STRING' start
1315condition, and
1316
1317         <INITIAL,STRING,QUOTE>\.        { /* handle an escape ... */
1318                     ...
1319                     }
1320
1321   will be active only when the current start condition is either
1322'INITIAL', 'STRING', or 'QUOTE'.
1323
1324   Start conditions are declared in the definitions (first) section of
1325the input using unindented lines beginning with either '%s' or '%x'
1326followed by a list of names.  The former declares "inclusive" start
1327conditions, the latter "exclusive" start conditions.  A start condition
1328is activated using the 'BEGIN' action.  Until the next 'BEGIN' action is
1329executed, rules with the given start condition will be active and rules
1330with other start conditions will be inactive.  If the start condition is
1331inclusive, then rules with no start conditions at all will also be
1332active.  If it is exclusive, then _only_ rules qualified with the start
1333condition will be active.  A set of rules contingent on the same
1334exclusive start condition describe a scanner which is independent of any
1335of the other rules in the 'flex' input.  Because of this, exclusive
1336start conditions make it easy to specify "mini-scanners" which scan
1337portions of the input that are syntactically different from the rest
1338(e.g., comments).
1339
1340   If the distinction between inclusive and exclusive start conditions
1341is still a little vague, here's a simple example illustrating the
1342connection between the two.  The set of rules:
1343
1344         %s example
1345         %%
1346
1347         <example>foo   do_something();
1348
1349         bar            something_else();
1350
1351   is equivalent to
1352
1353         %x example
1354         %%
1355
1356         <example>foo   do_something();
1357
1358         <INITIAL,example>bar    something_else();
1359
1360   Without the '<INITIAL,example>' qualifier, the 'bar' pattern in the
1361second example wouldn't be active (i.e., couldn't match) when in start
1362condition 'example'.  If we just used '<example>' to qualify 'bar',
1363though, then it would only be active in 'example' and not in 'INITIAL',
1364while in the first example it's active in both, because in the first
1365example the 'example' start condition is an inclusive '(%s)' start
1366condition.
1367
1368   Also note that the special start-condition specifier '<*>' matches
1369every start condition.  Thus, the above example could also have been
1370written:
1371
1372         %x example
1373         %%
1374
1375         <example>foo   do_something();
1376
1377         <*>bar    something_else();
1378
1379   The default rule (to 'ECHO' any unmatched character) remains active
1380in start conditions.  It is equivalent to:
1381
1382         <*>.|\n     ECHO;
1383
1384   'BEGIN(0)' returns to the original state where only the rules with no
1385start conditions are active.  This state can also be referred to as the
1386start-condition 'INITIAL', so 'BEGIN(INITIAL)' is equivalent to
1387'BEGIN(0)'.  (The parentheses around the start condition name are not
1388required but are considered good style.)
1389
1390   'BEGIN' actions can also be given as indented code at the beginning
1391of the rules section.  For example, the following will cause the scanner
1392to enter the 'SPECIAL' start condition whenever 'yylex()' is called and
1393the global variable 'enter_special' is true:
1394
1395                 int enter_special;
1396
1397         %x SPECIAL
1398         %%
1399                 if ( enter_special )
1400                     BEGIN(SPECIAL);
1401
1402         <SPECIAL>blahblahblah
1403         ...more rules follow...
1404
1405   To illustrate the uses of start conditions, here is a scanner which
1406provides two different interpretations of a string like '123.456'.  By
1407default it will treat it as three tokens, the integer '123', a dot
1408('.'), and the integer '456'.  But if the string is preceded earlier in
1409the line by the string 'expect-floats' it will treat it as a single
1410token, the floating-point number '123.456':
1411
1412         %{
1413         #include <math.h>
1414         %}
1415         %s expect
1416
1417         %%
1418         expect-floats        BEGIN(expect);
1419
1420         <expect>[0-9]+.[0-9]+      {
1421                     printf( "found a float, = %f\n",
1422                             atof( yytext ) );
1423                     }
1424         <expect>\n           {
1425                     /* that's the end of the line, so
1426                      * we need another "expect-number"
1427                      * before we'll recognize any more
1428                      * numbers
1429                      */
1430                     BEGIN(INITIAL);
1431                     }
1432
1433         [0-9]+      {
1434                     printf( "found an integer, = %d\n",
1435                             atoi( yytext ) );
1436                     }
1437
1438         "."         printf( "found a dot\n" );
1439
1440   Here is a scanner which recognizes (and discards) C comments while
1441maintaining a count of the current input line.
1442
1443         %x comment
1444         %%
1445                 int line_num = 1;
1446
1447         "/*"         BEGIN(comment);
1448
1449         <comment>[^*\n]*        /* eat anything that's not a '*' */
1450         <comment>"*"+[^*/\n]*   /* eat up '*'s not followed by '/'s */
1451         <comment>\n             ++line_num;
1452         <comment>"*"+"/"        BEGIN(INITIAL);
1453
1454   This scanner goes to a bit of trouble to match as much text as
1455possible with each rule.  In general, when attempting to write a
1456high-speed scanner try to match as much possible in each rule, as it's a
1457big win.
1458
1459   Note that start-conditions names are really integer values and can be
1460stored as such.  Thus, the above could be extended in the following
1461fashion:
1462
1463         %x comment foo
1464         %%
1465                 int line_num = 1;
1466                 int comment_caller;
1467
1468         "/*"         {
1469                      comment_caller = INITIAL;
1470                      BEGIN(comment);
1471                      }
1472
1473         ...
1474
1475         <foo>"/*"    {
1476                      comment_caller = foo;
1477                      BEGIN(comment);
1478                      }
1479
1480         <comment>[^*\n]*        /* eat anything that's not a '*' */
1481         <comment>"*"+[^*/\n]*   /* eat up '*'s not followed by '/'s */
1482         <comment>\n             ++line_num;
1483         <comment>"*"+"/"        BEGIN(comment_caller);
1484
1485   Furthermore, you can access the current start condition using the
1486integer-valued 'YY_START' macro.  For example, the above assignments to
1487'comment_caller' could instead be written
1488
1489         comment_caller = YY_START;
1490
1491   Flex provides 'YYSTATE' as an alias for 'YY_START' (since that is
1492what's used by AT&T 'lex').
1493
1494   For historical reasons, start conditions do not have their own
1495name-space within the generated scanner.  The start condition names are
1496unmodified in the generated scanner and generated header.  *Note
1497option-header::.  *Note option-prefix::.
1498
1499   Finally, here's an example of how to match C-style quoted strings
1500using exclusive start conditions, including expanded escape sequences
1501(but not including checking for a string that's too long):
1502
1503         %x str
1504
1505         %%
1506                 char string_buf[MAX_STR_CONST];
1507                 char *string_buf_ptr;
1508
1509
1510         \"      string_buf_ptr = string_buf; BEGIN(str);
1511
1512         <str>\"        { /* saw closing quote - all done */
1513                 BEGIN(INITIAL);
1514                 *string_buf_ptr = '\0';
1515                 /* return string constant token type and
1516                  * value to parser
1517                  */
1518                 }
1519
1520         <str>\n        {
1521                 /* error - unterminated string constant */
1522                 /* generate error message */
1523                 }
1524
1525         <str>\\[0-7]{1,3} {
1526                 /* octal escape sequence */
1527                 int result;
1528
1529                 (void) sscanf( yytext + 1, "%o", &result );
1530
1531                 if ( result > 0xff )
1532                         /* error, constant is out-of-bounds */
1533
1534                 *string_buf_ptr++ = result;
1535                 }
1536
1537         <str>\\[0-9]+ {
1538                 /* generate error - bad escape sequence; something
1539                  * like '\48' or '\0777777'
1540                  */
1541                 }
1542
1543         <str>\\n  *string_buf_ptr++ = '\n';
1544         <str>\\t  *string_buf_ptr++ = '\t';
1545         <str>\\r  *string_buf_ptr++ = '\r';
1546         <str>\\b  *string_buf_ptr++ = '\b';
1547         <str>\\f  *string_buf_ptr++ = '\f';
1548
1549         <str>\\(.|\n)  *string_buf_ptr++ = yytext[1];
1550
1551         <str>[^\\\n\"]+        {
1552                 char *yptr = yytext;
1553
1554                 while ( *yptr )
1555                         *string_buf_ptr++ = *yptr++;
1556                 }
1557
1558   Often, such as in some of the examples above, you wind up writing a
1559whole bunch of rules all preceded by the same start condition(s).  Flex
1560makes this a little easier and cleaner by introducing a notion of start
1561condition "scope".  A start condition scope is begun with:
1562
1563         <SCs>{
1564
1565   where '<SCs>' is a list of one or more start conditions.  Inside the
1566start condition scope, every rule automatically has the prefix '<SCs>'
1567applied to it, until a '}' which matches the initial '{'.  So, for
1568example,
1569
1570         <ESC>{
1571             "\\n"   return '\n';
1572             "\\r"   return '\r';
1573             "\\f"   return '\f';
1574             "\\0"   return '\0';
1575         }
1576
1577   is equivalent to:
1578
1579         <ESC>"\\n"  return '\n';
1580         <ESC>"\\r"  return '\r';
1581         <ESC>"\\f"  return '\f';
1582         <ESC>"\\0"  return '\0';
1583
1584   Start condition scopes may be nested.
1585
1586   The following routines are available for manipulating stacks of start
1587conditions:
1588
1589 -- Function: void yy_push_state ( int 'new_state' )
1590     pushes the current start condition onto the top of the start
1591     condition stack and switches to 'new_state' as though you had used
1592     'BEGIN new_state' (recall that start condition names are also
1593     integers).
1594
1595 -- Function: void yy_pop_state ()
1596     pops the top of the stack and switches to it via 'BEGIN'.
1597
1598 -- Function: int yy_top_state ()
1599     returns the top of the stack without altering the stack's contents.
1600
1601   The start condition stack grows dynamically and so has no built-in
1602size limitation.  If memory is exhausted, program execution aborts.
1603
1604   To use start condition stacks, your scanner must include a '%option
1605stack' directive (*note Scanner Options::).
1606
1607
1608File: flex.info,  Node: Multiple Input Buffers,  Next: EOF,  Prev: Start Conditions,  Up: Top
1609
161011 Multiple Input Buffers
1611*************************
1612
1613Some scanners (such as those which support "include" files) require
1614reading from several input streams.  As 'flex' scanners do a large
1615amount of buffering, one cannot control where the next input will be
1616read from by simply writing a 'YY_INPUT()' which is sensitive to the
1617scanning context.  'YY_INPUT()' is only called when the scanner reaches
1618the end of its buffer, which may be a long time after scanning a
1619statement such as an 'include' statement which requires switching the
1620input source.
1621
1622   To negotiate these sorts of problems, 'flex' provides a mechanism for
1623creating and switching between multiple input buffers.  An input buffer
1624is created by using:
1625
1626 -- Function: YY_BUFFER_STATE yy_create_buffer ( FILE *file, int size )
1627
1628   which takes a 'FILE' pointer and a size and creates a buffer
1629associated with the given file and large enough to hold 'size'
1630characters (when in doubt, use 'YY_BUF_SIZE' for the size).  It returns
1631a 'YY_BUFFER_STATE' handle, which may then be passed to other routines
1632(see below).  The 'YY_BUFFER_STATE' type is a pointer to an opaque
1633'struct yy_buffer_state' structure, so you may safely initialize
1634'YY_BUFFER_STATE' variables to '((YY_BUFFER_STATE) 0)' if you wish, and
1635also refer to the opaque structure in order to correctly declare input
1636buffers in source files other than that of your scanner.  Note that the
1637'FILE' pointer in the call to 'yy_create_buffer' is only used as the
1638value of 'yyin' seen by 'YY_INPUT'.  If you redefine 'YY_INPUT()' so it
1639no longer uses 'yyin', then you can safely pass a NULL 'FILE' pointer to
1640'yy_create_buffer'.  You select a particular buffer to scan from using:
1641
1642 -- Function: void yy_switch_to_buffer ( YY_BUFFER_STATE new_buffer )
1643
1644   The above function switches the scanner's input buffer so subsequent
1645tokens will come from 'new_buffer'.  Note that 'yy_switch_to_buffer()'
1646may be used by 'yywrap()' to set things up for continued scanning,
1647instead of opening a new file and pointing 'yyin' at it.  If you are
1648looking for a stack of input buffers, then you want to use
1649'yypush_buffer_state()' instead of this function.  Note also that
1650switching input sources via either 'yy_switch_to_buffer()' or 'yywrap()'
1651does _not_ change the start condition.
1652
1653 -- Function: void yy_delete_buffer ( YY_BUFFER_STATE buffer )
1654
1655   is used to reclaim the storage associated with a buffer.  ('buffer'
1656can be NULL, in which case the routine does nothing.)  You can also
1657clear the current contents of a buffer using:
1658
1659 -- Function: void yypush_buffer_state ( YY_BUFFER_STATE buffer )
1660
1661   This function pushes the new buffer state onto an internal stack.
1662The pushed state becomes the new current state.  The stack is maintained
1663by flex and will grow as required.  This function is intended to be used
1664instead of 'yy_switch_to_buffer', when you want to change states, but
1665preserve the current state for later use.
1666
1667 -- Function: void yypop_buffer_state ( )
1668
1669   This function removes the current state from the top of the stack,
1670and deletes it by calling 'yy_delete_buffer'.  The next state on the
1671stack, if any, becomes the new current state.
1672
1673 -- Function: void yy_flush_buffer ( YY_BUFFER_STATE buffer )
1674
1675   This function discards the buffer's contents, so the next time the
1676scanner attempts to match a token from the buffer, it will first fill
1677the buffer anew using 'YY_INPUT()'.
1678
1679 -- Function: YY_BUFFER_STATE yy_new_buffer ( FILE *file, int size )
1680
1681   is an alias for 'yy_create_buffer()', provided for compatibility with
1682the C++ use of 'new' and 'delete' for creating and destroying dynamic
1683objects.
1684
1685   'YY_CURRENT_BUFFER' macro returns a 'YY_BUFFER_STATE' handle to the
1686current buffer.  It should not be used as an lvalue.
1687
1688   Here are two examples of using these features for writing a scanner
1689which expands include files (the '<<EOF>>' feature is discussed below).
1690
1691   This first example uses yypush_buffer_state and yypop_buffer_state.
1692Flex maintains the stack internally.
1693
1694         /* the "incl" state is used for picking up the name
1695          * of an include file
1696          */
1697         %x incl
1698         %%
1699         include             BEGIN(incl);
1700
1701         [a-z]+              ECHO;
1702         [^a-z\n]*\n?        ECHO;
1703
1704         <incl>[ \t]*      /* eat the whitespace */
1705         <incl>[^ \t\n]+   { /* got the include file name */
1706                 yyin = fopen( yytext, "r" );
1707
1708                 if ( ! yyin )
1709                     error( ... );
1710
1711     			yypush_buffer_state(yy_create_buffer( yyin, YY_BUF_SIZE ));
1712
1713                 BEGIN(INITIAL);
1714                 }
1715
1716         <<EOF>> {
1717     			yypop_buffer_state();
1718
1719                 if ( !YY_CURRENT_BUFFER )
1720                     {
1721                     yyterminate();
1722                     }
1723                 }
1724
1725   The second example, below, does the same thing as the previous
1726example did, but manages its own input buffer stack manually (instead of
1727letting flex do it).
1728
1729         /* the "incl" state is used for picking up the name
1730          * of an include file
1731          */
1732         %x incl
1733
1734         %{
1735         #define MAX_INCLUDE_DEPTH 10
1736         YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH];
1737         int include_stack_ptr = 0;
1738         %}
1739
1740         %%
1741         include             BEGIN(incl);
1742
1743         [a-z]+              ECHO;
1744         [^a-z\n]*\n?        ECHO;
1745
1746         <incl>[ \t]*      /* eat the whitespace */
1747         <incl>[^ \t\n]+   { /* got the include file name */
1748                 if ( include_stack_ptr >= MAX_INCLUDE_DEPTH )
1749                     {
1750                     fprintf( stderr, "Includes nested too deeply" );
1751                     exit( 1 );
1752                     }
1753
1754                 include_stack[include_stack_ptr++] =
1755                     YY_CURRENT_BUFFER;
1756
1757                 yyin = fopen( yytext, "r" );
1758
1759                 if ( ! yyin )
1760                     error( ... );
1761
1762                 yy_switch_to_buffer(
1763                     yy_create_buffer( yyin, YY_BUF_SIZE ) );
1764
1765                 BEGIN(INITIAL);
1766                 }
1767
1768         <<EOF>> {
1769                 if ( --include_stack_ptr == 0 )
1770                     {
1771                     yyterminate();
1772                     }
1773
1774                 else
1775                     {
1776                     yy_delete_buffer( YY_CURRENT_BUFFER );
1777                     yy_switch_to_buffer(
1778                          include_stack[include_stack_ptr] );
1779                     }
1780                 }
1781
1782   The following routines are available for setting up input buffers for
1783scanning in-memory strings instead of files.  All of them create a new
1784input buffer for scanning the string, and return a corresponding
1785'YY_BUFFER_STATE' handle (which you should delete with
1786'yy_delete_buffer()' when done with it).  They also switch to the new
1787buffer using 'yy_switch_to_buffer()', so the next call to 'yylex()' will
1788start scanning the string.
1789
1790 -- Function: YY_BUFFER_STATE yy_scan_string ( const char *str )
1791     scans a NUL-terminated string.
1792
1793 -- Function: YY_BUFFER_STATE yy_scan_bytes ( const char *bytes, int len
1794          )
1795     scans 'len' bytes (including possibly 'NUL's) starting at location
1796     'bytes'.
1797
1798   Note that both of these functions create and scan a _copy_ of the
1799string or bytes.  (This may be desirable, since 'yylex()' modifies the
1800contents of the buffer it is scanning.)  You can avoid the copy by
1801using:
1802
1803 -- Function: YY_BUFFER_STATE yy_scan_buffer (char *base, yy_size_t
1804          size)
1805     which scans in place the buffer starting at 'base', consisting of
1806     'size' bytes, the last two bytes of which _must_ be
1807     'YY_END_OF_BUFFER_CHAR' (ASCII NUL). These last two bytes are not
1808     scanned; thus, scanning consists of 'base[0]' through
1809     'base[size-2]', inclusive.
1810
1811   If you fail to set up 'base' in this manner (i.e., forget the final
1812two 'YY_END_OF_BUFFER_CHAR' bytes), then 'yy_scan_buffer()' returns a
1813NULL pointer instead of creating a new input buffer.
1814
1815 -- Data type: yy_size_t
1816     is an integral type to which you can cast an integer expression
1817     reflecting the size of the buffer.
1818
1819
1820File: flex.info,  Node: EOF,  Next: Misc Macros,  Prev: Multiple Input Buffers,  Up: Top
1821
182212 End-of-File Rules
1823********************
1824
1825The special rule '<<EOF>>' indicates actions which are to be taken when
1826an end-of-file is encountered and 'yywrap()' returns non-zero (i.e.,
1827indicates no further files to process).  The action must finish by doing
1828one of the following things:
1829
1830   * assigning 'yyin' to a new input file (in previous versions of
1831     'flex', after doing the assignment you had to call the special
1832     action 'YY_NEW_FILE'.  This is no longer necessary.)
1833
1834   * executing a 'return' statement;
1835
1836   * executing the special 'yyterminate()' action.
1837
1838   * or, switching to a new buffer using 'yy_switch_to_buffer()' as
1839     shown in the example above.
1840
1841   <<EOF>> rules may not be used with other patterns; they may only be
1842qualified with a list of start conditions.  If an unqualified <<EOF>>
1843rule is given, it applies to _all_ start conditions which do not already
1844have <<EOF>> actions.  To specify an <<EOF>> rule for only the initial
1845start condition, use:
1846
1847         <INITIAL><<EOF>>
1848
1849   These rules are useful for catching things like unclosed comments.
1850An example:
1851
1852         %x quote
1853         %%
1854
1855         ...other rules for dealing with quotes...
1856
1857         <quote><<EOF>>   {
1858                  error( "unterminated quote" );
1859                  yyterminate();
1860                  }
1861        <<EOF>>  {
1862                  if ( *++filelist )
1863                      yyin = fopen( *filelist, "r" );
1864                  else
1865                     yyterminate();
1866                  }
1867
1868
1869File: flex.info,  Node: Misc Macros,  Next: User Values,  Prev: EOF,  Up: Top
1870
187113 Miscellaneous Macros
1872***********************
1873
1874The macro 'YY_USER_ACTION' can be defined to provide an action which is
1875always executed prior to the matched rule's action.  For example, it
1876could be #define'd to call a routine to convert yytext to lower-case.
1877When 'YY_USER_ACTION' is invoked, the variable 'yy_act' gives the number
1878of the matched rule (rules are numbered starting with 1).  Suppose you
1879want to profile how often each of your rules is matched.  The following
1880would do the trick:
1881
1882         #define YY_USER_ACTION ++ctr[yy_act]
1883
1884   where 'ctr' is an array to hold the counts for the different rules.
1885Note that the macro 'YY_NUM_RULES' gives the total number of rules
1886(including the default rule), even if you use '-s)', so a correct
1887declaration for 'ctr' is:
1888
1889         int ctr[YY_NUM_RULES];
1890
1891   The macro 'YY_USER_INIT' may be defined to provide an action which is
1892always executed before the first scan (and before the scanner's internal
1893initializations are done).  For example, it could be used to call a
1894routine to read in a data table or open a logging file.
1895
1896   The macro 'yy_set_interactive(is_interactive)' can be used to control
1897whether the current buffer is considered "interactive".  An interactive
1898buffer is processed more slowly, but must be used when the scanner's
1899input source is indeed interactive to avoid problems due to waiting to
1900fill buffers (see the discussion of the '-I' flag in *note Scanner
1901Options::).  A non-zero value in the macro invocation marks the buffer
1902as interactive, a zero value as non-interactive.  Note that use of this
1903macro overrides '%option always-interactive' or '%option
1904never-interactive' (*note Scanner Options::).  'yy_set_interactive()'
1905must be invoked prior to beginning to scan the buffer that is (or is
1906not) to be considered interactive.
1907
1908   The macro 'yy_set_bol(at_bol)' can be used to control whether the
1909current buffer's scanning context for the next token match is done as
1910though at the beginning of a line.  A non-zero macro argument makes
1911rules anchored with '^' active, while a zero argument makes '^' rules
1912inactive.
1913
1914   The macro 'YY_AT_BOL()' returns true if the next token scanned from
1915the current buffer will have '^' rules active, false otherwise.
1916
1917   In the generated scanner, the actions are all gathered in one large
1918switch statement and separated using 'YY_BREAK', which may be redefined.
1919By default, it is simply a 'break', to separate each rule's action from
1920the following rule's.  Redefining 'YY_BREAK' allows, for example, C++
1921users to #define YY_BREAK to do nothing (while being very careful that
1922every rule ends with a 'break' or a 'return'!)  to avoid suffering from
1923unreachable statement warnings where because a rule's action ends with
1924'return', the 'YY_BREAK' is inaccessible.
1925
1926
1927File: flex.info,  Node: User Values,  Next: Yacc,  Prev: Misc Macros,  Up: Top
1928
192914 Values Available To the User
1930*******************************
1931
1932This chapter summarizes the various values available to the user in the
1933rule actions.
1934
1935'char *yytext'
1936     holds the text of the current token.  It may be modified but not
1937     lengthened (you cannot append characters to the end).
1938
1939     If the special directive '%array' appears in the first section of
1940     the scanner description, then 'yytext' is instead declared 'char
1941     yytext[YYLMAX]', where 'YYLMAX' is a macro definition that you can
1942     redefine in the first section if you don't like the default value
1943     (generally 8KB). Using '%array' results in somewhat slower
1944     scanners, but the value of 'yytext' becomes immune to calls to
1945     'unput()', which potentially destroy its value when 'yytext' is a
1946     character pointer.  The opposite of '%array' is '%pointer', which
1947     is the default.
1948
1949     You cannot use '%array' when generating C++ scanner classes (the
1950     '-+' flag).
1951
1952'int yyleng'
1953     holds the length of the current token.
1954
1955'FILE *yyin'
1956     is the file which by default 'flex' reads from.  It may be
1957     redefined but doing so only makes sense before scanning begins or
1958     after an EOF has been encountered.  Changing it in the midst of
1959     scanning will have unexpected results since 'flex' buffers its
1960     input; use 'yyrestart()' instead.  Once scanning terminates because
1961     an end-of-file has been seen, you can assign 'yyin' at the new
1962     input file and then call the scanner again to continue scanning.
1963
1964'void yyrestart( FILE *new_file )'
1965     may be called to point 'yyin' at the new input file.  The
1966     switch-over to the new file is immediate (any previously
1967     buffered-up input is lost).  Note that calling 'yyrestart()' with
1968     'yyin' as an argument thus throws away the current input buffer and
1969     continues scanning the same input file.
1970
1971'FILE *yyout'
1972     is the file to which 'ECHO' actions are done.  It can be reassigned
1973     by the user.
1974
1975'YY_CURRENT_BUFFER'
1976     returns a 'YY_BUFFER_STATE' handle to the current buffer.
1977
1978'YY_START'
1979     returns an integer value corresponding to the current start
1980     condition.  You can subsequently use this value with 'BEGIN' to
1981     return to that start condition.
1982
1983
1984File: flex.info,  Node: Yacc,  Next: Scanner Options,  Prev: User Values,  Up: Top
1985
198615 Interfacing with Yacc
1987************************
1988
1989One of the main uses of 'flex' is as a companion to the 'yacc'
1990parser-generator.  'yacc' parsers expect to call a routine named
1991'yylex()' to find the next input token.  The routine is supposed to
1992return the type of the next token as well as putting any associated
1993value in the global 'yylval'.  To use 'flex' with 'yacc', one specifies
1994the '-d' option to 'yacc' to instruct it to generate the file 'y.tab.h'
1995containing definitions of all the '%tokens' appearing in the 'yacc'
1996input.  This file is then included in the 'flex' scanner.  For example,
1997if one of the tokens is 'TOK_NUMBER', part of the scanner might look
1998like:
1999
2000         %{
2001         #include "y.tab.h"
2002         %}
2003
2004         %%
2005
2006         [0-9]+        yylval = atoi( yytext ); return TOK_NUMBER;
2007
2008
2009File: flex.info,  Node: Scanner Options,  Next: Performance,  Prev: Yacc,  Up: Top
2010
201116 Scanner Options
2012******************
2013
2014The various 'flex' options are categorized by function in the following
2015menu.  If you want to lookup a particular option by name, *Note Index of
2016Scanner Options::.
2017
2018* Menu:
2019
2020* Options for Specifying Filenames::
2021* Options Affecting Scanner Behavior::
2022* Code-Level And API Options::
2023* Options for Scanner Speed and Size::
2024* Debugging Options::
2025* Miscellaneous Options::
2026
2027   Even though there are many scanner options, a typical scanner might
2028only specify the following options:
2029
2030     %option   8bit reentrant bison-bridge
2031     %option   warn nodefault
2032     %option   yylineno
2033     %option   outfile="scanner.c" header-file="scanner.h"
2034
2035   The first line specifies the general type of scanner we want.  The
2036second line specifies that we are being careful.  The third line asks
2037flex to track line numbers.  The last line tells flex what to name the
2038files.  (The options can be specified in any order.  We just divided
2039them.)
2040
2041   'flex' also provides a mechanism for controlling options within the
2042scanner specification itself, rather than from the flex command-line.
2043This is done by including '%option' directives in the first section of
2044the scanner specification.  You can specify multiple options with a
2045single '%option' directive, and multiple directives in the first section
2046of your flex input file.
2047
2048   Most options are given simply as names, optionally preceded by the
2049word 'no' (with no intervening whitespace) to negate their meaning.  The
2050names are the same as their long-option equivalents (but without the
2051leading '--' ).
2052
2053   'flex' scans your rule actions to determine whether you use the
2054'REJECT' or 'yymore()' features.  The 'REJECT' and 'yymore' options are
2055available to override its decision as to whether you use the options,
2056either by setting them (e.g., '%option reject)' to indicate the feature
2057is indeed used, or unsetting them to indicate it actually is not used
2058(e.g., '%option noyymore)'.
2059
2060   A number of options are available for lint purists who want to
2061suppress the appearance of unneeded routines in the generated scanner.
2062Each of the following, if unset (e.g., '%option nounput'), results in
2063the corresponding routine not appearing in the generated scanner:
2064
2065         input, unput
2066         yy_push_state, yy_pop_state, yy_top_state
2067         yy_scan_buffer, yy_scan_bytes, yy_scan_string
2068
2069         yyget_extra, yyset_extra, yyget_leng, yyget_text,
2070         yyget_lineno, yyset_lineno, yyget_in, yyset_in,
2071         yyget_out, yyset_out, yyget_lval, yyset_lval,
2072         yyget_lloc, yyset_lloc, yyget_debug, yyset_debug
2073
2074   (though 'yy_push_state()' and friends won't appear anyway unless you
2075use '%option stack)'.
2076
2077
2078File: flex.info,  Node: Options for Specifying Filenames,  Next: Options Affecting Scanner Behavior,  Prev: Scanner Options,  Up: Scanner Options
2079
208016.1 Options for Specifying Filenames
2081=====================================
2082
2083'--header-file=FILE, '%option header-file="FILE"''
2084     instructs flex to write a C header to 'FILE'.  This file contains
2085     function prototypes, extern variables, and types used by the
2086     scanner.  Only the external API is exported by the header file.
2087     Many macros that are usable from within scanner actions are not
2088     exported to the header file.  This is due to namespace problems and
2089     the goal of a clean external API.
2090
2091     While in the header, the macro 'yyIN_HEADER' is defined, where 'yy'
2092     is substituted with the appropriate prefix.
2093
2094     The '--header-file' option is not compatible with the '--c++'
2095     option, since the C++ scanner provides its own header in
2096     'yyFlexLexer.h'.
2097
2098'-oFILE, --outfile=FILE, '%option outfile="FILE"''
2099     directs flex to write the scanner to the file 'FILE' instead of
2100     'lex.yy.c'.  If you combine '--outfile' with the '--stdout' option,
2101     then the scanner is written to 'stdout' but its '#line' directives
2102     (see the '-l' option above) refer to the file 'FILE'.
2103
2104'-t, --stdout, '%option stdout''
2105     instructs 'flex' to write the scanner it generates to standard
2106     output instead of 'lex.yy.c'.
2107
2108'-SFILE, --skel=FILE'
2109     overrides the default skeleton file from which 'flex' constructs
2110     its scanners.  You'll never need this option unless you are doing
2111     'flex' maintenance or development.
2112
2113'--tables-file=FILE'
2114     Write serialized scanner dfa tables to FILE. The generated scanner
2115     will not contain the tables, and requires them to be loaded at
2116     runtime.  *Note serialization::.
2117
2118'--tables-verify'
2119     This option is for flex development.  We document it here in case
2120     you stumble upon it by accident or in case you suspect some
2121     inconsistency in the serialized tables.  Flex will serialize the
2122     scanner dfa tables but will also generate the in-code tables as it
2123     normally does.  At runtime, the scanner will verify that the
2124     serialized tables match the in-code tables, instead of loading
2125     them.
2126
2127
2128File: flex.info,  Node: Options Affecting Scanner Behavior,  Next: Code-Level And API Options,  Prev: Options for Specifying Filenames,  Up: Scanner Options
2129
213016.2 Options Affecting Scanner Behavior
2131=======================================
2132
2133'-i, --case-insensitive, '%option case-insensitive''
2134     instructs 'flex' to generate a "case-insensitive" scanner.  The
2135     case of letters given in the 'flex' input patterns will be ignored,
2136     and tokens in the input will be matched regardless of case.  The
2137     matched text given in 'yytext' will have the preserved case (i.e.,
2138     it will not be folded).  For tricky behavior, see *note case and
2139     character ranges::.
2140
2141'-l, --lex-compat, '%option lex-compat''
2142     turns on maximum compatibility with the original AT&T 'lex'
2143     implementation.  Note that this does not mean _full_ compatibility.
2144     Use of this option costs a considerable amount of performance, and
2145     it cannot be used with the '--c++', '--full', '--fast', '-Cf', or
2146     '-CF' options.  For details on the compatibilities it provides, see
2147     *note Lex and Posix::.  This option also results in the name
2148     'YY_FLEX_LEX_COMPAT' being '#define''d in the generated scanner.
2149
2150'-B, --batch, '%option batch''
2151     instructs 'flex' to generate a "batch" scanner, the opposite of
2152     _interactive_ scanners generated by '--interactive' (see below).
2153     In general, you use '-B' when you are _certain_ that your scanner
2154     will never be used interactively, and you want to squeeze a
2155     _little_ more performance out of it.  If your goal is instead to
2156     squeeze out a _lot_ more performance, you should be using the '-Cf'
2157     or '-CF' options, which turn on '--batch' automatically anyway.
2158
2159'-I, --interactive, '%option interactive''
2160     instructs 'flex' to generate an interactive scanner.  An
2161     interactive scanner is one that only looks ahead to decide what
2162     token has been matched if it absolutely must.  It turns out that
2163     always looking one extra character ahead, even if the scanner has
2164     already seen enough text to disambiguate the current token, is a
2165     bit faster than only looking ahead when necessary.  But scanners
2166     that always look ahead give dreadful interactive performance; for
2167     example, when a user types a newline, it is not recognized as a
2168     newline token until they enter _another_ token, which often means
2169     typing in another whole line.
2170
2171     'flex' scanners default to 'interactive' unless you use the '-Cf'
2172     or '-CF' table-compression options (*note Performance::).  That's
2173     because if you're looking for high-performance you should be using
2174     one of these options, so if you didn't, 'flex' assumes you'd rather
2175     trade off a bit of run-time performance for intuitive interactive
2176     behavior.  Note also that you _cannot_ use '--interactive' in
2177     conjunction with '-Cf' or '-CF'.  Thus, this option is not really
2178     needed; it is on by default for all those cases in which it is
2179     allowed.
2180
2181     You can force a scanner to _not_ be interactive by using '--batch'
2182
2183'-7, --7bit, '%option 7bit''
2184     instructs 'flex' to generate a 7-bit scanner, i.e., one which can
2185     only recognize 7-bit characters in its input.  The advantage of
2186     using '--7bit' is that the scanner's tables can be up to half the
2187     size of those generated using the '--8bit'.  The disadvantage is
2188     that such scanners often hang or crash if their input contains an
2189     8-bit character.
2190
2191     Note, however, that unless you generate your scanner using the
2192     '-Cf' or '-CF' table compression options, use of '--7bit' will save
2193     only a small amount of table space, and make your scanner
2194     considerably less portable.  'Flex''s default behavior is to
2195     generate an 8-bit scanner unless you use the '-Cf' or '-CF', in
2196     which case 'flex' defaults to generating 7-bit scanners unless your
2197     site was always configured to generate 8-bit scanners (as will
2198     often be the case with non-USA sites).  You can tell whether flex
2199     generated a 7-bit or an 8-bit scanner by inspecting the flag
2200     summary in the '--verbose' output as described above.
2201
2202     Note that if you use '-Cfe' or '-CFe' 'flex' still defaults to
2203     generating an 8-bit scanner, since usually with these compression
2204     options full 8-bit tables are not much more expensive than 7-bit
2205     tables.
2206
2207'-8, --8bit, '%option 8bit''
2208     instructs 'flex' to generate an 8-bit scanner, i.e., one which can
2209     recognize 8-bit characters.  This flag is only needed for scanners
2210     generated using '-Cf' or '-CF', as otherwise flex defaults to
2211     generating an 8-bit scanner anyway.
2212
2213     See the discussion of '--7bit' above for 'flex''s default behavior
2214     and the tradeoffs between 7-bit and 8-bit scanners.
2215
2216'--default, '%option default''
2217     generate the default rule.
2218
2219'--always-interactive, '%option always-interactive''
2220     instructs flex to generate a scanner which always considers its
2221     input _interactive_.  Normally, on each new input file the scanner
2222     calls 'isatty()' in an attempt to determine whether the scanner's
2223     input source is interactive and thus should be read a character at
2224     a time.  When this option is used, however, then no such call is
2225     made.
2226
2227'--never-interactive, '--never-interactive''
2228     instructs flex to generate a scanner which never considers its
2229     input interactive.  This is the opposite of 'always-interactive'.
2230
2231'-X, --posix, '%option posix''
2232     turns on maximum compatibility with the POSIX 1003.2-1992
2233     definition of 'lex'.  Since 'flex' was originally designed to
2234     implement the POSIX definition of 'lex' this generally involves
2235     very few changes in behavior.  At the current writing the known
2236     differences between 'flex' and the POSIX standard are:
2237
2238        * In POSIX and AT&T 'lex', the repeat operator, '{}', has lower
2239          precedence than concatenation (thus 'ab{3}' yields 'ababab').
2240          Most POSIX utilities use an Extended Regular Expression (ERE)
2241          precedence that has the precedence of the repeat operator
2242          higher than concatenation (which causes 'ab{3}' to yield
2243          'abbb').  By default, 'flex' places the precedence of the
2244          repeat operator higher than concatenation which matches the
2245          ERE processing of other POSIX utilities.  When either
2246          '--posix' or '-l' are specified, 'flex' will use the
2247          traditional AT&T and POSIX-compliant precedence for the repeat
2248          operator where concatenation has higher precedence than the
2249          repeat operator.
2250
2251'--stack, '%option stack''
2252     enables the use of start condition stacks (*note Start
2253     Conditions::).
2254
2255'--stdinit, '%option stdinit''
2256     if set (i.e., %option stdinit) initializes 'yyin' and 'yyout' to
2257     'stdin' and 'stdout', instead of the default of 'NULL'.  Some
2258     existing 'lex' programs depend on this behavior, even though it is
2259     not compliant with ANSI C, which does not require 'stdin' and
2260     'stdout' to be compile-time constant.  In a reentrant scanner,
2261     however, this is not a problem since initialization is performed in
2262     'yylex_init' at runtime.
2263
2264'--yylineno, '%option yylineno''
2265     directs 'flex' to generate a scanner that maintains the number of
2266     the current line read from its input in the global variable
2267     'yylineno'.  This option is implied by '%option lex-compat'.  In a
2268     reentrant C scanner, the macro 'yylineno' is accessible regardless
2269     of the value of '%option yylineno', however, its value is not
2270     modified by 'flex' unless '%option yylineno' is enabled.
2271
2272'--yywrap, '%option yywrap''
2273     if unset (i.e., '--noyywrap)', makes the scanner not call
2274     'yywrap()' upon an end-of-file, but simply assume that there are no
2275     more files to scan (until the user points 'yyin' at a new file and
2276     calls 'yylex()' again).
2277
2278
2279File: flex.info,  Node: Code-Level And API Options,  Next: Options for Scanner Speed and Size,  Prev: Options Affecting Scanner Behavior,  Up: Scanner Options
2280
228116.3 Code-Level And API Options
2282===============================
2283
2284'--ansi-definitions, '%option ansi-definitions''
2285     Deprecated, ignored
2286
2287'--ansi-prototypes, '%option ansi-prototypes''
2288     Deprecated, ignored
2289
2290'--bison-bridge, '%option bison-bridge''
2291     instructs flex to generate a C scanner that is meant to be called
2292     by a 'GNU bison' parser.  The scanner has minor API changes for
2293     'bison' compatibility.  In particular, the declaration of 'yylex'
2294     is modified to take an additional parameter, 'yylval'.  *Note Bison
2295     Bridge::.
2296
2297'--bison-locations, '%option bison-locations''
2298     instruct flex that 'GNU bison' '%locations' are being used.  This
2299     means 'yylex' will be passed an additional parameter, 'yylloc'.
2300     This option implies '%option bison-bridge'.  *Note Bison Bridge::.
2301
2302'-L, --noline, '%option noline''
2303     instructs 'flex' not to generate '#line' directives.  Without this
2304     option, 'flex' peppers the generated scanner with '#line'
2305     directives so error messages in the actions will be correctly
2306     located with respect to either the original 'flex' input file (if
2307     the errors are due to code in the input file), or 'lex.yy.c' (if
2308     the errors are 'flex''s fault - you should report these sorts of
2309     errors to the email address given in *note Reporting Bugs::).
2310
2311'-R, --reentrant, '%option reentrant''
2312     instructs flex to generate a reentrant C scanner.  The generated
2313     scanner may safely be used in a multi-threaded environment.  The
2314     API for a reentrant scanner is different than for a non-reentrant
2315     scanner *note Reentrant::).  Because of the API difference between
2316     reentrant and non-reentrant 'flex' scanners, non-reentrant flex
2317     code must be modified before it is suitable for use with this
2318     option.  This option is not compatible with the '--c++' option.
2319
2320     The option '--reentrant' does not affect the performance of the
2321     scanner.
2322
2323'-+, --c++, '%option c++''
2324     specifies that you want flex to generate a C++ scanner class.
2325     *Note Cxx::, for details.
2326
2327'--array, '%option array''
2328     specifies that you want yytext to be an array instead of a char*
2329
2330'--pointer, '%option pointer''
2331     specify that 'yytext' should be a 'char *', not an array.  This
2332     default is 'char *'.
2333
2334'-PPREFIX, --prefix=PREFIX, '%option prefix="PREFIX"''
2335     changes the default 'yy' prefix used by 'flex' for all
2336     globally-visible variable and function names to instead be
2337     'PREFIX'.  For example, '--prefix=foo' changes the name of 'yytext'
2338     to 'footext'.  It also changes the name of the default output file
2339     from 'lex.yy.c' to 'lex.foo.c'.  Here is a partial list of the
2340     names affected:
2341
2342              yy_create_buffer
2343              yy_delete_buffer
2344              yy_flex_debug
2345              yy_init_buffer
2346              yy_flush_buffer
2347              yy_load_buffer_state
2348              yy_switch_to_buffer
2349              yyin
2350              yyleng
2351              yylex
2352              yylineno
2353              yyout
2354              yyrestart
2355              yytext
2356              yywrap
2357              yyalloc
2358              yyrealloc
2359              yyfree
2360
2361     (If you are using a C++ scanner, then only 'yywrap' and
2362     'yyFlexLexer' are affected.)  Within your scanner itself, you can
2363     still refer to the global variables and functions using either
2364     version of their name; but externally, they have the modified name.
2365
2366     This option lets you easily link together multiple 'flex' programs
2367     into the same executable.  Note, though, that using this option
2368     also renames 'yywrap()', so you now _must_ either provide your own
2369     (appropriately-named) version of the routine for your scanner, or
2370     use '%option noyywrap', as linking with '-lfl' no longer provides
2371     one for you by default.
2372
2373'--main, '%option main''
2374     directs flex to provide a default 'main()' program for the scanner,
2375     which simply calls 'yylex()'.  This option implies 'noyywrap' (see
2376     below).
2377
2378'--nounistd, '%option nounistd''
2379     suppresses inclusion of the non-ANSI header file 'unistd.h'.  This
2380     option is meant to target environments in which 'unistd.h' does not
2381     exist.  Be aware that certain options may cause flex to generate
2382     code that relies on functions normally found in 'unistd.h', (e.g.
2383     'isatty()', 'read()'.)  If you wish to use these functions, you
2384     will have to inform your compiler where to find them.  *Note
2385     option-always-interactive::.  *Note option-read::.
2386
2387'--yyclass=NAME, '%option yyclass="NAME"''
2388     only applies when generating a C++ scanner (the '--c++' option).
2389     It informs 'flex' that you have derived 'NAME' as a subclass of
2390     'yyFlexLexer', so 'flex' will place your actions in the member
2391     function 'foo::yylex()' instead of 'yyFlexLexer::yylex()'.  It also
2392     generates a 'yyFlexLexer::yylex()' member function that emits a
2393     run-time error (by invoking 'yyFlexLexer::LexerError())' if called.
2394     *Note Cxx::.
2395
2396
2397File: flex.info,  Node: Options for Scanner Speed and Size,  Next: Debugging Options,  Prev: Code-Level And API Options,  Up: Scanner Options
2398
239916.4 Options for Scanner Speed and Size
2400=======================================
2401
2402'-C[aefFmr]'
2403     controls the degree of table compression and, more generally,
2404     trade-offs between small scanners and fast scanners.
2405
2406     '-C'
2407          A lone '-C' specifies that the scanner tables should be
2408          compressed but neither equivalence classes nor
2409          meta-equivalence classes should be used.
2410
2411     '-Ca, --align, '%option align''
2412          ("align") instructs flex to trade off larger tables in the
2413          generated scanner for faster performance because the elements
2414          of the tables are better aligned for memory access and
2415          computation.  On some RISC architectures, fetching and
2416          manipulating longwords is more efficient than with
2417          smaller-sized units such as shortwords.  This option can
2418          quadruple the size of the tables used by your scanner.
2419
2420     '-Ce, --ecs, '%option ecs''
2421          directs 'flex' to construct "equivalence classes", i.e., sets
2422          of characters which have identical lexical properties (for
2423          example, if the only appearance of digits in the 'flex' input
2424          is in the character class "[0-9]" then the digits '0', '1',
2425          ..., '9' will all be put in the same equivalence class).
2426          Equivalence classes usually give dramatic reductions in the
2427          final table/object file sizes (typically a factor of 2-5) and
2428          are pretty cheap performance-wise (one array look-up per
2429          character scanned).
2430
2431     '-Cf'
2432          specifies that the "full" scanner tables should be generated -
2433          'flex' should not compress the tables by taking advantages of
2434          similar transition functions for different states.
2435
2436     '-CF'
2437          specifies that the alternate fast scanner representation
2438          (described above under the '--fast' flag) should be used.
2439          This option cannot be used with '--c++'.
2440
2441     '-Cm, --meta-ecs, '%option meta-ecs''
2442          directs 'flex' to construct "meta-equivalence classes", which
2443          are sets of equivalence classes (or characters, if equivalence
2444          classes are not being used) that are commonly used together.
2445          Meta-equivalence classes are often a big win when using
2446          compressed tables, but they have a moderate performance impact
2447          (one or two 'if' tests and one array look-up per character
2448          scanned).
2449
2450     '-Cr, --read, '%option read''
2451          causes the generated scanner to _bypass_ use of the standard
2452          I/O library ('stdio') for input.  Instead of calling 'fread()'
2453          or 'getc()', the scanner will use the 'read()' system call,
2454          resulting in a performance gain which varies from system to
2455          system, but in general is probably negligible unless you are
2456          also using '-Cf' or '-CF'.  Using '-Cr' can cause strange
2457          behavior if, for example, you read from 'yyin' using 'stdio'
2458          prior to calling the scanner (because the scanner will miss
2459          whatever text your previous reads left in the 'stdio' input
2460          buffer).  '-Cr' has no effect if you define 'YY_INPUT()'
2461          (*note Generated Scanner::).
2462
2463     The options '-Cf' or '-CF' and '-Cm' do not make sense together -
2464     there is no opportunity for meta-equivalence classes if the table
2465     is not being compressed.  Otherwise the options may be freely
2466     mixed, and are cumulative.
2467
2468     The default setting is '-Cem', which specifies that 'flex' should
2469     generate equivalence classes and meta-equivalence classes.  This
2470     setting provides the highest degree of table compression.  You can
2471     trade off faster-executing scanners at the cost of larger tables
2472     with the following generally being true:
2473
2474              slowest & smallest
2475                    -Cem
2476                    -Cm
2477                    -Ce
2478                    -C
2479                    -C{f,F}e
2480                    -C{f,F}
2481                    -C{f,F}a
2482              fastest & largest
2483
2484     Note that scanners with the smallest tables are usually generated
2485     and compiled the quickest, so during development you will usually
2486     want to use the default, maximal compression.
2487
2488     '-Cfe' is often a good compromise between speed and size for
2489     production scanners.
2490
2491'-f, --full, '%option full''
2492     specifies "fast scanner".  No table compression is done and 'stdio'
2493     is bypassed.  The result is large but fast.  This option is
2494     equivalent to '--Cfr'
2495
2496'-F, --fast, '%option fast''
2497     specifies that the _fast_ scanner table representation should be
2498     used (and 'stdio' bypassed).  This representation is about as fast
2499     as the full table representation '--full', and for some sets of
2500     patterns will be considerably smaller (and for others, larger).  In
2501     general, if the pattern set contains both _keywords_ and a
2502     catch-all, _identifier_ rule, such as in the set:
2503
2504              "case"    return TOK_CASE;
2505              "switch"  return TOK_SWITCH;
2506              ...
2507              "default" return TOK_DEFAULT;
2508              [a-z]+    return TOK_ID;
2509
2510     then you're better off using the full table representation.  If
2511     only the _identifier_ rule is present and you then use a hash table
2512     or some such to detect the keywords, you're better off using
2513     '--fast'.
2514
2515     This option is equivalent to '-CFr'.  It cannot be used with
2516     '--c++'.
2517
2518
2519File: flex.info,  Node: Debugging Options,  Next: Miscellaneous Options,  Prev: Options for Scanner Speed and Size,  Up: Scanner Options
2520
252116.5 Debugging Options
2522======================
2523
2524'-b, --backup, '%option backup''
2525     Generate backing-up information to 'lex.backup'.  This is a list of
2526     scanner states which require backing up and the input characters on
2527     which they do so.  By adding rules one can remove backing-up
2528     states.  If _all_ backing-up states are eliminated and '-Cf' or
2529     '-CF' is used, the generated scanner will run faster (see the
2530     '--perf-report' flag).  Only users who wish to squeeze every last
2531     cycle out of their scanners need worry about this option.  (*note
2532     Performance::).
2533
2534'-d, --debug, '%option debug''
2535     makes the generated scanner run in "debug" mode.  Whenever a
2536     pattern is recognized and the global variable 'yy_flex_debug' is
2537     non-zero (which is the default), the scanner will write to 'stderr'
2538     a line of the form:
2539
2540              -accepting rule at line 53 ("the matched text")
2541
2542     The line number refers to the location of the rule in the file
2543     defining the scanner (i.e., the file that was fed to flex).
2544     Messages are also generated when the scanner backs up, accepts the
2545     default rule, reaches the end of its input buffer (or encounters a
2546     NUL; at this point, the two look the same as far as the scanner's
2547     concerned), or reaches an end-of-file.
2548
2549'-p, --perf-report, '%option perf-report''
2550     generates a performance report to 'stderr'.  The report consists of
2551     comments regarding features of the 'flex' input file which will
2552     cause a serious loss of performance in the resulting scanner.  If
2553     you give the flag twice, you will also get comments regarding
2554     features that lead to minor performance losses.
2555
2556     Note that the use of 'REJECT', and variable trailing context (*note
2557     Limitations::) entails a substantial performance penalty; use of
2558     'yymore()', the '^' operator, and the '--interactive' flag entail
2559     minor performance penalties.
2560
2561'-s, --nodefault, '%option nodefault''
2562     causes the _default rule_ (that unmatched scanner input is echoed
2563     to 'stdout)' to be suppressed.  If the scanner encounters input
2564     that does not match any of its rules, it aborts with an error.
2565     This option is useful for finding holes in a scanner's rule set.
2566
2567'-T, --trace, '%option trace''
2568     makes 'flex' run in "trace" mode.  It will generate a lot of
2569     messages to 'stderr' concerning the form of the input and the
2570     resultant non-deterministic and deterministic finite automata.
2571     This option is mostly for use in maintaining 'flex'.
2572
2573'-w, --nowarn, '%option nowarn''
2574     suppresses warning messages.
2575
2576'-v, --verbose, '%option verbose''
2577     specifies that 'flex' should write to 'stderr' a summary of
2578     statistics regarding the scanner it generates.  Most of the
2579     statistics are meaningless to the casual 'flex' user, but the first
2580     line identifies the version of 'flex' (same as reported by
2581     '--version'), and the next line the flags used when generating the
2582     scanner, including those that are on by default.
2583
2584'--warn, '%option warn''
2585     warn about certain things.  In particular, if the default rule can
2586     be matched but no default rule has been given, the flex will warn
2587     you.  We recommend using this option always.
2588
2589
2590File: flex.info,  Node: Miscellaneous Options,  Prev: Debugging Options,  Up: Scanner Options
2591
259216.6 Miscellaneous Options
2593==========================
2594
2595'-c'
2596     A do-nothing option included for POSIX compliance.
2597
2598'-h, -?, --help'
2599     generates a "help" summary of 'flex''s options to 'stdout' and then
2600     exits.
2601
2602'-n'
2603     Another do-nothing option included for POSIX compliance.
2604
2605'-V, --version'
2606     prints the version number to 'stdout' and exits.
2607
2608
2609File: flex.info,  Node: Performance,  Next: Cxx,  Prev: Scanner Options,  Up: Top
2610
261117 Performance Considerations
2612*****************************
2613
2614The main design goal of 'flex' is that it generate high-performance
2615scanners.  It has been optimized for dealing well with large sets of
2616rules.  Aside from the effects on scanner speed of the table compression
2617'-C' options outlined above, there are a number of options/actions which
2618degrade performance.  These are, from most expensive to least:
2619
2620         REJECT
2621         arbitrary trailing context
2622
2623         pattern sets that require backing up
2624         %option yylineno
2625         %array
2626
2627         %option interactive
2628         %option always-interactive
2629
2630         ^ beginning-of-line operator
2631         yymore()
2632
2633   with the first two all being quite expensive and the last two being
2634quite cheap.  Note also that 'unput()' is implemented as a routine call
2635that potentially does quite a bit of work, while 'yyless()' is a
2636quite-cheap macro.  So if you are just putting back some excess text you
2637scanned, use 'yyless()'.
2638
2639   'REJECT' should be avoided at all costs when performance is
2640important.  It is a particularly expensive option.
2641
2642   There is one case when '%option yylineno' can be expensive.  That is
2643when your patterns match long tokens that could _possibly_ contain a
2644newline character.  There is no performance penalty for rules that can
2645not possibly match newlines, since flex does not need to check them for
2646newlines.  In general, you should avoid rules such as '[^f]+', which
2647match very long tokens, including newlines, and may possibly match your
2648entire file!  A better approach is to separate '[^f]+' into two rules:
2649
2650     %option yylineno
2651     %%
2652         [^f\n]+
2653         \n+
2654
2655   The above scanner does not incur a performance penalty.
2656
2657   Getting rid of backing up is messy and often may be an enormous
2658amount of work for a complicated scanner.  In principal, one begins by
2659using the '-b' flag to generate a 'lex.backup' file.  For example, on
2660the input:
2661
2662         %%
2663         foo        return TOK_KEYWORD;
2664         foobar     return TOK_KEYWORD;
2665
2666   the file looks like:
2667
2668         State #6 is non-accepting -
2669          associated rule line numbers:
2670                2       3
2671          out-transitions: [ o ]
2672          jam-transitions: EOF [ \001-n  p-\177 ]
2673
2674         State #8 is non-accepting -
2675          associated rule line numbers:
2676                3
2677          out-transitions: [ a ]
2678          jam-transitions: EOF [ \001-`  b-\177 ]
2679
2680         State #9 is non-accepting -
2681          associated rule line numbers:
2682                3
2683          out-transitions: [ r ]
2684          jam-transitions: EOF [ \001-q  s-\177 ]
2685
2686         Compressed tables always back up.
2687
2688   The first few lines tell us that there's a scanner state in which it
2689can make a transition on an 'o' but not on any other character, and that
2690in that state the currently scanned text does not match any rule.  The
2691state occurs when trying to match the rules found at lines 2 and 3 in
2692the input file.  If the scanner is in that state and then reads
2693something other than an 'o', it will have to back up to find a rule
2694which is matched.  With a bit of headscratching one can see that this
2695must be the state it's in when it has seen 'fo'.  When this has
2696happened, if anything other than another 'o' is seen, the scanner will
2697have to back up to simply match the 'f' (by the default rule).
2698
2699   The comment regarding State #8 indicates there's a problem when
2700'foob' has been scanned.  Indeed, on any character other than an 'a',
2701the scanner will have to back up to accept "foo".  Similarly, the
2702comment for State #9 concerns when 'fooba' has been scanned and an 'r'
2703does not follow.
2704
2705   The final comment reminds us that there's no point going to all the
2706trouble of removing backing up from the rules unless we're using '-Cf'
2707or '-CF', since there's no performance gain doing so with compressed
2708scanners.
2709
2710   The way to remove the backing up is to add "error" rules:
2711
2712         %%
2713         foo         return TOK_KEYWORD;
2714         foobar      return TOK_KEYWORD;
2715
2716         fooba       |
2717         foob        |
2718         fo          {
2719                     /* false alarm, not really a keyword */
2720                     return TOK_ID;
2721                     }
2722
2723   Eliminating backing up among a list of keywords can also be done
2724using a "catch-all" rule:
2725
2726         %%
2727         foo         return TOK_KEYWORD;
2728         foobar      return TOK_KEYWORD;
2729
2730         [a-z]+      return TOK_ID;
2731
2732   This is usually the best solution when appropriate.
2733
2734   Backing up messages tend to cascade.  With a complicated set of rules
2735it's not uncommon to get hundreds of messages.  If one can decipher
2736them, though, it often only takes a dozen or so rules to eliminate the
2737backing up (though it's easy to make a mistake and have an error rule
2738accidentally match a valid token.  A possible future 'flex' feature will
2739be to automatically add rules to eliminate backing up).
2740
2741   It's important to keep in mind that you gain the benefits of
2742eliminating backing up only if you eliminate _every_ instance of backing
2743up.  Leaving just one means you gain nothing.
2744
2745   _Variable_ trailing context (where both the leading and trailing
2746parts do not have a fixed length) entails almost the same performance
2747loss as 'REJECT' (i.e., substantial).  So when possible a rule like:
2748
2749         %%
2750         mouse|rat/(cat|dog)   run();
2751
2752   is better written:
2753
2754         %%
2755         mouse/cat|dog         run();
2756         rat/cat|dog           run();
2757
2758   or as
2759
2760         %%
2761         mouse|rat/cat         run();
2762         mouse|rat/dog         run();
2763
2764   Note that here the special '|' action does _not_ provide any savings,
2765and can even make things worse (*note Limitations::).
2766
2767   Another area where the user can increase a scanner's performance (and
2768one that's easier to implement) arises from the fact that the longer the
2769tokens matched, the faster the scanner will run.  This is because with
2770long tokens the processing of most input characters takes place in the
2771(short) inner scanning loop, and does not often have to go through the
2772additional work of setting up the scanning environment (e.g., 'yytext')
2773for the action.  Recall the scanner for C comments:
2774
2775         %x comment
2776         %%
2777                 int line_num = 1;
2778
2779         "/*"         BEGIN(comment);
2780
2781         <comment>[^*\n]*
2782         <comment>"*"+[^*/\n]*
2783         <comment>\n             ++line_num;
2784         <comment>"*"+"/"        BEGIN(INITIAL);
2785
2786   This could be sped up by writing it as:
2787
2788         %x comment
2789         %%
2790                 int line_num = 1;
2791
2792         "/*"         BEGIN(comment);
2793
2794         <comment>[^*\n]*
2795         <comment>[^*\n]*\n      ++line_num;
2796         <comment>"*"+[^*/\n]*
2797         <comment>"*"+[^*/\n]*\n ++line_num;
2798         <comment>"*"+"/"        BEGIN(INITIAL);
2799
2800   Now instead of each newline requiring the processing of another
2801action, recognizing the newlines is distributed over the other rules to
2802keep the matched text as long as possible.  Note that _adding_ rules
2803does _not_ slow down the scanner!  The speed of the scanner is
2804independent of the number of rules or (modulo the considerations given
2805at the beginning of this section) how complicated the rules are with
2806regard to operators such as '*' and '|'.
2807
2808   A final example in speeding up a scanner: suppose you want to scan
2809through a file containing identifiers and keywords, one per line and
2810with no other extraneous characters, and recognize all the keywords.  A
2811natural first approach is:
2812
2813         %%
2814         asm      |
2815         auto     |
2816         break    |
2817         ... etc ...
2818         volatile |
2819         while    /* it's a keyword */
2820
2821         .|\n     /* it's not a keyword */
2822
2823   To eliminate the back-tracking, introduce a catch-all rule:
2824
2825         %%
2826         asm      |
2827         auto     |
2828         break    |
2829         ... etc ...
2830         volatile |
2831         while    /* it's a keyword */
2832
2833         [a-z]+   |
2834         .|\n     /* it's not a keyword */
2835
2836   Now, if it's guaranteed that there's exactly one word per line, then
2837we can reduce the total number of matches by a half by merging in the
2838recognition of newlines with that of the other tokens:
2839
2840         %%
2841         asm\n    |
2842         auto\n   |
2843         break\n  |
2844         ... etc ...
2845         volatile\n |
2846         while\n  /* it's a keyword */
2847
2848         [a-z]+\n |
2849         .|\n     /* it's not a keyword */
2850
2851   One has to be careful here, as we have now reintroduced backing up
2852into the scanner.  In particular, while _we_ know that there will never
2853be any characters in the input stream other than letters or newlines,
2854'flex' can't figure this out, and it will plan for possibly needing to
2855back up when it has scanned a token like 'auto' and then the next
2856character is something other than a newline or a letter.  Previously it
2857would then just match the 'auto' rule and be done, but now it has no
2858'auto' rule, only a 'auto\n' rule.  To eliminate the possibility of
2859backing up, we could either duplicate all rules but without final
2860newlines, or, since we never expect to encounter such an input and
2861therefore don't how it's classified, we can introduce one more catch-all
2862rule, this one which doesn't include a newline:
2863
2864         %%
2865         asm\n    |
2866         auto\n   |
2867         break\n  |
2868         ... etc ...
2869         volatile\n |
2870         while\n  /* it's a keyword */
2871
2872         [a-z]+\n |
2873         [a-z]+   |
2874         .|\n     /* it's not a keyword */
2875
2876   Compiled with '-Cf', this is about as fast as one can get a 'flex'
2877scanner to go for this particular problem.
2878
2879   A final note: 'flex' is slow when matching 'NUL's, particularly when
2880a token contains multiple 'NUL's.  It's best to write rules which match
2881_short_ amounts of text if it's anticipated that the text will often
2882include 'NUL's.
2883
2884   Another final note regarding performance: as mentioned in *note
2885Matching::, dynamically resizing 'yytext' to accommodate huge tokens is
2886a slow process because it presently requires that the (huge) token be
2887rescanned from the beginning.  Thus if performance is vital, you should
2888attempt to match "large" quantities of text but not "huge" quantities,
2889where the cutoff between the two is at about 8K characters per token.
2890
2891
2892File: flex.info,  Node: Cxx,  Next: Reentrant,  Prev: Performance,  Up: Top
2893
289418 Generating C++ Scanners
2895**************************
2896
2897*IMPORTANT*: the present form of the scanning class is _experimental_
2898and may change considerably between major releases.
2899
2900   'flex' provides two different ways to generate scanners for use with
2901C++.  The first way is to simply compile a scanner generated by 'flex'
2902using a C++ compiler instead of a C compiler.  You should not encounter
2903any compilation errors (*note Reporting Bugs::).  You can then use C++
2904code in your rule actions instead of C code.  Note that the default
2905input source for your scanner remains 'yyin', and default echoing is
2906still done to 'yyout'.  Both of these remain 'FILE *' variables and not
2907C++ _streams_.
2908
2909   You can also use 'flex' to generate a C++ scanner class, using the
2910'-+' option (or, equivalently, '%option c++)', which is automatically
2911specified if the name of the 'flex' executable ends in a '+', such as
2912'flex++'.  When using this option, 'flex' defaults to generating the
2913scanner to the file 'lex.yy.cc' instead of 'lex.yy.c'.  The generated
2914scanner includes the header file 'FlexLexer.h', which defines the
2915interface to two C++ classes.
2916
2917   The first class in 'FlexLexer.h', 'FlexLexer', provides an abstract
2918base class defining the general scanner class interface.  It provides
2919the following member functions:
2920
2921'const char* YYText()'
2922     returns the text of the most recently matched token, the equivalent
2923     of 'yytext'.
2924
2925'int YYLeng()'
2926     returns the length of the most recently matched token, the
2927     equivalent of 'yyleng'.
2928
2929'int lineno() const'
2930     returns the current input line number (see '%option yylineno)', or
2931     '1' if '%option yylineno' was not used.
2932
2933'void set_debug( int flag )'
2934     sets the debugging flag for the scanner, equivalent to assigning to
2935     'yy_flex_debug' (*note Scanner Options::).  Note that you must
2936     build the scanner using '%option debug' to include debugging
2937     information in it.
2938
2939'int debug() const'
2940     returns the current setting of the debugging flag.
2941
2942   Also provided are member functions equivalent to
2943'yy_switch_to_buffer()', 'yy_create_buffer()' (though the first argument
2944is an 'istream&' object reference and not a 'FILE*)',
2945'yy_flush_buffer()', 'yy_delete_buffer()', and 'yyrestart()' (again, the
2946first argument is a 'istream&' object reference).
2947
2948   The second class defined in 'FlexLexer.h' is 'yyFlexLexer', which is
2949derived from 'FlexLexer'.  It defines the following additional member
2950functions:
2951
2952'yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 )'
2953'yyFlexLexer( istream& arg_yyin, ostream& arg_yyout )'
2954     constructs a 'yyFlexLexer' object using the given streams for input
2955     and output.  If not specified, the streams default to 'cin' and
2956     'cout', respectively.  'yyFlexLexer' does not take ownership of its
2957     stream arguments.  It's up to the user to ensure the streams
2958     pointed to remain alive at least as long as the 'yyFlexLexer'
2959     instance.
2960
2961'virtual int yylex()'
2962     performs the same role is 'yylex()' does for ordinary 'flex'
2963     scanners: it scans the input stream, consuming tokens, until a
2964     rule's action returns a value.  If you derive a subclass 'S' from
2965     'yyFlexLexer' and want to access the member functions and variables
2966     of 'S' inside 'yylex()', then you need to use '%option yyclass="S"'
2967     to inform 'flex' that you will be using that subclass instead of
2968     'yyFlexLexer'.  In this case, rather than generating
2969     'yyFlexLexer::yylex()', 'flex' generates 'S::yylex()' (and also
2970     generates a dummy 'yyFlexLexer::yylex()' that calls
2971     'yyFlexLexer::LexerError()' if called).
2972
2973'virtual void switch_streams(istream* new_in = 0, ostream* new_out = 0)'
2974'virtual void switch_streams(istream& new_in, ostream& new_out)'
2975     reassigns 'yyin' to 'new_in' (if non-null) and 'yyout' to 'new_out'
2976     (if non-null), deleting the previous input buffer if 'yyin' is
2977     reassigned.
2978
2979'int yylex( istream* new_in, ostream* new_out = 0 )'
2980'int yylex( istream& new_in, ostream& new_out )'
2981     first switches the input streams via 'switch_streams( new_in,
2982     new_out )' and then returns the value of 'yylex()'.
2983
2984   In addition, 'yyFlexLexer' defines the following protected virtual
2985functions which you can redefine in derived classes to tailor the
2986scanner:
2987
2988'virtual int LexerInput( char* buf, int max_size )'
2989     reads up to 'max_size' characters into 'buf' and returns the number
2990     of characters read.  To indicate end-of-input, return 0 characters.
2991     Note that 'interactive' scanners (see the '-B' and '-I' flags in
2992     *note Scanner Options::) define the macro 'YY_INTERACTIVE'.  If you
2993     redefine 'LexerInput()' and need to take different actions
2994     depending on whether or not the scanner might be scanning an
2995     interactive input source, you can test for the presence of this
2996     name via '#ifdef' statements.
2997
2998'virtual void LexerOutput( const char* buf, int size )'
2999     writes out 'size' characters from the buffer 'buf', which, while
3000     'NUL'-terminated, may also contain internal 'NUL's if the scanner's
3001     rules can match text with 'NUL's in them.
3002
3003'virtual void LexerError( const char* msg )'
3004     reports a fatal error message.  The default version of this
3005     function writes the message to the stream 'cerr' and exits.
3006
3007   Note that a 'yyFlexLexer' object contains its _entire_ scanning
3008state.  Thus you can use such objects to create reentrant scanners, but
3009see also *note Reentrant::.  You can instantiate multiple instances of
3010the same 'yyFlexLexer' class, and you can also combine multiple C++
3011scanner classes together in the same program using the '-P' option
3012discussed above.
3013
3014   Finally, note that the '%array' feature is not available to C++
3015scanner classes; you must use '%pointer' (the default).
3016
3017   Here is an example of a simple C++ scanner:
3018
3019          // An example of using the flex C++ scanner class.
3020
3021         %{
3022         #include <iostream>
3023         using namespace std;
3024         int mylineno = 0;
3025         %}
3026
3027         %option noyywrap c++
3028
3029         string  \"[^\n"]+\"
3030
3031         ws      [ \t]+
3032
3033         alpha   [A-Za-z]
3034         dig     [0-9]
3035         name    ({alpha}|{dig}|\$)({alpha}|{dig}|[_.\-/$])*
3036         num1    [-+]?{dig}+\.?([eE][-+]?{dig}+)?
3037         num2    [-+]?{dig}*\.{dig}+([eE][-+]?{dig}+)?
3038         number  {num1}|{num2}
3039
3040         %%
3041
3042         {ws}    /* skip blanks and tabs */
3043
3044         "/*"    {
3045                 int c;
3046
3047                 while((c = yyinput()) != 0)
3048                     {
3049                     if(c == '\n')
3050                         ++mylineno;
3051
3052                     else if(c == '*')
3053                         {
3054                         if((c = yyinput()) == '/')
3055                             break;
3056                         else
3057                             unput(c);
3058                         }
3059                     }
3060                 }
3061
3062         {number}  cout << "number " << YYText() << '\n';
3063
3064         \n        mylineno++;
3065
3066         {name}    cout << "name " << YYText() << '\n';
3067
3068         {string}  cout << "string " << YYText() << '\n';
3069
3070         %%
3071
3072     	// This include is required if main() is an another source file.
3073     	//#include <FlexLexer.h>
3074
3075         int main( int /* argc */, char** /* argv */ )
3076         {
3077             FlexLexer* lexer = new yyFlexLexer;
3078             while(lexer->yylex() != 0)
3079                 ;
3080             return 0;
3081         }
3082
3083   If you want to create multiple (different) lexer classes, you use the
3084'-P' flag (or the 'prefix=' option) to rename each 'yyFlexLexer' to some
3085other 'xxFlexLexer'.  You then can include '<FlexLexer.h>' in your other
3086sources once per lexer class, first renaming 'yyFlexLexer' as follows:
3087
3088         #undef yyFlexLexer
3089         #define yyFlexLexer xxFlexLexer
3090         #include <FlexLexer.h>
3091
3092         #undef yyFlexLexer
3093         #define yyFlexLexer zzFlexLexer
3094         #include <FlexLexer.h>
3095
3096   if, for example, you used '%option prefix="xx"' for one of your
3097scanners and '%option prefix="zz"' for the other.
3098
3099
3100File: flex.info,  Node: Reentrant,  Next: Lex and Posix,  Prev: Cxx,  Up: Top
3101
310219 Reentrant C Scanners
3103***********************
3104
3105'flex' has the ability to generate a reentrant C scanner.  This is
3106accomplished by specifying '%option reentrant' ('-R') The generated
3107scanner is both portable, and safe to use in one or more separate
3108threads of control.  The most common use for reentrant scanners is from
3109within multi-threaded applications.  Any thread may create and execute a
3110reentrant 'flex' scanner without the need for synchronization with other
3111threads.
3112
3113* Menu:
3114
3115* Reentrant Uses::
3116* Reentrant Overview::
3117* Reentrant Example::
3118* Reentrant Detail::
3119* Reentrant Functions::
3120
3121
3122File: flex.info,  Node: Reentrant Uses,  Next: Reentrant Overview,  Prev: Reentrant,  Up: Reentrant
3123
312419.1 Uses for Reentrant Scanners
3125================================
3126
3127However, there are other uses for a reentrant scanner.  For example, you
3128could scan two or more files simultaneously to implement a 'diff' at the
3129token level (i.e., instead of at the character level):
3130
3131         /* Example of maintaining more than one active scanner. */
3132
3133         do {
3134             int tok1, tok2;
3135
3136             tok1 = yylex( scanner_1 );
3137             tok2 = yylex( scanner_2 );
3138
3139             if( tok1 != tok2 )
3140                 printf("Files are different.");
3141
3142        } while ( tok1 && tok2 );
3143
3144   Another use for a reentrant scanner is recursion.  (Note that a
3145recursive scanner can also be created using a non-reentrant scanner and
3146buffer states.  *Note Multiple Input Buffers::.)
3147
3148   The following crude scanner supports the 'eval' command by invoking
3149another instance of itself.
3150
3151         /* Example of recursive invocation. */
3152
3153         %option reentrant
3154
3155         %%
3156         "eval(".+")"  {
3157                           yyscan_t scanner;
3158                           YY_BUFFER_STATE buf;
3159
3160                           yylex_init( &scanner );
3161                           yytext[yyleng-1] = ' ';
3162
3163                           buf = yy_scan_string( yytext + 5, scanner );
3164                           yylex( scanner );
3165
3166                           yy_delete_buffer(buf,scanner);
3167                           yylex_destroy( scanner );
3168                      }
3169         ...
3170         %%
3171
3172
3173File: flex.info,  Node: Reentrant Overview,  Next: Reentrant Example,  Prev: Reentrant Uses,  Up: Reentrant
3174
317519.2 An Overview of the Reentrant API
3176=====================================
3177
3178The API for reentrant scanners is different than for non-reentrant
3179scanners.  Here is a quick overview of the API:
3180
3181     '%option reentrant' must be specified.
3182
3183   * All functions take one additional argument: 'yyscanner'
3184
3185   * All global variables are replaced by their macro equivalents.  (We
3186     tell you this because it may be important to you during debugging.)
3187
3188   * 'yylex_init' and 'yylex_destroy' must be called before and after
3189     'yylex', respectively.
3190
3191   * Accessor methods (get/set functions) provide access to common
3192     'flex' variables.
3193
3194   * User-specific data can be stored in 'yyextra'.
3195
3196
3197File: flex.info,  Node: Reentrant Example,  Next: Reentrant Detail,  Prev: Reentrant Overview,  Up: Reentrant
3198
319919.3 Reentrant Example
3200======================
3201
3202First, an example of a reentrant scanner:
3203         /* This scanner prints "//" comments. */
3204
3205         %option reentrant stack noyywrap
3206         %x COMMENT
3207
3208         %%
3209
3210         "//"                 yy_push_state( COMMENT, yyscanner);
3211         .|\n
3212
3213         <COMMENT>\n          yy_pop_state( yyscanner );
3214         <COMMENT>[^\n]+      fprintf( yyout, "%s\n", yytext);
3215
3216         %%
3217
3218         int main ( int argc, char * argv[] )
3219         {
3220             yyscan_t scanner;
3221
3222             yylex_init ( &scanner );
3223             yylex ( scanner );
3224             yylex_destroy ( scanner );
3225         return 0;
3226        }
3227
3228
3229File: flex.info,  Node: Reentrant Detail,  Next: Reentrant Functions,  Prev: Reentrant Example,  Up: Reentrant
3230
323119.4 The Reentrant API in Detail
3232================================
3233
3234Here are the things you need to do or know to use the reentrant C API of
3235'flex'.
3236
3237* Menu:
3238
3239* Specify Reentrant::
3240* Extra Reentrant Argument::
3241* Global Replacement::
3242* Init and Destroy Functions::
3243* Accessor Methods::
3244* Extra Data::
3245* About yyscan_t::
3246
3247
3248File: flex.info,  Node: Specify Reentrant,  Next: Extra Reentrant Argument,  Prev: Reentrant Detail,  Up: Reentrant Detail
3249
325019.4.1 Declaring a Scanner As Reentrant
3251---------------------------------------
3252
3253%option reentrant (-reentrant) must be specified.
3254
3255   Notice that '%option reentrant' is specified in the above example
3256(*note Reentrant Example::.  Had this option not been specified, 'flex'
3257would have happily generated a non-reentrant scanner without
3258complaining.  You may explicitly specify '%option noreentrant', if you
3259do _not_ want a reentrant scanner, although it is not necessary.  The
3260default is to generate a non-reentrant scanner.
3261
3262
3263File: flex.info,  Node: Extra Reentrant Argument,  Next: Global Replacement,  Prev: Specify Reentrant,  Up: Reentrant Detail
3264
326519.4.2 The Extra Argument
3266-------------------------
3267
3268All functions take one additional argument: 'yyscanner'.
3269
3270   Notice that the calls to 'yy_push_state' and 'yy_pop_state' both have
3271an argument, 'yyscanner' , that is not present in a non-reentrant
3272scanner.  Here are the declarations of 'yy_push_state' and
3273'yy_pop_state' in the reentrant scanner:
3274
3275         static void yy_push_state  ( int new_state , yyscan_t yyscanner ) ;
3276         static void yy_pop_state  ( yyscan_t yyscanner  ) ;
3277
3278   Notice that the argument 'yyscanner' appears in the declaration of
3279both functions.  In fact, all 'flex' functions in a reentrant scanner
3280have this additional argument.  It is always the last argument in the
3281argument list, it is always of type 'yyscan_t' (which is typedef'd to
3282'void *') and it is always named 'yyscanner'.  As you may have guessed,
3283'yyscanner' is a pointer to an opaque data structure encapsulating the
3284current state of the scanner.  For a list of function declarations, see
3285*note Reentrant Functions::.  Note that preprocessor macros, such as
3286'BEGIN', 'ECHO', and 'REJECT', do not take this additional argument.
3287
3288
3289File: flex.info,  Node: Global Replacement,  Next: Init and Destroy Functions,  Prev: Extra Reentrant Argument,  Up: Reentrant Detail
3290
329119.4.3 Global Variables Replaced By Macros
3292------------------------------------------
3293
3294All global variables in traditional flex have been replaced by macro
3295equivalents.
3296
3297   Note that in the above example, 'yyout' and 'yytext' are not plain
3298variables.  These are macros that will expand to their equivalent
3299lvalue.  All of the familiar 'flex' globals have been replaced by their
3300macro equivalents.  In particular, 'yytext', 'yyleng', 'yylineno',
3301'yyin', 'yyout', 'yyextra', 'yylval', and 'yylloc' are macros.  You may
3302safely use these macros in actions as if they were plain variables.  We
3303only tell you this so you don't expect to link to these variables
3304externally.  Currently, each macro expands to a member of an internal
3305struct, e.g.,
3306
3307     #define yytext (((struct yyguts_t*)yyscanner)->yytext_r)
3308
3309   One important thing to remember about 'yytext' and friends is that
3310'yytext' is not a global variable in a reentrant scanner, you can not
3311access it directly from outside an action or from other functions.  You
3312must use an accessor method, e.g., 'yyget_text', to accomplish this.
3313(See below).
3314
3315
3316File: flex.info,  Node: Init and Destroy Functions,  Next: Accessor Methods,  Prev: Global Replacement,  Up: Reentrant Detail
3317
331819.4.4 Init and Destroy Functions
3319---------------------------------
3320
3321'yylex_init' and 'yylex_destroy' must be called before and after
3322'yylex', respectively.
3323
3324         int yylex_init ( yyscan_t * ptr_yy_globals ) ;
3325         int yylex_init_extra ( YY_EXTRA_TYPE user_defined, yyscan_t * ptr_yy_globals ) ;
3326         int yylex ( yyscan_t yyscanner ) ;
3327         int yylex_destroy ( yyscan_t yyscanner ) ;
3328
3329   The function 'yylex_init' must be called before calling any other
3330function.  The argument to 'yylex_init' is the address of an
3331uninitialized pointer to be filled in by 'yylex_init', overwriting any
3332previous contents.  The function 'yylex_init_extra' may be used instead,
3333taking as its first argument a variable of type 'YY_EXTRA_TYPE'.  See
3334the section on yyextra, below, for more details.
3335
3336   The value stored in 'ptr_yy_globals' should thereafter be passed to
3337'yylex' and 'yylex_destroy'.  Flex does not save the argument passed to
3338'yylex_init', so it is safe to pass the address of a local pointer to
3339'yylex_init' so long as it remains in scope for the duration of all
3340calls to the scanner, up to and including the call to 'yylex_destroy'.
3341
3342   The function 'yylex' should be familiar to you by now.  The reentrant
3343version takes one argument, which is the value returned (via an
3344argument) by 'yylex_init'.  Otherwise, it behaves the same as the
3345non-reentrant version of 'yylex'.
3346
3347   Both 'yylex_init' and 'yylex_init_extra' returns 0 (zero) on success,
3348or non-zero on failure, in which case errno is set to one of the
3349following values:
3350
3351   * ENOMEM Memory allocation error.  *Note memory-management::.
3352   * EINVAL Invalid argument.
3353
3354   The function 'yylex_destroy' should be called to free resources used
3355by the scanner.  After 'yylex_destroy' is called, the contents of
3356'yyscanner' should not be used.  Of course, there is no need to destroy
3357a scanner if you plan to reuse it.  A 'flex' scanner (both reentrant and
3358non-reentrant) may be restarted by calling 'yyrestart'.
3359
3360   Below is an example of a program that creates a scanner, uses it,
3361then destroys it when done:
3362
3363         int main ()
3364         {
3365             yyscan_t scanner;
3366             int tok;
3367
3368             yylex_init(&scanner);
3369
3370             while ((tok=yylex(scanner)) > 0)
3371                 printf("tok=%d  yytext=%s\n", tok, yyget_text(scanner));
3372
3373             yylex_destroy(scanner);
3374             return 0;
3375         }
3376
3377
3378File: flex.info,  Node: Accessor Methods,  Next: Extra Data,  Prev: Init and Destroy Functions,  Up: Reentrant Detail
3379
338019.4.5 Accessing Variables with Reentrant Scanners
3381--------------------------------------------------
3382
3383Accessor methods (get/set functions) provide access to common 'flex'
3384variables.
3385
3386   Many scanners that you build will be part of a larger project.
3387Portions of your project will need access to 'flex' values, such as
3388'yytext'.  In a non-reentrant scanner, these values are global, so there
3389is no problem accessing them.  However, in a reentrant scanner, there
3390are no global 'flex' values.  You can not access them directly.
3391Instead, you must access 'flex' values using accessor methods (get/set
3392functions).  Each accessor method is named 'yyget_NAME' or 'yyset_NAME',
3393where 'NAME' is the name of the 'flex' variable you want.  For example:
3394
3395         /* Set the last character of yytext to NULL. */
3396         void chop ( yyscan_t scanner )
3397         {
3398             int len = yyget_leng( scanner );
3399             yyget_text( scanner )[len - 1] = '\0';
3400         }
3401
3402   The above code may be called from within an action like this:
3403
3404         %%
3405         .+\n    { chop( yyscanner );}
3406
3407   You may find that '%option header-file' is particularly useful for
3408generating prototypes of all the accessor functions.  *Note
3409option-header::.
3410
3411
3412File: flex.info,  Node: Extra Data,  Next: About yyscan_t,  Prev: Accessor Methods,  Up: Reentrant Detail
3413
341419.4.6 Extra Data
3415-----------------
3416
3417User-specific data can be stored in 'yyextra'.
3418
3419   In a reentrant scanner, it is unwise to use global variables to
3420communicate with or maintain state between different pieces of your
3421program.  However, you may need access to external data or invoke
3422external functions from within the scanner actions.  Likewise, you may
3423need to pass information to your scanner (e.g., open file descriptors,
3424or database connections).  In a non-reentrant scanner, the only way to
3425do this would be through the use of global variables.  'Flex' allows you
3426to store arbitrary, "extra" data in a scanner.  This data is accessible
3427through the accessor methods 'yyget_extra' and 'yyset_extra' from
3428outside the scanner, and through the shortcut macro 'yyextra' from
3429within the scanner itself.  They are defined as follows:
3430
3431         #define YY_EXTRA_TYPE  void*
3432         YY_EXTRA_TYPE  yyget_extra ( yyscan_t scanner );
3433         void           yyset_extra ( YY_EXTRA_TYPE arbitrary_data , yyscan_t scanner);
3434
3435   In addition, an extra form of 'yylex_init' is provided,
3436'yylex_init_extra'.  This function is provided so that the yyextra value
3437can be accessed from within the very first yyalloc, used to allocate the
3438scanner itself.
3439
3440   By default, 'YY_EXTRA_TYPE' is defined as type 'void *'.  You may
3441redefine this type using '%option extra-type="your_type"' in the
3442scanner:
3443
3444         /* An example of overriding YY_EXTRA_TYPE. */
3445         %{
3446         #include <sys/stat.h>
3447         #include <unistd.h>
3448         %}
3449         %option reentrant
3450         %option extra-type="struct stat *"
3451         %%
3452
3453         __filesize__     printf( "%ld", yyextra->st_size  );
3454         __lastmod__      printf( "%ld", yyextra->st_mtime );
3455         %%
3456         void scan_file( char* filename )
3457         {
3458             yyscan_t scanner;
3459             struct stat buf;
3460             FILE *in;
3461
3462             in = fopen( filename, "r" );
3463             stat( filename, &buf );
3464
3465             yylex_init_extra( buf, &scanner );
3466             yyset_in( in, scanner );
3467             yylex( scanner );
3468             yylex_destroy( scanner );
3469
3470             fclose( in );
3471        }
3472
3473
3474File: flex.info,  Node: About yyscan_t,  Prev: Extra Data,  Up: Reentrant Detail
3475
347619.4.7 About yyscan_t
3477---------------------
3478
3479'yyscan_t' is defined as:
3480
3481          typedef void* yyscan_t;
3482
3483   It is initialized by 'yylex_init()' to point to an internal
3484structure.  You should never access this value directly.  In particular,
3485you should never attempt to free it (use 'yylex_destroy()' instead.)
3486
3487
3488File: flex.info,  Node: Reentrant Functions,  Prev: Reentrant Detail,  Up: Reentrant
3489
349019.5 Functions and Macros Available in Reentrant C Scanners
3491===========================================================
3492
3493The following Functions are available in a reentrant scanner:
3494
3495         char *yyget_text ( yyscan_t scanner );
3496         int yyget_leng ( yyscan_t scanner );
3497         FILE *yyget_in ( yyscan_t scanner );
3498         FILE *yyget_out ( yyscan_t scanner );
3499         int yyget_lineno ( yyscan_t scanner );
3500         YY_EXTRA_TYPE yyget_extra ( yyscan_t scanner );
3501         int  yyget_debug ( yyscan_t scanner );
3502
3503         void yyset_debug ( int flag, yyscan_t scanner );
3504         void yyset_in  ( FILE * in_str , yyscan_t scanner );
3505         void yyset_out  ( FILE * out_str , yyscan_t scanner );
3506         void yyset_lineno ( int line_number , yyscan_t scanner );
3507         void yyset_extra ( YY_EXTRA_TYPE user_defined , yyscan_t scanner );
3508
3509   There are no "set" functions for yytext and yyleng.  This is
3510intentional.
3511
3512   The following Macro shortcuts are available in actions in a reentrant
3513scanner:
3514
3515         yytext
3516         yyleng
3517         yyin
3518         yyout
3519         yylineno
3520         yyextra
3521         yy_flex_debug
3522
3523   In a reentrant C scanner, support for yylineno is always present
3524(i.e., you may access yylineno), but the value is never modified by
3525'flex' unless '%option yylineno' is enabled.  This is to allow the user
3526to maintain the line count independently of 'flex'.
3527
3528   The following functions and macros are made available when '%option
3529bison-bridge' ('--bison-bridge') is specified:
3530
3531         YYSTYPE * yyget_lval ( yyscan_t scanner );
3532         void yyset_lval ( YYSTYPE * yylvalp , yyscan_t scanner );
3533         yylval
3534
3535   The following functions and macros are made available when '%option
3536bison-locations' ('--bison-locations') is specified:
3537
3538         YYLTYPE *yyget_lloc ( yyscan_t scanner );
3539         void yyset_lloc ( YYLTYPE * yyllocp , yyscan_t scanner );
3540         yylloc
3541
3542   Support for yylval assumes that 'YYSTYPE' is a valid type.  Support
3543for yylloc assumes that 'YYSLYPE' is a valid type.  Typically, these
3544types are generated by 'bison', and are included in section 1 of the
3545'flex' input.
3546
3547
3548File: flex.info,  Node: Lex and Posix,  Next: Memory Management,  Prev: Reentrant,  Up: Top
3549
355020 Incompatibilities with Lex and Posix
3551***************************************
3552
3553'flex' is a rewrite of the AT&T Unix _lex_ tool (the two implementations
3554do not share any code, though), with some extensions and
3555incompatibilities, both of which are of concern to those who wish to
3556write scanners acceptable to both implementations.  'flex' is fully
3557compliant with the POSIX 'lex' specification, except that when using
3558'%pointer' (the default), a call to 'unput()' destroys the contents of
3559'yytext', which is counter to the POSIX specification.  In this section
3560we discuss all of the known areas of incompatibility between 'flex',
3561AT&T 'lex', and the POSIX specification.  'flex''s '-l' option turns on
3562maximum compatibility with the original AT&T 'lex' implementation, at
3563the cost of a major loss in the generated scanner's performance.  We
3564note below which incompatibilities can be overcome using the '-l'
3565option.  'flex' is fully compatible with 'lex' with the following
3566exceptions:
3567
3568   * The undocumented 'lex' scanner internal variable 'yylineno' is not
3569     supported unless '-l' or '%option yylineno' is used.
3570
3571   * 'yylineno' should be maintained on a per-buffer basis, rather than
3572     a per-scanner (single global variable) basis.
3573
3574   * 'yylineno' is not part of the POSIX specification.
3575
3576   * The 'input()' routine is not redefinable, though it may be called
3577     to read characters following whatever has been matched by a rule.
3578     If 'input()' encounters an end-of-file the normal 'yywrap()'
3579     processing is done.  A "real" end-of-file is returned by 'input()'
3580     as 'EOF'.
3581
3582   * Input is instead controlled by defining the 'YY_INPUT()' macro.
3583
3584   * The 'flex' restriction that 'input()' cannot be redefined is in
3585     accordance with the POSIX specification, which simply does not
3586     specify any way of controlling the scanner's input other than by
3587     making an initial assignment to 'yyin'.
3588
3589   * The 'unput()' routine is not redefinable.  This restriction is in
3590     accordance with POSIX.
3591
3592   * 'flex' scanners are not as reentrant as 'lex' scanners.  In
3593     particular, if you have an interactive scanner and an interrupt
3594     handler which long-jumps out of the scanner, and the scanner is
3595     subsequently called again, you may get the following message:
3596
3597              fatal flex scanner internal error--end of buffer missed
3598
3599     To reenter the scanner, first use:
3600
3601              yyrestart( yyin );
3602
3603     Note that this call will throw away any buffered input; usually
3604     this isn't a problem with an interactive scanner.  *Note
3605     Reentrant::, for 'flex''s reentrant API.
3606
3607   * Also note that 'flex' C++ scanner classes _are_ reentrant, so if
3608     using C++ is an option for you, you should use them instead.  *Note
3609     Cxx::, and *note Reentrant:: for details.
3610
3611   * 'output()' is not supported.  Output from the ECHO macro is done to
3612     the file-pointer 'yyout' (default 'stdout)'.
3613
3614   * 'output()' is not part of the POSIX specification.
3615
3616   * 'lex' does not support exclusive start conditions (%x), though they
3617     are in the POSIX specification.
3618
3619   * When definitions are expanded, 'flex' encloses them in parentheses.
3620     With 'lex', the following:
3621
3622              NAME    [A-Z][A-Z0-9]*
3623              %%
3624              foo{NAME}?      printf( "Found it\n" );
3625              %%
3626
3627     will not match the string 'foo' because when the macro is expanded
3628     the rule is equivalent to 'foo[A-Z][A-Z0-9]*?' and the precedence
3629     is such that the '?' is associated with '[A-Z0-9]*'.  With 'flex',
3630     the rule will be expanded to 'foo([A-Z][A-Z0-9]*)?' and so the
3631     string 'foo' will match.
3632
3633   * Note that if the definition begins with '^' or ends with '$' then
3634     it is _not_ expanded with parentheses, to allow these operators to
3635     appear in definitions without losing their special meanings.  But
3636     the '<s>', '/', and '<<EOF>>' operators cannot be used in a 'flex'
3637     definition.
3638
3639   * Using '-l' results in the 'lex' behavior of no parentheses around
3640     the definition.
3641
3642   * The POSIX specification is that the definition be enclosed in
3643     parentheses.
3644
3645   * Some implementations of 'lex' allow a rule's action to begin on a
3646     separate line, if the rule's pattern has trailing whitespace:
3647
3648              %%
3649              foo|bar<space here>
3650                { foobar_action();}
3651
3652     'flex' does not support this feature.
3653
3654   * The 'lex' '%r' (generate a Ratfor scanner) option is not supported.
3655     It is not part of the POSIX specification.
3656
3657   * After a call to 'unput()', _yytext_ is undefined until the next
3658     token is matched, unless the scanner was built using '%array'.
3659     This is not the case with 'lex' or the POSIX specification.  The
3660     '-l' option does away with this incompatibility.
3661
3662   * The precedence of the '{,}' (numeric range) operator is different.
3663     The AT&T and POSIX specifications of 'lex' interpret 'abc{1,3}' as
3664     match one, two, or three occurrences of 'abc'", whereas 'flex'
3665     interprets it as "match 'ab' followed by one, two, or three
3666     occurrences of 'c'".  The '-l' and '--posix' options do away with
3667     this incompatibility.
3668
3669   * The precedence of the '^' operator is different.  'lex' interprets
3670     '^foo|bar' as "match either 'foo' at the beginning of a line, or
3671     'bar' anywhere", whereas 'flex' interprets it as "match either
3672     'foo' or 'bar' if they come at the beginning of a line".  The
3673     latter is in agreement with the POSIX specification.
3674
3675   * The special table-size declarations such as '%a' supported by 'lex'
3676     are not required by 'flex' scanners..  'flex' ignores them.
3677   * The name 'FLEX_SCANNER' is '#define''d so scanners may be written
3678     for use with either 'flex' or 'lex'.  Scanners also include
3679     'YY_FLEX_MAJOR_VERSION', 'YY_FLEX_MINOR_VERSION' and
3680     'YY_FLEX_SUBMINOR_VERSION' indicating which version of 'flex'
3681     generated the scanner.  For example, for the 2.5.22 release, these
3682     defines would be 2, 5 and 22 respectively.  If the version of
3683     'flex' being used is a beta version, then the symbol 'FLEX_BETA' is
3684     defined.
3685
3686   * The symbols '[[' and ']]' in the code sections of the input may
3687     conflict with the m4 delimiters.  *Note M4 Dependency::.
3688
3689   The following 'flex' features are not included in 'lex' or the POSIX
3690specification:
3691
3692   * C++ scanners
3693   * %option
3694   * start condition scopes
3695   * start condition stacks
3696   * interactive/non-interactive scanners
3697   * yy_scan_string() and friends
3698   * yyterminate()
3699   * yy_set_interactive()
3700   * yy_set_bol()
3701   * YY_AT_BOL() <<EOF>>
3702   * <*>
3703   * YY_DECL
3704   * YY_START
3705   * YY_USER_ACTION
3706   * YY_USER_INIT
3707   * #line directives
3708   * %{}'s around actions
3709   * reentrant C API
3710   * multiple actions on a line
3711   * almost all of the 'flex' command-line options
3712
3713   The feature "multiple actions on a line" refers to the fact that with
3714'flex' you can put multiple actions on the same line, separated with
3715semi-colons, while with 'lex', the following:
3716
3717         foo    handle_foo(); ++num_foos_seen;
3718
3719   is (rather surprisingly) truncated to
3720
3721         foo    handle_foo();
3722
3723   'flex' does not truncate the action.  Actions that are not enclosed
3724in braces are simply terminated at the end of the line.
3725
3726
3727File: flex.info,  Node: Memory Management,  Next: Serialized Tables,  Prev: Lex and Posix,  Up: Top
3728
372921 Memory Management
3730********************
3731
3732This chapter describes how flex handles dynamic memory, and how you can
3733override the default behavior.
3734
3735* Menu:
3736
3737* The Default Memory Management::
3738* Overriding The Default Memory Management::
3739* A Note About yytext And Memory::
3740
3741
3742File: flex.info,  Node: The Default Memory Management,  Next: Overriding The Default Memory Management,  Prev: Memory Management,  Up: Memory Management
3743
374421.1 The Default Memory Management
3745==================================
3746
3747Flex allocates dynamic memory during initialization, and once in a while
3748from within a call to yylex().  Initialization takes place during the
3749first call to yylex().  Thereafter, flex may reallocate more memory if
3750it needs to enlarge a buffer.  As of version 2.5.9 Flex will clean up
3751all memory when you call 'yylex_destroy' *Note faq-memory-leak::.
3752
3753   Flex allocates dynamic memory for four purposes, listed below (1)
3754
375516kB for the input buffer.
3756     Flex allocates memory for the character buffer used to perform
3757     pattern matching.  Flex must read ahead from the input stream and
3758     store it in a large character buffer.  This buffer is typically the
3759     largest chunk of dynamic memory flex consumes.  This buffer will
3760     grow if necessary, doubling the size each time.  Flex frees this
3761     memory when you call yylex_destroy().  The default size of this
3762     buffer (16384 bytes) is almost always too large.  The ideal size
3763     for this buffer is the length of the longest token expected, in
3764     bytes, plus a little more.  Flex will allocate a few extra bytes
3765     for housekeeping.  Currently, to override the size of the input
3766     buffer you must '#define YY_BUF_SIZE' to whatever number of bytes
3767     you want.  We don't plan to change this in the near future, but we
3768     reserve the right to do so if we ever add a more robust memory
3769     management API.
3770
377164kb for the REJECT state. This will only be allocated if you use REJECT.
3772     The size is large enough to hold the same number of states as
3773     characters in the input buffer.  If you override the size of the
3774     input buffer (via 'YY_BUF_SIZE'), then you automatically override
3775     the size of this buffer as well.
3776
3777100 bytes for the start condition stack.
3778     Flex allocates memory for the start condition stack.  This is the
3779     stack used for pushing start states, i.e., with yy_push_state().
3780     It will grow if necessary.  Since the states are simply integers,
3781     this stack doesn't consume much memory.  This stack is not present
3782     if '%option stack' is not specified.  You will rarely need to tune
3783     this buffer.  The ideal size for this stack is the maximum depth
3784     expected.  The memory for this stack is automatically destroyed
3785     when you call yylex_destroy().  *Note option-stack::.
3786
378740 bytes for each YY_BUFFER_STATE.
3788     Flex allocates memory for each YY_BUFFER_STATE. The buffer state
3789     itself is about 40 bytes, plus an additional large character buffer
3790     (described above.)  The initial buffer state is created during
3791     initialization, and with each call to yy_create_buffer().  You
3792     can't tune the size of this, but you can tune the character buffer
3793     as described above.  Any buffer state that you explicitly create by
3794     calling yy_create_buffer() is _NOT_ destroyed automatically.  You
3795     must call yy_delete_buffer() to free the memory.  The exception to
3796     this rule is that flex will delete the current buffer automatically
3797     when you call yylex_destroy().  If you delete the current buffer,
3798     be sure to set it to NULL. That way, flex will not try to delete
3799     the buffer a second time (possibly crashing your program!)  At the
3800     time of this writing, flex does not provide a growable stack for
3801     the buffer states.  You have to manage that yourself.  *Note
3802     Multiple Input Buffers::.
3803
380484 bytes for the reentrant scanner guts
3805     Flex allocates about 84 bytes for the reentrant scanner structure
3806     when you call yylex_init().  It is destroyed when the user calls
3807     yylex_destroy().
3808
3809   ---------- Footnotes ----------
3810
3811   (1) The quantities given here are approximate, and may vary due to
3812host architecture, compiler configuration, or due to future enhancements
3813to flex.
3814
3815
3816File: flex.info,  Node: Overriding The Default Memory Management,  Next: A Note About yytext And Memory,  Prev: The Default Memory Management,  Up: Memory Management
3817
381821.2 Overriding The Default Memory Management
3819=============================================
3820
3821Flex calls the functions 'yyalloc', 'yyrealloc', and 'yyfree' when it
3822needs to allocate or free memory.  By default, these functions are
3823wrappers around the standard C functions, 'malloc', 'realloc', and
3824'free', respectively.  You can override the default implementations by
3825telling flex that you will provide your own implementations.
3826
3827   To override the default implementations, you must do two things:
3828
3829  1. Suppress the default implementations by specifying one or more of
3830     the following options:
3831
3832        * '%option noyyalloc'
3833        * '%option noyyrealloc'
3834        * '%option noyyfree'.
3835
3836  2. Provide your own implementation of the following functions: (1)
3837
3838          // For a non-reentrant scanner
3839          void * yyalloc (size_t bytes);
3840          void * yyrealloc (void * ptr, size_t bytes);
3841          void   yyfree (void * ptr);
3842
3843          // For a reentrant scanner
3844          void * yyalloc (size_t bytes, void * yyscanner);
3845          void * yyrealloc (void * ptr, size_t bytes, void * yyscanner);
3846          void   yyfree (void * ptr, void * yyscanner);
3847
3848   In the following example, we will override all three memory routines.
3849We assume that there is a custom allocator with garbage collection.  In
3850order to make this example interesting, we will use a reentrant scanner,
3851passing a pointer to the custom allocator through 'yyextra'.
3852
3853     %{
3854     #include "some_allocator.h"
3855     %}
3856
3857     /* Suppress the default implementations. */
3858     %option noyyalloc noyyrealloc noyyfree
3859     %option reentrant
3860
3861     /* Initialize the allocator. */
3862     %{
3863     #define YY_EXTRA_TYPE  struct allocator*
3864     #define YY_USER_INIT  yyextra = allocator_create();
3865     %}
3866
3867     %%
3868     .|\n   ;
3869     %%
3870
3871     /* Provide our own implementations. */
3872     void * yyalloc (size_t bytes, void* yyscanner) {
3873         return allocator_alloc (yyextra, bytes);
3874     }
3875
3876     void * yyrealloc (void * ptr, size_t bytes, void* yyscanner) {
3877         return allocator_realloc (yyextra, bytes);
3878     }
3879
3880     void yyfree (void * ptr, void * yyscanner) {
3881         /* Do nothing -- we leave it to the garbage collector. */
3882     }
3883
3884
3885   ---------- Footnotes ----------
3886
3887   (1) It is not necessary to override all (or any) of the memory
3888management routines.  You may, for example, override 'yyrealloc', but
3889not 'yyfree' or 'yyalloc'.
3890
3891
3892File: flex.info,  Node: A Note About yytext And Memory,  Prev: Overriding The Default Memory Management,  Up: Memory Management
3893
389421.3 A Note About yytext And Memory
3895===================================
3896
3897When flex finds a match, 'yytext' points to the first character of the
3898match in the input buffer.  The string itself is part of the input
3899buffer, and is _NOT_ allocated separately.  The value of yytext will be
3900overwritten the next time yylex() is called.  In short, the value of
3901yytext is only valid from within the matched rule's action.
3902
3903   Often, you want the value of yytext to persist for later processing,
3904i.e., by a parser with non-zero lookahead.  In order to preserve yytext,
3905you will have to copy it with strdup() or a similar function.  But this
3906introduces some headache because your parser is now responsible for
3907freeing the copy of yytext.  If you use a yacc or bison parser,
3908(commonly used with flex), you will discover that the error recovery
3909mechanisms can cause memory to be leaked.
3910
3911   To prevent memory leaks from strdup'd yytext, you will have to track
3912the memory somehow.  Our experience has shown that a garbage collection
3913mechanism or a pooled memory mechanism will save you a lot of grief when
3914writing parsers.
3915
3916
3917File: flex.info,  Node: Serialized Tables,  Next: Diagnostics,  Prev: Memory Management,  Up: Top
3918
391922 Serialized Tables
3920********************
3921
3922A 'flex' scanner has the ability to save the DFA tables to a file, and
3923load them at runtime when needed.  The motivation for this feature is to
3924reduce the runtime memory footprint.  Traditionally, these tables have
3925been compiled into the scanner as C arrays, and are sometimes quite
3926large.  Since the tables are compiled into the scanner, the memory used
3927by the tables can never be freed.  This is a waste of memory, especially
3928if an application uses several scanners, but none of them at the same
3929time.
3930
3931   The serialization feature allows the tables to be loaded at runtime,
3932before scanning begins.  The tables may be discarded when scanning is
3933finished.
3934
3935* Menu:
3936
3937* Creating Serialized Tables::
3938* Loading and Unloading Serialized Tables::
3939* Tables File Format::
3940
3941
3942File: flex.info,  Node: Creating Serialized Tables,  Next: Loading and Unloading Serialized Tables,  Prev: Serialized Tables,  Up: Serialized Tables
3943
394422.1 Creating Serialized Tables
3945===============================
3946
3947You may create a scanner with serialized tables by specifying:
3948
3949         %option tables-file=FILE
3950     or
3951         --tables-file=FILE
3952
3953   These options instruct flex to save the DFA tables to the file FILE.
3954The tables will _not_ be embedded in the generated scanner.  The scanner
3955will not function on its own.  The scanner will be dependent upon the
3956serialized tables.  You must load the tables from this file at runtime
3957before you can scan anything.
3958
3959   If you do not specify a filename to '--tables-file', the tables will
3960be saved to 'lex.yy.tables', where 'yy' is the appropriate prefix.
3961
3962   If your project uses several different scanners, you can concatenate
3963the serialized tables into one file, and flex will find the correct set
3964of tables, using the scanner prefix as part of the lookup key.  An
3965example follows:
3966
3967     $ flex --tables-file --prefix=cpp cpp.l
3968     $ flex --tables-file --prefix=c   c.l
3969     $ cat lex.cpp.tables lex.c.tables  >  all.tables
3970
3971   The above example created two scanners, 'cpp', and 'c'.  Since we did
3972not specify a filename, the tables were serialized to 'lex.c.tables' and
3973'lex.cpp.tables', respectively.  Then, we concatenated the two files
3974together into 'all.tables', which we will distribute with our project.
3975At runtime, we will open the file and tell flex to load the tables from
3976it.  Flex will find the correct tables automatically.  (See next
3977section).
3978
3979
3980File: flex.info,  Node: Loading and Unloading Serialized Tables,  Next: Tables File Format,  Prev: Creating Serialized Tables,  Up: Serialized Tables
3981
398222.2 Loading and Unloading Serialized Tables
3983============================================
3984
3985If you've built your scanner with '%option tables-file', then you must
3986load the scanner tables at runtime.  This can be accomplished with the
3987following function:
3988
3989 -- Function: int yytables_fload (FILE* FP [, yyscan_t SCANNER])
3990     Locates scanner tables in the stream pointed to by FP and loads
3991     them.  Memory for the tables is allocated via 'yyalloc'.  You must
3992     call this function before the first call to 'yylex'.  The argument
3993     SCANNER only appears in the reentrant scanner.  This function
3994     returns '0' (zero) on success, or non-zero on error.
3995
3996   The loaded tables are *not* automatically destroyed (unloaded) when
3997you call 'yylex_destroy'.  The reason is that you may create several
3998scanners of the same type (in a reentrant scanner), each of which needs
3999access to these tables.  To avoid a nasty memory leak, you must call the
4000following function:
4001
4002 -- Function: int yytables_destroy ([yyscan_t SCANNER])
4003     Unloads the scanner tables.  The tables must be loaded again before
4004     you can scan any more data.  The argument SCANNER only appears in
4005     the reentrant scanner.  This function returns '0' (zero) on
4006     success, or non-zero on error.
4007
4008   *The functions 'yytables_fload' and 'yytables_destroy' are not
4009thread-safe.*  You must ensure that these functions are called exactly
4010once (for each scanner type) in a threaded program, before any thread
4011calls 'yylex'.  After the tables are loaded, they are never written to,
4012and no thread protection is required thereafter - until you destroy
4013them.
4014
4015
4016File: flex.info,  Node: Tables File Format,  Prev: Loading and Unloading Serialized Tables,  Up: Serialized Tables
4017
401822.3 Tables File Format
4019=======================
4020
4021This section defines the file format of serialized 'flex' tables.
4022
4023   The tables format allows for one or more sets of tables to be
4024specified, where each set corresponds to a given scanner.  Scanners are
4025indexed by name, as described below.  The file format is as follows:
4026
4027                      TABLE SET 1
4028                     +-------------------------------+
4029             Header  | uint32          th_magic;     |
4030                     | uint32          th_hsize;     |
4031                     | uint32          th_ssize;     |
4032                     | uint16          th_flags;     |
4033                     | char            th_version[]; |
4034                     | char            th_name[];    |
4035                     | uint8           th_pad64[];   |
4036                     +-------------------------------+
4037             Table 1 | uint16          td_id;        |
4038                     | uint16          td_flags;     |
4039                     | uint32          td_hilen;     |
4040                     | uint32          td_lolen;     |
4041                     | void            td_data[];    |
4042                     | uint8           td_pad64[];   |
4043                     +-------------------------------+
4044             Table 2 |                               |
4045                .    .                               .
4046                .    .                               .
4047                .    .                               .
4048                .    .                               .
4049             Table n |                               |
4050                     +-------------------------------+
4051                      TABLE SET 2
4052                           .
4053                           .
4054                           .
4055                      TABLE SET N
4056
4057   The above diagram shows that a complete set of tables consists of a
4058header followed by multiple individual tables.  Furthermore, multiple
4059complete sets may be present in the same file, each set with its own
4060header and tables.  The sets are contiguous in the file.  The only way
4061to know if another set follows is to check the next four bytes for the
4062magic number (or check for EOF). The header and tables sections are
4063padded to 64-bit boundaries.  Below we describe each field in detail.
4064This format does not specify how the scanner will expand the given data,
4065i.e., data may be serialized as int8, but expanded to an int32 array at
4066runtime.  This is to reduce the size of the serialized data where
4067possible.  Remember, _all integer values are in network byte order_.
4068
4069Fields of a table header:
4070
4071'th_magic'
4072     Magic number, always 0xF13C57B1.
4073
4074'th_hsize'
4075     Size of this entire header, in bytes, including all fields plus any
4076     padding.
4077
4078'th_ssize'
4079     Size of this entire set, in bytes, including the header, all
4080     tables, plus any padding.
4081
4082'th_flags'
4083     Bit flags for this table set.  Currently unused.
4084
4085'th_version[]'
4086     Flex version in NULL-terminated string format.  e.g., '2.5.13a'.
4087     This is the version of flex that was used to create the serialized
4088     tables.
4089
4090'th_name[]'
4091     Contains the name of this table set.  The default is 'yytables',
4092     and is prefixed accordingly, e.g., 'footables'.  Must be
4093     NULL-terminated.
4094
4095'th_pad64[]'
4096     Zero or more NULL bytes, padding the entire header to the next
4097     64-bit boundary as calculated from the beginning of the header.
4098
4099Fields of a table:
4100
4101'td_id'
4102     Specifies the table identifier.  Possible values are:
4103     'YYTD_ID_ACCEPT (0x01)'
4104          'yy_accept'
4105     'YYTD_ID_BASE (0x02)'
4106          'yy_base'
4107     'YYTD_ID_CHK (0x03)'
4108          'yy_chk'
4109     'YYTD_ID_DEF (0x04)'
4110          'yy_def'
4111     'YYTD_ID_EC (0x05)'
4112          'yy_ec '
4113     'YYTD_ID_META (0x06)'
4114          'yy_meta'
4115     'YYTD_ID_NUL_TRANS (0x07)'
4116          'yy_NUL_trans'
4117     'YYTD_ID_NXT (0x08)'
4118          'yy_nxt'.  This array may be two dimensional.  See the
4119          'td_hilen' field below.
4120     'YYTD_ID_RULE_CAN_MATCH_EOL (0x09)'
4121          'yy_rule_can_match_eol'
4122     'YYTD_ID_START_STATE_LIST (0x0A)'
4123          'yy_start_state_list'.  This array is handled specially
4124          because it is an array of pointers to structs.  See the
4125          'td_flags' field below.
4126     'YYTD_ID_TRANSITION (0x0B)'
4127          'yy_transition'.  This array is handled specially because it
4128          is an array of structs.  See the 'td_lolen' field below.
4129     'YYTD_ID_ACCLIST (0x0C)'
4130          'yy_acclist'
4131
4132'td_flags'
4133     Bit flags describing how to interpret the data in 'td_data'.  The
4134     data arrays are one-dimensional by default, but may be two
4135     dimensional as specified in the 'td_hilen' field.
4136
4137     'YYTD_DATA8 (0x01)'
4138          The data is serialized as an array of type int8.
4139     'YYTD_DATA16 (0x02)'
4140          The data is serialized as an array of type int16.
4141     'YYTD_DATA32 (0x04)'
4142          The data is serialized as an array of type int32.
4143     'YYTD_PTRANS (0x08)'
4144          The data is a list of indexes of entries in the expanded
4145          'yy_transition' array.  Each index should be expanded to a
4146          pointer to the corresponding entry in the 'yy_transition'
4147          array.  We count on the fact that the 'yy_transition' array
4148          has already been seen.
4149     'YYTD_STRUCT (0x10)'
4150          The data is a list of yy_trans_info structs, each of which
4151          consists of two integers.  There is no padding between struct
4152          elements or between structs.  The type of each member is
4153          determined by the 'YYTD_DATA*' bits.
4154
4155'td_hilen'
4156     If 'td_hilen' is non-zero, then the data is a two-dimensional
4157     array.  Otherwise, the data is a one-dimensional array.  'td_hilen'
4158     contains the number of elements in the higher dimensional array,
4159     and 'td_lolen' contains the number of elements in the lowest
4160     dimension.
4161
4162     Conceptually, 'td_data' is either 'sometype td_data[td_lolen]', or
4163     'sometype td_data[td_hilen][td_lolen]', where 'sometype' is
4164     specified by the 'td_flags' field.  It is possible for both
4165     'td_lolen' and 'td_hilen' to be zero, in which case 'td_data' is a
4166     zero length array, and no data is loaded, i.e., this table is
4167     simply skipped.  Flex does not currently generate tables of zero
4168     length.
4169
4170'td_lolen'
4171     Specifies the number of elements in the lowest dimension array.  If
4172     this is a one-dimensional array, then it is simply the number of
4173     elements in this array.  The element size is determined by the
4174     'td_flags' field.
4175
4176'td_data[]'
4177     The table data.  This array may be a one- or two-dimensional array,
4178     of type 'int8', 'int16', 'int32', 'struct yy_trans_info', or
4179     'struct yy_trans_info*', depending upon the values in the
4180     'td_flags', 'td_hilen', and 'td_lolen' fields.
4181
4182'td_pad64[]'
4183     Zero or more NULL bytes, padding the entire table to the next
4184     64-bit boundary as calculated from the beginning of this table.
4185
4186
4187File: flex.info,  Node: Diagnostics,  Next: Limitations,  Prev: Serialized Tables,  Up: Top
4188
418923 Diagnostics
4190**************
4191
4192The following is a list of 'flex' diagnostic messages:
4193
4194   * 'warning, rule cannot be matched' indicates that the given rule
4195     cannot be matched because it follows other rules that will always
4196     match the same text as it.  For example, in the following 'foo'
4197     cannot be matched because it comes after an identifier "catch-all"
4198     rule:
4199
4200              [a-z]+    got_identifier();
4201              foo       got_foo();
4202
4203     Using 'REJECT' in a scanner suppresses this warning.
4204
4205   * 'warning, -s option given but default rule can be matched' means
4206     that it is possible (perhaps only in a particular start condition)
4207     that the default rule (match any single character) is the only one
4208     that will match a particular input.  Since '-s' was given,
4209     presumably this is not intended.
4210
4211   * 'reject_used_but_not_detected undefined' or
4212     'yymore_used_but_not_detected undefined'.  These errors can occur
4213     at compile time.  They indicate that the scanner uses 'REJECT' or
4214     'yymore()' but that 'flex' failed to notice the fact, meaning that
4215     'flex' scanned the first two sections looking for occurrences of
4216     these actions and failed to find any, but somehow you snuck some in
4217     (via a #include file, for example).  Use '%option reject' or
4218     '%option yymore' to indicate to 'flex' that you really do use these
4219     features.
4220
4221   * 'flex scanner jammed'.  a scanner compiled with '-s' has
4222     encountered an input string which wasn't matched by any of its
4223     rules.  This error can also occur due to internal problems.
4224
4225   * 'token too large, exceeds YYLMAX'.  your scanner uses '%array' and
4226     one of its rules matched a string longer than the 'YYLMAX' constant
4227     (8K bytes by default).  You can increase the value by #define'ing
4228     'YYLMAX' in the definitions section of your 'flex' input.
4229
4230   * 'scanner requires -8 flag to use the character 'x''.  Your scanner
4231     specification includes recognizing the 8-bit character ''x'' and
4232     you did not specify the -8 flag, and your scanner defaulted to
4233     7-bit because you used the '-Cf' or '-CF' table compression
4234     options.  See the discussion of the '-7' flag, *note Scanner
4235     Options::, for details.
4236
4237   * 'flex scanner push-back overflow'.  you used 'unput()' to push back
4238     so much text that the scanner's buffer could not hold both the
4239     pushed-back text and the current token in 'yytext'.  Ideally the
4240     scanner should dynamically resize the buffer in this case, but at
4241     present it does not.
4242
4243   * 'input buffer overflow, can't enlarge buffer because scanner uses
4244     REJECT'.  the scanner was working on matching an extremely large
4245     token and needed to expand the input buffer.  This doesn't work
4246     with scanners that use 'REJECT'.
4247
4248   * 'fatal flex scanner internal error--end of buffer missed'.  This
4249     can occur in a scanner which is reentered after a long-jump has
4250     jumped out (or over) the scanner's activation frame.  Before
4251     reentering the scanner, use:
4252              yyrestart( yyin );
4253     or, as noted above, switch to using the C++ scanner class.
4254
4255   * 'too many start conditions in <> construct!' you listed more start
4256     conditions in a <> construct than exist (so you must have listed at
4257     least one of them twice).
4258
4259
4260File: flex.info,  Node: Limitations,  Next: Bibliography,  Prev: Diagnostics,  Up: Top
4261
426224 Limitations
4263**************
4264
4265Some trailing context patterns cannot be properly matched and generate
4266warning messages ('dangerous trailing context').  These are patterns
4267where the ending of the first part of the rule matches the beginning of
4268the second part, such as 'zx*/xy*', where the 'x*' matches the 'x' at
4269the beginning of the trailing context.  (Note that the POSIX draft
4270states that the text matched by such patterns is undefined.)  For some
4271trailing context rules, parts which are actually fixed-length are not
4272recognized as such, leading to the abovementioned performance loss.  In
4273particular, parts using '|' or '{n}' (such as 'foo{3}') are always
4274considered variable-length.  Combining trailing context with the special
4275'|' action can result in _fixed_ trailing context being turned into the
4276more expensive _variable_ trailing context.  For example, in the
4277following:
4278
4279         %%
4280         abc      |
4281         xyz/def
4282
4283   Use of 'unput()' invalidates yytext and yyleng, unless the '%array'
4284directive or the '-l' option has been used.  Pattern-matching of 'NUL's
4285is substantially slower than matching other characters.  Dynamic
4286resizing of the input buffer is slow, as it entails rescanning all the
4287text matched so far by the current (generally huge) token.  Due to both
4288buffering of input and read-ahead, you cannot intermix calls to
4289'<stdio.h>' routines, such as, getchar(), with 'flex' rules and expect
4290it to work.  Call 'input()' instead.  The total table entries listed by
4291the '-v' flag excludes the number of table entries needed to determine
4292what rule has been matched.  The number of entries is equal to the
4293number of DFA states if the scanner does not use 'REJECT', and somewhat
4294greater than the number of states if it does.  'REJECT' cannot be used
4295with the '-f' or '-F' options.
4296
4297   The 'flex' internal algorithms need documentation.
4298
4299
4300File: flex.info,  Node: Bibliography,  Next: FAQ,  Prev: Limitations,  Up: Top
4301
430225 Additional Reading
4303*********************
4304
4305You may wish to read more about the following programs:
4306   * lex
4307   * yacc
4308   * sed
4309   * awk
4310
4311   The following books may contain material of interest:
4312
4313   John Levine, Tony Mason, and Doug Brown, _Lex & Yacc_, O'Reilly and
4314Associates.  Be sure to get the 2nd edition.
4315
4316   M. E. Lesk and E. Schmidt, _LEX - Lexical Analyzer Generator_
4317
4318   Alfred Aho, Ravi Sethi and Jeffrey Ullman, _Compilers: Principles,
4319Techniques and Tools_, Addison-Wesley (1986).  Describes the
4320pattern-matching techniques used by 'flex' (deterministic finite
4321automata).
4322
4323
4324File: flex.info,  Node: FAQ,  Next: Appendices,  Prev: Bibliography,  Up: Top
4325
4326FAQ
4327***
4328
4329From time to time, the 'flex' maintainer receives certain questions.
4330Rather than repeat answers to well-understood problems, we publish them
4331here.
4332
4333* Menu:
4334
4335* When was flex born?::
4336* How do I expand backslash-escape sequences in C-style quoted strings?::
4337* Why do flex scanners call fileno if it is not ANSI compatible?::
4338* Does flex support recursive pattern definitions?::
4339* How do I skip huge chunks of input (tens of megabytes) while using flex?::
4340* Flex is not matching my patterns in the same order that I defined them.::
4341* My actions are executing out of order or sometimes not at all.::
4342* How can I have multiple input sources feed into the same scanner at the same time?::
4343* Can I build nested parsers that work with the same input file?::
4344* How can I match text only at the end of a file?::
4345* How can I make REJECT cascade across start condition boundaries?::
4346* Why cant I use fast or full tables with interactive mode?::
4347* How much faster is -F or -f than -C?::
4348* If I have a simple grammar cant I just parse it with flex?::
4349* Why doesn't yyrestart() set the start state back to INITIAL?::
4350* How can I match C-style comments?::
4351* The period isn't working the way I expected.::
4352* Can I get the flex manual in another format?::
4353* Does there exist a "faster" NDFA->DFA algorithm?::
4354* How does flex compile the DFA so quickly?::
4355* How can I use more than 8192 rules?::
4356* How do I abandon a file in the middle of a scan and switch to a new file?::
4357* How do I execute code only during initialization (only before the first scan)?::
4358* How do I execute code at termination?::
4359* Where else can I find help?::
4360* Can I include comments in the "rules" section of the file?::
4361* I get an error about undefined yywrap().::
4362* How can I change the matching pattern at run time?::
4363* How can I expand macros in the input?::
4364* How can I build a two-pass scanner?::
4365* How do I match any string not matched in the preceding rules?::
4366* I am trying to port code from AT&T lex that uses yysptr and yysbuf.::
4367* Is there a way to make flex treat NULL like a regular character?::
4368* Whenever flex can not match the input it says "flex scanner jammed".::
4369* Why doesn't flex have non-greedy operators like perl does?::
4370* Memory leak - 16386 bytes allocated by malloc.::
4371* How do I track the byte offset for lseek()?::
4372* How do I use my own I/O classes in a C++ scanner?::
4373* How do I skip as many chars as possible?::
4374* deleteme00::
4375* Are certain equivalent patterns faster than others?::
4376* Is backing up a big deal?::
4377* Can I fake multi-byte character support?::
4378* deleteme01::
4379* Can you discuss some flex internals?::
4380* unput() messes up yy_at_bol::
4381* The | operator is not doing what I want::
4382* Why can't flex understand this variable trailing context pattern?::
4383* The ^ operator isn't working::
4384* Trailing context is getting confused with trailing optional patterns::
4385* Is flex GNU or not?::
4386* ERASEME53::
4387* I need to scan if-then-else blocks and while loops::
4388* ERASEME55::
4389* ERASEME56::
4390* ERASEME57::
4391* Is there a repository for flex scanners?::
4392* How can I conditionally compile or preprocess my flex input file?::
4393* Where can I find grammars for lex and yacc?::
4394* I get an end-of-buffer message for each character scanned.::
4395* unnamed-faq-62::
4396* unnamed-faq-63::
4397* unnamed-faq-64::
4398* unnamed-faq-65::
4399* unnamed-faq-66::
4400* unnamed-faq-67::
4401* unnamed-faq-68::
4402* unnamed-faq-69::
4403* unnamed-faq-70::
4404* unnamed-faq-71::
4405* unnamed-faq-72::
4406* unnamed-faq-73::
4407* unnamed-faq-74::
4408* unnamed-faq-75::
4409* unnamed-faq-76::
4410* unnamed-faq-77::
4411* unnamed-faq-78::
4412* unnamed-faq-79::
4413* unnamed-faq-80::
4414* unnamed-faq-81::
4415* unnamed-faq-82::
4416* unnamed-faq-83::
4417* unnamed-faq-84::
4418* unnamed-faq-85::
4419* unnamed-faq-86::
4420* unnamed-faq-87::
4421* unnamed-faq-88::
4422* unnamed-faq-90::
4423* unnamed-faq-91::
4424* unnamed-faq-92::
4425* unnamed-faq-93::
4426* unnamed-faq-94::
4427* unnamed-faq-95::
4428* unnamed-faq-96::
4429* unnamed-faq-97::
4430* unnamed-faq-98::
4431* unnamed-faq-99::
4432* unnamed-faq-100::
4433* unnamed-faq-101::
4434* What is the difference between YYLEX_PARAM and YY_DECL?::
4435* Why do I get "conflicting types for yylex" error?::
4436* How do I access the values set in a Flex action from within a Bison action?::
4437
4438
4439File: flex.info,  Node: When was flex born?,  Next: How do I expand backslash-escape sequences in C-style quoted strings?,  Up: FAQ
4440
4441When was flex born?
4442===================
4443
4444Vern Paxson took over the 'Software Tools' lex project from Jef
4445Poskanzer in 1982.  At that point it was written in Ratfor.  Around 1987
4446or so, Paxson translated it into C, and a legend was born :-).
4447
4448
4449File: flex.info,  Node: How do I expand backslash-escape sequences in C-style quoted strings?,  Next: Why do flex scanners call fileno if it is not ANSI compatible?,  Prev: When was flex born?,  Up: FAQ
4450
4451How do I expand backslash-escape sequences in C-style quoted strings?
4452=====================================================================
4453
4454A key point when scanning quoted strings is that you cannot (easily)
4455write a single rule that will precisely match the string if you allow
4456things like embedded escape sequences and newlines.  If you try to match
4457strings with a single rule then you'll wind up having to rescan the
4458string anyway to find any escape sequences.
4459
4460   Instead you can use exclusive start conditions and a set of rules,
4461one for matching non-escaped text, one for matching a single escape, one
4462for matching an embedded newline, and one for recognizing the end of the
4463string.  Each of these rules is then faced with the question of where to
4464put its intermediary results.  The best solution is for the rules to
4465append their local value of 'yytext' to the end of a "string literal"
4466buffer.  A rule like the escape-matcher will append to the buffer the
4467meaning of the escape sequence rather than the literal text in 'yytext'.
4468In this way, 'yytext' does not need to be modified at all.
4469
4470
4471File: flex.info,  Node: Why do flex scanners call fileno if it is not ANSI compatible?,  Next: Does flex support recursive pattern definitions?,  Prev: How do I expand backslash-escape sequences in C-style quoted strings?,  Up: FAQ
4472
4473Why do flex scanners call fileno if it is not ANSI compatible?
4474==============================================================
4475
4476Flex scanners call 'fileno()' in order to get the file descriptor
4477corresponding to 'yyin'.  The file descriptor may be passed to
4478'isatty()' or 'read()', depending upon which '%options' you specified.
4479If your system does not have 'fileno()' support, to get rid of the
4480'read()' call, do not specify '%option read'.  To get rid of the
4481'isatty()' call, you must specify one of '%option always-interactive' or
4482'%option never-interactive'.
4483
4484
4485File: flex.info,  Node: Does flex support recursive pattern definitions?,  Next: How do I skip huge chunks of input (tens of megabytes) while using flex?,  Prev: Why do flex scanners call fileno if it is not ANSI compatible?,  Up: FAQ
4486
4487Does flex support recursive pattern definitions?
4488================================================
4489
4490e.g.,
4491
4492     %%
4493     block   "{"({block}|{statement})*"}"
4494
4495   No.  You cannot have recursive definitions.  The pattern-matching
4496power of regular expressions in general (and therefore flex scanners,
4497too) is limited.  In particular, regular expressions cannot "balance"
4498parentheses to an arbitrary degree.  For example, it's impossible to
4499write a regular expression that matches all strings containing the same
4500number of '{'s as '}'s.  For more powerful pattern matching, you need a
4501parser, such as 'GNU bison'.
4502
4503
4504File: flex.info,  Node: How do I skip huge chunks of input (tens of megabytes) while using flex?,  Next: Flex is not matching my patterns in the same order that I defined them.,  Prev: Does flex support recursive pattern definitions?,  Up: FAQ
4505
4506How do I skip huge chunks of input (tens of megabytes) while using flex?
4507========================================================================
4508
4509Use 'fseek()' (or 'lseek()') to position yyin, then call 'yyrestart()'.
4510
4511
4512File: flex.info,  Node: Flex is not matching my patterns in the same order that I defined them.,  Next: My actions are executing out of order or sometimes not at all.,  Prev: How do I skip huge chunks of input (tens of megabytes) while using flex?,  Up: FAQ
4513
4514Flex is not matching my patterns in the same order that I defined them.
4515=======================================================================
4516
4517'flex' picks the rule that matches the most text (i.e., the longest
4518possible input string).  This is because 'flex' uses an entirely
4519different matching technique ("deterministic finite automata") that
4520actually does all of the matching simultaneously, in parallel.  (Seems
4521impossible, but it's actually a fairly simple technique once you
4522understand the principles.)
4523
4524   A side-effect of this parallel matching is that when the input
4525matches more than one rule, 'flex' scanners pick the rule that matched
4526the _most_ text.  This is explained further in the manual, in the
4527section *Note Matching::.
4528
4529   If you want 'flex' to choose a shorter match, then you can work
4530around this behavior by expanding your short rule to match more text,
4531then put back the extra:
4532
4533     data_.*        yyless( 5 ); BEGIN BLOCKIDSTATE;
4534
4535   Another fix would be to make the second rule active only during the
4536'<BLOCKIDSTATE>' start condition, and make that start condition
4537exclusive by declaring it with '%x' instead of '%s'.
4538
4539   A final fix is to change the input language so that the ambiguity for
4540'data_' is removed, by adding characters to it that don't match the
4541identifier rule, or by removing characters (such as '_') from the
4542identifier rule so it no longer matches 'data_'.  (Of course, you might
4543also not have the option of changing the input language.)
4544
4545
4546File: flex.info,  Node: My actions are executing out of order or sometimes not at all.,  Next: How can I have multiple input sources feed into the same scanner at the same time?,  Prev: Flex is not matching my patterns in the same order that I defined them.,  Up: FAQ
4547
4548My actions are executing out of order or sometimes not at all.
4549==============================================================
4550
4551Most likely, you have (in error) placed the opening '{' of the action
4552block on a different line than the rule, e.g.,
4553
4554     ^(foo|bar)
4555     {  <<<--- WRONG!
4556
4557     }
4558
4559   'flex' requires that the opening '{' of an action associated with a
4560rule begin on the same line as does the rule.  You need instead to write
4561your rules as follows:
4562
4563     ^(foo|bar)   {  // CORRECT!
4564
4565     }
4566
4567
4568File: flex.info,  Node: How can I have multiple input sources feed into the same scanner at the same time?,  Next: Can I build nested parsers that work with the same input file?,  Prev: My actions are executing out of order or sometimes not at all.,  Up: FAQ
4569
4570How can I have multiple input sources feed into the same scanner at the same time?
4571==================================================================================
4572
4573If ...
4574   * your scanner is free of backtracking (verified using 'flex''s '-b'
4575     flag),
4576   * AND you run your scanner interactively ('-I' option; default unless
4577     using special table compression options),
4578   * AND you feed it one character at a time by redefining 'YY_INPUT' to
4579     do so,
4580
4581   then every time it matches a token, it will have exhausted its input
4582buffer (because the scanner is free of backtracking).  This means you
4583can safely use 'select()' at the point and only call 'yylex()' for
4584another token if 'select()' indicates there's data available.
4585
4586   That is, move the 'select()' out from the input function to a point
4587where it determines whether 'yylex()' gets called for the next token.
4588
4589   With this approach, you will still have problems if your input can
4590arrive piecemeal; 'select()' could inform you that the beginning of a
4591token is available, you call 'yylex()' to get it, but it winds up
4592blocking waiting for the later characters in the token.
4593
4594   Here's another way: Move your input multiplexing inside of
4595'YY_INPUT'.  That is, whenever 'YY_INPUT' is called, it 'select()''s to
4596see where input is available.  If input is available for the scanner, it
4597reads and returns the next byte.  If input is available from another
4598source, it calls whatever function is responsible for reading from that
4599source.  (If no input is available, it blocks until some input is
4600available.)  I've used this technique in an interpreter I wrote that
4601both reads keyboard input using a 'flex' scanner and IPC traffic from
4602sockets, and it works fine.
4603
4604
4605File: flex.info,  Node: Can I build nested parsers that work with the same input file?,  Next: How can I match text only at the end of a file?,  Prev: How can I have multiple input sources feed into the same scanner at the same time?,  Up: FAQ
4606
4607Can I build nested parsers that work with the same input file?
4608==============================================================
4609
4610This is not going to work without some additional effort.  The reason is
4611that 'flex' block-buffers the input it reads from 'yyin'.  This means
4612that the "outermost" 'yylex()', when called, will automatically slurp up
4613the first 8K of input available on yyin, and subsequent calls to other
4614'yylex()''s won't see that input.  You might be tempted to work around
4615this problem by redefining 'YY_INPUT' to only return a small amount of
4616text, but it turns out that that approach is quite difficult.  Instead,
4617the best solution is to combine all of your scanners into one large
4618scanner, using a different exclusive start condition for each.
4619
4620
4621File: flex.info,  Node: How can I match text only at the end of a file?,  Next: How can I make REJECT cascade across start condition boundaries?,  Prev: Can I build nested parsers that work with the same input file?,  Up: FAQ
4622
4623How can I match text only at the end of a file?
4624===============================================
4625
4626There is no way to write a rule which is "match this text, but only if
4627it comes at the end of the file".  You can fake it, though, if you
4628happen to have a character lying around that you don't allow in your
4629input.  Then you redefine 'YY_INPUT' to call your own routine which, if
4630it sees an 'EOF', returns the magic character first (and remembers to
4631return a real 'EOF' next time it's called).  Then you could write:
4632
4633     <COMMENT>(.|\n)*{EOF_CHAR}    /* saw comment at EOF */
4634
4635
4636File: flex.info,  Node: How can I make REJECT cascade across start condition boundaries?,  Next: Why cant I use fast or full tables with interactive mode?,  Prev: How can I match text only at the end of a file?,  Up: FAQ
4637
4638How can I make REJECT cascade across start condition boundaries?
4639================================================================
4640
4641You can do this as follows.  Suppose you have a start condition 'A', and
4642after exhausting all of the possible matches in '<A>', you want to try
4643matches in '<INITIAL>'.  Then you could use the following:
4644
4645     %x A
4646     %%
4647     <A>rule_that_is_long    ...; REJECT;
4648     <A>rule                 ...; REJECT; /* shorter rule */
4649     <A>etc.
4650     ...
4651     <A>.|\n  {
4652     /* Shortest and last rule in <A>, so
4653     * cascaded REJECTs will eventually
4654     * wind up matching this rule.  We want
4655     * to now switch to the initial state
4656     * and try matching from there instead.
4657     */
4658     yyless(0);    /* put back matched text */
4659     BEGIN(INITIAL);
4660     }
4661
4662
4663File: flex.info,  Node: Why cant I use fast or full tables with interactive mode?,  Next: How much faster is -F or -f than -C?,  Prev: How can I make REJECT cascade across start condition boundaries?,  Up: FAQ
4664
4665Why can't I use fast or full tables with interactive mode?
4666==========================================================
4667
4668One of the assumptions flex makes is that interactive applications are
4669inherently slow (they're waiting on a human after all).  It has to do
4670with how the scanner detects that it must be finished scanning a token.
4671For interactive scanners, after scanning each character the current
4672state is looked up in a table (essentially) to see whether there's a
4673chance of another input character possibly extending the length of the
4674match.  If not, the scanner halts.  For non-interactive scanners, the
4675end-of-token test is much simpler, basically a compare with 0, so no
4676memory bus cycles.  Since the test occurs in the innermost scanning
4677loop, one would like to make it go as fast as possible.
4678
4679   Still, it seems reasonable to allow the user to choose to trade off a
4680bit of performance in this area to gain the corresponding flexibility.
4681There might be another reason, though, why fast scanners don't support
4682the interactive option.
4683
4684
4685File: flex.info,  Node: How much faster is -F or -f than -C?,  Next: If I have a simple grammar cant I just parse it with flex?,  Prev: Why cant I use fast or full tables with interactive mode?,  Up: FAQ
4686
4687How much faster is -F or -f than -C?
4688====================================
4689
4690Much faster (factor of 2-3).
4691
4692
4693File: flex.info,  Node: If I have a simple grammar cant I just parse it with flex?,  Next: Why doesn't yyrestart() set the start state back to INITIAL?,  Prev: How much faster is -F or -f than -C?,  Up: FAQ
4694
4695If I have a simple grammar can't I just parse it with flex?
4696===========================================================
4697
4698Is your grammar recursive?  That's almost always a sign that you're
4699better off using a parser/scanner rather than just trying to use a
4700scanner alone.
4701
4702
4703File: flex.info,  Node: Why doesn't yyrestart() set the start state back to INITIAL?,  Next: How can I match C-style comments?,  Prev: If I have a simple grammar cant I just parse it with flex?,  Up: FAQ
4704
4705Why doesn't yyrestart() set the start state back to INITIAL?
4706============================================================
4707
4708There are two reasons.  The first is that there might be programs that
4709rely on the start state not changing across file changes.  The second is
4710that beginning with 'flex' version 2.4, use of 'yyrestart()' is no
4711longer required, so fixing the problem there doesn't solve the more
4712general problem.
4713
4714
4715File: flex.info,  Node: How can I match C-style comments?,  Next: The period isn't working the way I expected.,  Prev: Why doesn't yyrestart() set the start state back to INITIAL?,  Up: FAQ
4716
4717How can I match C-style comments?
4718=================================
4719
4720You might be tempted to try something like this:
4721
4722     "/*".*"*/"       // WRONG!
4723
4724   or, worse, this:
4725
4726     "/*"(.|\n)"*/"   // WRONG!
4727
4728   The above rules will eat too much input, and blow up on things like:
4729
4730     /* a comment */ do_my_thing( "oops */" );
4731
4732   Here is one way which allows you to track line information:
4733
4734     <INITIAL>{
4735     "/*"              BEGIN(IN_COMMENT);
4736     }
4737     <IN_COMMENT>{
4738     "*/"      BEGIN(INITIAL);
4739     [^*\n]+   // eat comment in chunks
4740     "*"       // eat the lone star
4741     \n        yylineno++;
4742     }
4743
4744
4745File: flex.info,  Node: The period isn't working the way I expected.,  Next: Can I get the flex manual in another format?,  Prev: How can I match C-style comments?,  Up: FAQ
4746
4747The '.' isn't working the way I expected.
4748=========================================
4749
4750Here are some tips for using '.':
4751
4752   * A common mistake is to place the grouping parenthesis AFTER an
4753     operator, when you really meant to place the parenthesis BEFORE the
4754     operator, e.g., you probably want this '(foo|bar)+' and NOT this
4755     '(foo|bar+)'.
4756
4757     The first pattern matches the words 'foo' or 'bar' any number of
4758     times, e.g., it matches the text 'barfoofoobarfoo'.  The second
4759     pattern matches a single instance of 'foo' or a single instance of
4760     'bar' followed by one or more 'r's, e.g., it matches the text
4761     'barrrr' .
4762   * A '.' inside '[]''s just means a literal'.' (period), and NOT "any
4763     character except newline".
4764   * Remember that '.' matches any character EXCEPT '\n' (and 'EOF').
4765     If you really want to match ANY character, including newlines, then
4766     use '(.|\n)' Beware that the regex '(.|\n)+' will match your entire
4767     input!
4768   * Finally, if you want to match a literal '.' (a period), then use
4769     '[.]' or '"."'
4770
4771
4772File: flex.info,  Node: Can I get the flex manual in another format?,  Next: Does there exist a "faster" NDFA->DFA algorithm?,  Prev: The period isn't working the way I expected.,  Up: FAQ
4773
4774Can I get the flex manual in another format?
4775============================================
4776
4777The 'flex' source distribution includes a texinfo manual.  You are free
4778to convert that texinfo into whatever format you desire.  The 'texinfo'
4779package includes tools for conversion to a number of formats.
4780
4781
4782File: flex.info,  Node: Does there exist a "faster" NDFA->DFA algorithm?,  Next: How does flex compile the DFA so quickly?,  Prev: Can I get the flex manual in another format?,  Up: FAQ
4783
4784Does there exist a "faster" NDFA->DFA algorithm?
4785================================================
4786
4787There's no way around the potential exponential running time - it can
4788take you exponential time just to enumerate all of the DFA states.  In
4789practice, though, the running time is closer to linear, or sometimes
4790quadratic.
4791
4792
4793File: flex.info,  Node: How does flex compile the DFA so quickly?,  Next: How can I use more than 8192 rules?,  Prev: Does there exist a "faster" NDFA->DFA algorithm?,  Up: FAQ
4794
4795How does flex compile the DFA so quickly?
4796=========================================
4797
4798There are two big speed wins that 'flex' uses:
4799
4800  1. It analyzes the input rules to construct equivalence classes for
4801     those characters that always make the same transitions.  It then
4802     rewrites the NFA using equivalence classes for transitions instead
4803     of characters.  This cuts down the NFA->DFA computation time
4804     dramatically, to the point where, for uncompressed DFA tables, the
4805     DFA generation is often I/O bound in writing out the tables.
4806  2. It maintains hash values for previously computed DFA states, so
4807     testing whether a newly constructed DFA state is equivalent to a
4808     previously constructed state can be done very quickly, by first
4809     comparing hash values.
4810
4811
4812File: flex.info,  Node: How can I use more than 8192 rules?,  Next: How do I abandon a file in the middle of a scan and switch to a new file?,  Prev: How does flex compile the DFA so quickly?,  Up: FAQ
4813
4814How can I use more than 8192 rules?
4815===================================
4816
4817'Flex' is compiled with an upper limit of 8192 rules per scanner.  If
4818you need more than 8192 rules in your scanner, you'll have to recompile
4819'flex' with the following changes in 'flexdef.h':
4820
4821     <    #define YY_TRAILING_MASK 0x2000
4822     <    #define YY_TRAILING_HEAD_MASK 0x4000
4823     --
4824     >    #define YY_TRAILING_MASK 0x20000000
4825     >    #define YY_TRAILING_HEAD_MASK 0x40000000
4826
4827   This should work okay as long as your C compiler uses 32 bit
4828integers.  But you might want to think about whether using such a huge
4829number of rules is the best way to solve your problem.
4830
4831   The following may also be relevant:
4832
4833   With luck, you should be able to increase the definitions in
4834flexdef.h for:
4835
4836     #define JAMSTATE -32766 /* marks a reference to the state that always jams */
4837     #define MAXIMUM_MNS 31999
4838     #define BAD_SUBSCRIPT -32767
4839
4840   recompile everything, and it'll all work.  Flex only has these
484116-bit-like values built into it because a long time ago it was
4842developed on a machine with 16-bit ints.  I've given this advice to
4843others in the past but haven't heard back from them whether it worked
4844okay or not...
4845
4846
4847File: flex.info,  Node: How do I abandon a file in the middle of a scan and switch to a new file?,  Next: How do I execute code only during initialization (only before the first scan)?,  Prev: How can I use more than 8192 rules?,  Up: FAQ
4848
4849How do I abandon a file in the middle of a scan and switch to a new file?
4850=========================================================================
4851
4852Just call 'yyrestart(newfile)'.  Be sure to reset the start state if you
4853want a "fresh start, since 'yyrestart' does NOT reset the start state
4854back to 'INITIAL'.
4855
4856
4857File: flex.info,  Node: How do I execute code only during initialization (only before the first scan)?,  Next: How do I execute code at termination?,  Prev: How do I abandon a file in the middle of a scan and switch to a new file?,  Up: FAQ
4858
4859How do I execute code only during initialization (only before the first scan)?
4860==============================================================================
4861
4862You can specify an initial action by defining the macro 'YY_USER_INIT'
4863(though note that 'yyout' may not be available at the time this macro is
4864executed).  Or you can add to the beginning of your rules section:
4865
4866     %%
4867         /* Must be indented! */
4868         static int did_init = 0;
4869
4870         if ( ! did_init ){
4871     do_my_init();
4872             did_init = 1;
4873         }
4874
4875
4876File: flex.info,  Node: How do I execute code at termination?,  Next: Where else can I find help?,  Prev: How do I execute code only during initialization (only before the first scan)?,  Up: FAQ
4877
4878How do I execute code at termination?
4879=====================================
4880
4881You can specify an action for the '<<EOF>>' rule.
4882
4883
4884File: flex.info,  Node: Where else can I find help?,  Next: Can I include comments in the "rules" section of the file?,  Prev: How do I execute code at termination?,  Up: FAQ
4885
4886Where else can I find help?
4887===========================
4888
4889You can find the flex homepage on the web at
4890<http://flex.sourceforge.net/>.  See that page for details about flex
4891mailing lists as well.
4892
4893
4894File: flex.info,  Node: Can I include comments in the "rules" section of the file?,  Next: I get an error about undefined yywrap().,  Prev: Where else can I find help?,  Up: FAQ
4895
4896Can I include comments in the "rules" section of the file?
4897==========================================================
4898
4899Yes, just about anywhere you want to.  See the manual for the specific
4900syntax.
4901
4902
4903File: flex.info,  Node: I get an error about undefined yywrap().,  Next: How can I change the matching pattern at run time?,  Prev: Can I include comments in the "rules" section of the file?,  Up: FAQ
4904
4905I get an error about undefined yywrap().
4906========================================
4907
4908You must supply a 'yywrap()' function of your own, or link to 'libfl.a'
4909(which provides one), or use
4910
4911     %option noyywrap
4912
4913   in your source to say you don't want a 'yywrap()' function.
4914
4915
4916File: flex.info,  Node: How can I change the matching pattern at run time?,  Next: How can I expand macros in the input?,  Prev: I get an error about undefined yywrap().,  Up: FAQ
4917
4918How can I change the matching pattern at run time?
4919==================================================
4920
4921You can't, it's compiled into a static table when flex builds the
4922scanner.
4923
4924
4925File: flex.info,  Node: How can I expand macros in the input?,  Next: How can I build a two-pass scanner?,  Prev: How can I change the matching pattern at run time?,  Up: FAQ
4926
4927How can I expand macros in the input?
4928=====================================
4929
4930The best way to approach this problem is at a higher level, e.g., in the
4931parser.
4932
4933   However, you can do this using multiple input buffers.
4934
4935     %%
4936     macro/[a-z]+	{
4937     /* Saw the macro "macro" followed by extra stuff. */
4938     main_buffer = YY_CURRENT_BUFFER;
4939     expansion_buffer = yy_scan_string(expand(yytext));
4940     yy_switch_to_buffer(expansion_buffer);
4941     }
4942
4943     <<EOF>>	{
4944     if ( expansion_buffer )
4945     {
4946     // We were doing an expansion, return to where
4947     // we were.
4948     yy_switch_to_buffer(main_buffer);
4949     yy_delete_buffer(expansion_buffer);
4950     expansion_buffer = 0;
4951     }
4952     else
4953     yyterminate();
4954     }
4955
4956   You probably will want a stack of expansion buffers to allow nested
4957macros.  From the above though hopefully the idea is clear.
4958
4959
4960File: flex.info,  Node: How can I build a two-pass scanner?,  Next: How do I match any string not matched in the preceding rules?,  Prev: How can I expand macros in the input?,  Up: FAQ
4961
4962How can I build a two-pass scanner?
4963===================================
4964
4965One way to do it is to filter the first pass to a temporary file, then
4966process the temporary file on the second pass.  You will probably see a
4967performance hit, due to all the disk I/O.
4968
4969   When you need to look ahead far forward like this, it almost always
4970means that the right solution is to build a parse tree of the entire
4971input, then walk it after the parse in order to generate the output.  In
4972a sense, this is a two-pass approach, once through the text and once
4973through the parse tree, but the performance hit for the latter is
4974usually an order of magnitude smaller, since everything is already
4975classified, in binary format, and residing in memory.
4976
4977
4978File: flex.info,  Node: How do I match any string not matched in the preceding rules?,  Next: I am trying to port code from AT&T lex that uses yysptr and yysbuf.,  Prev: How can I build a two-pass scanner?,  Up: FAQ
4979
4980How do I match any string not matched in the preceding rules?
4981=============================================================
4982
4983One way to assign precedence, is to place the more specific rules first.
4984If two rules would match the same input (same sequence of characters)
4985then the first rule listed in the 'flex' input wins, e.g.,
4986
4987     %%
4988     foo[a-zA-Z_]+    return FOO_ID;
4989     bar[a-zA-Z_]+    return BAR_ID;
4990     [a-zA-Z_]+       return GENERIC_ID;
4991
4992   Note that the rule '[a-zA-Z_]+' must come *after* the others.  It
4993will match the same amount of text as the more specific rules, and in
4994that case the 'flex' scanner will pick the first rule listed in your
4995scanner as the one to match.
4996
4997
4998File: flex.info,  Node: I am trying to port code from AT&T lex that uses yysptr and yysbuf.,  Next: Is there a way to make flex treat NULL like a regular character?,  Prev: How do I match any string not matched in the preceding rules?,  Up: FAQ
4999
5000I am trying to port code from AT&T lex that uses yysptr and yysbuf.
5001===================================================================
5002
5003Those are internal variables pointing into the AT&T scanner's input
5004buffer.  I imagine they're being manipulated in user versions of the
5005'input()' and 'unput()' functions.  If so, what you need to do is
5006analyze those functions to figure out what they're doing, and then
5007replace 'input()' with an appropriate definition of 'YY_INPUT'.  You
5008shouldn't need to (and must not) replace 'flex''s 'unput()' function.
5009
5010
5011File: flex.info,  Node: Is there a way to make flex treat NULL like a regular character?,  Next: Whenever flex can not match the input it says "flex scanner jammed".,  Prev: I am trying to port code from AT&T lex that uses yysptr and yysbuf.,  Up: FAQ
5012
5013Is there a way to make flex treat NULL like a regular character?
5014================================================================
5015
5016Yes, '\0' and '\x00' should both do the trick.  Perhaps you have an
5017ancient version of 'flex'.  The latest release is version 2.6.4.
5018
5019
5020File: flex.info,  Node: Whenever flex can not match the input it says "flex scanner jammed".,  Next: Why doesn't flex have non-greedy operators like perl does?,  Prev: Is there a way to make flex treat NULL like a regular character?,  Up: FAQ
5021
5022Whenever flex can not match the input it says "flex scanner jammed".
5023====================================================================
5024
5025You need to add a rule that matches the otherwise-unmatched text, e.g.,
5026
5027     %option yylineno
5028     %%
5029     [[a bunch of rules here]]
5030
5031     .	printf("bad input character '%s' at line %d\n", yytext, yylineno);
5032
5033   See '%option default' for more information.
5034
5035
5036File: flex.info,  Node: Why doesn't flex have non-greedy operators like perl does?,  Next: Memory leak - 16386 bytes allocated by malloc.,  Prev: Whenever flex can not match the input it says "flex scanner jammed".,  Up: FAQ
5037
5038Why doesn't flex have non-greedy operators like perl does?
5039==========================================================
5040
5041A DFA can do a non-greedy match by stopping the first time it enters an
5042accepting state, instead of consuming input until it determines that no
5043further matching is possible (a "jam" state).  This is actually easier
5044to implement than longest leftmost match (which flex does).
5045
5046   But it's also much less useful than longest leftmost match.  In
5047general, when you find yourself wishing for non-greedy matching, that's
5048usually a sign that you're trying to make the scanner do some parsing.
5049That's generally the wrong approach, since it lacks the power to do a
5050decent job.  Better is to either introduce a separate parser, or to
5051split the scanner into multiple scanners using (exclusive) start
5052conditions.
5053
5054   You might have a separate start state once you've seen the 'BEGIN'.
5055In that state, you might then have a regex that will match 'END' (to
5056kick you out of the state), and perhaps '(.|\n)' to get a single
5057character within the chunk ...
5058
5059   This approach also has much better error-reporting properties.
5060
5061
5062File: flex.info,  Node: Memory leak - 16386 bytes allocated by malloc.,  Next: How do I track the byte offset for lseek()?,  Prev: Why doesn't flex have non-greedy operators like perl does?,  Up: FAQ
5063
5064Memory leak - 16386 bytes allocated by malloc.
5065==============================================
5066
5067UPDATED 2002-07-10: As of 'flex' version 2.5.9, this leak means that you
5068did not call 'yylex_destroy()'.  If you are using an earlier version of
5069'flex', then read on.
5070
5071   The leak is about 16426 bytes.  That is, (8192 * 2 + 2) for the
5072read-buffer, and about 40 for 'struct yy_buffer_state' (depending upon
5073alignment).  The leak is in the non-reentrant C scanner only (NOT in the
5074reentrant scanner, NOT in the C++ scanner).  Since 'flex' doesn't know
5075when you are done, the buffer is never freed.
5076
5077   However, the leak won't multiply since the buffer is reused no matter
5078how many times you call 'yylex()'.
5079
5080   If you want to reclaim the memory when you are completely done
5081scanning, then you might try this:
5082
5083     /* For non-reentrant C scanner only. */
5084     yy_delete_buffer(YY_CURRENT_BUFFER);
5085     yy_init = 1;
5086
5087   Note: 'yy_init' is an "internal variable", and hasn't been tested in
5088this situation.  It is possible that some other globals may need
5089resetting as well.
5090
5091
5092File: flex.info,  Node: How do I track the byte offset for lseek()?,  Next: How do I use my own I/O classes in a C++ scanner?,  Prev: Memory leak - 16386 bytes allocated by malloc.,  Up: FAQ
5093
5094How do I track the byte offset for lseek()?
5095===========================================
5096
5097     >   We thought that it would be possible to have this number through the
5098     >   evaluation of the following expression:
5099     >
5100     >   seek_position = (no_buffers)*YY_READ_BUF_SIZE + yy_c_buf_p - YY_CURRENT_BUFFER->yy_ch_buf
5101
5102   While this is the right idea, it has two problems.  The first is that
5103it's possible that 'flex' will request less than 'YY_READ_BUF_SIZE'
5104during an invocation of 'YY_INPUT' (or that your input source will
5105return less even though 'YY_READ_BUF_SIZE' bytes were requested).  The
5106second problem is that when refilling its internal buffer, 'flex' keeps
5107some characters from the previous buffer (because usually it's in the
5108middle of a match, and needs those characters to construct 'yytext' for
5109the match once it's done).  Because of this, 'yy_c_buf_p -
5110YY_CURRENT_BUFFER->yy_ch_buf' won't be exactly the number of characters
5111already read from the current buffer.
5112
5113   An alternative solution is to count the number of characters you've
5114matched since starting to scan.  This can be done by using
5115'YY_USER_ACTION'.  For example,
5116
5117     #define YY_USER_ACTION num_chars += yyleng;
5118
5119   (You need to be careful to update your bookkeeping if you use
5120'yymore('), 'yyless()', 'unput()', or 'input()'.)
5121
5122
5123File: flex.info,  Node: How do I use my own I/O classes in a C++ scanner?,  Next: How do I skip as many chars as possible?,  Prev: How do I track the byte offset for lseek()?,  Up: FAQ
5124
5125How do I use my own I/O classes in a C++ scanner?
5126=================================================
5127
5128When the flex C++ scanning class rewrite finally happens, then this sort
5129of thing should become much easier.
5130
5131   You can do this by passing the various functions (such as
5132'LexerInput()' and 'LexerOutput()') NULL 'iostream*''s, and then dealing
5133with your own I/O classes surreptitiously (i.e., stashing them in
5134special member variables).  This works because the only assumption about
5135the lexer regarding what's done with the iostream's is that they're
5136ultimately passed to 'LexerInput()' and 'LexerOutput', which then do
5137whatever is necessary with them.
5138
5139
5140File: flex.info,  Node: How do I skip as many chars as possible?,  Next: deleteme00,  Prev: How do I use my own I/O classes in a C++ scanner?,  Up: FAQ
5141
5142How do I skip as many chars as possible?
5143========================================
5144
5145How do I skip as many chars as possible - without interfering with the
5146other patterns?
5147
5148   In the example below, we want to skip over characters until we see
5149the phrase "endskip".  The following will _NOT_ work correctly (do you
5150see why not?)
5151
5152     /* INCORRECT SCANNER */
5153     %x SKIP
5154     %%
5155     <INITIAL>startskip   BEGIN(SKIP);
5156     ...
5157     <SKIP>"endskip"       BEGIN(INITIAL);
5158     <SKIP>.*             ;
5159
5160   The problem is that the pattern .* will eat up the word "endskip."
5161The simplest (but slow) fix is:
5162
5163     <SKIP>"endskip"      BEGIN(INITIAL);
5164     <SKIP>.              ;
5165
5166   The fix involves making the second rule match more, without making it
5167match "endskip" plus something else.  So for example:
5168
5169     <SKIP>"endskip"     BEGIN(INITIAL);
5170     <SKIP>[^e]+         ;
5171     <SKIP>.		        ;/* so you eat up e's, too */
5172
5173
5174File: flex.info,  Node: deleteme00,  Next: Are certain equivalent patterns faster than others?,  Prev: How do I skip as many chars as possible?,  Up: FAQ
5175
5176deleteme00
5177==========
5178
5179     QUESTION:
5180     When was flex born?
5181
5182     Vern Paxson took over
5183     the Software Tools lex project from Jef Poskanzer in 1982.  At that point it
5184     was written in Ratfor.  Around 1987 or so, Paxson translated it into C, and
5185     a legend was born :-).
5186
5187
5188File: flex.info,  Node: Are certain equivalent patterns faster than others?,  Next: Is backing up a big deal?,  Prev: deleteme00,  Up: FAQ
5189
5190Are certain equivalent patterns faster than others?
5191===================================================
5192
5193     To: Adoram Rogel <adoram@orna.hybridge.com>
5194     Subject: Re: Flex 2.5.2 performance questions
5195     In-reply-to: Your message of Wed, 18 Sep 96 11:12:17 EDT.
5196     Date: Wed, 18 Sep 96 10:51:02 PDT
5197     From: Vern Paxson <vern>
5198
5199     [Note, the most recent flex release is 2.5.4, which you can get from
5200     ftp.ee.lbl.gov.  It has bug fixes over 2.5.2 and 2.5.3.]
5201
5202     > 1. Using the pattern
5203     >    ([Ff](oot)?)?[Nn](ote)?(\.)?
5204     >    instead of
5205     >    (((F|f)oot(N|n)ote)|((N|n)ote)|((N|n)\.)|((F|f)(N|n)(\.)))
5206     >    (in a very complicated flex program) caused the program to slow from
5207     >    300K+/min to 100K/min (no other changes were done).
5208
5209     These two are not equivalent.  For example, the first can match "footnote."
5210     but the second can only match "footnote".  This is almost certainly the
5211     cause in the discrepancy - the slower scanner run is matching more tokens,
5212     and/or having to do more backing up.
5213
5214     > 2. Which of these two are better: [Ff]oot or (F|f)oot ?
5215
5216     From a performance point of view, they're equivalent (modulo presumably
5217     minor effects such as memory cache hit rates; and the presence of trailing
5218     context, see below).  From a space point of view, the first is slightly
5219     preferable.
5220
5221     > 3. I have a pattern that look like this:
5222     >    pats {p1}|{p2}|{p3}|...|{p50}     (50 patterns ORd)
5223     >
5224     >    running yet another complicated program that includes the following rule:
5225     >    <snext>{and}/{no4}{bb}{pats}
5226     >
5227     >    gets me to "too complicated - over 32,000 states"...
5228
5229     I can't tell from this example whether the trailing context is variable-length
5230     or fixed-length (it could be the latter if {and} is fixed-length).  If it's
5231     variable length, which flex -p will tell you, then this reflects a basic
5232     performance problem, and if you can eliminate it by restructuring your
5233     scanner, you will see significant improvement.
5234
5235     >    so I divided {pats} to {pats1}, {pats2},..., {pats5} each consists of about
5236     >    10 patterns and changed the rule to be 5 rules.
5237     >    This did compile, but what is the rule of thumb here ?
5238
5239     The rule is to avoid trailing context other than fixed-length, in which for
5240     a/b, either the 'a' pattern or the 'b' pattern have a fixed length.  Use
5241     of the '|' operator automatically makes the pattern variable length, so in
5242     this case '[Ff]oot' is preferred to '(F|f)oot'.
5243
5244     > 4. I changed a rule that looked like this:
5245     >    <snext8>{and}{bb}/{ROMAN}[^A-Za-z] { BEGIN...
5246     >
5247     >    to the next 2 rules:
5248     >    <snext8>{and}{bb}/{ROMAN}[A-Za-z] { ECHO;}
5249     >    <snext8>{and}{bb}/{ROMAN}         { BEGIN...
5250     >
5251     >    Again, I understand the using [^...] will cause a great performance loss
5252
5253     Actually, it doesn't cause any sort of performance loss.  It's a surprising
5254     fact about regular expressions that they always match in linear time
5255     regardless of how complex they are.
5256
5257     >    but are there any specific rules about it ?
5258
5259     See the "Performance Considerations" section of the man page, and also
5260     the example in MISC/fastwc/.
5261
5262     		Vern
5263
5264
5265File: flex.info,  Node: Is backing up a big deal?,  Next: Can I fake multi-byte character support?,  Prev: Are certain equivalent patterns faster than others?,  Up: FAQ
5266
5267Is backing up a big deal?
5268=========================
5269
5270     To: Adoram Rogel <adoram@hybridge.com>
5271     Subject: Re: Flex 2.5.2 performance questions
5272     In-reply-to: Your message of Thu, 19 Sep 96 10:16:04 EDT.
5273     Date: Thu, 19 Sep 96 09:58:00 PDT
5274     From: Vern Paxson <vern>
5275
5276     > a lot about the backing up problem.
5277     > I believe that there lies my biggest problem, and I'll try to improve
5278     > it.
5279
5280     Since you have variable trailing context, this is a bigger performance
5281     problem.  Fixing it is usually easier than fixing backing up, which in a
5282     complicated scanner (yours seems to fit the bill) can be extremely
5283     difficult to do correctly.
5284
5285     You also don't mention what flags you are using for your scanner.
5286     -f makes a large speed difference, and -Cfe buys you nearly as much
5287     speed but the resulting scanner is considerably smaller.
5288
5289     > I have an | operator in {and} and in {pats} so both of them are variable
5290     > length.
5291
5292     -p should have reported this.
5293
5294     > Is changing one of them to fixed-length is enough ?
5295
5296     Yes.
5297
5298     > Is it possible to change the 32,000 states limit ?
5299
5300     Yes.  I've appended instructions on how.  Before you make this change,
5301     though, you should think about whether there are ways to fundamentally
5302     simplify your scanner - those are certainly preferable!
5303
5304     		Vern
5305
5306     To increase the 32K limit (on a machine with 32 bit integers), you increase
5307     the magnitude of the following in flexdef.h:
5308
5309     #define JAMSTATE -32766 /* marks a reference to the state that always jams */
5310     #define MAXIMUM_MNS 31999
5311     #define BAD_SUBSCRIPT -32767
5312     #define MAX_SHORT 32700
5313
5314     Adding a 0 or two after each should do the trick.
5315
5316
5317File: flex.info,  Node: Can I fake multi-byte character support?,  Next: deleteme01,  Prev: Is backing up a big deal?,  Up: FAQ
5318
5319Can I fake multi-byte character support?
5320========================================
5321
5322     To: Heeman_Lee@hp.com
5323     Subject: Re: flex - multi-byte support?
5324     In-reply-to: Your message of Thu, 03 Oct 1996 17:24:04 PDT.
5325     Date: Fri, 04 Oct 1996 11:42:18 PDT
5326     From: Vern Paxson <vern>
5327
5328     >      I assume as long as my *.l file defines the
5329     >      range of expected character code values (in octal format), flex will
5330     >      scan the file and read multi-byte characters correctly. But I have no
5331     >      confidence in this assumption.
5332
5333     Your lack of confidence is justified - this won't work.
5334
5335     Flex has in it a widespread assumption that the input is processed
5336     one byte at a time.  Fixing this is on the to-do list, but is involved,
5337     so it won't happen any time soon.  In the interim, the best I can suggest
5338     (unless you want to try fixing it yourself) is to write your rules in
5339     terms of pairs of bytes, using definitions in the first section:
5340
5341     	X	\xfe\xc2
5342     	...
5343     	%%
5344     	foo{X}bar	found_foo_fe_c2_bar();
5345
5346     etc.  Definitely a pain - sorry about that.
5347
5348     By the way, the email address you used for me is ancient, indicating you
5349     have a very old version of flex.  You can get the most recent, 2.5.4, from
5350     ftp.ee.lbl.gov.
5351
5352     		Vern
5353
5354
5355File: flex.info,  Node: deleteme01,  Next: Can you discuss some flex internals?,  Prev: Can I fake multi-byte character support?,  Up: FAQ
5356
5357deleteme01
5358==========
5359
5360     To: moleary@primus.com
5361     Subject: Re: Flex / Unicode compatibility question
5362     In-reply-to: Your message of Tue, 22 Oct 1996 10:15:42 PDT.
5363     Date: Tue, 22 Oct 1996 11:06:13 PDT
5364     From: Vern Paxson <vern>
5365
5366     Unfortunately flex at the moment has a widespread assumption within it
5367     that characters are processed 8 bits at a time.  I don't see any easy
5368     fix for this (other than writing your rules in terms of double characters -
5369     a pain).  I also don't know of a wider lex, though you might try surfing
5370     the Plan 9 stuff because I know it's a Unicode system, and also the PCCT
5371     toolkit (try searching say Alta Vista for "Purdue Compiler Construction
5372     Toolkit").
5373
5374     Fixing flex to handle wider characters is on the long-term to-do list.
5375     But since flex is a strictly spare-time project these days, this probably
5376     won't happen for quite a while, unless someone else does it first.
5377
5378     		Vern
5379
5380
5381File: flex.info,  Node: Can you discuss some flex internals?,  Next: unput() messes up yy_at_bol,  Prev: deleteme01,  Up: FAQ
5382
5383Can you discuss some flex internals?
5384====================================
5385
5386     To: Johan Linde <jl@theophys.kth.se>
5387     Subject: Re: translation of flex
5388     In-reply-to: Your message of Sun, 10 Nov 1996 09:16:36 PST.
5389     Date: Mon, 11 Nov 1996 10:33:50 PST
5390     From: Vern Paxson <vern>
5391
5392     > I'm working for the Swedish team translating GNU program, and I'm currently
5393     > working with flex. I have a few questions about some of the messages which
5394     > I hope you can answer.
5395
5396     All of the things you're wondering about, by the way, concerning flex
5397     internals - probably the only person who understands what they mean in
5398     English is me!  So I wouldn't worry too much about getting them right.
5399     That said ...
5400
5401     > #: main.c:545
5402     > msgid "  %d protos created\n"
5403     >
5404     > Does proto mean prototype?
5405
5406     Yes - prototypes of state compression tables.
5407
5408     > #: main.c:539
5409     > msgid "  %d/%d (peak %d) template nxt-chk entries created\n"
5410     >
5411     > Here I'm mainly puzzled by 'nxt-chk'. I guess it means 'next-check'. (?)
5412     > However, 'template next-check entries' doesn't make much sense to me. To be
5413     > able to find a good translation I need to know a little bit more about it.
5414
5415     There is a scheme in the Aho/Sethi/Ullman compiler book for compressing
5416     scanner tables.  It involves creating two pairs of tables.  The first has
5417     "base" and "default" entries, the second has "next" and "check" entries.
5418     The "base" entry is indexed by the current state and yields an index into
5419     the next/check table.  The "default" entry gives what to do if the state
5420     transition isn't found in next/check.  The "next" entry gives the next
5421     state to enter, but only if the "check" entry verifies that this entry is
5422     correct for the current state.  Flex creates templates of series of
5423     next/check entries and then encodes differences from these templates as a
5424     way to compress the tables.
5425
5426     > #: main.c:533
5427     > msgid "  %d/%d base-def entries created\n"
5428     >
5429     > The same problem here for 'base-def'.
5430
5431     See above.
5432
5433     		Vern
5434
5435
5436File: flex.info,  Node: unput() messes up yy_at_bol,  Next: The | operator is not doing what I want,  Prev: Can you discuss some flex internals?,  Up: FAQ
5437
5438unput() messes up yy_at_bol
5439===========================
5440
5441     To: Xinying Li <xli@npac.syr.edu>
5442     Subject: Re: FLEX ?
5443     In-reply-to: Your message of Wed, 13 Nov 1996 17:28:38 PST.
5444     Date: Wed, 13 Nov 1996 19:51:54 PST
5445     From: Vern Paxson <vern>
5446
5447     > "unput()" them to input flow, question occurs. If I do this after I scan
5448     > a carriage, the variable "YY_CURRENT_BUFFER->yy_at_bol" is changed. That
5449     > means the carriage flag has gone.
5450
5451     You can control this by calling yy_set_bol().  It's described in the manual.
5452
5453     >      And if in pre-reading it goes to the end of file, is anything done
5454     > to control the end of curren buffer and end of file?
5455
5456     No, there's no way to put back an end-of-file.
5457
5458     >      By the way I am using flex 2.5.2 and using the "-l".
5459
5460     The latest release is 2.5.4, by the way.  It fixes some bugs in 2.5.2 and
5461     2.5.3.  You can get it from ftp.ee.lbl.gov.
5462
5463     		Vern
5464
5465
5466File: flex.info,  Node: The | operator is not doing what I want,  Next: Why can't flex understand this variable trailing context pattern?,  Prev: unput() messes up yy_at_bol,  Up: FAQ
5467
5468The | operator is not doing what I want
5469=======================================
5470
5471     To: Alain.ISSARD@st.com
5472     Subject: Re: Start condition with FLEX
5473     In-reply-to: Your message of Mon, 18 Nov 1996 09:45:02 PST.
5474     Date: Mon, 18 Nov 1996 10:41:34 PST
5475     From: Vern Paxson <vern>
5476
5477     > I am not able to use the start condition scope and to use the | (OR) with
5478     > rules having start conditions.
5479
5480     The problem is that if you use '|' as a regular expression operator, for
5481     example "a|b" meaning "match either 'a' or 'b'", then it must *not* have
5482     any blanks around it.  If you instead want the special '|' *action* (which
5483     from your scanner appears to be the case), which is a way of giving two
5484     different rules the same action:
5485
5486     	foo	|
5487     	bar	matched_foo_or_bar();
5488
5489     then '|' *must* be separated from the first rule by whitespace and *must*
5490     be followed by a new line.  You *cannot* write it as:
5491
5492     	foo | bar	matched_foo_or_bar();
5493
5494     even though you might think you could because yacc supports this syntax.
5495     The reason for this unfortunately incompatibility is historical, but it's
5496     unlikely to be changed.
5497
5498     Your problems with start condition scope are simply due to syntax errors
5499     from your use of '|' later confusing flex.
5500
5501     Let me know if you still have problems.
5502
5503     		Vern
5504
5505
5506File: flex.info,  Node: Why can't flex understand this variable trailing context pattern?,  Next: The ^ operator isn't working,  Prev: The | operator is not doing what I want,  Up: FAQ
5507
5508Why can't flex understand this variable trailing context pattern?
5509=================================================================
5510
5511     To: Gregory Margo <gmargo@newton.vip.best.com>
5512     Subject: Re: flex-2.5.3 bug report
5513     In-reply-to: Your message of Sat, 23 Nov 1996 16:50:09 PST.
5514     Date: Sat, 23 Nov 1996 17:07:32 PST
5515     From: Vern Paxson <vern>
5516
5517     > Enclosed is a lex file that "real" lex will process, but I cannot get
5518     > flex to process it.  Could you try it and maybe point me in the right direction?
5519
5520     Your problem is that some of the definitions in the scanner use the '/'
5521     trailing context operator, and have it enclosed in ()'s.  Flex does not
5522     allow this operator to be enclosed in ()'s because doing so allows undefined
5523     regular expressions such as "(a/b)+".  So the solution is to remove the
5524     parentheses.  Note that you must also be building the scanner with the -l
5525     option for AT&T lex compatibility.  Without this option, flex automatically
5526     encloses the definitions in parentheses.
5527
5528     		Vern
5529
5530
5531File: flex.info,  Node: The ^ operator isn't working,  Next: Trailing context is getting confused with trailing optional patterns,  Prev: Why can't flex understand this variable trailing context pattern?,  Up: FAQ
5532
5533The ^ operator isn't working
5534============================
5535
5536     To: Thomas Hadig <hadig@toots.physik.rwth-aachen.de>
5537     Subject: Re: Flex Bug ?
5538     In-reply-to: Your message of Tue, 26 Nov 1996 14:35:01 PST.
5539     Date: Tue, 26 Nov 1996 11:15:05 PST
5540     From: Vern Paxson <vern>
5541
5542     > In my lexer code, i have the line :
5543     > ^\*.*          { }
5544     >
5545     > Thus all lines starting with an astrix (*) are comment lines.
5546     > This does not work !
5547
5548     I can't get this problem to reproduce - it works fine for me.  Note
5549     though that if what you have is slightly different:
5550
5551     	COMMENT	^\*.*
5552     	%%
5553     	{COMMENT}	{ }
5554
5555     then it won't work, because flex pushes back macro definitions enclosed
5556     in ()'s, so the rule becomes
5557
5558     	(^\*.*)		{ }
5559
5560     and now that the '^' operator is not at the immediate beginning of the
5561     line, it's interpreted as just a regular character.  You can avoid this
5562     behavior by using the "-l" lex-compatibility flag, or "%option lex-compat".
5563
5564     		Vern
5565
5566
5567File: flex.info,  Node: Trailing context is getting confused with trailing optional patterns,  Next: Is flex GNU or not?,  Prev: The ^ operator isn't working,  Up: FAQ
5568
5569Trailing context is getting confused with trailing optional patterns
5570====================================================================
5571
5572     To: Adoram Rogel <adoram@hybridge.com>
5573     Subject: Re: Flex 2.5.4 BOF ???
5574     In-reply-to: Your message of Tue, 26 Nov 1996 16:10:41 PST.
5575     Date: Wed, 27 Nov 1996 10:56:25 PST
5576     From: Vern Paxson <vern>
5577
5578     >     Organization(s)?/[a-z]
5579     >
5580     > This matched "Organizations" (looking in debug mode, the trailing s
5581     > was matched with trailing context instead of the optional (s) in the
5582     > end of the word.
5583
5584     That should only happen with lex.  Flex can properly match this pattern.
5585     (That might be what you're saying, I'm just not sure.)
5586
5587     > Is there a way to avoid this dangerous trailing context problem ?
5588
5589     Unfortunately, there's no easy way.  On the other hand, I don't see why
5590     it should be a problem.  Lex's matching is clearly wrong, and I'd hope
5591     that usually the intent remains the same as expressed with the pattern,
5592     so flex's matching will be correct.
5593
5594     		Vern
5595
5596
5597File: flex.info,  Node: Is flex GNU or not?,  Next: ERASEME53,  Prev: Trailing context is getting confused with trailing optional patterns,  Up: FAQ
5598
5599Is flex GNU or not?
5600===================
5601
5602     To: Cameron MacKinnon <mackin@interlog.com>
5603     Subject: Re: Flex documentation bug
5604     In-reply-to: Your message of Mon, 02 Dec 1996 00:07:08 PST.
5605     Date: Sun, 01 Dec 1996 22:29:39 PST
5606     From: Vern Paxson <vern>
5607
5608     > I'm not sure how or where to submit bug reports (documentation or
5609     > otherwise) for the GNU project stuff ...
5610
5611     Well, strictly speaking flex isn't part of the GNU project.  They just
5612     distribute it because no one's written a decent GPL'd lex replacement.
5613     So you should send bugs directly to me.  Those sent to the GNU folks
5614     sometimes find there way to me, but some may drop between the cracks.
5615
5616     > In GNU Info, under the section 'Start Conditions', and also in the man
5617     > page (mine's dated April '95) is a nice little snippet showing how to
5618     > parse C quoted strings into a buffer, defined to be MAX_STR_CONST in
5619     > size. Unfortunately, no overflow checking is ever done ...
5620
5621     This is already mentioned in the manual:
5622
5623     Finally, here's an example of how to  match  C-style  quoted
5624     strings using exclusive start conditions, including expanded
5625     escape sequences (but not including checking  for  a  string
5626     that's too long):
5627
5628     The reason for not doing the overflow checking is that it will needlessly
5629     clutter up an example whose main purpose is just to demonstrate how to
5630     use flex.
5631
5632     The latest release is 2.5.4, by the way, available from ftp.ee.lbl.gov.
5633
5634     		Vern
5635
5636
5637File: flex.info,  Node: ERASEME53,  Next: I need to scan if-then-else blocks and while loops,  Prev: Is flex GNU or not?,  Up: FAQ
5638
5639ERASEME53
5640=========
5641
5642     To: tsv@cs.UManitoba.CA
5643     Subject: Re: Flex (reg)..
5644     In-reply-to: Your message of Thu, 06 Mar 1997 23:50:16 PST.
5645     Date: Thu, 06 Mar 1997 15:54:19 PST
5646     From: Vern Paxson <vern>
5647
5648     > [:alpha:] ([:alnum:] | \\_)*
5649
5650     If your rule really has embedded blanks as shown above, then it won't
5651     work, as the first blank delimits the rule from the action.  (It wouldn't
5652     even compile ...)  You need instead:
5653
5654     [:alpha:]([:alnum:]|\\_)*
5655
5656     and that should work fine - there's no restriction on what can go inside
5657     of ()'s except for the trailing context operator, '/'.
5658
5659     		Vern
5660
5661
5662File: flex.info,  Node: I need to scan if-then-else blocks and while loops,  Next: ERASEME55,  Prev: ERASEME53,  Up: FAQ
5663
5664I need to scan if-then-else blocks and while loops
5665==================================================
5666
5667     To: "Mike Stolnicki" <mstolnic@ford.com>
5668     Subject: Re: FLEX help
5669     In-reply-to: Your message of Fri, 30 May 1997 13:33:27 PDT.
5670     Date: Fri, 30 May 1997 10:46:35 PDT
5671     From: Vern Paxson <vern>
5672
5673     > We'd like to add "if-then-else", "while", and "for" statements to our
5674     > language ...
5675     > We've investigated many possible solutions.  The one solution that seems
5676     > the most reasonable involves knowing the position of a TOKEN in yyin.
5677
5678     I strongly advise you to instead build a parse tree (abstract syntax tree)
5679     and loop over that instead.  You'll find this has major benefits in keeping
5680     your interpreter simple and extensible.
5681
5682     That said, the functionality you mention for get_position and set_position
5683     have been on the to-do list for a while.  As flex is a purely spare-time
5684     project for me, no guarantees when this will be added (in particular, it
5685     for sure won't be for many months to come).
5686
5687     		Vern
5688
5689
5690File: flex.info,  Node: ERASEME55,  Next: ERASEME56,  Prev: I need to scan if-then-else blocks and while loops,  Up: FAQ
5691
5692ERASEME55
5693=========
5694
5695     To: Colin Paul Adams <colin@colina.demon.co.uk>
5696     Subject: Re: Flex C++ classes and Bison
5697     In-reply-to: Your message of 09 Aug 1997 17:11:41 PDT.
5698     Date: Fri, 15 Aug 1997 10:48:19 PDT
5699     From: Vern Paxson <vern>
5700
5701     > #define YY_DECL   int yylex (YYSTYPE *lvalp, struct parser_control
5702     > *parm)
5703     >
5704     > I have been trying  to get this to work as a C++ scanner, but it does
5705     > not appear to be possible (warning that it matches no declarations in
5706     > yyFlexLexer, or something like that).
5707     >
5708     > Is this supposed to be possible, or is it being worked on (I DID
5709     > notice the comment that scanner classes are still experimental, so I'm
5710     > not too hopeful)?
5711
5712     What you need to do is derive a subclass from yyFlexLexer that provides
5713     the above yylex() method, squirrels away lvalp and parm into member
5714     variables, and then invokes yyFlexLexer::yylex() to do the regular scanning.
5715
5716     		Vern
5717
5718
5719File: flex.info,  Node: ERASEME56,  Next: ERASEME57,  Prev: ERASEME55,  Up: FAQ
5720
5721ERASEME56
5722=========
5723
5724     To: Mikael.Latvala@lmf.ericsson.se
5725     Subject: Re: Possible mistake in Flex v2.5 document
5726     In-reply-to: Your message of Fri, 05 Sep 1997 16:07:24 PDT.
5727     Date: Fri, 05 Sep 1997 10:01:54 PDT
5728     From: Vern Paxson <vern>
5729
5730     > In that example you show how to count comment lines when using
5731     > C style /* ... */ comments. My question is, shouldn't you take into
5732     > account a scenario where end of a comment marker occurs inside
5733     > character or string literals?
5734
5735     The scanner certainly needs to also scan character and string literals.
5736     However it does that (there's an example in the man page for strings), the
5737     lexer will recognize the beginning of the literal before it runs across the
5738     embedded "/*".  Consequently, it will finish scanning the literal before it
5739     even considers the possibility of matching "/*".
5740
5741     Example:
5742
5743     	'([^']*|{ESCAPE_SEQUENCE})'
5744
5745     will match all the text between the ''s (inclusive).  So the lexer
5746     considers this as a token beginning at the first ', and doesn't even
5747     attempt to match other tokens inside it.
5748
5749     I thinnk this subtlety is not worth putting in the manual, as I suspect
5750     it would confuse more people than it would enlighten.
5751
5752     		Vern
5753
5754
5755File: flex.info,  Node: ERASEME57,  Next: Is there a repository for flex scanners?,  Prev: ERASEME56,  Up: FAQ
5756
5757ERASEME57
5758=========
5759
5760     To: "Marty Leisner" <leisner@sdsp.mc.xerox.com>
5761     Subject: Re: flex limitations
5762     In-reply-to: Your message of Sat, 06 Sep 1997 11:27:21 PDT.
5763     Date: Mon, 08 Sep 1997 11:38:08 PDT
5764     From: Vern Paxson <vern>
5765
5766     > %%
5767     > [a-zA-Z]+       /* skip a line */
5768     >                 {  printf("got %s\n", yytext); }
5769     > %%
5770
5771     What version of flex are you using?  If I feed this to 2.5.4, it complains:
5772
5773     	"bug.l", line 5: EOF encountered inside an action
5774     	"bug.l", line 5: unrecognized rule
5775     	"bug.l", line 5: fatal parse error
5776
5777     Not the world's greatest error message, but it manages to flag the problem.
5778
5779     (With the introduction of start condition scopes, flex can't accommodate
5780     an action on a separate line, since it's ambiguous with an indented rule.)
5781
5782     You can get 2.5.4 from ftp.ee.lbl.gov.
5783
5784     		Vern
5785
5786
5787File: flex.info,  Node: Is there a repository for flex scanners?,  Next: How can I conditionally compile or preprocess my flex input file?,  Prev: ERASEME57,  Up: FAQ
5788
5789Is there a repository for flex scanners?
5790========================================
5791
5792Not that we know of.  You might try asking on comp.compilers.
5793
5794
5795File: flex.info,  Node: How can I conditionally compile or preprocess my flex input file?,  Next: Where can I find grammars for lex and yacc?,  Prev: Is there a repository for flex scanners?,  Up: FAQ
5796
5797How can I conditionally compile or preprocess my flex input file?
5798=================================================================
5799
5800Flex doesn't have a preprocessor like C does.  You might try using m4,
5801or the C preprocessor plus a sed script to clean up the result.
5802
5803
5804File: flex.info,  Node: Where can I find grammars for lex and yacc?,  Next: I get an end-of-buffer message for each character scanned.,  Prev: How can I conditionally compile or preprocess my flex input file?,  Up: FAQ
5805
5806Where can I find grammars for lex and yacc?
5807===========================================
5808
5809In the sources for flex and bison.
5810
5811
5812File: flex.info,  Node: I get an end-of-buffer message for each character scanned.,  Next: unnamed-faq-62,  Prev: Where can I find grammars for lex and yacc?,  Up: FAQ
5813
5814I get an end-of-buffer message for each character scanned.
5815==========================================================
5816
5817This will happen if your LexerInput() function returns only one
5818character at a time, which can happen either if you're scanner is
5819"interactive", or if the streams library on your platform always returns
58201 for yyin->gcount().
5821
5822   Solution: override LexerInput() with a version that returns whole
5823buffers.
5824
5825
5826File: flex.info,  Node: unnamed-faq-62,  Next: unnamed-faq-63,  Prev: I get an end-of-buffer message for each character scanned.,  Up: FAQ
5827
5828unnamed-faq-62
5829==============
5830
5831     To: Georg.Rehm@CL-KI.Uni-Osnabrueck.DE
5832     Subject: Re: Flex maximums
5833     In-reply-to: Your message of Mon, 17 Nov 1997 17:16:06 PST.
5834     Date: Mon, 17 Nov 1997 17:16:15 PST
5835     From: Vern Paxson <vern>
5836
5837     > I took a quick look into the flex-sources and altered some #defines in
5838     > flexdefs.h:
5839     >
5840     > 	#define INITIAL_MNS 64000
5841     > 	#define MNS_INCREMENT 1024000
5842     > 	#define MAXIMUM_MNS 64000
5843
5844     The things to fix are to add a couple of zeroes to:
5845
5846     #define JAMSTATE -32766 /* marks a reference to the state that always jams */
5847     #define MAXIMUM_MNS 31999
5848     #define BAD_SUBSCRIPT -32767
5849     #define MAX_SHORT 32700
5850
5851     and, if you get complaints about too many rules, make the following change too:
5852
5853     	#define YY_TRAILING_MASK 0x200000
5854     	#define YY_TRAILING_HEAD_MASK 0x400000
5855
5856     - Vern
5857
5858
5859File: flex.info,  Node: unnamed-faq-63,  Next: unnamed-faq-64,  Prev: unnamed-faq-62,  Up: FAQ
5860
5861unnamed-faq-63
5862==============
5863
5864     To: jimmey@lexis-nexis.com (Jimmey Todd)
5865     Subject: Re: FLEX question regarding istream vs ifstream
5866     In-reply-to: Your message of Mon, 08 Dec 1997 15:54:15 PST.
5867     Date: Mon, 15 Dec 1997 13:21:35 PST
5868     From: Vern Paxson <vern>
5869
5870     >         stdin_handle = YY_CURRENT_BUFFER;
5871     >         ifstream fin( "aFile" );
5872     >         yy_switch_to_buffer( yy_create_buffer( fin, YY_BUF_SIZE ) );
5873     >
5874     > What I'm wanting to do, is pass the contents of a file thru one set
5875     > of rules and then pass stdin thru another set... It works great if, I
5876     > don't use the C++ classes. But since everything else that I'm doing is
5877     > in C++, I thought I'd be consistent.
5878     >
5879     > The problem is that 'yy_create_buffer' is expecting an istream* as it's
5880     > first argument (as stated in the man page). However, fin is a ifstream
5881     > object. Any ideas on what I might be doing wrong? Any help would be
5882     > appreciated. Thanks!!
5883
5884     You need to pass &fin, to turn it into an ifstream* instead of an ifstream.
5885     Then its type will be compatible with the expected istream*, because ifstream
5886     is derived from istream.
5887
5888     		Vern
5889
5890
5891File: flex.info,  Node: unnamed-faq-64,  Next: unnamed-faq-65,  Prev: unnamed-faq-63,  Up: FAQ
5892
5893unnamed-faq-64
5894==============
5895
5896     To: Enda Fadian <fadiane@piercom.ie>
5897     Subject: Re: Question related to Flex man page?
5898     In-reply-to: Your message of Tue, 16 Dec 1997 15:17:34 PST.
5899     Date: Tue, 16 Dec 1997 14:17:09 PST
5900     From: Vern Paxson <vern>
5901
5902     > Can you explain to me what is ment by a long-jump in relation to flex?
5903
5904     Using the longjmp() function while inside yylex() or a routine called by it.
5905
5906     > what is the flex activation frame.
5907
5908     Just yylex()'s stack frame.
5909
5910     > As far as I can see yyrestart will bring me back to the sart of the input
5911     > file and using flex++ isnot really an option!
5912
5913     No, yyrestart() doesn't imply a rewind, even though its name might sound
5914     like it does.  It tells the scanner to flush its internal buffers and
5915     start reading from the given file at its present location.
5916
5917     		Vern
5918
5919
5920File: flex.info,  Node: unnamed-faq-65,  Next: unnamed-faq-66,  Prev: unnamed-faq-64,  Up: FAQ
5921
5922unnamed-faq-65
5923==============
5924
5925     To: hassan@larc.info.uqam.ca (Hassan Alaoui)
5926     Subject: Re: Need urgent Help
5927     In-reply-to: Your message of Sat, 20 Dec 1997 19:38:19 PST.
5928     Date: Sun, 21 Dec 1997 21:30:46 PST
5929     From: Vern Paxson <vern>
5930
5931     > /usr/lib/yaccpar: In function `int yyparse()':
5932     > /usr/lib/yaccpar:184: warning: implicit declaration of function `int yylex(...)'
5933     >
5934     > ld: Undefined symbol
5935     >    _yylex
5936     >    _yyparse
5937     >    _yyin
5938
5939     This is a known problem with Solaris C++ (and/or Solaris yacc).  I believe
5940     the fix is to explicitly insert some 'extern "C"' statements for the
5941     corresponding routines/symbols.
5942
5943     		Vern
5944
5945
5946File: flex.info,  Node: unnamed-faq-66,  Next: unnamed-faq-67,  Prev: unnamed-faq-65,  Up: FAQ
5947
5948unnamed-faq-66
5949==============
5950
5951     To: mc0307@mclink.it
5952     Cc: gnu@prep.ai.mit.edu
5953     Subject: Re: [mc0307@mclink.it: Help request]
5954     In-reply-to: Your message of Fri, 12 Dec 1997 17:57:29 PST.
5955     Date: Sun, 21 Dec 1997 22:33:37 PST
5956     From: Vern Paxson <vern>
5957
5958     > This is my definition for float and integer types:
5959     > . . .
5960     > NZD          [1-9]
5961     > ...
5962     > I've tested my program on other lex version (on UNIX Sun Solaris an HP
5963     > UNIX) and it work well, so I think that my definitions are correct.
5964     > There are any differences between Lex and Flex?
5965
5966     There are indeed differences, as discussed in the man page.  The one
5967     you are probably running into is that when flex expands a name definition,
5968     it puts parentheses around the expansion, while lex does not.  There's
5969     an example in the man page of how this can lead to different matching.
5970     Flex's behavior complies with the POSIX standard (or at least with the
5971     last POSIX draft I saw).
5972
5973     		Vern
5974
5975
5976File: flex.info,  Node: unnamed-faq-67,  Next: unnamed-faq-68,  Prev: unnamed-faq-66,  Up: FAQ
5977
5978unnamed-faq-67
5979==============
5980
5981     To: hassan@larc.info.uqam.ca (Hassan Alaoui)
5982     Subject: Re: Thanks
5983     In-reply-to: Your message of Mon, 22 Dec 1997 16:06:35 PST.
5984     Date: Mon, 22 Dec 1997 14:35:05 PST
5985     From: Vern Paxson <vern>
5986
5987     > Thank you very much for your help. I compile and link well with C++ while
5988     > declaring 'yylex ...' extern, But a little problem remains. I get a
5989     > segmentation default when executing ( I linked with lfl library) while it
5990     > works well when using LEX instead of flex. Do you have some ideas about the
5991     > reason for this ?
5992
5993     The one possible reason for this that comes to mind is if you've defined
5994     yytext as "extern char yytext[]" (which is what lex uses) instead of
5995     "extern char *yytext" (which is what flex uses).  If it's not that, then
5996     I'm afraid I don't know what the problem might be.
5997
5998     		Vern
5999
6000
6001File: flex.info,  Node: unnamed-faq-68,  Next: unnamed-faq-69,  Prev: unnamed-faq-67,  Up: FAQ
6002
6003unnamed-faq-68
6004==============
6005
6006     To: "Bart Niswonger" <NISWONGR@almaden.ibm.com>
6007     Subject: Re: flex 2.5: c++ scanners & start conditions
6008     In-reply-to: Your message of Tue, 06 Jan 1998 10:34:21 PST.
6009     Date: Tue, 06 Jan 1998 19:19:30 PST
6010     From: Vern Paxson <vern>
6011
6012     > The problem is that when I do this (using %option c++) start
6013     > conditions seem to not apply.
6014
6015     The BEGIN macro modifies the yy_start variable.  For C scanners, this
6016     is a static with scope visible through the whole file.  For C++ scanners,
6017     it's a member variable, so it only has visible scope within a member
6018     function.  Your lexbegin() routine is not a member function when you
6019     build a C++ scanner, so it's not modifying the correct yy_start.  The
6020     diagnostic that indicates this is that you found you needed to add
6021     a declaration of yy_start in order to get your scanner to compile when
6022     using C++; instead, the correct fix is to make lexbegin() a member
6023     function (by deriving from yyFlexLexer).
6024
6025     		Vern
6026
6027
6028File: flex.info,  Node: unnamed-faq-69,  Next: unnamed-faq-70,  Prev: unnamed-faq-68,  Up: FAQ
6029
6030unnamed-faq-69
6031==============
6032
6033     To: "Boris Zinin" <boris@ippe.rssi.ru>
6034     Subject: Re: current position in flex buffer
6035     In-reply-to: Your message of Mon, 12 Jan 1998 18:58:23 PST.
6036     Date: Mon, 12 Jan 1998 12:03:15 PST
6037     From: Vern Paxson <vern>
6038
6039     > The problem is how to determine the current position in flex active
6040     > buffer when a rule is matched....
6041
6042     You will need to keep track of this explicitly, such as by redefining
6043     YY_USER_ACTION to count the number of characters matched.
6044
6045     The latest flex release, by the way, is 2.5.4, available from ftp.ee.lbl.gov.
6046
6047     		Vern
6048
6049
6050File: flex.info,  Node: unnamed-faq-70,  Next: unnamed-faq-71,  Prev: unnamed-faq-69,  Up: FAQ
6051
6052unnamed-faq-70
6053==============
6054
6055     To: Bik.Dhaliwal@bis.org
6056     Subject: Re: Flex question
6057     In-reply-to: Your message of Mon, 26 Jan 1998 13:05:35 PST.
6058     Date: Tue, 27 Jan 1998 22:41:52 PST
6059     From: Vern Paxson <vern>
6060
6061     > That requirement involves knowing
6062     > the character position at which a particular token was matched
6063     > in the lexer.
6064
6065     The way you have to do this is by explicitly keeping track of where
6066     you are in the file, by counting the number of characters scanned
6067     for each token (available in yyleng).  It may prove convenient to
6068     do this by redefining YY_USER_ACTION, as described in the manual.
6069
6070     		Vern
6071
6072
6073File: flex.info,  Node: unnamed-faq-71,  Next: unnamed-faq-72,  Prev: unnamed-faq-70,  Up: FAQ
6074
6075unnamed-faq-71
6076==============
6077
6078     To: Vladimir Alexiev <vladimir@cs.ualberta.ca>
6079     Subject: Re: flex: how to control start condition from parser?
6080     In-reply-to: Your message of Mon, 26 Jan 1998 05:50:16 PST.
6081     Date: Tue, 27 Jan 1998 22:45:37 PST
6082     From: Vern Paxson <vern>
6083
6084     > It seems useful for the parser to be able to tell the lexer about such
6085     > context dependencies, because then they don't have to be limited to
6086     > local or sequential context.
6087
6088     One way to do this is to have the parser call a stub routine that's
6089     included in the scanner's .l file, and consequently that has access ot
6090     BEGIN.  The only ugliness is that the parser can't pass in the state
6091     it wants, because those aren't visible - but if you don't have many
6092     such states, then using a different set of names doesn't seem like
6093     to much of a burden.
6094
6095     While generating a .h file like you suggests is certainly cleaner,
6096     flex development has come to a virtual stand-still :-(, so a workaround
6097     like the above is much more pragmatic than waiting for a new feature.
6098
6099     		Vern
6100
6101
6102File: flex.info,  Node: unnamed-faq-72,  Next: unnamed-faq-73,  Prev: unnamed-faq-71,  Up: FAQ
6103
6104unnamed-faq-72
6105==============
6106
6107     To: Barbara Denny <denny@3com.com>
6108     Subject: Re: freebsd flex bug?
6109     In-reply-to: Your message of Fri, 30 Jan 1998 12:00:43 PST.
6110     Date: Fri, 30 Jan 1998 12:42:32 PST
6111     From: Vern Paxson <vern>
6112
6113     > lex.yy.c:1996: parse error before `='
6114
6115     This is the key, identifying this error.  (It may help to pinpoint
6116     it by using flex -L, so it doesn't generate #line directives in its
6117     output.)  I will bet you heavy money that you have a start condition
6118     name that is also a variable name, or something like that; flex spits
6119     out #define's for each start condition name, mapping them to a number,
6120     so you can wind up with:
6121
6122     	%x foo
6123     	%%
6124     		...
6125     	%%
6126     	void bar()
6127     		{
6128     		int foo = 3;
6129     		}
6130
6131     and the penultimate will turn into "int 1 = 3" after C preprocessing,
6132     since flex will put "#define foo 1" in the generated scanner.
6133
6134     		Vern
6135
6136
6137File: flex.info,  Node: unnamed-faq-73,  Next: unnamed-faq-74,  Prev: unnamed-faq-72,  Up: FAQ
6138
6139unnamed-faq-73
6140==============
6141
6142     To: Maurice Petrie <mpetrie@infoscigroup.com>
6143     Subject: Re: Lost flex .l file
6144     In-reply-to: Your message of Mon, 02 Feb 1998 14:10:01 PST.
6145     Date: Mon, 02 Feb 1998 11:15:12 PST
6146     From: Vern Paxson <vern>
6147
6148     > I am curious as to
6149     > whether there is a simple way to backtrack from the generated source to
6150     > reproduce the lost list of tokens we are searching on.
6151
6152     In theory, it's straight-forward to go from the DFA representation
6153     back to a regular-expression representation - the two are isomorphic.
6154     In practice, a huge headache, because you have to unpack all the tables
6155     back into a single DFA representation, and then write a program to munch
6156     on that and translate it into an RE.
6157
6158     Sorry for the less-than-happy news ...
6159
6160     		Vern
6161
6162
6163File: flex.info,  Node: unnamed-faq-74,  Next: unnamed-faq-75,  Prev: unnamed-faq-73,  Up: FAQ
6164
6165unnamed-faq-74
6166==============
6167
6168     To: jimmey@lexis-nexis.com (Jimmey Todd)
6169     Subject: Re: Flex performance question
6170     In-reply-to: Your message of Thu, 19 Feb 1998 11:01:17 PST.
6171     Date: Thu, 19 Feb 1998 08:48:51 PST
6172     From: Vern Paxson <vern>
6173
6174     > What I have found, is that the smaller the data chunk, the faster the
6175     > program executes. This is the opposite of what I expected. Should this be
6176     > happening this way?
6177
6178     This is exactly what will happen if your input file has embedded NULs.
6179     From the man page:
6180
6181     A final note: flex is slow when matching NUL's, particularly
6182     when  a  token  contains multiple NUL's.  It's best to write
6183     rules which match short amounts of text if it's  anticipated
6184     that the text will often include NUL's.
6185
6186     So that's the first thing to look for.
6187
6188     		Vern
6189
6190
6191File: flex.info,  Node: unnamed-faq-75,  Next: unnamed-faq-76,  Prev: unnamed-faq-74,  Up: FAQ
6192
6193unnamed-faq-75
6194==============
6195
6196     To: jimmey@lexis-nexis.com (Jimmey Todd)
6197     Subject: Re: Flex performance question
6198     In-reply-to: Your message of Thu, 19 Feb 1998 11:01:17 PST.
6199     Date: Thu, 19 Feb 1998 15:42:25 PST
6200     From: Vern Paxson <vern>
6201
6202     So there are several problems.
6203
6204     First, to go fast, you want to match as much text as possible, which
6205     your scanners don't in the case that what they're scanning is *not*
6206     a <RN> tag.  So you want a rule like:
6207
6208     	[^<]+
6209
6210     Second, C++ scanners are particularly slow if they're interactive,
6211     which they are by default.  Using -B speeds it up by a factor of 3-4
6212     on my workstation.
6213
6214     Third, C++ scanners that use the istream interface are slow, because
6215     of how poorly implemented istream's are.  I built two versions of
6216     the following scanner:
6217
6218     	%%
6219     	.*\n
6220     	.*
6221     	%%
6222
6223     and the C version inhales a 2.5MB file on my workstation in 0.8 seconds.
6224     The C++ istream version, using -B, takes 3.8 seconds.
6225
6226     		Vern
6227
6228
6229File: flex.info,  Node: unnamed-faq-76,  Next: unnamed-faq-77,  Prev: unnamed-faq-75,  Up: FAQ
6230
6231unnamed-faq-76
6232==============
6233
6234     To: "Frescatore, David (CRD, TAD)" <frescatore@exc01crdge.crd.ge.com>
6235     Subject: Re: FLEX 2.5 & THE YEAR 2000
6236     In-reply-to: Your message of Wed, 03 Jun 1998 11:26:22 PDT.
6237     Date: Wed, 03 Jun 1998 10:22:26 PDT
6238     From: Vern Paxson <vern>
6239
6240     > I am researching the Y2K problem with General Electric R&D
6241     > and need to know if there are any known issues concerning
6242     > the above mentioned software and Y2K regardless of version.
6243
6244     There shouldn't be, all it ever does with the date is ask the system
6245     for it and then print it out.
6246
6247     		Vern
6248
6249
6250File: flex.info,  Node: unnamed-faq-77,  Next: unnamed-faq-78,  Prev: unnamed-faq-76,  Up: FAQ
6251
6252unnamed-faq-77
6253==============
6254
6255     To: "Hans Dermot Doran" <htd@ibhdoran.com>
6256     Subject: Re: flex problem
6257     In-reply-to: Your message of Wed, 15 Jul 1998 21:30:13 PDT.
6258     Date: Tue, 21 Jul 1998 14:23:34 PDT
6259     From: Vern Paxson <vern>
6260
6261     > To overcome this, I gets() the stdin into a string and lex the string. The
6262     > string is lexed OK except that the end of string isn't lexed properly
6263     > (yy_scan_string()), that is the lexer dosn't recognise the end of string.
6264
6265     Flex doesn't contain mechanisms for recognizing buffer endpoints.  But if
6266     you use fgets instead (which you should anyway, to protect against buffer
6267     overflows), then the final \n will be preserved in the string, and you can
6268     scan that in order to find the end of the string.
6269
6270     		Vern
6271
6272
6273File: flex.info,  Node: unnamed-faq-78,  Next: unnamed-faq-79,  Prev: unnamed-faq-77,  Up: FAQ
6274
6275unnamed-faq-78
6276==============
6277
6278     To: soumen@almaden.ibm.com
6279     Subject: Re: Flex++ 2.5.3 instance member vs. static member
6280     In-reply-to: Your message of Mon, 27 Jul 1998 02:10:04 PDT.
6281     Date: Tue, 28 Jul 1998 01:10:34 PDT
6282     From: Vern Paxson <vern>
6283
6284     > %{
6285     > int mylineno = 0;
6286     > %}
6287     > ws      [ \t]+
6288     > alpha   [A-Za-z]
6289     > dig     [0-9]
6290     > %%
6291     >
6292     > Now you'd expect mylineno to be a member of each instance of class
6293     > yyFlexLexer, but is this the case?  A look at the lex.yy.cc file seems to
6294     > indicate otherwise; unless I am missing something the declaration of
6295     > mylineno seems to be outside any class scope.
6296     >
6297     > How will this work if I want to run a multi-threaded application with each
6298     > thread creating a FlexLexer instance?
6299
6300     Derive your own subclass and make mylineno a member variable of it.
6301
6302     		Vern
6303
6304
6305File: flex.info,  Node: unnamed-faq-79,  Next: unnamed-faq-80,  Prev: unnamed-faq-78,  Up: FAQ
6306
6307unnamed-faq-79
6308==============
6309
6310     To: Adoram Rogel <adoram@hybridge.com>
6311     Subject: Re: More than 32K states change hangs
6312     In-reply-to: Your message of Tue, 04 Aug 1998 16:55:39 PDT.
6313     Date: Tue, 04 Aug 1998 22:28:45 PDT
6314     From: Vern Paxson <vern>
6315
6316     > Vern Paxson,
6317     >
6318     > I followed your advice, posted on Usenet bu you, and emailed to me
6319     > personally by you, on how to overcome the 32K states limit. I'm running
6320     > on Linux machines.
6321     > I took the full source of version 2.5.4 and did the following changes in
6322     > flexdef.h:
6323     > #define JAMSTATE -327660
6324     > #define MAXIMUM_MNS 319990
6325     > #define BAD_SUBSCRIPT -327670
6326     > #define MAX_SHORT 327000
6327     >
6328     > and compiled.
6329     > All looked fine, including check and bigcheck, so I installed.
6330
6331     Hmmm, you shouldn't increase MAX_SHORT, though looking through my email
6332     archives I see that I did indeed recommend doing so.  Try setting it back
6333     to 32700; that should suffice that you no longer need -Ca.  If it still
6334     hangs, then the interesting question is - where?
6335
6336     > Compiling the same hanged program with a out-of-the-box (RedHat 4.2
6337     > distribution of Linux)
6338     > flex 2.5.4 binary works.
6339
6340     Since Linux comes with source code, you should diff it against what
6341     you have to see what problems they missed.
6342
6343     > Should I always compile with the -Ca option now ? even short and simple
6344     > filters ?
6345
6346     No, definitely not.  It's meant to be for those situations where you
6347     absolutely must squeeze every last cycle out of your scanner.
6348
6349     		Vern
6350
6351
6352File: flex.info,  Node: unnamed-faq-80,  Next: unnamed-faq-81,  Prev: unnamed-faq-79,  Up: FAQ
6353
6354unnamed-faq-80
6355==============
6356
6357     To: "Schmackpfeffer, Craig" <Craig.Schmackpfeffer@usa.xerox.com>
6358     Subject: Re: flex output for static code portion
6359     In-reply-to: Your message of Tue, 11 Aug 1998 11:55:30 PDT.
6360     Date: Mon, 17 Aug 1998 23:57:42 PDT
6361     From: Vern Paxson <vern>
6362
6363     > I would like to use flex under the hood to generate a binary file
6364     > containing the data structures that control the parse.
6365
6366     This has been on the wish-list for a long time.  In principle it's
6367     straight-forward - you redirect mkdata() et al's I/O to another file,
6368     and modify the skeleton to have a start-up function that slurps these
6369     into dynamic arrays.  The concerns are (1) the scanner generation code
6370     is hairy and full of corner cases, so it's easy to get surprised when
6371     going down this path :-( ; and (2) being careful about buffering so
6372     that when the tables change you make sure the scanner starts in the
6373     correct state and reading at the right point in the input file.
6374
6375     > I was wondering if you know of anyone who has used flex in this way.
6376
6377     I don't - but it seems like a reasonable project to undertake (unlike
6378     numerous other flex tweaks :-).
6379
6380     		Vern
6381
6382
6383File: flex.info,  Node: unnamed-faq-81,  Next: unnamed-faq-82,  Prev: unnamed-faq-80,  Up: FAQ
6384
6385unnamed-faq-81
6386==============
6387
6388     Received: from 131.173.17.11 (131.173.17.11 [131.173.17.11])
6389     	by ee.lbl.gov (8.9.1/8.9.1) with ESMTP id AAA03838
6390     	for <vern@ee.lbl.gov>; Thu, 20 Aug 1998 00:47:57 -0700 (PDT)
6391     Received: from hal.cl-ki.uni-osnabrueck.de (hal.cl-ki.Uni-Osnabrueck.DE [131.173.141.2])
6392     	by deimos.rz.uni-osnabrueck.de (8.8.7/8.8.8) with ESMTP id JAA34694
6393     	for <vern@ee.lbl.gov>; Thu, 20 Aug 1998 09:47:55 +0200
6394     Received: (from georg@localhost) by hal.cl-ki.uni-osnabrueck.de (8.6.12/8.6.12) id JAA34834 for vern@ee.lbl.gov; Thu, 20 Aug 1998 09:47:54 +0200
6395     From: Georg Rehm <georg@hal.cl-ki.uni-osnabrueck.de>
6396     Message-Id: <199808200747.JAA34834@hal.cl-ki.uni-osnabrueck.de>
6397     Subject: "flex scanner push-back overflow"
6398     To: vern@ee.lbl.gov
6399     Date: Thu, 20 Aug 1998 09:47:54 +0200 (MEST)
6400     Reply-To: Georg.Rehm@CL-KI.Uni-Osnabrueck.DE
6401     X-NoJunk: Do NOT send commercial mail, spam or ads to this address!
6402     X-URL: http://www.cl-ki.uni-osnabrueck.de/~georg/
6403     X-Mailer: ELM [version 2.4ME+ PL28 (25)]
6404     MIME-Version: 1.0
6405     Content-Type: text/plain; charset=US-ASCII
6406     Content-Transfer-Encoding: 7bit
6407
6408     Hi Vern,
6409
6410     Yesterday, I encountered a strange problem: I use the macro processor m4
6411     to include some lengthy lists into a .l file. Following is a flex macro
6412     definition that causes some serious pain in my neck:
6413
6414     AUTHOR           ("A. Boucard / L. Boucard"|"A. Dastarac / M. Levent"|"A.Boucaud / L.Boucaud"|"Abderrahim Lamchichi"|"Achmat Dangor"|"Adeline Toullier"|"Adewale Maja-Pearce"|"Ahmed Ziri"|"Akram Ellyas"|"Alain Bihr"|"Alain Gresh"|"Alain Guillemoles"|"Alain Joxe"|"Alain Morice"|"Alain Renon"|"Alain Zecchini"|"Albert Memmi"|"Alberto Manguel"|"Alex De Waal"|"Alfonso Artico"| [...])
6415
6416     The complete list contains about 10kB. When I try to "flex" this file
6417     (on a Solaris 2.6 machine, using a modified flex 2.5.4 (I only increased
6418     some of the predefined values in flexdefs.h) I get the error:
6419
6420     myflex/flex -8  sentag.tmp.l
6421     flex scanner push-back overflow
6422
6423     When I remove the slashes in the macro definition everything works fine.
6424     As I understand it, the double quotes escape the slash-character so it
6425     really means "/" and not "trailing context". Furthermore, I tried to
6426     escape the slashes with backslashes, but with no use, the same error message
6427     appeared when flexing the code.
6428
6429     Do you have an idea what's going on here?
6430
6431     Greetings from Germany,
6432     	Georg
6433     --
6434     Georg Rehm                                     georg@cl-ki.uni-osnabrueck.de
6435     Institute for Semantic Information Processing, University of Osnabrueck, FRG
6436
6437
6438File: flex.info,  Node: unnamed-faq-82,  Next: unnamed-faq-83,  Prev: unnamed-faq-81,  Up: FAQ
6439
6440unnamed-faq-82
6441==============
6442
6443     To: Georg.Rehm@CL-KI.Uni-Osnabrueck.DE
6444     Subject: Re: "flex scanner push-back overflow"
6445     In-reply-to: Your message of Thu, 20 Aug 1998 09:47:54 PDT.
6446     Date: Thu, 20 Aug 1998 07:05:35 PDT
6447     From: Vern Paxson <vern>
6448
6449     > myflex/flex -8  sentag.tmp.l
6450     > flex scanner push-back overflow
6451
6452     Flex itself uses a flex scanner.  That scanner is running out of buffer
6453     space when it tries to unput() the humongous macro you've defined.  When
6454     you remove the '/'s, you make it small enough so that it fits in the buffer;
6455     removing spaces would do the same thing.
6456
6457     The fix is to either rethink how come you're using such a big macro and
6458     perhaps there's another/better way to do it; or to rebuild flex's own
6459     scan.c with a larger value for
6460
6461     	#define YY_BUF_SIZE 16384
6462
6463     - Vern
6464
6465
6466File: flex.info,  Node: unnamed-faq-83,  Next: unnamed-faq-84,  Prev: unnamed-faq-82,  Up: FAQ
6467
6468unnamed-faq-83
6469==============
6470
6471     To: Jan Kort <jan@research.techforce.nl>
6472     Subject: Re: Flex
6473     In-reply-to: Your message of Fri, 04 Sep 1998 12:18:43 +0200.
6474     Date: Sat, 05 Sep 1998 00:59:49 PDT
6475     From: Vern Paxson <vern>
6476
6477     > %%
6478     >
6479     > "TEST1\n"       { fprintf(stderr, "TEST1\n"); yyless(5); }
6480     > ^\n             { fprintf(stderr, "empty line\n"); }
6481     > .               { }
6482     > \n              { fprintf(stderr, "new line\n"); }
6483     >
6484     > %%
6485     > -- input ---------------------------------------
6486     > TEST1
6487     > -- output --------------------------------------
6488     > TEST1
6489     > empty line
6490     > ------------------------------------------------
6491
6492     IMHO, it's not clear whether or not this is in fact a bug.  It depends
6493     on whether you view yyless() as backing up in the input stream, or as
6494     pushing new characters onto the beginning of the input stream.  Flex
6495     interprets it as the latter (for implementation convenience, I'll admit),
6496     and so considers the newline as in fact matching at the beginning of a
6497     line, as after all the last token scanned an entire line and so the
6498     scanner is now at the beginning of a new line.
6499
6500     I agree that this is counter-intuitive for yyless(), given its
6501     functional description (it's less so for unput(), depending on whether
6502     you're unput()'ing new text or scanned text).  But I don't plan to
6503     change it any time soon, as it's a pain to do so.  Consequently,
6504     you do indeed need to use yy_set_bol() and YY_AT_BOL() to tweak
6505     your scanner into the behavior you desire.
6506
6507     Sorry for the less-than-completely-satisfactory answer.
6508
6509     		Vern
6510
6511
6512File: flex.info,  Node: unnamed-faq-84,  Next: unnamed-faq-85,  Prev: unnamed-faq-83,  Up: FAQ
6513
6514unnamed-faq-84
6515==============
6516
6517     To: Patrick Krusenotto <krusenot@mac-info-link.de>
6518     Subject: Re: Problems with restarting flex-2.5.2-generated scanner
6519     In-reply-to: Your message of Thu, 24 Sep 1998 10:14:07 PDT.
6520     Date: Thu, 24 Sep 1998 23:28:43 PDT
6521     From: Vern Paxson <vern>
6522
6523     > I am using flex-2.5.2 and bison 1.25 for Solaris and I am desperately
6524     > trying to make my scanner restart with a new file after my parser stops
6525     > with a parse error. When my compiler restarts, the parser always
6526     > receives the token after the token (in the old file!) that caused the
6527     > parser error.
6528
6529     I suspect the problem is that your parser has read ahead in order
6530     to attempt to resolve an ambiguity, and when it's restarted it picks
6531     up with that token rather than reading a fresh one.  If you're using
6532     yacc, then the special "error" production can sometimes be used to
6533     consume tokens in an attempt to get the parser into a consistent state.
6534
6535     		Vern
6536
6537
6538File: flex.info,  Node: unnamed-faq-85,  Next: unnamed-faq-86,  Prev: unnamed-faq-84,  Up: FAQ
6539
6540unnamed-faq-85
6541==============
6542
6543     To: Henric Jungheim <junghelh@pe-nelson.com>
6544     Subject: Re: flex 2.5.4a
6545     In-reply-to: Your message of Tue, 27 Oct 1998 16:41:42 PST.
6546     Date: Tue, 27 Oct 1998 16:50:14 PST
6547     From: Vern Paxson <vern>
6548
6549     > This brings up a feature request:  How about a command line
6550     > option to specify the filename when reading from stdin?  That way one
6551     > doesn't need to create a temporary file in order to get the "#line"
6552     > directives to make sense.
6553
6554     Use -o combined with -t (per the man page description of -o).
6555
6556     > P.S., Is there any simple way to use non-blocking IO to parse multiple
6557     > streams?
6558
6559     Simple, no.
6560
6561     One approach might be to return a magic character on EWOULDBLOCK and
6562     have a rule
6563
6564     	.*<magic-character>	// put back .*, eat magic character
6565
6566     This is off the top of my head, not sure it'll work.
6567
6568     		Vern
6569
6570
6571File: flex.info,  Node: unnamed-faq-86,  Next: unnamed-faq-87,  Prev: unnamed-faq-85,  Up: FAQ
6572
6573unnamed-faq-86
6574==============
6575
6576     To: "Repko, Billy D" <billy.d.repko@intel.com>
6577     Subject: Re: Compiling scanners
6578     In-reply-to: Your message of Wed, 13 Jan 1999 10:52:47 PST.
6579     Date: Thu, 14 Jan 1999 00:25:30 PST
6580     From: Vern Paxson <vern>
6581
6582     > It appears that maybe it cannot find the lfl library.
6583
6584     The Makefile in the distribution builds it, so you should have it.
6585     It's exceedingly trivial, just a main() that calls yylex() and
6586     a yyrap() that always returns 1.
6587
6588     > %%
6589     >       \n      ++num_lines; ++num_chars;
6590     >       .       ++num_chars;
6591
6592     You can't indent your rules like this - that's where the errors are coming
6593     from.  Flex copies indented text to the output file, it's how you do things
6594     like
6595
6596     	int num_lines_seen = 0;
6597
6598     to declare local variables.
6599
6600     		Vern
6601
6602
6603File: flex.info,  Node: unnamed-faq-87,  Next: unnamed-faq-88,  Prev: unnamed-faq-86,  Up: FAQ
6604
6605unnamed-faq-87
6606==============
6607
6608     To: Erick Branderhorst <Erick.Branderhorst@asml.nl>
6609     Subject: Re: flex input buffer
6610     In-reply-to: Your message of Tue, 09 Feb 1999 13:53:46 PST.
6611     Date: Tue, 09 Feb 1999 21:03:37 PST
6612     From: Vern Paxson <vern>
6613
6614     > In the flex.skl file the size of the default input buffers is set.  Can you
6615     > explain why this size is set and why it is such a high number.
6616
6617     It's large to optimize performance when scanning large files.  You can
6618     safely make it a lot lower if needed.
6619
6620     		Vern
6621
6622
6623File: flex.info,  Node: unnamed-faq-88,  Next: unnamed-faq-90,  Prev: unnamed-faq-87,  Up: FAQ
6624
6625unnamed-faq-88
6626==============
6627
6628     To: "Guido Minnen" <guidomi@cogs.susx.ac.uk>
6629     Subject: Re: Flex error message
6630     In-reply-to: Your message of Wed, 24 Feb 1999 15:31:46 PST.
6631     Date: Thu, 25 Feb 1999 00:11:31 PST
6632     From: Vern Paxson <vern>
6633
6634     > I'm extending a larger scanner written in Flex and I keep running into
6635     > problems. More specifically, I get the error message:
6636     > "flex: input rules are too complicated (>= 32000 NFA states)"
6637
6638     Increase the definitions in flexdef.h for:
6639
6640     #define JAMSTATE -32766 /* marks a reference to the state that always j
6641     ams */
6642     #define MAXIMUM_MNS 31999
6643     #define BAD_SUBSCRIPT -32767
6644
6645     recompile everything, and it should all work.
6646
6647     		Vern
6648
6649
6650File: flex.info,  Node: unnamed-faq-90,  Next: unnamed-faq-91,  Prev: unnamed-faq-88,  Up: FAQ
6651
6652unnamed-faq-90
6653==============
6654
6655     To: "Dmitriy Goldobin" <gold@ems.chel.su>
6656     Subject: Re: FLEX trouble
6657     In-reply-to: Your message of Mon, 31 May 1999 18:44:49 PDT.
6658     Date: Tue, 01 Jun 1999 00:15:07 PDT
6659     From: Vern Paxson <vern>
6660
6661     >   I have a trouble with FLEX. Why rule "/*".*"*/" work properly,=20
6662     > but rule "/*"(.|\n)*"*/" don't work ?
6663
6664     The second of these will have to scan the entire input stream (because
6665     "(.|\n)*" matches an arbitrary amount of any text) in order to see if
6666     it ends with "*/", terminating the comment.  That potentially will overflow
6667     the input buffer.
6668
6669     >   More complex rule "/*"([^*]|(\*/[^/]))*"*/ give an error
6670     > 'unrecognized rule'.
6671
6672     You can't use the '/' operator inside parentheses.  It's not clear
6673     what "(a/b)*" actually means.
6674
6675     >   I now use workaround with state <comment>, but single-rule is
6676     > better, i think.
6677
6678     Single-rule is nice but will always have the problem of either setting
6679     restrictions on comments (like not allowing multi-line comments) and/or
6680     running the risk of consuming the entire input stream, as noted above.
6681
6682     		Vern
6683
6684
6685File: flex.info,  Node: unnamed-faq-91,  Next: unnamed-faq-92,  Prev: unnamed-faq-90,  Up: FAQ
6686
6687unnamed-faq-91
6688==============
6689
6690     Received: from mc-qout4.whowhere.com (mc-qout4.whowhere.com [209.185.123.18])
6691     	by ee.lbl.gov (8.9.3/8.9.3) with SMTP id IAA05100
6692     	for <vern@ee.lbl.gov>; Tue, 15 Jun 1999 08:56:06 -0700 (PDT)
6693     Received: from Unknown/Local ([?.?.?.?]) by my-deja.com; Tue Jun 15 08:55:43 1999
6694     To: vern@ee.lbl.gov
6695     Date: Tue, 15 Jun 1999 08:55:43 -0700
6696     From: "Aki Niimura" <neko@my-deja.com>
6697     Message-ID: <KNONDOHDOBGAEAAA@my-deja.com>
6698     Mime-Version: 1.0
6699     Cc:
6700     X-Sent-Mail: on
6701     Reply-To:
6702     X-Mailer: MailCity Service
6703     Subject: A question on flex C++ scanner
6704     X-Sender-Ip: 12.72.207.61
6705     Organization: My Deja Email  (http://www.my-deja.com:80)
6706     Content-Type: text/plain; charset=us-ascii
6707     Content-Transfer-Encoding: 7bit
6708
6709     Dear Dr. Paxon,
6710
6711     I have been using flex for years.
6712     It works very well on many projects.
6713     Most case, I used it to generate a scanner on C language.
6714     However, one project I needed to generate  a scanner
6715     on C++ lanuage. Thanks to your enhancement, flex did
6716     the job.
6717
6718     Currently, I'm working on enhancing my previous project.
6719     I need to deal with multiple input streams (recursive
6720     inclusion) in this scanner (C++).
6721     I did similar thing for another scanner (C) as you
6722     explained in your documentation.
6723
6724     The generated scanner (C++) has necessary methods:
6725     - switch_to_buffer(struct yy_buffer_state *b)
6726     - yy_create_buffer(istream *is, int sz)
6727     - yy_delete_buffer(struct yy_buffer_state *b)
6728
6729     However, I couldn't figure out how to access current
6730     buffer (yy_current_buffer).
6731
6732     yy_current_buffer is a protected member of yyFlexLexer.
6733     I can't access it directly.
6734     Then, I thought yy_create_buffer() with is = 0 might
6735     return current stream buffer. But it seems not as far
6736     as I checked the source. (flex 2.5.4)
6737
6738     I went through the Web in addition to Flex documentation.
6739     However, it hasn't been successful, so far.
6740
6741     It is not my intention to bother you, but, can you
6742     comment about how to obtain the current stream buffer?
6743
6744     Your response would be highly appreciated.
6745
6746     Best regards,
6747     Aki Niimura
6748
6749     --== Sent via Deja.com http://www.deja.com/ ==--
6750     Share what you know. Learn what you don't.
6751
6752
6753File: flex.info,  Node: unnamed-faq-92,  Next: unnamed-faq-93,  Prev: unnamed-faq-91,  Up: FAQ
6754
6755unnamed-faq-92
6756==============
6757
6758     To: neko@my-deja.com
6759     Subject: Re: A question on flex C++ scanner
6760     In-reply-to: Your message of Tue, 15 Jun 1999 08:55:43 PDT.
6761     Date: Tue, 15 Jun 1999 09:04:24 PDT
6762     From: Vern Paxson <vern>
6763
6764     > However, I couldn't figure out how to access current
6765     > buffer (yy_current_buffer).
6766
6767     Derive your own subclass from yyFlexLexer.
6768
6769     		Vern
6770
6771
6772File: flex.info,  Node: unnamed-faq-93,  Next: unnamed-faq-94,  Prev: unnamed-faq-92,  Up: FAQ
6773
6774unnamed-faq-93
6775==============
6776
6777     To: "Stones, Darren" <Darren.Stones@nectech.co.uk>
6778     Subject: Re: You're the man to see?
6779     In-reply-to: Your message of Wed, 23 Jun 1999 11:10:29 PDT.
6780     Date: Wed, 23 Jun 1999 09:01:40 PDT
6781     From: Vern Paxson <vern>
6782
6783     > I hope you can help me.  I am using Flex and Bison to produce an interpreted
6784     > language.  However all goes well until I try to implement an IF statement or
6785     > a WHILE.  I cannot get this to work as the parser parses all the conditions
6786     > eg. the TRUE and FALSE conditons to check for a rule match.  So I cannot
6787     > make a decision!!
6788
6789     You need to use the parser to build a parse tree (= abstract syntax trwee),
6790     and when that's all done you recursively evaluate the tree, binding variables
6791     to values at that time.
6792
6793     		Vern
6794
6795
6796File: flex.info,  Node: unnamed-faq-94,  Next: unnamed-faq-95,  Prev: unnamed-faq-93,  Up: FAQ
6797
6798unnamed-faq-94
6799==============
6800
6801     To: Petr Danecek <petr@ics.cas.cz>
6802     Subject: Re: flex - question
6803     In-reply-to: Your message of Mon, 28 Jun 1999 19:21:41 PDT.
6804     Date: Fri, 02 Jul 1999 16:52:13 PDT
6805     From: Vern Paxson <vern>
6806
6807     > file, it takes an enormous amount of time. It is funny, because the
6808     > source code has only 12 rules!!! I think it looks like an exponencial
6809     > growth.
6810
6811     Right, that's the problem - some patterns (those with a lot of
6812     ambiguity, where yours has because at any given time the scanner can
6813     be in the middle of all sorts of combinations of the different
6814     rules) blow up exponentially.
6815
6816     For your rules, there is an easy fix.  Change the ".*" that comes fater
6817     the directory name to "[^ ]*".  With that in place, the rules are no
6818     longer nearly so ambiguous, because then once one of the directories
6819     has been matched, no other can be matched (since they all require a
6820     leading blank).
6821
6822     If that's not an acceptable solution, then you can enter a start state
6823     to pick up the .*\n after each directory is matched.
6824
6825     Also note that for speed, you'll want to add a ".*" rule at the end,
6826     otherwise rules that don't match any of the patterns will be matched
6827     very slowly, a character at a time.
6828
6829     		Vern
6830
6831
6832File: flex.info,  Node: unnamed-faq-95,  Next: unnamed-faq-96,  Prev: unnamed-faq-94,  Up: FAQ
6833
6834unnamed-faq-95
6835==============
6836
6837     To: Tielman Koekemoer <tielman@spi.co.za>
6838     Subject: Re: Please help.
6839     In-reply-to: Your message of Thu, 08 Jul 1999 13:20:37 PDT.
6840     Date: Thu, 08 Jul 1999 08:20:39 PDT
6841     From: Vern Paxson <vern>
6842
6843     > I was hoping you could help me with my problem.
6844     >
6845     > I tried compiling (gnu)flex on a Solaris 2.4 machine
6846     > but when I ran make (after configure) I got an error.
6847     >
6848     > --------------------------------------------------------------
6849     > gcc -c -I. -I. -g -O parse.c
6850     > ./flex -t -p  ./scan.l >scan.c
6851     > sh: ./flex: not found
6852     > *** Error code 1
6853     > make: Fatal error: Command failed for target `scan.c'
6854     > -------------------------------------------------------------
6855     >
6856     > What's strange to me is that I'm only
6857     > trying to install flex now. I then edited the Makefile to
6858     > and changed where it says "FLEX = flex" to "FLEX = lex"
6859     > ( lex: the native Solaris one ) but then it complains about
6860     > the "-p" option. Is there any way I can compile flex without
6861     > using flex or lex?
6862     >
6863     > Thanks so much for your time.
6864
6865     You managed to step on the bootstrap sequence, which first copies
6866     initscan.c to scan.c in order to build flex.  Try fetching a fresh
6867     distribution from ftp.ee.lbl.gov.  (Or you can first try removing
6868     ".bootstrap" and doing a make again.)
6869
6870     		Vern
6871
6872
6873File: flex.info,  Node: unnamed-faq-96,  Next: unnamed-faq-97,  Prev: unnamed-faq-95,  Up: FAQ
6874
6875unnamed-faq-96
6876==============
6877
6878     To: Tielman Koekemoer <tielman@spi.co.za>
6879     Subject: Re: Please help.
6880     In-reply-to: Your message of Fri, 09 Jul 1999 09:16:14 PDT.
6881     Date: Fri, 09 Jul 1999 00:27:20 PDT
6882     From: Vern Paxson <vern>
6883
6884     > First I removed .bootstrap (and ran make) - no luck. I downloaded the
6885     > software but I still have the same problem. Is there anything else I
6886     > could try.
6887
6888     Try:
6889
6890     	cp initscan.c scan.c
6891     	touch scan.c
6892     	make scan.o
6893
6894     If this last tries to first build scan.c from scan.l using ./flex, then
6895     your "make" is broken, in which case compile scan.c to scan.o by hand.
6896
6897     		Vern
6898
6899
6900File: flex.info,  Node: unnamed-faq-97,  Next: unnamed-faq-98,  Prev: unnamed-faq-96,  Up: FAQ
6901
6902unnamed-faq-97
6903==============
6904
6905     To: Sumanth Kamenani <skamenan@crl.nmsu.edu>
6906     Subject: Re: Error
6907     In-reply-to: Your message of Mon, 19 Jul 1999 23:08:41 PDT.
6908     Date: Tue, 20 Jul 1999 00:18:26 PDT
6909     From: Vern Paxson <vern>
6910
6911     > I am getting a compilation error. The error is given as "unknown symbol- yylex".
6912
6913     The parser relies on calling yylex(), but you're instead using the C++ scanning
6914     class, so you need to supply a yylex() "glue" function that calls an instance
6915     scanner of the scanner (e.g., "scanner->yylex()").
6916
6917     		Vern
6918
6919
6920File: flex.info,  Node: unnamed-faq-98,  Next: unnamed-faq-99,  Prev: unnamed-faq-97,  Up: FAQ
6921
6922unnamed-faq-98
6923==============
6924
6925     To: daniel@synchrods.synchrods.COM (Daniel Senderowicz)
6926     Subject: Re: lex
6927     In-reply-to: Your message of Mon, 22 Nov 1999 11:19:04 PST.
6928     Date: Tue, 23 Nov 1999 15:54:30 PST
6929     From: Vern Paxson <vern>
6930
6931     Well, your problem is the
6932
6933     switch (yybgin-yysvec-1) {      /* witchcraft */
6934
6935     at the beginning of lex rules.  "witchcraft" == "non-portable".  It's
6936     assuming knowledge of the AT&T lex's internal variables.
6937
6938     For flex, you can probably do the equivalent using a switch on YYSTATE.
6939
6940     		Vern
6941
6942
6943File: flex.info,  Node: unnamed-faq-99,  Next: unnamed-faq-100,  Prev: unnamed-faq-98,  Up: FAQ
6944
6945unnamed-faq-99
6946==============
6947
6948     To: archow@hss.hns.com
6949     Subject: Re: Regarding distribution of flex and yacc based grammars
6950     In-reply-to: Your message of Sun, 19 Dec 1999 17:50:24 +0530.
6951     Date: Wed, 22 Dec 1999 01:56:24 PST
6952     From: Vern Paxson <vern>
6953
6954     > When we provide the customer with an object code distribution, is it
6955     > necessary for us to provide source
6956     > for the generated C files from flex and bison since they are generated by
6957     > flex and bison ?
6958
6959     For flex, no.  I don't know what the current state of this is for bison.
6960
6961     > Also, is there any requrirement for us to neccessarily  provide source for
6962     > the grammar files which are fed into flex and bison ?
6963
6964     Again, for flex, no.
6965
6966     See the file "COPYING" in the flex distribution for the legalese.
6967
6968     		Vern
6969
6970
6971File: flex.info,  Node: unnamed-faq-100,  Next: unnamed-faq-101,  Prev: unnamed-faq-99,  Up: FAQ
6972
6973unnamed-faq-100
6974===============
6975
6976     To: Martin Gallwey <gallweym@hyperion.moe.ul.ie>
6977     Subject: Re: Flex, and self referencing rules
6978     In-reply-to: Your message of Sun, 20 Feb 2000 01:01:21 PST.
6979     Date: Sat, 19 Feb 2000 18:33:16 PST
6980     From: Vern Paxson <vern>
6981
6982     > However, I do not use unput anywhere. I do use self-referencing
6983     > rules like this:
6984     >
6985     > UnaryExpr               ({UnionExpr})|("-"{UnaryExpr})
6986
6987     You can't do this - flex is *not* a parser like yacc (which does indeed
6988     allow recursion), it is a scanner that's confined to regular expressions.
6989
6990     		Vern
6991
6992
6993File: flex.info,  Node: unnamed-faq-101,  Next: What is the difference between YYLEX_PARAM and YY_DECL?,  Prev: unnamed-faq-100,  Up: FAQ
6994
6995unnamed-faq-101
6996===============
6997
6998     To: slg3@lehigh.edu (SAMUEL L. GULDEN)
6999     Subject: Re: Flex problem
7000     In-reply-to: Your message of Thu, 02 Mar 2000 12:29:04 PST.
7001     Date: Thu, 02 Mar 2000 23:00:46 PST
7002     From: Vern Paxson <vern>
7003
7004     If this is exactly your program:
7005
7006     > digit [0-9]
7007     > digits {digit}+
7008     > whitespace [ \t\n]+
7009     >
7010     > %%
7011     > "[" { printf("open_brac\n");}
7012     > "]" { printf("close_brac\n");}
7013     > "+" { printf("addop\n");}
7014     > "*" { printf("multop\n");}
7015     > {digits} { printf("NUMBER = %s\n", yytext);}
7016     > whitespace ;
7017
7018     then the problem is that the last rule needs to be "{whitespace}" !
7019
7020     		Vern
7021
7022
7023File: flex.info,  Node: What is the difference between YYLEX_PARAM and YY_DECL?,  Next: Why do I get "conflicting types for yylex" error?,  Prev: unnamed-faq-101,  Up: FAQ
7024
7025What is the difference between YYLEX_PARAM and YY_DECL?
7026=======================================================
7027
7028YYLEX_PARAM is not a flex symbol.  It is for Bison.  It tells Bison to
7029pass extra params when it calls yylex() from the parser.
7030
7031   YY_DECL is the Flex declaration of yylex.  The default is similar to
7032this:
7033
7034     #define int yy_lex ()
7035
7036
7037File: flex.info,  Node: Why do I get "conflicting types for yylex" error?,  Next: How do I access the values set in a Flex action from within a Bison action?,  Prev: What is the difference between YYLEX_PARAM and YY_DECL?,  Up: FAQ
7038
7039Why do I get "conflicting types for yylex" error?
7040=================================================
7041
7042This is a compiler error regarding a generated Bison parser, not a Flex
7043scanner.  It means you need a prototype of yylex() in the top of the
7044Bison file.  Be sure the prototype matches YY_DECL.
7045
7046
7047File: flex.info,  Node: How do I access the values set in a Flex action from within a Bison action?,  Prev: Why do I get "conflicting types for yylex" error?,  Up: FAQ
7048
7049How do I access the values set in a Flex action from within a Bison action?
7050===========================================================================
7051
7052With $1, $2, $3, etc.  These are called "Semantic Values" in the Bison
7053manual.  See *note (bison)Top::.
7054
7055
7056File: flex.info,  Node: Appendices,  Next: Indices,  Prev: FAQ,  Up: Top
7057
7058Appendix A Appendices
7059*********************
7060
7061* Menu:
7062
7063* Makefiles and Flex::
7064* Bison Bridge::
7065* M4 Dependency::
7066* Common Patterns::
7067
7068
7069File: flex.info,  Node: Makefiles and Flex,  Next: Bison Bridge,  Prev: Appendices,  Up: Appendices
7070
7071A.1 Makefiles and Flex
7072======================
7073
7074In this appendix, we provide tips for writing Makefiles to build your
7075scanners.
7076
7077   In a traditional build environment, we say that the '.c' files are
7078the sources, and the '.o' files are the intermediate files.  When using
7079'flex', however, the '.l' files are the sources, and the generated '.c'
7080files (along with the '.o' files) are the intermediate files.  This
7081requires you to carefully plan your Makefile.
7082
7083   Modern 'make' programs understand that 'foo.l' is intended to
7084generate 'lex.yy.c' or 'foo.c', and will behave accordingly(1)(2).  The
7085following Makefile does not explicitly instruct 'make' how to build
7086'foo.c' from 'foo.l'.  Instead, it relies on the implicit rules of the
7087'make' program to build the intermediate file, 'scan.c':
7088
7089         # Basic Makefile -- relies on implicit rules
7090         # Creates "myprogram" from "scan.l" and "myprogram.c"
7091         #
7092         LEX=flex
7093         myprogram: scan.o myprogram.o
7094         scan.o: scan.l
7095
7096
7097   For simple cases, the above may be sufficient.  For other cases, you
7098may have to explicitly instruct 'make' how to build your scanner.  The
7099following is an example of a Makefile containing explicit rules:
7100
7101         # Basic Makefile -- provides explicit rules
7102         # Creates "myprogram" from "scan.l" and "myprogram.c"
7103         #
7104         LEX=flex
7105         myprogram: scan.o myprogram.o
7106                 $(CC) -o $@  $(LDFLAGS) $^
7107
7108         myprogram.o: myprogram.c
7109                 $(CC) $(CPPFLAGS) $(CFLAGS) -o $@ -c $^
7110
7111         scan.o: scan.c
7112                 $(CC) $(CPPFLAGS) $(CFLAGS) -o $@ -c $^
7113
7114         scan.c: scan.l
7115                 $(LEX) $(LFLAGS) -o $@ $^
7116
7117         clean:
7118                 $(RM) *.o scan.c
7119
7120
7121   Notice in the above example that 'scan.c' is in the 'clean' target.
7122This is because we consider the file 'scan.c' to be an intermediate
7123file.
7124
7125   Finally, we provide a realistic example of a 'flex' scanner used with
7126a 'bison' parser(3).  There is a tricky problem we have to deal with.
7127Since a 'flex' scanner will typically include a header file (e.g.,
7128'y.tab.h') generated by the parser, we need to be sure that the header
7129file is generated BEFORE the scanner is compiled.  We handle this case
7130in the following example:
7131
7132         # Makefile example -- scanner and parser.
7133         # Creates "myprogram" from "scan.l", "parse.y", and "myprogram.c"
7134         #
7135         LEX     = flex
7136         YACC    = bison -y
7137         YFLAGS  = -d
7138         objects = scan.o parse.o myprogram.o
7139
7140         myprogram: $(objects)
7141         scan.o: scan.l parse.c
7142         parse.o: parse.y
7143         myprogram.o: myprogram.c
7144
7145
7146   In the above example, notice the line,
7147
7148         scan.o: scan.l parse.c
7149
7150   , which lists the file 'parse.c' (the generated parser) as a
7151dependency of 'scan.o'.  We want to ensure that the parser is created
7152before the scanner is compiled, and the above line seems to do the
7153trick.  Feel free to experiment with your specific implementation of
7154'make'.
7155
7156   For more details on writing Makefiles, see *note (make)Top::.
7157
7158   ---------- Footnotes ----------
7159
7160   (1) GNU 'make' and GNU 'automake' are two such programs that provide
7161implicit rules for flex-generated scanners.
7162
7163   (2) GNU 'automake' may generate code to execute flex in
7164lex-compatible mode, or to stdout.  If this is not what you want, then
7165you should provide an explicit rule in your Makefile.am
7166
7167   (3) This example also applies to yacc parsers.
7168
7169
7170File: flex.info,  Node: Bison Bridge,  Next: M4 Dependency,  Prev: Makefiles and Flex,  Up: Appendices
7171
7172A.2 C Scanners with Bison Parsers
7173=================================
7174
7175This section describes the 'flex' features useful when integrating
7176'flex' with 'GNU bison'(1).  Skip this section if you are not using
7177'bison' with your scanner.  Here we discuss only the 'flex' half of the
7178'flex' and 'bison' pair.  We do not discuss 'bison' in any detail.  For
7179more information about generating 'bison' parsers, see *note
7180(bison)Top::.
7181
7182   A compatible 'bison' scanner is generated by declaring '%option
7183bison-bridge' or by supplying '--bison-bridge' when invoking 'flex' from
7184the command line.  This instructs 'flex' that the macro 'yylval' may be
7185used.  The data type for 'yylval', 'YYSTYPE', is typically defined in a
7186header file, included in section 1 of the 'flex' input file.  For a list
7187of functions and macros available, *Note bison-functions::.
7188
7189   The declaration of yylex becomes,
7190
7191           int yylex ( YYSTYPE * lvalp, yyscan_t scanner );
7192
7193   If '%option bison-locations' is specified, then the declaration
7194becomes,
7195
7196           int yylex ( YYSTYPE * lvalp, YYLTYPE * llocp, yyscan_t scanner );
7197
7198   Note that the macros 'yylval' and 'yylloc' evaluate to pointers.
7199Support for 'yylloc' is optional in 'bison', so it is optional in 'flex'
7200as well.  The following is an example of a 'flex' scanner that is
7201compatible with 'bison'.
7202
7203         /* Scanner for "C" assignment statements... sort of. */
7204         %{
7205         #include "y.tab.h"  /* Generated by bison. */
7206         %}
7207
7208         %option bison-bridge bison-locations
7209         %
7210
7211         [[:digit:]]+  { yylval->num = atoi(yytext);   return NUMBER;}
7212         [[:alnum:]]+  { yylval->str = strdup(yytext); return STRING;}
7213         "="|";"       { return yytext[0];}
7214         .  {}
7215         %
7216
7217   As you can see, there really is no magic here.  We just use 'yylval'
7218as we would any other variable.  The data type of 'yylval' is generated
7219by 'bison', and included in the file 'y.tab.h'.  Here is the
7220corresponding 'bison' parser:
7221
7222         /* Parser to convert "C" assignments to lisp. */
7223         %{
7224         /* Pass the argument to yyparse through to yylex. */
7225         #define YYPARSE_PARAM scanner
7226         #define YYLEX_PARAM   scanner
7227         %}
7228         %locations
7229         %pure_parser
7230         %union {
7231             int num;
7232             char* str;
7233         }
7234         %token <str> STRING
7235         %token <num> NUMBER
7236         %%
7237         assignment:
7238             STRING '=' NUMBER ';' {
7239                 printf( "(setf %s %d)", $1, $3 );
7240            }
7241         ;
7242
7243   ---------- Footnotes ----------
7244
7245   (1) The features described here are purely optional, and are by no
7246means the only way to use flex with bison.  We merely provide some glue
7247to ease development of your parser-scanner pair.
7248
7249
7250File: flex.info,  Node: M4 Dependency,  Next: Common Patterns,  Prev: Bison Bridge,  Up: Appendices
7251
7252A.3 M4 Dependency
7253=================
7254
7255The macro processor 'm4'(1) must be installed wherever flex is
7256installed.  'flex' invokes 'm4', found by searching the directories in
7257the 'PATH' environment variable.  Any code you place in section 1 or in
7258the actions will be sent through m4.  Please follow these rules to
7259protect your code from unwanted 'm4' processing.
7260
7261   * Do not use symbols that begin with, 'm4_', such as, 'm4_define', or
7262     'm4_include', since those are reserved for 'm4' macro names.  If
7263     for some reason you need m4_ as a prefix, use a preprocessor
7264     #define to get your symbol past m4 unmangled.
7265
7266   * Do not use the strings '[[' or ']]' anywhere in your code.  The
7267     former is not valid in C, except within comments and strings, but
7268     the latter is valid in code such as 'x[y[z]]'.  The solution is
7269     simple.  To get the literal string '"]]"', use '"]""]"'.  To get
7270     the array notation 'x[y[z]]', use 'x[y[z] ]'.  Flex will attempt to
7271     detect these sequences in user code, and escape them.  However,
7272     it's best to avoid this complexity where possible, by removing such
7273     sequences from your code.
7274
7275   'm4' is only required at the time you run 'flex'.  The generated
7276scanner is ordinary C or C++, and does _not_ require 'm4'.
7277
7278   ---------- Footnotes ----------
7279
7280   (1) The use of m4 is subject to change in future revisions of flex.
7281It is not part of the public API of flex.  Do not depend on it.
7282
7283
7284File: flex.info,  Node: Common Patterns,  Prev: M4 Dependency,  Up: Appendices
7285
7286A.4 Common Patterns
7287===================
7288
7289This appendix provides examples of common regular expressions you might
7290use in your scanner.
7291
7292* Menu:
7293
7294* Numbers::
7295* Identifiers::
7296* Quoted Constructs::
7297* Addresses::
7298
7299
7300File: flex.info,  Node: Numbers,  Next: Identifiers,  Up: Common Patterns
7301
7302A.4.1 Numbers
7303-------------
7304
7305C99 decimal constant
7306     '([[:digit:]]{-}[0])[[:digit:]]*'
7307
7308C99 hexadecimal constant
7309     '0[xX][[:xdigit:]]+'
7310
7311C99 octal constant
7312     '0[01234567]*'
7313
7314C99 floating point constant
7315      {dseq}      ([[:digit:]]+)
7316      {dseq_opt}  ([[:digit:]]*)
7317      {frac}      (({dseq_opt}"."{dseq})|{dseq}".")
7318      {exp}       ([eE][+-]?{dseq})
7319      {exp_opt}   ({exp}?)
7320      {fsuff}     [flFL]
7321      {fsuff_opt} ({fsuff}?)
7322      {hpref}     (0[xX])
7323      {hdseq}     ([[:xdigit:]]+)
7324      {hdseq_opt} ([[:xdigit:]]*)
7325      {hfrac}     (({hdseq_opt}"."{hdseq})|({hdseq}"."))
7326      {bexp}      ([pP][+-]?{dseq})
7327      {dfc}       (({frac}{exp_opt}{fsuff_opt})|({dseq}{exp}{fsuff_opt}))
7328      {hfc}       (({hpref}{hfrac}{bexp}{fsuff_opt})|({hpref}{hdseq}{bexp}{fsuff_opt}))
7329
7330      {c99_floating_point_constant}  ({dfc}|{hfc})
7331
7332     See C99 section 6.4.4.2 for the gory details.
7333
7334
7335File: flex.info,  Node: Identifiers,  Next: Quoted Constructs,  Prev: Numbers,  Up: Common Patterns
7336
7337A.4.2 Identifiers
7338-----------------
7339
7340C99 Identifier
7341     ucn        ((\\u([[:xdigit:]]{4}))|(\\U([[:xdigit:]]{8})))
7342     nondigit    [_[:alpha:]]
7343     c99_id     ([_[:alpha:]]|{ucn})([_[:alnum:]]|{ucn})*
7344
7345     Technically, the above pattern does not encompass all possible C99
7346     identifiers, since C99 allows for "implementation-defined"
7347     characters.  In practice, C compilers follow the above pattern,
7348     with the addition of the '$' character.
7349
7350UTF-8 Encoded Unicode Code Point
7351     [\x09\x0A\x0D\x20-\x7E]|[\xC2-\xDF][\x80-\xBF]|\xE0[\xA0-\xBF][\x80-\xBF]|[\xE1-\xEC\xEE\xEF]([\x80-\xBF]{2})|\xED[\x80-\x9F][\x80-\xBF]|\xF0[\x90-\xBF]([\x80-\xBF]{2})|[\xF1-\xF3]([\x80-\xBF]{3})|\xF4[\x80-\x8F]([\x80-\xBF]{2})
7352
7353
7354File: flex.info,  Node: Quoted Constructs,  Next: Addresses,  Prev: Identifiers,  Up: Common Patterns
7355
7356A.4.3 Quoted Constructs
7357-----------------------
7358
7359C99 String Literal
7360     'L?\"([^\"\\\n]|(\\['\"?\\abfnrtv])|(\\([0123456]{1,3}))|(\\x[[:xdigit:]]+)|(\\u([[:xdigit:]]{4}))|(\\U([[:xdigit:]]{8})))*\"'
7361
7362C99 Comment
7363     '("/*"([^*]|"*"[^/])*"*/")|("/"(\\\n)*"/"[^\n]*)'
7364
7365     Note that in C99, a '//'-style comment may be split across lines,
7366     and, contrary to popular belief, does not include the trailing '\n'
7367     character.
7368
7369     A better way to scan '/* */' comments is by line, rather than
7370     matching possibly huge comments all at once.  This will allow you
7371     to scan comments of unlimited length, as long as line breaks appear
7372     at sane intervals.  This is also more efficient when used with
7373     automatic line number processing.  *Note option-yylineno::.
7374
7375     <INITIAL>{
7376         "/*"      BEGIN(COMMENT);
7377     }
7378     <COMMENT>{
7379         "*/"      BEGIN(0);
7380         [^*\n]+   ;
7381         "*"[^/]   ;
7382         \n        ;
7383     }
7384
7385
7386File: flex.info,  Node: Addresses,  Prev: Quoted Constructs,  Up: Common Patterns
7387
7388A.4.4 Addresses
7389---------------
7390
7391IPv4 Address
7392     dec-octet     [0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5]
7393     IPv4address   {dec-octet}\.{dec-octet}\.{dec-octet}\.{dec-octet}
7394
7395IPv6 Address
7396     h16           [0-9A-Fa-f]{1,4}
7397     ls32          {h16}:{h16}|{IPv4address}
7398     IPv6address   ({h16}:){6}{ls32}|
7399                   ::({h16}:){5}{ls32}|
7400                   ({h16})?::({h16}:){4}{ls32}|
7401                   (({h16}:){0,1}{h16})?::({h16}:){3}{ls32}|
7402                   (({h16}:){0,2}{h16})?::({h16}:){2}{ls32}|
7403                   (({h16}:){0,3}{h16})?::{h16}:{ls32}|
7404                   (({h16}:){0,4}{h16})?::{ls32}|
7405                   (({h16}:){0,5}{h16})?::{h16}|
7406                   (({h16}:){0,6}{h16})?::
7407
7408     See RFC 2373 (http://www.ietf.org/rfc/rfc2373.txt) for details.
7409     Note that you have to fold the definition of 'IPv6address' into one
7410     line and that it also matches the "unspecified address" "::".
7411
7412URI
7413     '(([^:/?#]+):)?("//"([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?'
7414
7415     This pattern is nearly useless, since it allows just about any
7416     character to appear in a URI, including spaces and control
7417     characters.  See RFC 2396 (http://www.ietf.org/rfc/rfc2396.txt) for
7418     details.
7419
7420
7421File: flex.info,  Node: Indices,  Prev: Appendices,  Up: Top
7422
7423Indices
7424*******
7425
7426* Menu:
7427
7428* Concept Index::
7429* Index of Functions and Macros::
7430* Index of Variables::
7431* Index of Data Types::
7432* Index of Hooks::
7433* Index of Scanner Options::
7434
7435
7436File: flex.info,  Node: Concept Index,  Next: Index of Functions and Macros,  Prev: Indices,  Up: Indices
7437
7438Concept Index
7439=============
7440
7441�[index�]
7442* Menu:
7443
7444* $ as normal character in patterns:     Patterns.            (line 275)
7445* %array, advantages of:                 Matching.            (line  43)
7446* %array, use of:                        Matching.            (line  29)
7447* %array, with C++:                      Matching.            (line  65)
7448* %option noyywrapp:                     Generated Scanner.   (line  93)
7449* %pointer, and unput():                 Actions.             (line 162)
7450* %pointer, use of:                      Matching.            (line  29)
7451* %top:                                  Definitions Section. (line  44)
7452* %{ and %}, in Definitions Section:     Definitions Section. (line  40)
7453* %{ and %}, in Rules Section:           Actions.             (line  26)
7454* <<EOF>>, use of:                       EOF.                 (line  33)
7455* [] in patterns:                        Patterns.            (line  15)
7456* ^ as non-special character in patterns: Patterns.           (line 275)
7457* |, in actions:                         Actions.             (line  33)
7458* |, use of:                             Actions.             (line  83)
7459* accessor functions, use of:            Accessor Methods.    (line  18)
7460* actions:                               Actions.             (line   6)
7461* actions, embedded C strings:           Actions.             (line  26)
7462* actions, redefining YY_BREAK:          Misc Macros.         (line  49)
7463* actions, use of { and }:               Actions.             (line  26)
7464* aliases, how to define:                Definitions Section. (line  10)
7465* arguments, command-line:               Scanner Options.     (line   6)
7466* array, default size for yytext:        User Values.         (line  13)
7467* backing up, eliminating:               Performance.         (line  54)
7468* backing up, eliminating by adding error rules: Performance. (line 104)
7469* backing up, eliminating with catch-all rule: Performance.   (line 118)
7470* backing up, example of eliminating:    Performance.         (line  49)
7471* BEGIN:                                 Actions.             (line  57)
7472* BEGIN, explanation:                    Start Conditions.    (line  84)
7473* beginning of line, in patterns:        Patterns.            (line 127)
7474* bison, bridging with flex:             Bison Bridge.        (line   6)
7475* bison, parser:                         Bison Bridge.        (line  53)
7476* bison, scanner to be called from bison: Bison Bridge.       (line  34)
7477* BOL, checking the BOL flag:            Misc Macros.         (line  46)
7478* BOL, in patterns:                      Patterns.            (line 127)
7479* BOL, setting it:                       Misc Macros.         (line  40)
7480* braces in patterns:                    Patterns.            (line  42)
7481* bugs, reporting:                       Reporting Bugs.      (line   6)
7482* C code in flex input:                  Definitions Section. (line  40)
7483* C++:                                   Cxx.                 (line   9)
7484* C++ and %array:                        User Values.         (line  23)
7485* C++ I/O, customizing:                  How do I use my own I/O classes in a C++ scanner?.
7486                                                              (line   9)
7487* C++ scanners, including multiple scanners: Cxx.             (line 197)
7488* C++ scanners, use of:                  Cxx.                 (line 128)
7489* c++, experimental form of scanner class: Cxx.               (line   6)
7490* C++, multiple different scanners:      Cxx.                 (line 192)
7491* C-strings, in actions:                 Actions.             (line  26)
7492* case-insensitive, effect on character classes: Patterns.    (line 216)
7493* character classes in patterns:         Patterns.            (line 186)
7494* character classes in patterns, syntax of: Patterns.         (line  15)
7495* character classes, equivalence of:     Patterns.            (line 205)
7496* clearing an input buffer:              Multiple Input Buffers.
7497                                                              (line  66)
7498* command-line options:                  Scanner Options.     (line   6)
7499* comments in flex input:                Definitions Section. (line  37)
7500* comments in the input:                 Comments in the Input.
7501                                                              (line  24)
7502* comments, discarding:                  Actions.             (line 176)
7503* comments, example of scanning C comments: Start Conditions. (line 140)
7504* comments, in actions:                  Actions.             (line  26)
7505* comments, in rules section:            Comments in the Input.
7506                                                              (line  11)
7507* comments, syntax of:                   Comments in the Input.
7508                                                              (line   6)
7509* comments, valid uses of:               Comments in the Input.
7510                                                              (line  24)
7511* compressing whitespace:                Actions.             (line  22)
7512* concatenation, in patterns:            Patterns.            (line 111)
7513* copyright of flex:                     Copyright.           (line   6)
7514* counting characters and lines:         Simple Examples.     (line  23)
7515* customizing I/O in C++ scanners:       How do I use my own I/O classes in a C++ scanner?.
7516                                                              (line   9)
7517* default rule:                          Simple Examples.     (line  15)
7518* default rule <1>:                      Matching.            (line  20)
7519* defining pattern aliases:              Definitions Section. (line  21)
7520* Definitions, in flex input:            Definitions Section. (line   6)
7521* deleting lines from input:             Actions.             (line  13)
7522* discarding C comments:                 Actions.             (line 176)
7523* distributing flex:                     Copyright.           (line   6)
7524* ECHO:                                  Actions.             (line  54)
7525* ECHO, and yyout:                       Generated Scanner.   (line 101)
7526* embedding C code in flex input:        Definitions Section. (line  40)
7527* end of file, in patterns:              Patterns.            (line 150)
7528* end of line, in negated character classes: Patterns.        (line 237)
7529* end of line, in patterns:              Patterns.            (line 131)
7530* end-of-file, and yyrestart():          Generated Scanner.   (line  42)
7531* EOF and yyrestart():                   Generated Scanner.   (line  42)
7532* EOF in patterns, syntax of:            Patterns.            (line 150)
7533* EOF, example using multiple input buffers: Multiple Input Buffers.
7534                                                              (line  81)
7535* EOF, explanation:                      EOF.                 (line   6)
7536* EOF, pushing back:                     Actions.             (line 170)
7537* EOL, in negated character classes:     Patterns.            (line 237)
7538* EOL, in patterns:                      Patterns.            (line 131)
7539* error messages, end of buffer missed:  Lex and Posix.       (line  50)
7540* error reporting, diagnostic messages:  Diagnostics.         (line   6)
7541* error reporting, in C++:               Cxx.                 (line 112)
7542* error rules, to eliminate backing up:  Performance.         (line 102)
7543* escape sequences in patterns, syntax of: Patterns.          (line  57)
7544* exiting with yyterminate():            Actions.             (line 212)
7545* experimental form of c++ scanner class: Cxx.                (line   6)
7546* extended scope of start conditions:    Start Conditions.    (line 270)
7547* file format:                           Format.              (line   6)
7548* file format, serialized tables:        Tables File Format.  (line   6)
7549* flushing an input buffer:              Multiple Input Buffers.
7550                                                              (line  66)
7551* flushing the internal buffer:          Actions.             (line 206)
7552* format of flex input:                  Format.              (line   6)
7553* format of input file:                  Format.              (line   9)
7554* freeing tables:                        Loading and Unloading Serialized Tables.
7555                                                              (line   6)
7556* getting current start state with YY_START: Start Conditions.
7557                                                              (line 189)
7558* halting with yyterminate():            Actions.             (line 212)
7559* handling include files with multiple input buffers: Multiple Input Buffers.
7560                                                              (line  87)
7561* handling include files with multiple input buffers <1>: Multiple Input Buffers.
7562                                                              (line 122)
7563* header files, with C++:                Cxx.                 (line 197)
7564* include files, with C++:               Cxx.                 (line 197)
7565* input file, Definitions section:       Definitions Section. (line   6)
7566* input file, Rules Section:             Rules Section.       (line   6)
7567* input file, user code Section:         User Code Section.   (line   6)
7568* input():                               Actions.             (line 173)
7569* input(), and C++:                      Actions.             (line 202)
7570* input, format of:                      Format.              (line   6)
7571* input, matching:                       Matching.            (line   6)
7572* keywords, for performance:             Performance.         (line 200)
7573* lex (traditional) and POSIX:           Lex and Posix.       (line   6)
7574* LexerInput, overriding:                How do I use my own I/O classes in a C++ scanner?.
7575                                                              (line   9)
7576* LexerOutput, overriding:               How do I use my own I/O classes in a C++ scanner?.
7577                                                              (line   9)
7578* limitations of flex:                   Limitations.         (line   6)
7579* literal text in patterns, syntax of:   Patterns.            (line  54)
7580* loading tables at runtime:             Loading and Unloading Serialized Tables.
7581                                                              (line   6)
7582* m4:                                    M4 Dependency.       (line   6)
7583* Makefile, example of implicit rules:   Makefiles and Flex.  (line  21)
7584* Makefile, explicit example:            Makefiles and Flex.  (line  33)
7585* Makefile, syntax:                      Makefiles and Flex.  (line   6)
7586* matching C-style double-quoted strings: Start Conditions.   (line 203)
7587* matching, and trailing context:        Matching.            (line   6)
7588* matching, length of:                   Matching.            (line   6)
7589* matching, multiple matches:            Matching.            (line   6)
7590* member functions, C++:                 Cxx.                 (line   9)
7591* memory management:                     Memory Management.   (line   6)
7592* memory, allocating input buffers:      Multiple Input Buffers.
7593                                                              (line  19)
7594* memory, considerations for reentrant scanners: Init and Destroy Functions.
7595                                                              (line   6)
7596* memory, deleting input buffers:        Multiple Input Buffers.
7597                                                              (line  46)
7598* memory, for start condition stacks:    Start Conditions.    (line 301)
7599* memory, serialized tables:             Serialized Tables.   (line   6)
7600* memory, serialized tables <1>:         Loading and Unloading Serialized Tables.
7601                                                              (line   6)
7602* methods, c++:                          Cxx.                 (line   9)
7603* minimal scanner:                       Matching.            (line  24)
7604* multiple input streams:                Multiple Input Buffers.
7605                                                              (line   6)
7606* name definitions, not POSIX:           Lex and Posix.       (line  75)
7607* negating ranges in patterns:           Patterns.            (line  23)
7608* newline, matching in patterns:         Patterns.            (line 135)
7609* non-POSIX features of flex:            Lex and Posix.       (line 142)
7610* noyywrap, %option:                     Generated Scanner.   (line  93)
7611* NULL character in patterns, syntax of: Patterns.            (line  62)
7612* octal characters in patterns:          Patterns.            (line  65)
7613* options, command-line:                 Scanner Options.     (line   6)
7614* overriding LexerInput:                 How do I use my own I/O classes in a C++ scanner?.
7615                                                              (line   9)
7616* overriding LexerOutput:                How do I use my own I/O classes in a C++ scanner?.
7617                                                              (line   9)
7618* overriding the memory routines:        Overriding The Default Memory Management.
7619                                                              (line  38)
7620* Pascal-like language:                  Simple Examples.     (line  49)
7621* pattern aliases, defining:             Definitions Section. (line  21)
7622* pattern aliases, expansion of:         Patterns.            (line  51)
7623* pattern aliases, how to define:        Definitions Section. (line  10)
7624* pattern aliases, use of:               Definitions Section. (line  28)
7625* patterns and actions on different lines: Lex and Posix.     (line 101)
7626* patterns, character class equivalence: Patterns.            (line 205)
7627* patterns, common:                      Common Patterns.     (line   6)
7628* patterns, end of line:                 Patterns.            (line 300)
7629* patterns, grouping and precedence:     Patterns.            (line 167)
7630* patterns, in rules section:            Patterns.            (line   6)
7631* patterns, invalid trailing context:    Patterns.            (line 285)
7632* patterns, matching:                    Matching.            (line   6)
7633* patterns, precedence of operators:     Patterns.            (line 161)
7634* patterns, repetitions with grouping:   Patterns.            (line 184)
7635* patterns, special characters treated as non-special: Patterns.
7636                                                              (line 293)
7637* patterns, syntax:                      Patterns.            (line   9)
7638* patterns, syntax <1>:                  Patterns.            (line   9)
7639* patterns, tuning for performance:      Performance.         (line  49)
7640* patterns, valid character classes:     Patterns.            (line 192)
7641* performance optimization, matching longer tokens: Performance.
7642                                                              (line 167)
7643* performance optimization, recognizing keywords: Performance.
7644                                                              (line 205)
7645* performance, backing up:               Performance.         (line  49)
7646* performance, considerations:           Performance.         (line   6)
7647* performance, using keywords:           Performance.         (line 200)
7648* popping an input buffer:               Multiple Input Buffers.
7649                                                              (line  60)
7650* POSIX and lex:                         Lex and Posix.       (line   6)
7651* POSIX comp;compliance:                 Lex and Posix.       (line 142)
7652* POSIX, character classes in patterns, syntax of: Patterns.  (line  15)
7653* preprocessor macros, for use in actions: Actions.           (line  50)
7654* pushing an input buffer:               Multiple Input Buffers.
7655                                                              (line  52)
7656* pushing back characters with unput:    Actions.             (line 143)
7657* pushing back characters with unput():  Actions.             (line 147)
7658* pushing back characters with yyless:   Actions.             (line 131)
7659* pushing back EOF:                      Actions.             (line 170)
7660* ranges in patterns:                    Patterns.            (line  19)
7661* ranges in patterns, negating:          Patterns.            (line  23)
7662* recognizing C comments:                Start Conditions.    (line 143)
7663* reentrant scanners, multiple interleaved scanners: Reentrant Uses.
7664                                                              (line  10)
7665* reentrant scanners, recursive invocation: Reentrant Uses.   (line  30)
7666* reentrant, accessing flex variables:   Global Replacement.  (line   6)
7667* reentrant, accessor functions:         Accessor Methods.    (line   6)
7668* reentrant, API explanation:            Reentrant Overview.  (line   6)
7669* reentrant, calling functions:          Extra Reentrant Argument.
7670                                                              (line   6)
7671* reentrant, example of:                 Reentrant Example.   (line   6)
7672* reentrant, explanation:                Reentrant.           (line   6)
7673* reentrant, extra data:                 Extra Data.          (line   6)
7674* reentrant, initialization:             Init and Destroy Functions.
7675                                                              (line   6)
7676* regular expressions, in patterns:      Patterns.            (line   6)
7677* REJECT:                                Actions.             (line  61)
7678* REJECT, calling multiple times:        Actions.             (line  83)
7679* REJECT, performance costs:             Performance.         (line  12)
7680* reporting bugs:                        Reporting Bugs.      (line   6)
7681* restarting the scanner:                Lex and Posix.       (line  54)
7682* RETURN, within actions:                Generated Scanner.   (line  57)
7683* rules, default:                        Simple Examples.     (line  15)
7684* rules, in flex input:                  Rules Section.       (line   6)
7685* scanner, definition of:                Introduction.        (line   6)
7686* sections of flex input:                Format.              (line   6)
7687* serialization:                         Serialized Tables.   (line   6)
7688* serialization of tables:               Creating Serialized Tables.
7689                                                              (line   6)
7690* serialized tables, multiple scanners:  Creating Serialized Tables.
7691                                                              (line  26)
7692* stack, input buffer pop:               Multiple Input Buffers.
7693                                                              (line  60)
7694* stack, input buffer push:              Multiple Input Buffers.
7695                                                              (line  52)
7696* stacks, routines for manipulating:     Start Conditions.    (line 286)
7697* start condition, applying to multiple patterns: Start Conditions.
7698                                                              (line 258)
7699* start conditions:                      Start Conditions.    (line   6)
7700* start conditions, behavior of default rule: Start Conditions.
7701                                                              (line  82)
7702* start conditions, exclusive:           Start Conditions.    (line  53)
7703* start conditions, for different interpretations of same input: Start Conditions.
7704                                                              (line 112)
7705* start conditions, in patterns:         Patterns.            (line 140)
7706* start conditions, inclusive:           Start Conditions.    (line  44)
7707* start conditions, inclusive v.s. exclusive: Start Conditions.
7708                                                              (line  24)
7709* start conditions, integer values:      Start Conditions.    (line 163)
7710* start conditions, multiple:            Start Conditions.    (line  17)
7711* start conditions, special wildcard condition: Start Conditions.
7712                                                              (line  68)
7713* start conditions, use of a stack:      Start Conditions.    (line 286)
7714* start conditions, use of wildcard condition (<*>): Start Conditions.
7715                                                              (line  72)
7716* start conditions, using BEGIN:         Start Conditions.    (line  95)
7717* stdin, default for yyin:               Generated Scanner.   (line  37)
7718* stdout, as default for yyout:          Generated Scanner.   (line 101)
7719* strings, scanning strings instead of files: Multiple Input Buffers.
7720                                                              (line 175)
7721* tables, creating serialized:           Creating Serialized Tables.
7722                                                              (line   6)
7723* tables, file format:                   Tables File Format.  (line   6)
7724* tables, freeing:                       Loading and Unloading Serialized Tables.
7725                                                              (line   6)
7726* tables, loading and unloading:         Loading and Unloading Serialized Tables.
7727                                                              (line   6)
7728* terminating with yyterminate():        Actions.             (line 212)
7729* token:                                 Matching.            (line  14)
7730* trailing context, in patterns:         Patterns.            (line 118)
7731* trailing context, limits of:           Patterns.            (line 275)
7732* trailing context, matching:            Matching.            (line   6)
7733* trailing context, performance costs:   Performance.         (line  12)
7734* trailing context, variable length:     Performance.         (line 141)
7735* unput():                               Actions.             (line 143)
7736* unput(), and %pointer:                 Actions.             (line 162)
7737* unput(), pushing back characters:      Actions.             (line 147)
7738* user code, in flex input:              User Code Section.   (line   6)
7739* username expansion:                    Simple Examples.     (line   8)
7740* using integer values of start condition names: Start Conditions.
7741                                                              (line 163)
7742* verbatim text in patterns, syntax of:  Patterns.            (line  54)
7743* warning, dangerous trailing context:   Limitations.         (line  20)
7744* warning, rule cannot be matched:       Diagnostics.         (line  14)
7745* warnings, diagnostic messages:         Diagnostics.         (line   6)
7746* whitespace, compressing:               Actions.             (line  22)
7747* yacc interface:                        Yacc.                (line  17)
7748* yacc, interface:                       Yacc.                (line   6)
7749* yyalloc, overriding:                   Overriding The Default Memory Management.
7750                                                              (line   6)
7751* yyfree, overriding:                    Overriding The Default Memory Management.
7752                                                              (line   6)
7753* yyin:                                  Generated Scanner.   (line  37)
7754* yyinput():                             Actions.             (line 202)
7755* yyleng:                                Matching.            (line  14)
7756* yyleng, modification of:               Actions.             (line  47)
7757* yyless():                              Actions.             (line 125)
7758* yyless(), pushing back characters:     Actions.             (line 131)
7759* yylex(), in generated scanner:         Generated Scanner.   (line   6)
7760* yylex(), overriding:                   Generated Scanner.   (line  16)
7761* yylex, overriding the prototype of:    Generated Scanner.   (line  20)
7762* yylineno, in a reentrant scanner:      Reentrant Functions. (line  36)
7763* yylineno, performance costs:           Performance.         (line  12)
7764* yymore():                              Actions.             (line 104)
7765* yymore() to append token to previous token: Actions.        (line 110)
7766* yymore(), mega-kludge:                 Actions.             (line 110)
7767* yymore, and yyleng:                    Actions.             (line  47)
7768* yymore, performance penalty of:        Actions.             (line 119)
7769* yyout:                                 Generated Scanner.   (line 101)
7770* yyrealloc, overriding:                 Overriding The Default Memory Management.
7771                                                              (line   6)
7772* yyrestart():                           Generated Scanner.   (line  42)
7773* yyterminate():                         Actions.             (line 212)
7774* yytext:                                Matching.            (line  14)
7775* yytext, default array size:            User Values.         (line  13)
7776* yytext, memory considerations:         A Note About yytext And Memory.
7777                                                              (line   6)
7778* yytext, modification of:               Actions.             (line  42)
7779* yytext, two types of:                  Matching.            (line  29)
7780* yywrap():                              Generated Scanner.   (line  85)
7781* yywrap, default for:                   Generated Scanner.   (line  93)
7782* YY_CURRENT_BUFFER, and multiple buffers Finally, the macro: Multiple Input Buffers.
7783                                                              (line  78)
7784* YY_EXTRA_TYPE, defining your own type: Extra Data.          (line  33)
7785* YY_FLUSH_BUFFER:                       Actions.             (line 206)
7786* YY_INPUT:                              Generated Scanner.   (line  61)
7787* YY_INPUT, overriding:                  Generated Scanner.   (line  71)
7788* YY_START, example:                     Start Conditions.    (line 185)
7789* YY_USER_ACTION to track each time a rule is matched: Misc Macros.
7790                                                              (line  14)
7791
7792