xref: /minix/external/bsd/flex/dist/doc/flex.info-1 (revision 0a6a1f1d)
1This is flex.info, produced by makeinfo version 4.13 from flex.texi.
2
3INFO-DIR-SECTION Programming
4START-INFO-DIR-ENTRY
5* flex: (flex).      Fast lexical analyzer generator (lex replacement).
6END-INFO-DIR-ENTRY
7
8   The flex manual is placed under the same licensing conditions as the
9rest of flex:
10
11   Copyright (C) 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2012 The
12Flex Project.
13
14   Copyright (C) 1990, 1997 The Regents of the University of California.
15All rights reserved.
16
17   This code is derived from software contributed to Berkeley by Vern
18Paxson.
19
20   The United States Government has rights in this work pursuant to
21contract no. DE-AC03-76SF00098 between the United States Department of
22Energy and the University of California.
23
24   Redistribution and use in source and binary forms, with or without
25modification, are permitted provided that the following conditions are
26met:
27
28  1.  Redistributions of source code must retain the above copyright
29     notice, this list of conditions and the following disclaimer.
30
31  2. Redistributions in binary form must reproduce the above copyright
32     notice, this list of conditions and the following disclaimer in the
33     documentation and/or other materials provided with the
34     distribution.
35
36   Neither the name of the University nor the names of its contributors
37may be used to endorse or promote products derived from this software
38without specific prior written permission.
39
40   THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
41WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
42MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
43
44
45File: flex.info,  Node: Top,  Next: Copyright,  Prev: (dir),  Up: (dir)
46
47flex
48****
49
50This manual describes `flex', a tool for generating programs that
51perform pattern-matching on text.  The manual includes both tutorial and
52reference sections.
53
54   This edition of `The flex Manual' documents `flex' version 2.5.39.
55It was last updated on 6 December 2012.
56
57   This manual was written by Vern Paxson, Will Estes and John Millaway.
58
59* Menu:
60
61* Copyright::
62* Reporting Bugs::
63* Introduction::
64* Simple Examples::
65* Format::
66* Patterns::
67* Matching::
68* Actions::
69* Generated Scanner::
70* Start Conditions::
71* Multiple Input Buffers::
72* EOF::
73* Misc Macros::
74* User Values::
75* Yacc::
76* Scanner Options::
77* Performance::
78* Cxx::
79* Reentrant::
80* Lex and Posix::
81* Memory Management::
82* Serialized Tables::
83* Diagnostics::
84* Limitations::
85* Bibliography::
86* FAQ::
87* Appendices::
88* Indices::
89
90 --- The Detailed Node Listing ---
91
92Format of the Input File
93
94* Definitions Section::
95* Rules Section::
96* User Code Section::
97* Comments in the Input::
98
99Scanner Options
100
101* Options for Specifying Filenames::
102* Options Affecting Scanner Behavior::
103* Code-Level And API Options::
104* Options for Scanner Speed and Size::
105* Debugging Options::
106* Miscellaneous Options::
107
108Reentrant C Scanners
109
110* Reentrant Uses::
111* Reentrant Overview::
112* Reentrant Example::
113* Reentrant Detail::
114* Reentrant Functions::
115
116The Reentrant API in Detail
117
118* Specify Reentrant::
119* Extra Reentrant Argument::
120* Global Replacement::
121* Init and Destroy Functions::
122* Accessor Methods::
123* Extra Data::
124* About yyscan_t::
125
126Memory Management
127
128* The Default Memory Management::
129* Overriding The Default Memory Management::
130* A Note About yytext And Memory::
131
132Serialized Tables
133
134* Creating Serialized Tables::
135* Loading and Unloading Serialized Tables::
136* Tables File Format::
137
138FAQ
139
140* When was flex born?::
141* How do I expand backslash-escape sequences in C-style quoted strings?::
142* Why do flex scanners call fileno if it is not ANSI compatible?::
143* Does flex support recursive pattern definitions?::
144* How do I skip huge chunks of input (tens of megabytes) while using flex?::
145* Flex is not matching my patterns in the same order that I defined them.::
146* My actions are executing out of order or sometimes not at all.::
147* How can I have multiple input sources feed into the same scanner at the same time?::
148* Can I build nested parsers that work with the same input file?::
149* How can I match text only at the end of a file?::
150* How can I make REJECT cascade across start condition boundaries?::
151* Why cant I use fast or full tables with interactive mode?::
152* How much faster is -F or -f than -C?::
153* If I have a simple grammar cant I just parse it with flex?::
154* Why doesn't yyrestart() set the start state back to INITIAL?::
155* How can I match C-style comments?::
156* The period isn't working the way I expected.::
157* Can I get the flex manual in another format?::
158* Does there exist a "faster" NDFA->DFA algorithm?::
159* How does flex compile the DFA so quickly?::
160* How can I use more than 8192 rules?::
161* How do I abandon a file in the middle of a scan and switch to a new file?::
162* How do I execute code only during initialization (only before the first scan)?::
163* How do I execute code at termination?::
164* Where else can I find help?::
165* Can I include comments in the "rules" section of the file?::
166* I get an error about undefined yywrap().::
167* How can I change the matching pattern at run time?::
168* How can I expand macros in the input?::
169* How can I build a two-pass scanner?::
170* How do I match any string not matched in the preceding rules?::
171* I am trying to port code from AT&T lex that uses yysptr and yysbuf.::
172* Is there a way to make flex treat NULL like a regular character?::
173* Whenever flex can not match the input it says "flex scanner jammed".::
174* Why doesn't flex have non-greedy operators like perl does?::
175* Memory leak - 16386 bytes allocated by malloc.::
176* How do I track the byte offset for lseek()?::
177* How do I use my own I/O classes in a C++ scanner?::
178* How do I skip as many chars as possible?::
179* deleteme00::
180* Are certain equivalent patterns faster than others?::
181* Is backing up a big deal?::
182* Can I fake multi-byte character support?::
183* deleteme01::
184* Can you discuss some flex internals?::
185* unput() messes up yy_at_bol::
186* The | operator is not doing what I want::
187* Why can't flex understand this variable trailing context pattern?::
188* The ^ operator isn't working::
189* Trailing context is getting confused with trailing optional patterns::
190* Is flex GNU or not?::
191* ERASEME53::
192* I need to scan if-then-else blocks and while loops::
193* ERASEME55::
194* ERASEME56::
195* ERASEME57::
196* Is there a repository for flex scanners?::
197* How can I conditionally compile or preprocess my flex input file?::
198* Where can I find grammars for lex and yacc?::
199* I get an end-of-buffer message for each character scanned.::
200* unnamed-faq-62::
201* unnamed-faq-63::
202* unnamed-faq-64::
203* unnamed-faq-65::
204* unnamed-faq-66::
205* unnamed-faq-67::
206* unnamed-faq-68::
207* unnamed-faq-69::
208* unnamed-faq-70::
209* unnamed-faq-71::
210* unnamed-faq-72::
211* unnamed-faq-73::
212* unnamed-faq-74::
213* unnamed-faq-75::
214* unnamed-faq-76::
215* unnamed-faq-77::
216* unnamed-faq-78::
217* unnamed-faq-79::
218* unnamed-faq-80::
219* unnamed-faq-81::
220* unnamed-faq-82::
221* unnamed-faq-83::
222* unnamed-faq-84::
223* unnamed-faq-85::
224* unnamed-faq-86::
225* unnamed-faq-87::
226* unnamed-faq-88::
227* unnamed-faq-90::
228* unnamed-faq-91::
229* unnamed-faq-92::
230* unnamed-faq-93::
231* unnamed-faq-94::
232* unnamed-faq-95::
233* unnamed-faq-96::
234* unnamed-faq-97::
235* unnamed-faq-98::
236* unnamed-faq-99::
237* unnamed-faq-100::
238* unnamed-faq-101::
239* What is the difference between YYLEX_PARAM and YY_DECL?::
240* Why do I get "conflicting types for yylex" error?::
241* How do I access the values set in a Flex action from within a Bison action?::
242
243Appendices
244
245* Makefiles and Flex::
246* Bison Bridge::
247* M4 Dependency::
248* Common Patterns::
249
250Indices
251
252* Concept Index::
253* Index of Functions and Macros::
254* Index of Variables::
255* Index of Data Types::
256* Index of Hooks::
257* Index of Scanner Options::
258
259
260File: flex.info,  Node: Copyright,  Next: Reporting Bugs,  Prev: Top,  Up: Top
261
2621 Copyright
263***********
264
265The flex manual is placed under the same licensing conditions as the
266rest of flex:
267
268   Copyright (C) 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2012 The
269Flex Project.
270
271   Copyright (C) 1990, 1997 The Regents of the University of California.
272All rights reserved.
273
274   This code is derived from software contributed to Berkeley by Vern
275Paxson.
276
277   The United States Government has rights in this work pursuant to
278contract no. DE-AC03-76SF00098 between the United States Department of
279Energy and the University of California.
280
281   Redistribution and use in source and binary forms, with or without
282modification, are permitted provided that the following conditions are
283met:
284
285  1.  Redistributions of source code must retain the above copyright
286     notice, this list of conditions and the following disclaimer.
287
288  2. Redistributions in binary form must reproduce the above copyright
289     notice, this list of conditions and the following disclaimer in the
290     documentation and/or other materials provided with the
291     distribution.
292
293   Neither the name of the University nor the names of its contributors
294may be used to endorse or promote products derived from this software
295without specific prior written permission.
296
297   THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
298WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
299MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
300
301
302File: flex.info,  Node: Reporting Bugs,  Next: Introduction,  Prev: Copyright,  Up: Top
303
3042 Reporting Bugs
305****************
306
307If you find a bug in `flex', please report it using the SourceForge Bug
308Tracking facilities which can be found on flex's SourceForge Page
309(http://sourceforge.net/projects/flex).
310
311
312File: flex.info,  Node: Introduction,  Next: Simple Examples,  Prev: Reporting Bugs,  Up: Top
313
3143 Introduction
315**************
316
317`flex' is a tool for generating "scanners".  A scanner is a program
318which recognizes lexical patterns in text.  The `flex' program reads
319the given input files, or its standard input if no file names are
320given, for a description of a scanner to generate.  The description is
321in the form of pairs of regular expressions and C code, called "rules".
322`flex' generates as output a C source file, `lex.yy.c' by default,
323which defines a routine `yylex()'.  This file can be compiled and
324linked with the flex runtime library to produce an executable.  When
325the executable is run, it analyzes its input for occurrences of the
326regular expressions.  Whenever it finds one, it executes the
327corresponding C code.
328
329
330File: flex.info,  Node: Simple Examples,  Next: Format,  Prev: Introduction,  Up: Top
331
3324 Some Simple Examples
333**********************
334
335First some simple examples to get the flavor of how one uses `flex'.
336
337   The following `flex' input specifies a scanner which, when it
338encounters the string `username' will replace it with the user's login
339name:
340
341         %%
342         username    printf( "%s", getlogin() );
343
344   By default, any text not matched by a `flex' scanner is copied to
345the output, so the net effect of this scanner is to copy its input file
346to its output with each occurrence of `username' expanded.  In this
347input, there is just one rule.  `username' is the "pattern" and the
348`printf' is the "action".  The `%%' symbol marks the beginning of the
349rules.
350
351   Here's another simple example:
352
353                 int num_lines = 0, num_chars = 0;
354
355         %%
356         \n      ++num_lines; ++num_chars;
357         .       ++num_chars;
358
359         %%
360
361         int main()
362                 {
363                 yylex();
364                 printf( "# of lines = %d, # of chars = %d\n",
365                         num_lines, num_chars );
366                 }
367
368   This scanner counts the number of characters and the number of lines
369in its input. It produces no output other than the final report on the
370character and line counts.  The first line declares two globals,
371`num_lines' and `num_chars', which are accessible both inside `yylex()'
372and in the `main()' routine declared after the second `%%'.  There are
373two rules, one which matches a newline (`\n') and increments both the
374line count and the character count, and one which matches any character
375other than a newline (indicated by the `.' regular expression).
376
377   A somewhat more complicated example:
378
379         /* scanner for a toy Pascal-like language */
380
381         %{
382         /* need this for the call to atof() below */
383         #include <math.h>
384         %}
385
386         DIGIT    [0-9]
387         ID       [a-z][a-z0-9]*
388
389         %%
390
391         {DIGIT}+    {
392                     printf( "An integer: %s (%d)\n", yytext,
393                             atoi( yytext ) );
394                     }
395
396         {DIGIT}+"."{DIGIT}*        {
397                     printf( "A float: %s (%g)\n", yytext,
398                             atof( yytext ) );
399                     }
400
401         if|then|begin|end|procedure|function        {
402                     printf( "A keyword: %s\n", yytext );
403                     }
404
405         {ID}        printf( "An identifier: %s\n", yytext );
406
407         "+"|"-"|"*"|"/"   printf( "An operator: %s\n", yytext );
408
409         "{"[\^{}}\n]*"}"     /* eat up one-line comments */
410
411         [ \t\n]+          /* eat up whitespace */
412
413         .           printf( "Unrecognized character: %s\n", yytext );
414
415         %%
416
417         int main( int argc, char **argv )
418             {
419             ++argv, --argc;  /* skip over program name */
420             if ( argc > 0 )
421                     yyin = fopen( argv[0], "r" );
422             else
423                     yyin = stdin;
424
425             yylex();
426             }
427
428   This is the beginnings of a simple scanner for a language like
429Pascal.  It identifies different types of "tokens" and reports on what
430it has seen.
431
432   The details of this example will be explained in the following
433sections.
434
435
436File: flex.info,  Node: Format,  Next: Patterns,  Prev: Simple Examples,  Up: Top
437
4385 Format of the Input File
439**************************
440
441The `flex' input file consists of three sections, separated by a line
442containing only `%%'.
443
444         definitions
445         %%
446         rules
447         %%
448         user code
449
450* Menu:
451
452* Definitions Section::
453* Rules Section::
454* User Code Section::
455* Comments in the Input::
456
457
458File: flex.info,  Node: Definitions Section,  Next: Rules Section,  Prev: Format,  Up: Format
459
4605.1 Format of the Definitions Section
461=====================================
462
463The "definitions section" contains declarations of simple "name"
464definitions to simplify the scanner specification, and declarations of
465"start conditions", which are explained in a later section.
466
467   Name definitions have the form:
468
469         name definition
470
471   The `name' is a word beginning with a letter or an underscore (`_')
472followed by zero or more letters, digits, `_', or `-' (dash).  The
473definition is taken to begin at the first non-whitespace character
474following the name and continuing to the end of the line.  The
475definition can subsequently be referred to using `{name}', which will
476expand to `(definition)'.  For example,
477
478         DIGIT    [0-9]
479         ID       [a-z][a-z0-9]*
480
481   Defines `DIGIT' to be a regular expression which matches a single
482digit, and `ID' to be a regular expression which matches a letter
483followed by zero-or-more letters-or-digits.  A subsequent reference to
484
485         {DIGIT}+"."{DIGIT}*
486
487   is identical to
488
489         ([0-9])+"."([0-9])*
490
491   and matches one-or-more digits followed by a `.' followed by
492zero-or-more digits.
493
494   An unindented comment (i.e., a line beginning with `/*') is copied
495verbatim to the output up to the next `*/'.
496
497   Any _indented_ text or text enclosed in `%{' and `%}' is also copied
498verbatim to the output (with the %{ and %} symbols removed).  The %{
499and %} symbols must appear unindented on lines by themselves.
500
501   A `%top' block is similar to a `%{' ... `%}' block, except that the
502code in a `%top' block is relocated to the _top_ of the generated file,
503before any flex definitions (1).  The `%top' block is useful when you
504want certain preprocessor macros to be defined or certain files to be
505included before the generated code.  The single characters, `{'  and
506`}' are used to delimit the `%top' block, as show in the example below:
507
508         %top{
509             /* This code goes at the "top" of the generated file. */
510             #include <stdint.h>
511             #include <inttypes.h>
512         }
513
514   Multiple `%top' blocks are allowed, and their order is preserved.
515
516   ---------- Footnotes ----------
517
518   (1) Actually, `yyIN_HEADER' is defined before the `%top' block.
519
520
521File: flex.info,  Node: Rules Section,  Next: User Code Section,  Prev: Definitions Section,  Up: Format
522
5235.2 Format of the Rules Section
524===============================
525
526The "rules" section of the `flex' input contains a series of rules of
527the form:
528
529         pattern   action
530
531   where the pattern must be unindented and the action must begin on
532the same line.  *Note Patterns::, for a further description of patterns
533and actions.
534
535   In the rules section, any indented or %{ %} enclosed text appearing
536before the first rule may be used to declare variables which are local
537to the scanning routine and (after the declarations) code which is to be
538executed whenever the scanning routine is entered.  Other indented or
539%{ %} text in the rule section is still copied to the output, but its
540meaning is not well-defined and it may well cause compile-time errors
541(this feature is present for POSIX compliance. *Note Lex and Posix::,
542for other such features).
543
544   Any _indented_ text or text enclosed in `%{' and `%}' is copied
545verbatim to the output (with the %{ and %} symbols removed).  The %{
546and %} symbols must appear unindented on lines by themselves.
547
548
549File: flex.info,  Node: User Code Section,  Next: Comments in the Input,  Prev: Rules Section,  Up: Format
550
5515.3 Format of the User Code Section
552===================================
553
554The user code section is simply copied to `lex.yy.c' verbatim.  It is
555used for companion routines which call or are called by the scanner.
556The presence of this section is optional; if it is missing, the second
557`%%' in the input file may be skipped, too.
558
559
560File: flex.info,  Node: Comments in the Input,  Prev: User Code Section,  Up: Format
561
5625.4 Comments in the Input
563=========================
564
565Flex supports C-style comments, that is, anything between `/*' and `*/'
566is considered a comment. Whenever flex encounters a comment, it copies
567the entire comment verbatim to the generated source code. Comments may
568appear just about anywhere, but with the following exceptions:
569
570   * Comments may not appear in the Rules Section wherever flex is
571     expecting a regular expression. This means comments may not appear
572     at the beginning of a line, or immediately following a list of
573     scanner states.
574
575   * Comments may not appear on an `%option' line in the Definitions
576     Section.
577
578   If you want to follow a simple rule, then always begin a comment on a
579new line, with one or more whitespace characters before the initial
580`/*').  This rule will work anywhere in the input file.
581
582   All the comments in the following example are valid:
583
584     %{
585     /* code block */
586     %}
587
588     /* Definitions Section */
589     %x STATE_X
590
591     %%
592         /* Rules Section */
593     ruleA   /* after regex */ { /* code block */ } /* after code block */
594             /* Rules Section (indented) */
595     <STATE_X>{
596     ruleC   ECHO;
597     ruleD   ECHO;
598     %{
599     /* code block */
600     %}
601     }
602     %%
603     /* User Code Section */
604
605
606File: flex.info,  Node: Patterns,  Next: Matching,  Prev: Format,  Up: Top
607
6086 Patterns
609**********
610
611The patterns in the input (see *note Rules Section::) are written using
612an extended set of regular expressions.  These are:
613
614`x'
615     match the character 'x'
616
617`.'
618     any character (byte) except newline
619
620`[xyz]'
621     a "character class"; in this case, the pattern matches either an
622     'x', a 'y', or a 'z'
623
624`[abj-oZ]'
625     a "character class" with a range in it; matches an 'a', a 'b', any
626     letter from 'j' through 'o', or a 'Z'
627
628`[^A-Z]'
629     a "negated character class", i.e., any character but those in the
630     class.  In this case, any character EXCEPT an uppercase letter.
631
632`[^A-Z\n]'
633     any character EXCEPT an uppercase letter or a newline
634
635`[a-z]{-}[aeiou]'
636     the lowercase consonants
637
638`r*'
639     zero or more r's, where r is any regular expression
640
641`r+'
642     one or more r's
643
644`r?'
645     zero or one r's (that is, "an optional r")
646
647`r{2,5}'
648     anywhere from two to five r's
649
650`r{2,}'
651     two or more r's
652
653`r{4}'
654     exactly 4 r's
655
656`{name}'
657     the expansion of the `name' definition (*note Format::).
658
659`"[xyz]\"foo"'
660     the literal string: `[xyz]"foo'
661
662`\X'
663     if X is `a', `b', `f', `n', `r', `t', or `v', then the ANSI-C
664     interpretation of `\x'.  Otherwise, a literal `X' (used to escape
665     operators such as `*')
666
667`\0'
668     a NUL character (ASCII code 0)
669
670`\123'
671     the character with octal value 123
672
673`\x2a'
674     the character with hexadecimal value 2a
675
676`(r)'
677     match an `r'; parentheses are used to override precedence (see
678     below)
679
680`(?r-s:pattern)'
681     apply option `r' and omit option `s' while interpreting pattern.
682     Options may be zero or more of the characters `i', `s', or `x'.
683
684     `i' means case-insensitive. `-i' means case-sensitive.
685
686     `s' alters the meaning of the `.' syntax to match any single byte
687     whatsoever.  `-s' alters the meaning of `.' to match any byte
688     except `\n'.
689
690     `x' ignores comments and whitespace in patterns. Whitespace is
691     ignored unless it is backslash-escaped, contained within `""'s, or
692     appears inside a character class.
693
694     The following are all valid:
695
696     (?:foo)         same as  (foo)
697     (?i:ab7)        same as  ([aA][bB]7)
698     (?-i:ab)        same as  (ab)
699     (?s:.)          same as  [\x00-\xFF]
700     (?-s:.)         same as  [^\n]
701     (?ix-s: a . b)  same as  ([Aa][^\n][bB])
702     (?x:a  b)       same as  ("ab")
703     (?x:a\ b)       same as  ("a b")
704     (?x:a" "b)      same as  ("a b")
705     (?x:a[ ]b)      same as  ("a b")
706     (?x:a
707         /* comment */
708         b
709         c)          same as  (abc)
710
711`(?# comment )'
712     omit everything within `()'. The first `)' character encountered
713     ends the pattern. It is not possible to for the comment to contain
714     a `)' character. The comment may span lines.
715
716`rs'
717     the regular expression `r' followed by the regular expression `s';
718     called "concatenation"
719
720`r|s'
721     either an `r' or an `s'
722
723`r/s'
724     an `r' but only if it is followed by an `s'.  The text matched by
725     `s' is included when determining whether this rule is the longest
726     match, but is then returned to the input before the action is
727     executed.  So the action only sees the text matched by `r'.  This
728     type of pattern is called "trailing context".  (There are some
729     combinations of `r/s' that flex cannot match correctly. *Note
730     Limitations::, regarding dangerous trailing context.)
731
732`^r'
733     an `r', but only at the beginning of a line (i.e., when just
734     starting to scan, or right after a newline has been scanned).
735
736`r$'
737     an `r', but only at the end of a line (i.e., just before a
738     newline).  Equivalent to `r/\n'.
739
740     Note that `flex''s notion of "newline" is exactly whatever the C
741     compiler used to compile `flex' interprets `\n' as; in particular,
742     on some DOS systems you must either filter out `\r's in the input
743     yourself, or explicitly use `r/\r\n' for `r$'.
744
745`<s>r'
746     an `r', but only in start condition `s' (see *note Start
747     Conditions:: for discussion of start conditions).
748
749`<s1,s2,s3>r'
750     same, but in any of start conditions `s1', `s2', or `s3'.
751
752`<*>r'
753     an `r' in any start condition, even an exclusive one.
754
755`<<EOF>>'
756     an end-of-file.
757
758`<s1,s2><<EOF>>'
759     an end-of-file when in start condition `s1' or `s2'
760
761   Note that inside of a character class, all regular expression
762operators lose their special meaning except escape (`\') and the
763character class operators, `-', `]]', and, at the beginning of the
764class, `^'.
765
766   The regular expressions listed above are grouped according to
767precedence, from highest precedence at the top to lowest at the bottom.
768Those grouped together have equal precedence (see special note on the
769precedence of the repeat operator, `{}', under the documentation for
770the `--posix' POSIX compliance option).  For example,
771
772         foo|bar*
773
774   is the same as
775
776         (foo)|(ba(r*))
777
778   since the `*' operator has higher precedence than concatenation, and
779concatenation higher than alternation (`|').  This pattern therefore
780matches _either_ the string `foo' _or_ the string `ba' followed by
781zero-or-more `r''s.  To match `foo' or zero-or-more repetitions of the
782string `bar', use:
783
784         foo|(bar)*
785
786   And to match a sequence of zero or more repetitions of `foo' and
787`bar':
788
789         (foo|bar)*
790
791   In addition to characters and ranges of characters, character classes
792can also contain "character class expressions".  These are expressions
793enclosed inside `[': and `:]' delimiters (which themselves must appear
794between the `[' and `]' of the character class. Other elements may
795occur inside the character class, too).  The valid expressions are:
796
797         [:alnum:] [:alpha:] [:blank:]
798         [:cntrl:] [:digit:] [:graph:]
799         [:lower:] [:print:] [:punct:]
800         [:space:] [:upper:] [:xdigit:]
801
802   These expressions all designate a set of characters equivalent to the
803corresponding standard C `isXXX' function.  For example, `[:alnum:]'
804designates those characters for which `isalnum()' returns true - i.e.,
805any alphabetic or numeric character.  Some systems don't provide
806`isblank()', so flex defines `[:blank:]' as a blank or a tab.
807
808   For example, the following character classes are all equivalent:
809
810         [[:alnum:]]
811         [[:alpha:][:digit:]]
812         [[:alpha:][0-9]]
813         [a-zA-Z0-9]
814
815   A word of caution. Character classes are expanded immediately when
816seen in the `flex' input.  This means the character classes are
817sensitive to the locale in which `flex' is executed, and the resulting
818scanner will not be sensitive to the runtime locale.  This may or may
819not be desirable.
820
821   * If your scanner is case-insensitive (the `-i' flag), then
822     `[:upper:]' and `[:lower:]' are equivalent to `[:alpha:]'.
823
824   * Character classes with ranges, such as `[a-Z]', should be used with
825     caution in a case-insensitive scanner if the range spans upper or
826     lowercase characters. Flex does not know if you want to fold all
827     upper and lowercase characters together, or if you want the
828     literal numeric range specified (with no case folding). When in
829     doubt, flex will assume that you meant the literal numeric range,
830     and will issue a warning. The exception to this rule is a
831     character range such as `[a-z]' or `[S-W]' where it is obvious
832     that you want case-folding to occur. Here are some examples with
833     the `-i' flag enabled:
834
835     Range        Result      Literal Range        Alternate Range
836     `[a-t]'      ok          `[a-tA-T]'
837     `[A-T]'      ok          `[a-tA-T]'
838     `[A-t]'      ambiguous   `[A-Z\[\\\]_`a-t]'   `[a-tA-T]'
839     `[_-{]'      ambiguous   `[_`a-z{]'           `[_`a-zA-Z{]'
840     `[@-C]'      ambiguous   `[@ABC]'             `[@A-Z\[\\\]_`abc]'
841
842   * A negated character class such as the example `[^A-Z]' above
843     _will_ match a newline unless `\n' (or an equivalent escape
844     sequence) is one of the characters explicitly present in the
845     negated character class (e.g., `[^A-Z\n]').  This is unlike how
846     many other regular expression tools treat negated character
847     classes, but unfortunately the inconsistency is historically
848     entrenched.  Matching newlines means that a pattern like `[^"]*'
849     can match the entire input unless there's another quote in the
850     input.
851
852     Flex allows negation of character class expressions by prepending
853     `^' to the POSIX character class name.
854
855              [:^alnum:] [:^alpha:] [:^blank:]
856              [:^cntrl:] [:^digit:] [:^graph:]
857              [:^lower:] [:^print:] [:^punct:]
858              [:^space:] [:^upper:] [:^xdigit:]
859
860     Flex will issue a warning if the expressions `[:^upper:]' and
861     `[:^lower:]' appear in a case-insensitive scanner, since their
862     meaning is unclear. The current behavior is to skip them entirely,
863     but this may change without notice in future revisions of flex.
864
865   *  The `{-}' operator computes the difference of two character
866     classes. For example, `[a-c]{-}[b-z]' represents all the
867     characters in the class `[a-c]' that are not in the class `[b-z]'
868     (which in this case, is just the single character `a'). The `{-}'
869     operator is left associative, so `[abc]{-}[b]{-}[c]' is the same
870     as `[a]'. Be careful not to accidentally create an empty set,
871     which will never match.
872
873   *  The `{+}' operator computes the union of two character classes.
874     For example, `[a-z]{+}[0-9]' is the same as `[a-z0-9]'. This
875     operator is useful when preceded by the result of a difference
876     operation, as in, `[[:alpha:]]{-}[[:lower:]]{+}[q]', which is
877     equivalent to `[A-Zq]' in the "C" locale.
878
879   * A rule can have at most one instance of trailing context (the `/'
880     operator or the `$' operator).  The start condition, `^', and
881     `<<EOF>>' patterns can only occur at the beginning of a pattern,
882     and, as well as with `/' and `$', cannot be grouped inside
883     parentheses.  A `^' which does not occur at the beginning of a
884     rule or a `$' which does not occur at the end of a rule loses its
885     special properties and is treated as a normal character.
886
887   * The following are invalid:
888
889              foo/bar$
890              <sc1>foo<sc2>bar
891
892     Note that the first of these can be written `foo/bar\n'.
893
894   * The following will result in `$' or `^' being treated as a normal
895     character:
896
897              foo|(bar$)
898              foo|^bar
899
900     If the desired meaning is a `foo' or a
901     `bar'-followed-by-a-newline, the following could be used (the
902     special `|' action is explained below, *note Actions::):
903
904              foo      |
905              bar$     /* action goes here */
906
907     A similar trick will work for matching a `foo' or a
908     `bar'-at-the-beginning-of-a-line.
909
910
911File: flex.info,  Node: Matching,  Next: Actions,  Prev: Patterns,  Up: Top
912
9137 How the Input Is Matched
914**************************
915
916When the generated scanner is run, it analyzes its input looking for
917strings which match any of its patterns.  If it finds more than one
918match, it takes the one matching the most text (for trailing context
919rules, this includes the length of the trailing part, even though it
920will then be returned to the input).  If it finds two or more matches of
921the same length, the rule listed first in the `flex' input file is
922chosen.
923
924   Once the match is determined, the text corresponding to the match
925(called the "token") is made available in the global character pointer
926`yytext', and its length in the global integer `yyleng'.  The "action"
927corresponding to the matched pattern is then executed (*note
928Actions::), and then the remaining input is scanned for another match.
929
930   If no match is found, then the "default rule" is executed: the next
931character in the input is considered matched and copied to the standard
932output.  Thus, the simplest valid `flex' input is:
933
934         %%
935
936   which generates a scanner that simply copies its input (one
937character at a time) to its output.
938
939   Note that `yytext' can be defined in two different ways: either as a
940character _pointer_ or as a character _array_. You can control which
941definition `flex' uses by including one of the special directives
942`%pointer' or `%array' in the first (definitions) section of your flex
943input.  The default is `%pointer', unless you use the `-l' lex
944compatibility option, in which case `yytext' will be an array.  The
945advantage of using `%pointer' is substantially faster scanning and no
946buffer overflow when matching very large tokens (unless you run out of
947dynamic memory).  The disadvantage is that you are restricted in how
948your actions can modify `yytext' (*note Actions::), and calls to the
949`unput()' function destroys the present contents of `yytext', which can
950be a considerable porting headache when moving between different `lex'
951versions.
952
953   The advantage of `%array' is that you can then modify `yytext' to
954your heart's content, and calls to `unput()' do not destroy `yytext'
955(*note Actions::).  Furthermore, existing `lex' programs sometimes
956access `yytext' externally using declarations of the form:
957
958         extern char yytext[];
959
960   This definition is erroneous when used with `%pointer', but correct
961for `%array'.
962
963   The `%array' declaration defines `yytext' to be an array of `YYLMAX'
964characters, which defaults to a fairly large value.  You can change the
965size by simply #define'ing `YYLMAX' to a different value in the first
966section of your `flex' input.  As mentioned above, with `%pointer'
967yytext grows dynamically to accommodate large tokens.  While this means
968your `%pointer' scanner can accommodate very large tokens (such as
969matching entire blocks of comments), bear in mind that each time the
970scanner must resize `yytext' it also must rescan the entire token from
971the beginning, so matching such tokens can prove slow.  `yytext'
972presently does _not_ dynamically grow if a call to `unput()' results in
973too much text being pushed back; instead, a run-time error results.
974
975   Also note that you cannot use `%array' with C++ scanner classes
976(*note Cxx::).
977
978
979File: flex.info,  Node: Actions,  Next: Generated Scanner,  Prev: Matching,  Up: Top
980
9818 Actions
982*********
983
984Each pattern in a rule has a corresponding "action", which can be any
985arbitrary C statement.  The pattern ends at the first non-escaped
986whitespace character; the remainder of the line is its action.  If the
987action is empty, then when the pattern is matched the input token is
988simply discarded.  For example, here is the specification for a program
989which deletes all occurrences of `zap me' from its input:
990
991         %%
992         "zap me"
993
994   This example will copy all other characters in the input to the
995output since they will be matched by the default rule.
996
997   Here is a program which compresses multiple blanks and tabs down to a
998single blank, and throws away whitespace found at the end of a line:
999
1000         %%
1001         [ \t]+        putchar( ' ' );
1002         [ \t]+$       /* ignore this token */
1003
1004   If the action contains a `{', then the action spans till the
1005balancing `}' is found, and the action may cross multiple lines.
1006`flex' knows about C strings and comments and won't be fooled by braces
1007found within them, but also allows actions to begin with `%{' and will
1008consider the action to be all the text up to the next `%}' (regardless
1009of ordinary braces inside the action).
1010
1011   An action consisting solely of a vertical bar (`|') means "same as
1012the action for the next rule".  See below for an illustration.
1013
1014   Actions can include arbitrary C code, including `return' statements
1015to return a value to whatever routine called `yylex()'.  Each time
1016`yylex()' is called it continues processing tokens from where it last
1017left off until it either reaches the end of the file or executes a
1018return.
1019
1020   Actions are free to modify `yytext' except for lengthening it
1021(adding characters to its end-these will overwrite later characters in
1022the input stream).  This however does not apply when using `%array'
1023(*note Matching::). In that case, `yytext' may be freely modified in
1024any way.
1025
1026   Actions are free to modify `yyleng' except they should not do so if
1027the action also includes use of `yymore()' (see below).
1028
1029   There are a number of special directives which can be included
1030within an action:
1031
1032`ECHO'
1033     copies yytext to the scanner's output.
1034
1035`BEGIN'
1036     followed by the name of a start condition places the scanner in the
1037     corresponding start condition (see below).
1038
1039`REJECT'
1040     directs the scanner to proceed on to the "second best" rule which
1041     matched the input (or a prefix of the input).  The rule is chosen
1042     as described above in *note Matching::, and `yytext' and `yyleng'
1043     set up appropriately.  It may either be one which matched as much
1044     text as the originally chosen rule but came later in the `flex'
1045     input file, or one which matched less text.  For example, the
1046     following will both count the words in the input and call the
1047     routine `special()' whenever `frob' is seen:
1048
1049                      int word_count = 0;
1050              %%
1051
1052              frob        special(); REJECT;
1053              [^ \t\n]+   ++word_count;
1054
1055     Without the `REJECT', any occurrences of `frob' in the input would
1056     not be counted as words, since the scanner normally executes only
1057     one action per token.  Multiple uses of `REJECT' are allowed, each
1058     one finding the next best choice to the currently active rule.  For
1059     example, when the following scanner scans the token `abcd', it will
1060     write `abcdabcaba' to the output:
1061
1062              %%
1063              a        |
1064              ab       |
1065              abc      |
1066              abcd     ECHO; REJECT;
1067              .|\n     /* eat up any unmatched character */
1068
1069     The first three rules share the fourth's action since they use the
1070     special `|' action.
1071
1072     `REJECT' is a particularly expensive feature in terms of scanner
1073     performance; if it is used in _any_ of the scanner's actions it
1074     will slow down _all_ of the scanner's matching.  Furthermore,
1075     `REJECT' cannot be used with the `-Cf' or `-CF' options (*note
1076     Scanner Options::).
1077
1078     Note also that unlike the other special actions, `REJECT' is a
1079     _branch_.  Code immediately following it in the action will _not_
1080     be executed.
1081
1082`yymore()'
1083     tells the scanner that the next time it matches a rule, the
1084     corresponding token should be _appended_ onto the current value of
1085     `yytext' rather than replacing it.  For example, given the input
1086     `mega-kludge' the following will write `mega-mega-kludge' to the
1087     output:
1088
1089              %%
1090              mega-    ECHO; yymore();
1091              kludge   ECHO;
1092
1093     First `mega-' is matched and echoed to the output.  Then `kludge'
1094     is matched, but the previous `mega-' is still hanging around at the
1095     beginning of `yytext' so the `ECHO' for the `kludge' rule will
1096     actually write `mega-kludge'.
1097
1098   Two notes regarding use of `yymore()'.  First, `yymore()' depends on
1099the value of `yyleng' correctly reflecting the size of the current
1100token, so you must not modify `yyleng' if you are using `yymore()'.
1101Second, the presence of `yymore()' in the scanner's action entails a
1102minor performance penalty in the scanner's matching speed.
1103
1104   `yyless(n)' returns all but the first `n' characters of the current
1105token back to the input stream, where they will be rescanned when the
1106scanner looks for the next match.  `yytext' and `yyleng' are adjusted
1107appropriately (e.g., `yyleng' will now be equal to `n').  For example,
1108on the input `foobar' the following will write out `foobarbar':
1109
1110         %%
1111         foobar    ECHO; yyless(3);
1112         [a-z]+    ECHO;
1113
1114   An argument of 0 to `yyless()' will cause the entire current input
1115string to be scanned again.  Unless you've changed how the scanner will
1116subsequently process its input (using `BEGIN', for example), this will
1117result in an endless loop.
1118
1119   Note that `yyless()' is a macro and can only be used in the flex
1120input file, not from other source files.
1121
1122   `unput(c)' puts the character `c' back onto the input stream.  It
1123will be the next character scanned.  The following action will take the
1124current token and cause it to be rescanned enclosed in parentheses.
1125
1126         {
1127         int i;
1128         /* Copy yytext because unput() trashes yytext */
1129         char *yycopy = strdup( yytext );
1130         unput( ')' );
1131         for ( i = yyleng - 1; i >= 0; --i )
1132             unput( yycopy[i] );
1133         unput( '(' );
1134         free( yycopy );
1135         }
1136
1137   Note that since each `unput()' puts the given character back at the
1138_beginning_ of the input stream, pushing back strings must be done
1139back-to-front.
1140
1141   An important potential problem when using `unput()' is that if you
1142are using `%pointer' (the default), a call to `unput()' _destroys_ the
1143contents of `yytext', starting with its rightmost character and
1144devouring one character to the left with each call.  If you need the
1145value of `yytext' preserved after a call to `unput()' (as in the above
1146example), you must either first copy it elsewhere, or build your
1147scanner using `%array' instead (*note Matching::).
1148
1149   Finally, note that you cannot put back `EOF' to attempt to mark the
1150input stream with an end-of-file.
1151
1152   `input()' reads the next character from the input stream.  For
1153example, the following is one way to eat up C comments:
1154
1155         %%
1156         "/*"        {
1157                     register int c;
1158
1159                     for ( ; ; )
1160                         {
1161                         while ( (c = input()) != '*' &&
1162                                 c != EOF )
1163                             ;    /* eat up text of comment */
1164
1165                         if ( c == '*' )
1166                             {
1167                             while ( (c = input()) == '*' )
1168                                 ;
1169                             if ( c == '/' )
1170                                 break;    /* found the end */
1171                             }
1172
1173                         if ( c == EOF )
1174                             {
1175                             error( "EOF in comment" );
1176                             break;
1177                             }
1178                         }
1179                     }
1180
1181   (Note that if the scanner is compiled using `C++', then `input()' is
1182instead referred to as yyinput(), in order to avoid a name clash with
1183the `C++' stream by the name of `input'.)
1184
1185   `YY_FLUSH_BUFFER;' flushes the scanner's internal buffer so that the
1186next time the scanner attempts to match a token, it will first refill
1187the buffer using `YY_INPUT()' (*note Generated Scanner::).  This action
1188is a special case of the more general `yy_flush_buffer;' function,
1189described below (*note Multiple Input Buffers::)
1190
1191   `yyterminate()' can be used in lieu of a return statement in an
1192action.  It terminates the scanner and returns a 0 to the scanner's
1193caller, indicating "all done".  By default, `yyterminate()' is also
1194called when an end-of-file is encountered.  It is a macro and may be
1195redefined.
1196
1197
1198File: flex.info,  Node: Generated Scanner,  Next: Start Conditions,  Prev: Actions,  Up: Top
1199
12009 The Generated Scanner
1201***********************
1202
1203The output of `flex' is the file `lex.yy.c', which contains the
1204scanning routine `yylex()', a number of tables used by it for matching
1205tokens, and a number of auxiliary routines and macros.  By default,
1206`yylex()' is declared as follows:
1207
1208         int yylex()
1209             {
1210             ... various definitions and the actions in here ...
1211             }
1212
1213   (If your environment supports function prototypes, then it will be
1214`int yylex( void )'.)  This definition may be changed by defining the
1215`YY_DECL' macro.  For example, you could use:
1216
1217         #define YY_DECL float lexscan( a, b ) float a, b;
1218
1219   to give the scanning routine the name `lexscan', returning a float,
1220and taking two floats as arguments.  Note that if you give arguments to
1221the scanning routine using a K&R-style/non-prototyped function
1222declaration, you must terminate the definition with a semi-colon (;).
1223
1224   `flex' generates `C99' function definitions by default. However flex
1225does have the ability to generate obsolete, er, `traditional', function
1226definitions. This is to support bootstrapping gcc on old systems.
1227Unfortunately, traditional definitions prevent us from using any
1228standard data types smaller than int (such as short, char, or bool) as
1229function arguments.  For this reason, future versions of `flex' may
1230generate standard C99 code only, leaving K&R-style functions to the
1231historians.  Currently, if you do *not* want `C99' definitions, then
1232you must use `%option noansi-definitions'.
1233
1234   Whenever `yylex()' is called, it scans tokens from the global input
1235file `yyin' (which defaults to stdin).  It continues until it either
1236reaches an end-of-file (at which point it returns the value 0) or one
1237of its actions executes a `return' statement.
1238
1239   If the scanner reaches an end-of-file, subsequent calls are undefined
1240unless either `yyin' is pointed at a new input file (in which case
1241scanning continues from that file), or `yyrestart()' is called.
1242`yyrestart()' takes one argument, a `FILE *' pointer (which can be
1243NULL, if you've set up `YY_INPUT' to scan from a source other than
1244`yyin'), and initializes `yyin' for scanning from that file.
1245Essentially there is no difference between just assigning `yyin' to a
1246new input file or using `yyrestart()' to do so; the latter is available
1247for compatibility with previous versions of `flex', and because it can
1248be used to switch input files in the middle of scanning.  It can also
1249be used to throw away the current input buffer, by calling it with an
1250argument of `yyin'; but it would be better to use `YY_FLUSH_BUFFER'
1251(*note Actions::).  Note that `yyrestart()' does _not_ reset the start
1252condition to `INITIAL' (*note Start Conditions::).
1253
1254   If `yylex()' stops scanning due to executing a `return' statement in
1255one of the actions, the scanner may then be called again and it will
1256resume scanning where it left off.
1257
1258   By default (and for purposes of efficiency), the scanner uses
1259block-reads rather than simple `getc()' calls to read characters from
1260`yyin'.  The nature of how it gets its input can be controlled by
1261defining the `YY_INPUT' macro.  The calling sequence for `YY_INPUT()'
1262is `YY_INPUT(buf,result,max_size)'.  Its action is to place up to
1263`max_size' characters in the character array `buf' and return in the
1264integer variable `result' either the number of characters read or the
1265constant `YY_NULL' (0 on Unix systems) to indicate `EOF'.  The default
1266`YY_INPUT' reads from the global file-pointer `yyin'.
1267
1268   Here is a sample definition of `YY_INPUT' (in the definitions
1269section of the input file):
1270
1271         %{
1272         #define YY_INPUT(buf,result,max_size) \
1273             { \
1274             int c = getchar(); \
1275             result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \
1276             }
1277         %}
1278
1279   This definition will change the input processing to occur one
1280character at a time.
1281
1282   When the scanner receives an end-of-file indication from YY_INPUT, it
1283then checks the `yywrap()' function.  If `yywrap()' returns false
1284(zero), then it is assumed that the function has gone ahead and set up
1285`yyin' to point to another input file, and scanning continues.  If it
1286returns true (non-zero), then the scanner terminates, returning 0 to
1287its caller.  Note that in either case, the start condition remains
1288unchanged; it does _not_ revert to `INITIAL'.
1289
1290   If you do not supply your own version of `yywrap()', then you must
1291either use `%option noyywrap' (in which case the scanner behaves as
1292though `yywrap()' returned 1), or you must link with `-lfl' to obtain
1293the default version of the routine, which always returns 1.
1294
1295   For scanning from in-memory buffers (e.g., scanning strings), see
1296*note Scanning Strings::. *Note Multiple Input Buffers::.
1297
1298   The scanner writes its `ECHO' output to the `yyout' global (default,
1299`stdout'), which may be redefined by the user simply by assigning it to
1300some other `FILE' pointer.
1301
1302
1303File: flex.info,  Node: Start Conditions,  Next: Multiple Input Buffers,  Prev: Generated Scanner,  Up: Top
1304
130510 Start Conditions
1306*******************
1307
1308`flex' provides a mechanism for conditionally activating rules.  Any
1309rule whose pattern is prefixed with `<sc>' will only be active when the
1310scanner is in the "start condition" named `sc'.  For example,
1311
1312         <STRING>[^"]*        { /* eat up the string body ... */
1313                     ...
1314                     }
1315
1316   will be active only when the scanner is in the `STRING' start
1317condition, and
1318
1319         <INITIAL,STRING,QUOTE>\.        { /* handle an escape ... */
1320                     ...
1321                     }
1322
1323   will be active only when the current start condition is either
1324`INITIAL', `STRING', or `QUOTE'.
1325
1326   Start conditions are declared in the definitions (first) section of
1327the input using unindented lines beginning with either `%s' or `%x'
1328followed by a list of names.  The former declares "inclusive" start
1329conditions, the latter "exclusive" start conditions.  A start condition
1330is activated using the `BEGIN' action.  Until the next `BEGIN' action
1331is executed, rules with the given start condition will be active and
1332rules with other start conditions will be inactive.  If the start
1333condition is inclusive, then rules with no start conditions at all will
1334also be active.  If it is exclusive, then _only_ rules qualified with
1335the start condition will be active.  A set of rules contingent on the
1336same exclusive start condition describe a scanner which is independent
1337of any of the other rules in the `flex' input.  Because of this,
1338exclusive start conditions make it easy to specify "mini-scanners"
1339which scan portions of the input that are syntactically different from
1340the rest (e.g., comments).
1341
1342   If the distinction between inclusive and exclusive start conditions
1343is still a little vague, here's a simple example illustrating the
1344connection between the two.  The set of rules:
1345
1346         %s example
1347         %%
1348
1349         <example>foo   do_something();
1350
1351         bar            something_else();
1352
1353   is equivalent to
1354
1355         %x example
1356         %%
1357
1358         <example>foo   do_something();
1359
1360         <INITIAL,example>bar    something_else();
1361
1362   Without the `<INITIAL,example>' qualifier, the `bar' pattern in the
1363second example wouldn't be active (i.e., couldn't match) when in start
1364condition `example'.  If we just used `<example>' to qualify `bar',
1365though, then it would only be active in `example' and not in `INITIAL',
1366while in the first example it's active in both, because in the first
1367example the `example' start condition is an inclusive `(%s)' start
1368condition.
1369
1370   Also note that the special start-condition specifier `<*>' matches
1371every start condition.  Thus, the above example could also have been
1372written:
1373
1374         %x example
1375         %%
1376
1377         <example>foo   do_something();
1378
1379         <*>bar    something_else();
1380
1381   The default rule (to `ECHO' any unmatched character) remains active
1382in start conditions.  It is equivalent to:
1383
1384         <*>.|\n     ECHO;
1385
1386   `BEGIN(0)' returns to the original state where only the rules with
1387no start conditions are active.  This state can also be referred to as
1388the start-condition `INITIAL', so `BEGIN(INITIAL)' is equivalent to
1389`BEGIN(0)'.  (The parentheses around the start condition name are not
1390required but are considered good style.)
1391
1392   `BEGIN' actions can also be given as indented code at the beginning
1393of the rules section.  For example, the following will cause the scanner
1394to enter the `SPECIAL' start condition whenever `yylex()' is called and
1395the global variable `enter_special' is true:
1396
1397                 int enter_special;
1398
1399         %x SPECIAL
1400         %%
1401                 if ( enter_special )
1402                     BEGIN(SPECIAL);
1403
1404         <SPECIAL>blahblahblah
1405         ...more rules follow...
1406
1407   To illustrate the uses of start conditions, here is a scanner which
1408provides two different interpretations of a string like `123.456'.  By
1409default it will treat it as three tokens, the integer `123', a dot
1410(`.'), and the integer `456'.  But if the string is preceded earlier in
1411the line by the string `expect-floats' it will treat it as a single
1412token, the floating-point number `123.456':
1413
1414         %{
1415         #include <math.h>
1416         %}
1417         %s expect
1418
1419         %%
1420         expect-floats        BEGIN(expect);
1421
1422         <expect>[0-9]+.[0-9]+      {
1423                     printf( "found a float, = %f\n",
1424                             atof( yytext ) );
1425                     }
1426         <expect>\n           {
1427                     /* that's the end of the line, so
1428                      * we need another "expect-number"
1429                      * before we'll recognize any more
1430                      * numbers
1431                      */
1432                     BEGIN(INITIAL);
1433                     }
1434
1435         [0-9]+      {
1436                     printf( "found an integer, = %d\n",
1437                             atoi( yytext ) );
1438                     }
1439
1440         "."         printf( "found a dot\n" );
1441
1442   Here is a scanner which recognizes (and discards) C comments while
1443maintaining a count of the current input line.
1444
1445         %x comment
1446         %%
1447                 int line_num = 1;
1448
1449         "/*"         BEGIN(comment);
1450
1451         <comment>[^*\n]*        /* eat anything that's not a '*' */
1452         <comment>"*"+[^*/\n]*   /* eat up '*'s not followed by '/'s */
1453         <comment>\n             ++line_num;
1454         <comment>"*"+"/"        BEGIN(INITIAL);
1455
1456   This scanner goes to a bit of trouble to match as much text as
1457possible with each rule.  In general, when attempting to write a
1458high-speed scanner try to match as much possible in each rule, as it's
1459a big win.
1460
1461   Note that start-conditions names are really integer values and can
1462be stored as such.  Thus, the above could be extended in the following
1463fashion:
1464
1465         %x comment foo
1466         %%
1467                 int line_num = 1;
1468                 int comment_caller;
1469
1470         "/*"         {
1471                      comment_caller = INITIAL;
1472                      BEGIN(comment);
1473                      }
1474
1475         ...
1476
1477         <foo>"/*"    {
1478                      comment_caller = foo;
1479                      BEGIN(comment);
1480                      }
1481
1482         <comment>[^*\n]*        /* eat anything that's not a '*' */
1483         <comment>"*"+[^*/\n]*   /* eat up '*'s not followed by '/'s */
1484         <comment>\n             ++line_num;
1485         <comment>"*"+"/"        BEGIN(comment_caller);
1486
1487   Furthermore, you can access the current start condition using the
1488integer-valued `YY_START' macro.  For example, the above assignments to
1489`comment_caller' could instead be written
1490
1491         comment_caller = YY_START;
1492
1493   Flex provides `YYSTATE' as an alias for `YY_START' (since that is
1494what's used by AT&T `lex').
1495
1496   For historical reasons, start conditions do not have their own
1497name-space within the generated scanner. The start condition names are
1498unmodified in the generated scanner and generated header.  *Note
1499option-header::. *Note option-prefix::.
1500
1501   Finally, here's an example of how to match C-style quoted strings
1502using exclusive start conditions, including expanded escape sequences
1503(but not including checking for a string that's too long):
1504
1505         %x str
1506
1507         %%
1508                 char string_buf[MAX_STR_CONST];
1509                 char *string_buf_ptr;
1510
1511
1512         \"      string_buf_ptr = string_buf; BEGIN(str);
1513
1514         <str>\"        { /* saw closing quote - all done */
1515                 BEGIN(INITIAL);
1516                 *string_buf_ptr = '\0';
1517                 /* return string constant token type and
1518                  * value to parser
1519                  */
1520                 }
1521
1522         <str>\n        {
1523                 /* error - unterminated string constant */
1524                 /* generate error message */
1525                 }
1526
1527         <str>\\[0-7]{1,3} {
1528                 /* octal escape sequence */
1529                 int result;
1530
1531                 (void) sscanf( yytext + 1, "%o", &result );
1532
1533                 if ( result > 0xff )
1534                         /* error, constant is out-of-bounds */
1535
1536                 *string_buf_ptr++ = result;
1537                 }
1538
1539         <str>\\[0-9]+ {
1540                 /* generate error - bad escape sequence; something
1541                  * like '\48' or '\0777777'
1542                  */
1543                 }
1544
1545         <str>\\n  *string_buf_ptr++ = '\n';
1546         <str>\\t  *string_buf_ptr++ = '\t';
1547         <str>\\r  *string_buf_ptr++ = '\r';
1548         <str>\\b  *string_buf_ptr++ = '\b';
1549         <str>\\f  *string_buf_ptr++ = '\f';
1550
1551         <str>\\(.|\n)  *string_buf_ptr++ = yytext[1];
1552
1553         <str>[^\\\n\"]+        {
1554                 char *yptr = yytext;
1555
1556                 while ( *yptr )
1557                         *string_buf_ptr++ = *yptr++;
1558                 }
1559
1560   Often, such as in some of the examples above, you wind up writing a
1561whole bunch of rules all preceded by the same start condition(s).  Flex
1562makes this a little easier and cleaner by introducing a notion of start
1563condition "scope".  A start condition scope is begun with:
1564
1565         <SCs>{
1566
1567   where `SCs' is a list of one or more start conditions.  Inside the
1568start condition scope, every rule automatically has the prefix `SCs>'
1569applied to it, until a `}' which matches the initial `{'.  So, for
1570example,
1571
1572         <ESC>{
1573             "\\n"   return '\n';
1574             "\\r"   return '\r';
1575             "\\f"   return '\f';
1576             "\\0"   return '\0';
1577         }
1578
1579   is equivalent to:
1580
1581         <ESC>"\\n"  return '\n';
1582         <ESC>"\\r"  return '\r';
1583         <ESC>"\\f"  return '\f';
1584         <ESC>"\\0"  return '\0';
1585
1586   Start condition scopes may be nested.
1587
1588   The following routines are available for manipulating stacks of
1589start conditions:
1590
1591 -- Function: void yy_push_state ( int `new_state' )
1592     pushes the current start condition onto the top of the start
1593     condition stack and switches to `new_state' as though you had used
1594     `BEGIN new_state' (recall that start condition names are also
1595     integers).
1596
1597 -- Function: void yy_pop_state ()
1598     pops the top of the stack and switches to it via `BEGIN'.
1599
1600 -- Function: int yy_top_state ()
1601     returns the top of the stack without altering the stack's contents.
1602
1603   The start condition stack grows dynamically and so has no built-in
1604size limitation.  If memory is exhausted, program execution aborts.
1605
1606   To use start condition stacks, your scanner must include a `%option
1607stack' directive (*note Scanner Options::).
1608
1609
1610File: flex.info,  Node: Multiple Input Buffers,  Next: EOF,  Prev: Start Conditions,  Up: Top
1611
161211 Multiple Input Buffers
1613*************************
1614
1615Some scanners (such as those which support "include" files) require
1616reading from several input streams.  As `flex' scanners do a large
1617amount of buffering, one cannot control where the next input will be
1618read from by simply writing a `YY_INPUT()' which is sensitive to the
1619scanning context.  `YY_INPUT()' is only called when the scanner reaches
1620the end of its buffer, which may be a long time after scanning a
1621statement such as an `include' statement which requires switching the
1622input source.
1623
1624   To negotiate these sorts of problems, `flex' provides a mechanism
1625for creating and switching between multiple input buffers.  An input
1626buffer is created by using:
1627
1628 -- Function: YY_BUFFER_STATE yy_create_buffer ( FILE *file, int size )
1629
1630   which takes a `FILE' pointer and a size and creates a buffer
1631associated with the given file and large enough to hold `size'
1632characters (when in doubt, use `YY_BUF_SIZE' for the size).  It returns
1633a `YY_BUFFER_STATE' handle, which may then be passed to other routines
1634(see below).  The `YY_BUFFER_STATE' type is a pointer to an opaque
1635`struct yy_buffer_state' structure, so you may safely initialize
1636`YY_BUFFER_STATE' variables to `((YY_BUFFER_STATE) 0)' if you wish, and
1637also refer to the opaque structure in order to correctly declare input
1638buffers in source files other than that of your scanner.  Note that the
1639`FILE' pointer in the call to `yy_create_buffer' is only used as the
1640value of `yyin' seen by `YY_INPUT'.  If you redefine `YY_INPUT()' so it
1641no longer uses `yyin', then you can safely pass a NULL `FILE' pointer to
1642`yy_create_buffer'.  You select a particular buffer to scan from using:
1643
1644 -- Function: void yy_switch_to_buffer ( YY_BUFFER_STATE new_buffer )
1645
1646   The above function switches the scanner's input buffer so subsequent
1647tokens will come from `new_buffer'.  Note that `yy_switch_to_buffer()'
1648may be used by `yywrap()' to set things up for continued scanning,
1649instead of opening a new file and pointing `yyin' at it. If you are
1650looking for a stack of input buffers, then you want to use
1651`yypush_buffer_state()' instead of this function. Note also that
1652switching input sources via either `yy_switch_to_buffer()' or
1653`yywrap()' does _not_ change the start condition.
1654
1655 -- Function: void yy_delete_buffer ( YY_BUFFER_STATE buffer )
1656
1657   is used to reclaim the storage associated with a buffer.  (`buffer'
1658can be NULL, in which case the routine does nothing.)  You can also
1659clear the current contents of a buffer using:
1660
1661 -- Function: void yypush_buffer_state ( YY_BUFFER_STATE buffer )
1662
1663   This function pushes the new buffer state onto an internal stack.
1664The pushed state becomes the new current state. The stack is maintained
1665by flex and will grow as required. This function is intended to be used
1666instead of `yy_switch_to_buffer', when you want to change states, but
1667preserve the current state for later use.
1668
1669 -- Function: void yypop_buffer_state ( )
1670
1671   This function removes the current state from the top of the stack,
1672and deletes it by calling `yy_delete_buffer'.  The next state on the
1673stack, if any, becomes the new current state.
1674
1675 -- Function: void yy_flush_buffer ( YY_BUFFER_STATE buffer )
1676
1677   This function discards the buffer's contents, so the next time the
1678scanner attempts to match a token from the buffer, it will first fill
1679the buffer anew using `YY_INPUT()'.
1680
1681 -- Function: YY_BUFFER_STATE yy_new_buffer ( FILE *file, int size )
1682
1683   is an alias for `yy_create_buffer()', provided for compatibility
1684with the C++ use of `new' and `delete' for creating and destroying
1685dynamic objects.
1686
1687   `YY_CURRENT_BUFFER' macro returns a `YY_BUFFER_STATE' handle to the
1688current buffer. It should not be used as an lvalue.
1689
1690   Here are two examples of using these features for writing a scanner
1691which expands include files (the `<<EOF>>' feature is discussed below).
1692
1693   This first example uses yypush_buffer_state and yypop_buffer_state.
1694Flex maintains the stack internally.
1695
1696         /* the "incl" state is used for picking up the name
1697          * of an include file
1698          */
1699         %x incl
1700         %%
1701         include             BEGIN(incl);
1702
1703         [a-z]+              ECHO;
1704         [^a-z\n]*\n?        ECHO;
1705
1706         <incl>[ \t]*      /* eat the whitespace */
1707         <incl>[^ \t\n]+   { /* got the include file name */
1708                 yyin = fopen( yytext, "r" );
1709
1710                 if ( ! yyin )
1711                     error( ... );
1712
1713     			yypush_buffer_state(yy_create_buffer( yyin, YY_BUF_SIZE ));
1714
1715                 BEGIN(INITIAL);
1716                 }
1717
1718         <<EOF>> {
1719     			yypop_buffer_state();
1720
1721                 if ( !YY_CURRENT_BUFFER )
1722                     {
1723                     yyterminate();
1724                     }
1725                 }
1726
1727   The second example, below, does the same thing as the previous
1728example did, but manages its own input buffer stack manually (instead
1729of letting flex do it).
1730
1731         /* the "incl" state is used for picking up the name
1732          * of an include file
1733          */
1734         %x incl
1735
1736         %{
1737         #define MAX_INCLUDE_DEPTH 10
1738         YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH];
1739         int include_stack_ptr = 0;
1740         %}
1741
1742         %%
1743         include             BEGIN(incl);
1744
1745         [a-z]+              ECHO;
1746         [^a-z\n]*\n?        ECHO;
1747
1748         <incl>[ \t]*      /* eat the whitespace */
1749         <incl>[^ \t\n]+   { /* got the include file name */
1750                 if ( include_stack_ptr >= MAX_INCLUDE_DEPTH )
1751                     {
1752                     fprintf( stderr, "Includes nested too deeply" );
1753                     exit( 1 );
1754                     }
1755
1756                 include_stack[include_stack_ptr++] =
1757                     YY_CURRENT_BUFFER;
1758
1759                 yyin = fopen( yytext, "r" );
1760
1761                 if ( ! yyin )
1762                     error( ... );
1763
1764                 yy_switch_to_buffer(
1765                     yy_create_buffer( yyin, YY_BUF_SIZE ) );
1766
1767                 BEGIN(INITIAL);
1768                 }
1769
1770         <<EOF>> {
1771                 if ( --include_stack_ptr  0 )
1772                     {
1773                     yyterminate();
1774                     }
1775
1776                 else
1777                     {
1778                     yy_delete_buffer( YY_CURRENT_BUFFER );
1779                     yy_switch_to_buffer(
1780                          include_stack[include_stack_ptr] );
1781                     }
1782                 }
1783
1784   The following routines are available for setting up input buffers for
1785scanning in-memory strings instead of files.  All of them create a new
1786input buffer for scanning the string, and return a corresponding
1787`YY_BUFFER_STATE' handle (which you should delete with
1788`yy_delete_buffer()' when done with it).  They also switch to the new
1789buffer using `yy_switch_to_buffer()', so the next call to `yylex()'
1790will start scanning the string.
1791
1792 -- Function: YY_BUFFER_STATE yy_scan_string ( const char *str )
1793     scans a NUL-terminated string.
1794
1795 -- Function: YY_BUFFER_STATE yy_scan_bytes ( const char *bytes, int
1796          len )
1797     scans `len' bytes (including possibly `NUL's) starting at location
1798     `bytes'.
1799
1800   Note that both of these functions create and scan a _copy_ of the
1801string or bytes.  (This may be desirable, since `yylex()' modifies the
1802contents of the buffer it is scanning.)  You can avoid the copy by
1803using:
1804
1805 -- Function: YY_BUFFER_STATE yy_scan_buffer (char *base, yy_size_t
1806          size)
1807     which scans in place the buffer starting at `base', consisting of
1808     `size' bytes, the last two bytes of which _must_ be
1809     `YY_END_OF_BUFFER_CHAR' (ASCII NUL).  These last two bytes are not
1810     scanned; thus, scanning consists of `base[0]' through
1811     `base[size-2]', inclusive.
1812
1813   If you fail to set up `base' in this manner (i.e., forget the final
1814two `YY_END_OF_BUFFER_CHAR' bytes), then `yy_scan_buffer()' returns a
1815NULL pointer instead of creating a new input buffer.
1816
1817 -- Data type: yy_size_t
1818     is an integral type to which you can cast an integer expression
1819     reflecting the size of the buffer.
1820
1821
1822File: flex.info,  Node: EOF,  Next: Misc Macros,  Prev: Multiple Input Buffers,  Up: Top
1823
182412 End-of-File Rules
1825********************
1826
1827The special rule `<<EOF>>' indicates actions which are to be taken when
1828an end-of-file is encountered and `yywrap()' returns non-zero (i.e.,
1829indicates no further files to process).  The action must finish by
1830doing one of the following things:
1831
1832   * assigning `yyin' to a new input file (in previous versions of
1833     `flex', after doing the assignment you had to call the special
1834     action `YY_NEW_FILE'.  This is no longer necessary.)
1835
1836   * executing a `return' statement;
1837
1838   * executing the special `yyterminate()' action.
1839
1840   * or, switching to a new buffer using `yy_switch_to_buffer()' as
1841     shown in the example above.
1842
1843   <<EOF>> rules may not be used with other patterns; they may only be
1844qualified with a list of start conditions.  If an unqualified <<EOF>>
1845rule is given, it applies to _all_ start conditions which do not
1846already have <<EOF>> actions.  To specify an <<EOF>> rule for only the
1847initial start condition, use:
1848
1849         <INITIAL><<EOF>>
1850
1851   These rules are useful for catching things like unclosed comments.
1852An example:
1853
1854         %x quote
1855         %%
1856
1857         ...other rules for dealing with quotes...
1858
1859         <quote><<EOF>>   {
1860                  error( "unterminated quote" );
1861                  yyterminate();
1862                  }
1863        <<EOF>>  {
1864                  if ( *++filelist )
1865                      yyin = fopen( *filelist, "r" );
1866                  else
1867                     yyterminate();
1868                  }
1869
1870
1871File: flex.info,  Node: Misc Macros,  Next: User Values,  Prev: EOF,  Up: Top
1872
187313 Miscellaneous Macros
1874***********************
1875
1876The macro `YY_USER_ACTION' can be defined to provide an action which is
1877always executed prior to the matched rule's action.  For example, it
1878could be #define'd to call a routine to convert yytext to lower-case.
1879When `YY_USER_ACTION' is invoked, the variable `yy_act' gives the
1880number of the matched rule (rules are numbered starting with 1).
1881Suppose you want to profile how often each of your rules is matched.
1882The following would do the trick:
1883
1884         #define YY_USER_ACTION ++ctr[yy_act]
1885
1886   where `ctr' is an array to hold the counts for the different rules.
1887Note that the macro `YY_NUM_RULES' gives the total number of rules
1888(including the default rule), even if you use `-s)', so a correct
1889declaration for `ctr' is:
1890
1891         int ctr[YY_NUM_RULES];
1892
1893   The macro `YY_USER_INIT' may be defined to provide an action which
1894is always executed before the first scan (and before the scanner's
1895internal initializations are done).  For example, it could be used to
1896call a routine to read in a data table or open a logging file.
1897
1898   The macro `yy_set_interactive(is_interactive)' can be used to
1899control whether the current buffer is considered "interactive".  An
1900interactive buffer is processed more slowly, but must be used when the
1901scanner's input source is indeed interactive to avoid problems due to
1902waiting to fill buffers (see the discussion of the `-I' flag in *note
1903Scanner Options::).  A non-zero value in the macro invocation marks the
1904buffer as interactive, a zero value as non-interactive.  Note that use
1905of this macro overrides `%option always-interactive' or `%option
1906never-interactive' (*note Scanner Options::).  `yy_set_interactive()'
1907must be invoked prior to beginning to scan the buffer that is (or is
1908not) to be considered interactive.
1909
1910   The macro `yy_set_bol(at_bol)' can be used to control whether the
1911current buffer's scanning context for the next token match is done as
1912though at the beginning of a line.  A non-zero macro argument makes
1913rules anchored with `^' active, while a zero argument makes `^' rules
1914inactive.
1915
1916   The macro `YY_AT_BOL()' returns true if the next token scanned from
1917the current buffer will have `^' rules active, false otherwise.
1918
1919   In the generated scanner, the actions are all gathered in one large
1920switch statement and separated using `YY_BREAK', which may be
1921redefined.  By default, it is simply a `break', to separate each rule's
1922action from the following rule's.  Redefining `YY_BREAK' allows, for
1923example, C++ users to #define YY_BREAK to do nothing (while being very
1924careful that every rule ends with a `break' or a `return'!) to avoid
1925suffering from unreachable statement warnings where because a rule's
1926action ends with `return', the `YY_BREAK' is inaccessible.
1927
1928
1929File: flex.info,  Node: User Values,  Next: Yacc,  Prev: Misc Macros,  Up: Top
1930
193114 Values Available To the User
1932*******************************
1933
1934This chapter summarizes the various values available to the user in the
1935rule actions.
1936
1937`char *yytext'
1938     holds the text of the current token.  It may be modified but not
1939     lengthened (you cannot append characters to the end).
1940
1941     If the special directive `%array' appears in the first section of
1942     the scanner description, then `yytext' is instead declared `char
1943     yytext[YYLMAX]', where `YYLMAX' is a macro definition that you can
1944     redefine in the first section if you don't like the default value
1945     (generally 8KB).  Using `%array' results in somewhat slower
1946     scanners, but the value of `yytext' becomes immune to calls to
1947     `unput()', which potentially destroy its value when `yytext' is a
1948     character pointer.  The opposite of `%array' is `%pointer', which
1949     is the default.
1950
1951     You cannot use `%array' when generating C++ scanner classes (the
1952     `-+' flag).
1953
1954`int yyleng'
1955     holds the length of the current token.
1956
1957`FILE *yyin'
1958     is the file which by default `flex' reads from.  It may be
1959     redefined but doing so only makes sense before scanning begins or
1960     after an EOF has been encountered.  Changing it in the midst of
1961     scanning will have unexpected results since `flex' buffers its
1962     input; use `yyrestart()' instead.  Once scanning terminates
1963     because an end-of-file has been seen, you can assign `yyin' at the
1964     new input file and then call the scanner again to continue
1965     scanning.
1966
1967`void yyrestart( FILE *new_file )'
1968     may be called to point `yyin' at the new input file.  The
1969     switch-over to the new file is immediate (any previously
1970     buffered-up input is lost).  Note that calling `yyrestart()' with
1971     `yyin' as an argument thus throws away the current input buffer
1972     and continues scanning the same input file.
1973
1974`FILE *yyout'
1975     is the file to which `ECHO' actions are done.  It can be reassigned
1976     by the user.
1977
1978`YY_CURRENT_BUFFER'
1979     returns a `YY_BUFFER_STATE' handle to the current buffer.
1980
1981`YY_START'
1982     returns an integer value corresponding to the current start
1983     condition.  You can subsequently use this value with `BEGIN' to
1984     return to that start condition.
1985
1986
1987File: flex.info,  Node: Yacc,  Next: Scanner Options,  Prev: User Values,  Up: Top
1988
198915 Interfacing with Yacc
1990************************
1991
1992One of the main uses of `flex' is as a companion to the `yacc'
1993parser-generator.  `yacc' parsers expect to call a routine named
1994`yylex()' to find the next input token.  The routine is supposed to
1995return the type of the next token as well as putting any associated
1996value in the global `yylval'.  To use `flex' with `yacc', one specifies
1997the `-d' option to `yacc' to instruct it to generate the file `y.tab.h'
1998containing definitions of all the `%tokens' appearing in the `yacc'
1999input.  This file is then included in the `flex' scanner.  For example,
2000if one of the tokens is `TOK_NUMBER', part of the scanner might look
2001like:
2002
2003         %{
2004         #include "y.tab.h"
2005         %}
2006
2007         %%
2008
2009         [0-9]+        yylval = atoi( yytext ); return TOK_NUMBER;
2010
2011
2012File: flex.info,  Node: Scanner Options,  Next: Performance,  Prev: Yacc,  Up: Top
2013
201416 Scanner Options
2015******************
2016
2017The various `flex' options are categorized by function in the following
2018menu. If you want to lookup a particular option by name, *Note Index of
2019Scanner Options::.
2020
2021* Menu:
2022
2023* Options for Specifying Filenames::
2024* Options Affecting Scanner Behavior::
2025* Code-Level And API Options::
2026* Options for Scanner Speed and Size::
2027* Debugging Options::
2028* Miscellaneous Options::
2029
2030   Even though there are many scanner options, a typical scanner might
2031only specify the following options:
2032
2033     %option   8bit reentrant bison-bridge
2034     %option   warn nodefault
2035     %option   yylineno
2036     %option   outfile="scanner.c" header-file="scanner.h"
2037
2038   The first line specifies the general type of scanner we want. The
2039second line specifies that we are being careful. The third line asks
2040flex to track line numbers. The last line tells flex what to name the
2041files. (The options can be specified in any order. We just divided
2042them.)
2043
2044   `flex' also provides a mechanism for controlling options within the
2045scanner specification itself, rather than from the flex command-line.
2046This is done by including `%option' directives in the first section of
2047the scanner specification.  You can specify multiple options with a
2048single `%option' directive, and multiple directives in the first
2049section of your flex input file.
2050
2051   Most options are given simply as names, optionally preceded by the
2052word `no' (with no intervening whitespace) to negate their meaning.
2053The names are the same as their long-option equivalents (but without the
2054leading `--' ).
2055
2056   `flex' scans your rule actions to determine whether you use the
2057`REJECT' or `yymore()' features.  The `REJECT' and `yymore' options are
2058available to override its decision as to whether you use the options,
2059either by setting them (e.g., `%option reject)' to indicate the feature
2060is indeed used, or unsetting them to indicate it actually is not used
2061(e.g., `%option noyymore)'.
2062
2063   A number of options are available for lint purists who want to
2064suppress the appearance of unneeded routines in the generated scanner.
2065Each of the following, if unset (e.g., `%option nounput'), results in
2066the corresponding routine not appearing in the generated scanner:
2067
2068         input, unput
2069         yy_push_state, yy_pop_state, yy_top_state
2070         yy_scan_buffer, yy_scan_bytes, yy_scan_string
2071
2072         yyget_extra, yyset_extra, yyget_leng, yyget_text,
2073         yyget_lineno, yyset_lineno, yyget_in, yyset_in,
2074         yyget_out, yyset_out, yyget_lval, yyset_lval,
2075         yyget_lloc, yyset_lloc, yyget_debug, yyset_debug
2076
2077   (though `yy_push_state()' and friends won't appear anyway unless you
2078use `%option stack)'.
2079
2080
2081File: flex.info,  Node: Options for Specifying Filenames,  Next: Options Affecting Scanner Behavior,  Prev: Scanner Options,  Up: Scanner Options
2082
208316.1 Options for Specifying Filenames
2084=====================================
2085
2086`--header-file=FILE, `%option header-file="FILE"''
2087     instructs flex to write a C header to `FILE'. This file contains
2088     function prototypes, extern variables, and types used by the
2089     scanner.  Only the external API is exported by the header file.
2090     Many macros that are usable from within scanner actions are not
2091     exported to the header file. This is due to namespace problems and
2092     the goal of a clean external API.
2093
2094     While in the header, the macro `yyIN_HEADER' is defined, where `yy'
2095     is substituted with the appropriate prefix.
2096
2097     The `--header-file' option is not compatible with the `--c++'
2098     option, since the C++ scanner provides its own header in
2099     `yyFlexLexer.h'.
2100
2101`-oFILE, --outfile=FILE, `%option outfile="FILE"''
2102     directs flex to write the scanner to the file `FILE' instead of
2103     `lex.yy.c'.  If you combine `--outfile' with the `--stdout' option,
2104     then the scanner is written to `stdout' but its `#line' directives
2105     (see the `-l' option above) refer to the file `FILE'.
2106
2107`-t, --stdout, `%option stdout''
2108     instructs `flex' to write the scanner it generates to standard
2109     output instead of `lex.yy.c'.
2110
2111`-SFILE, --skel=FILE'
2112     overrides the default skeleton file from which `flex' constructs
2113     its scanners.  You'll never need this option unless you are doing
2114     `flex' maintenance or development.
2115
2116`--tables-file=FILE'
2117     Write serialized scanner dfa tables to FILE. The generated scanner
2118     will not contain the tables, and requires them to be loaded at
2119     runtime.  *Note serialization::.
2120
2121`--tables-verify'
2122     This option is for flex development. We document it here in case
2123     you stumble upon it by accident or in case you suspect some
2124     inconsistency in the serialized tables.  Flex will serialize the
2125     scanner dfa tables but will also generate the in-code tables as it
2126     normally does. At runtime, the scanner will verify that the
2127     serialized tables match the in-code tables, instead of loading
2128     them.
2129
2130
2131
2132File: flex.info,  Node: Options Affecting Scanner Behavior,  Next: Code-Level And API Options,  Prev: Options for Specifying Filenames,  Up: Scanner Options
2133
213416.2 Options Affecting Scanner Behavior
2135=======================================
2136
2137`-i, --case-insensitive, `%option case-insensitive''
2138     instructs `flex' to generate a "case-insensitive" scanner.  The
2139     case of letters given in the `flex' input patterns will be ignored,
2140     and tokens in the input will be matched regardless of case.  The
2141     matched text given in `yytext' will have the preserved case (i.e.,
2142     it will not be folded).  For tricky behavior, see *note case and
2143     character ranges::.
2144
2145`-l, --lex-compat, `%option lex-compat''
2146     turns on maximum compatibility with the original AT&T `lex'
2147     implementation.  Note that this does not mean _full_ compatibility.
2148     Use of this option costs a considerable amount of performance, and
2149     it cannot be used with the `--c++', `--full', `--fast', `-Cf', or
2150     `-CF' options.  For details on the compatibilities it provides, see
2151     *note Lex and Posix::.  This option also results in the name
2152     `YY_FLEX_LEX_COMPAT' being `#define''d in the generated scanner.
2153
2154`-B, --batch, `%option batch''
2155     instructs `flex' to generate a "batch" scanner, the opposite of
2156     _interactive_ scanners generated by `--interactive' (see below).
2157     In general, you use `-B' when you are _certain_ that your scanner
2158     will never be used interactively, and you want to squeeze a
2159     _little_ more performance out of it.  If your goal is instead to
2160     squeeze out a _lot_ more performance, you should be using the
2161     `-Cf' or `-CF' options, which turn on `--batch' automatically
2162     anyway.
2163
2164`-I, --interactive, `%option interactive''
2165     instructs `flex' to generate an interactive scanner.  An
2166     interactive scanner is one that only looks ahead to decide what
2167     token has been matched if it absolutely must.  It turns out that
2168     always looking one extra character ahead, even if the scanner has
2169     already seen enough text to disambiguate the current token, is a
2170     bit faster than only looking ahead when necessary.  But scanners
2171     that always look ahead give dreadful interactive performance; for
2172     example, when a user types a newline, it is not recognized as a
2173     newline token until they enter _another_ token, which often means
2174     typing in another whole line.
2175
2176     `flex' scanners default to `interactive' unless you use the `-Cf'
2177     or `-CF' table-compression options (*note Performance::).  That's
2178     because if you're looking for high-performance you should be using
2179     one of these options, so if you didn't, `flex' assumes you'd
2180     rather trade off a bit of run-time performance for intuitive
2181     interactive behavior.  Note also that you _cannot_ use
2182     `--interactive' in conjunction with `-Cf' or `-CF'.  Thus, this
2183     option is not really needed; it is on by default for all those
2184     cases in which it is allowed.
2185
2186     You can force a scanner to _not_ be interactive by using `--batch'
2187
2188`-7, --7bit, `%option 7bit''
2189     instructs `flex' to generate a 7-bit scanner, i.e., one which can
2190     only recognize 7-bit characters in its input.  The advantage of
2191     using `--7bit' is that the scanner's tables can be up to half the
2192     size of those generated using the `--8bit'.  The disadvantage is
2193     that such scanners often hang or crash if their input contains an
2194     8-bit character.
2195
2196     Note, however, that unless you generate your scanner using the
2197     `-Cf' or `-CF' table compression options, use of `--7bit' will
2198     save only a small amount of table space, and make your scanner
2199     considerably less portable.  `Flex''s default behavior is to
2200     generate an 8-bit scanner unless you use the `-Cf' or `-CF', in
2201     which case `flex' defaults to generating 7-bit scanners unless
2202     your site was always configured to generate 8-bit scanners (as will
2203     often be the case with non-USA sites).  You can tell whether flex
2204     generated a 7-bit or an 8-bit scanner by inspecting the flag
2205     summary in the `--verbose' output as described above.
2206
2207     Note that if you use `-Cfe' or `-CFe' `flex' still defaults to
2208     generating an 8-bit scanner, since usually with these compression
2209     options full 8-bit tables are not much more expensive than 7-bit
2210     tables.
2211
2212`-8, --8bit, `%option 8bit''
2213     instructs `flex' to generate an 8-bit scanner, i.e., one which can
2214     recognize 8-bit characters.  This flag is only needed for scanners
2215     generated using `-Cf' or `-CF', as otherwise flex defaults to
2216     generating an 8-bit scanner anyway.
2217
2218     See the discussion of `--7bit' above for `flex''s default behavior
2219     and the tradeoffs between 7-bit and 8-bit scanners.
2220
2221`--default, `%option default''
2222     generate the default rule.
2223
2224`--always-interactive, `%option always-interactive''
2225     instructs flex to generate a scanner which always considers its
2226     input _interactive_.  Normally, on each new input file the scanner
2227     calls `isatty()' in an attempt to determine whether the scanner's
2228     input source is interactive and thus should be read a character at
2229     a time.  When this option is used, however, then no such call is
2230     made.
2231
2232`--never-interactive, `--never-interactive''
2233     instructs flex to generate a scanner which never considers its
2234     input interactive.  This is the opposite of `always-interactive'.
2235
2236`-X, --posix, `%option posix''
2237     turns on maximum compatibility with the POSIX 1003.2-1992
2238     definition of `lex'.  Since `flex' was originally designed to
2239     implement the POSIX definition of `lex' this generally involves
2240     very few changes in behavior.  At the current writing the known
2241     differences between `flex' and the POSIX standard are:
2242
2243        * In POSIX and AT&T `lex', the repeat operator, `{}', has lower
2244          precedence than concatenation (thus `ab{3}' yields `ababab').
2245          Most POSIX utilities use an Extended Regular Expression (ERE)
2246          precedence that has the precedence of the repeat operator
2247          higher than concatenation (which causes `ab{3}' to yield
2248          `abbb').  By default, `flex' places the precedence of the
2249          repeat operator higher than concatenation which matches the
2250          ERE processing of other POSIX utilities.  When either
2251          `--posix' or `-l' are specified, `flex' will use the
2252          traditional AT&T and POSIX-compliant precedence for the
2253          repeat operator where concatenation has higher precedence
2254          than the repeat operator.
2255
2256`--stack, `%option stack''
2257     enables the use of start condition stacks (*note Start
2258     Conditions::).
2259
2260`--stdinit, `%option stdinit''
2261     if set (i.e., %option stdinit) initializes `yyin' and `yyout' to
2262     `stdin' and `stdout', instead of the default of `NULL'.  Some
2263     existing `lex' programs depend on this behavior, even though it is
2264     not compliant with ANSI C, which does not require `stdin' and
2265     `stdout' to be compile-time constant. In a reentrant scanner,
2266     however, this is not a problem since initialization is performed
2267     in `yylex_init' at runtime.
2268
2269`--yylineno, `%option yylineno''
2270     directs `flex' to generate a scanner that maintains the number of
2271     the current line read from its input in the global variable
2272     `yylineno'.  This option is implied by `%option lex-compat'.  In a
2273     reentrant C scanner, the macro `yylineno' is accessible regardless
2274     of the value of `%option yylineno', however, its value is not
2275     modified by `flex' unless `%option yylineno' is enabled.
2276
2277`--yywrap, `%option yywrap''
2278     if unset (i.e., `--noyywrap)', makes the scanner not call
2279     `yywrap()' upon an end-of-file, but simply assume that there are no
2280     more files to scan (until the user points `yyin' at a new file and
2281     calls `yylex()' again).
2282
2283
2284
2285File: flex.info,  Node: Code-Level And API Options,  Next: Options for Scanner Speed and Size,  Prev: Options Affecting Scanner Behavior,  Up: Scanner Options
2286
228716.3 Code-Level And API Options
2288===============================
2289
2290`--ansi-definitions, `%option ansi-definitions''
2291     instruct flex to generate ANSI C99 definitions for functions.
2292     This option is enabled by default.  If `%option
2293     noansi-definitions' is specified, then the obsolete style is
2294     generated.
2295
2296`--ansi-prototypes, `%option ansi-prototypes''
2297     instructs flex to generate ANSI C99 prototypes for functions.
2298     This option is enabled by default.  If `noansi-prototypes' is
2299     specified, then prototypes will have empty parameter lists.
2300
2301`--bison-bridge, `%option bison-bridge''
2302     instructs flex to generate a C scanner that is meant to be called
2303     by a `GNU bison' parser. The scanner has minor API changes for
2304     `bison' compatibility. In particular, the declaration of `yylex'
2305     is modified to take an additional parameter, `yylval'.  *Note
2306     Bison Bridge::.
2307
2308`--bison-locations, `%option bison-locations''
2309     instruct flex that `GNU bison' `%locations' are being used.  This
2310     means `yylex' will be passed an additional parameter, `yylloc'.
2311     This option implies `%option bison-bridge'.  *Note Bison Bridge::.
2312
2313`-L, --noline, `%option noline''
2314     instructs `flex' not to generate `#line' directives.  Without this
2315     option, `flex' peppers the generated scanner with `#line'
2316     directives so error messages in the actions will be correctly
2317     located with respect to either the original `flex' input file (if
2318     the errors are due to code in the input file), or `lex.yy.c' (if
2319     the errors are `flex''s fault - you should report these sorts of
2320     errors to the email address given in *note Reporting Bugs::).
2321
2322`-R, --reentrant, `%option reentrant''
2323     instructs flex to generate a reentrant C scanner.  The generated
2324     scanner may safely be used in a multi-threaded environment. The
2325     API for a reentrant scanner is different than for a non-reentrant
2326     scanner *note Reentrant::).  Because of the API difference between
2327     reentrant and non-reentrant `flex' scanners, non-reentrant flex
2328     code must be modified before it is suitable for use with this
2329     option.  This option is not compatible with the `--c++' option.
2330
2331     The option `--reentrant' does not affect the performance of the
2332     scanner.
2333
2334`-+, --c++, `%option c++''
2335     specifies that you want flex to generate a C++ scanner class.
2336     *Note Cxx::, for details.
2337
2338`--array, `%option array''
2339     specifies that you want yytext to be an array instead of a char*
2340
2341`--pointer, `%option pointer''
2342     specify that  `yytext' should be a `char *', not an array.  This
2343     default is `char *'.
2344
2345`-PPREFIX, --prefix=PREFIX, `%option prefix="PREFIX"''
2346     changes the default `yy' prefix used by `flex' for all
2347     globally-visible variable and function names to instead be
2348     `PREFIX'.  For example, `--prefix=foo' changes the name of
2349     `yytext' to `footext'.  It also changes the name of the default
2350     output file from `lex.yy.c' to `lex.foo.c'.  Here is a partial
2351     list of the names affected:
2352
2353              yy_create_buffer
2354              yy_delete_buffer
2355              yy_flex_debug
2356              yy_init_buffer
2357              yy_flush_buffer
2358              yy_load_buffer_state
2359              yy_switch_to_buffer
2360              yyin
2361              yyleng
2362              yylex
2363              yylineno
2364              yyout
2365              yyrestart
2366              yytext
2367              yywrap
2368              yyalloc
2369              yyrealloc
2370              yyfree
2371
2372     (If you are using a C++ scanner, then only `yywrap' and
2373     `yyFlexLexer' are affected.)  Within your scanner itself, you can
2374     still refer to the global variables and functions using either
2375     version of their name; but externally, they have the modified name.
2376
2377     This option lets you easily link together multiple `flex' programs
2378     into the same executable.  Note, though, that using this option
2379     also renames `yywrap()', so you now _must_ either provide your own
2380     (appropriately-named) version of the routine for your scanner, or
2381     use `%option noyywrap', as linking with `-lfl' no longer provides
2382     one for you by default.
2383
2384`--main, `%option main''
2385     directs flex to provide a default `main()' program for the
2386     scanner, which simply calls `yylex()'.  This option implies
2387     `noyywrap' (see below).
2388
2389`--nounistd, `%option nounistd''
2390     suppresses inclusion of the non-ANSI header file `unistd.h'. This
2391     option is meant to target environments in which `unistd.h' does
2392     not exist. Be aware that certain options may cause flex to
2393     generate code that relies on functions normally found in
2394     `unistd.h', (e.g. `isatty()', `read()'.)  If you wish to use these
2395     functions, you will have to inform your compiler where to find
2396     them.  *Note option-always-interactive::. *Note option-read::.
2397
2398`--yyclass=NAME, `%option yyclass="NAME"''
2399     only applies when generating a C++ scanner (the `--c++' option).
2400     It informs `flex' that you have derived `NAME' as a subclass of
2401     `yyFlexLexer', so `flex' will place your actions in the member
2402     function `foo::yylex()' instead of `yyFlexLexer::yylex()'.  It
2403     also generates a `yyFlexLexer::yylex()' member function that emits
2404     a run-time error (by invoking `yyFlexLexer::LexerError())' if
2405     called.  *Note Cxx::.
2406
2407
2408
2409File: flex.info,  Node: Options for Scanner Speed and Size,  Next: Debugging Options,  Prev: Code-Level And API Options,  Up: Scanner Options
2410
241116.4 Options for Scanner Speed and Size
2412=======================================
2413
2414`-C[aefFmr]'
2415     controls the degree of table compression and, more generally,
2416     trade-offs between small scanners and fast scanners.
2417
2418    `-C'
2419          A lone `-C' specifies that the scanner tables should be
2420          compressed but neither equivalence classes nor
2421          meta-equivalence classes should be used.
2422
2423    `-Ca, --align, `%option align''
2424          ("align") instructs flex to trade off larger tables in the
2425          generated scanner for faster performance because the elements
2426          of the tables are better aligned for memory access and
2427          computation.  On some RISC architectures, fetching and
2428          manipulating longwords is more efficient than with
2429          smaller-sized units such as shortwords.  This option can
2430          quadruple the size of the tables used by your scanner.
2431
2432    `-Ce, --ecs, `%option ecs''
2433          directs `flex' to construct "equivalence classes", i.e., sets
2434          of characters which have identical lexical properties (for
2435          example, if the only appearance of digits in the `flex' input
2436          is in the character class "[0-9]" then the digits '0', '1',
2437          ..., '9' will all be put in the same equivalence class).
2438          Equivalence classes usually give dramatic reductions in the
2439          final table/object file sizes (typically a factor of 2-5) and
2440          are pretty cheap performance-wise (one array look-up per
2441          character scanned).
2442
2443    `-Cf'
2444          specifies that the "full" scanner tables should be generated -
2445          `flex' should not compress the tables by taking advantages of
2446          similar transition functions for different states.
2447
2448    `-CF'
2449          specifies that the alternate fast scanner representation
2450          (described above under the `--fast' flag) should be used.
2451          This option cannot be used with `--c++'.
2452
2453    `-Cm, --meta-ecs, `%option meta-ecs''
2454          directs `flex' to construct "meta-equivalence classes", which
2455          are sets of equivalence classes (or characters, if equivalence
2456          classes are not being used) that are commonly used together.
2457          Meta-equivalence classes are often a big win when using
2458          compressed tables, but they have a moderate performance
2459          impact (one or two `if' tests and one array look-up per
2460          character scanned).
2461
2462    `-Cr, --read, `%option read''
2463          causes the generated scanner to _bypass_ use of the standard
2464          I/O library (`stdio') for input.  Instead of calling
2465          `fread()' or `getc()', the scanner will use the `read()'
2466          system call, resulting in a performance gain which varies
2467          from system to system, but in general is probably negligible
2468          unless you are also using `-Cf' or `-CF'.  Using `-Cr' can
2469          cause strange behavior if, for example, you read from `yyin'
2470          using `stdio' prior to calling the scanner (because the
2471          scanner will miss whatever text your previous reads left in
2472          the `stdio' input buffer).  `-Cr' has no effect if you define
2473          `YY_INPUT()' (*note Generated Scanner::).
2474
2475     The options `-Cf' or `-CF' and `-Cm' do not make sense together -
2476     there is no opportunity for meta-equivalence classes if the table
2477     is not being compressed.  Otherwise the options may be freely
2478     mixed, and are cumulative.
2479
2480     The default setting is `-Cem', which specifies that `flex' should
2481     generate equivalence classes and meta-equivalence classes.  This
2482     setting provides the highest degree of table compression.  You can
2483     trade off faster-executing scanners at the cost of larger tables
2484     with the following generally being true:
2485
2486              slowest & smallest
2487                    -Cem
2488                    -Cm
2489                    -Ce
2490                    -C
2491                    -C{f,F}e
2492                    -C{f,F}
2493                    -C{f,F}a
2494              fastest & largest
2495
2496     Note that scanners with the smallest tables are usually generated
2497     and compiled the quickest, so during development you will usually
2498     want to use the default, maximal compression.
2499
2500     `-Cfe' is often a good compromise between speed and size for
2501     production scanners.
2502
2503`-f, --full, `%option full''
2504     specifies "fast scanner".  No table compression is done and
2505     `stdio' is bypassed.  The result is large but fast.  This option
2506     is equivalent to `--Cfr'
2507
2508`-F, --fast, `%option fast''
2509     specifies that the _fast_ scanner table representation should be
2510     used (and `stdio' bypassed).  This representation is about as fast
2511     as the full table representation `--full', and for some sets of
2512     patterns will be considerably smaller (and for others, larger).  In
2513     general, if the pattern set contains both _keywords_ and a
2514     catch-all, _identifier_ rule, such as in the set:
2515
2516              "case"    return TOK_CASE;
2517              "switch"  return TOK_SWITCH;
2518              ...
2519              "default" return TOK_DEFAULT;
2520              [a-z]+    return TOK_ID;
2521
2522     then you're better off using the full table representation.  If
2523     only the _identifier_ rule is present and you then use a hash
2524     table or some such to detect the keywords, you're better off using
2525     `--fast'.
2526
2527     This option is equivalent to `-CFr'.  It cannot be used with
2528     `--c++'.
2529
2530
2531
2532File: flex.info,  Node: Debugging Options,  Next: Miscellaneous Options,  Prev: Options for Scanner Speed and Size,  Up: Scanner Options
2533
253416.5 Debugging Options
2535======================
2536
2537`-b, --backup, `%option backup''
2538     Generate backing-up information to `lex.backup'.  This is a list of
2539     scanner states which require backing up and the input characters on
2540     which they do so.  By adding rules one can remove backing-up
2541     states.  If _all_ backing-up states are eliminated and `-Cf' or
2542     `-CF' is used, the generated scanner will run faster (see the
2543     `--perf-report' flag).  Only users who wish to squeeze every last
2544     cycle out of their scanners need worry about this option.  (*note
2545     Performance::).
2546
2547`-d, --debug, `%option debug''
2548     makes the generated scanner run in "debug" mode.  Whenever a
2549     pattern is recognized and the global variable `yy_flex_debug' is
2550     non-zero (which is the default), the scanner will write to
2551     `stderr' a line of the form:
2552
2553              -accepting rule at line 53 ("the matched text")
2554
2555     The line number refers to the location of the rule in the file
2556     defining the scanner (i.e., the file that was fed to flex).
2557     Messages are also generated when the scanner backs up, accepts the
2558     default rule, reaches the end of its input buffer (or encounters a
2559     NUL; at this point, the two look the same as far as the scanner's
2560     concerned), or reaches an end-of-file.
2561
2562`-p, --perf-report, `%option perf-report''
2563     generates a performance report to `stderr'.  The report consists of
2564     comments regarding features of the `flex' input file which will
2565     cause a serious loss of performance in the resulting scanner.  If
2566     you give the flag twice, you will also get comments regarding
2567     features that lead to minor performance losses.
2568
2569     Note that the use of `REJECT', and variable trailing context
2570     (*note Limitations::) entails a substantial performance penalty;
2571     use of `yymore()', the `^' operator, and the `--interactive' flag
2572     entail minor performance penalties.
2573
2574`-s, --nodefault, `%option nodefault''
2575     causes the _default rule_ (that unmatched scanner input is echoed
2576     to `stdout)' to be suppressed.  If the scanner encounters input
2577     that does not match any of its rules, it aborts with an error.
2578     This option is useful for finding holes in a scanner's rule set.
2579
2580`-T, --trace, `%option trace''
2581     makes `flex' run in "trace" mode.  It will generate a lot of
2582     messages to `stderr' concerning the form of the input and the
2583     resultant non-deterministic and deterministic finite automata.
2584     This option is mostly for use in maintaining `flex'.
2585
2586`-w, --nowarn, `%option nowarn''
2587     suppresses warning messages.
2588
2589`-v, --verbose, `%option verbose''
2590     specifies that `flex' should write to `stderr' a summary of
2591     statistics regarding the scanner it generates.  Most of the
2592     statistics are meaningless to the casual `flex' user, but the
2593     first line identifies the version of `flex' (same as reported by
2594     `--version'), and the next line the flags used when generating the
2595     scanner, including those that are on by default.
2596
2597`--warn, `%option warn''
2598     warn about certain things. In particular, if the default rule can
2599     be matched but no default rule has been given, the flex will warn
2600     you.  We recommend using this option always.
2601
2602
2603
2604File: flex.info,  Node: Miscellaneous Options,  Prev: Debugging Options,  Up: Scanner Options
2605
260616.6 Miscellaneous Options
2607==========================
2608
2609`-c'
2610     A do-nothing option included for POSIX compliance.
2611
2612`-h, -?, --help'
2613     generates a "help" summary of `flex''s options to `stdout' and
2614     then exits.
2615
2616`-n'
2617     Another do-nothing option included for POSIX compliance.
2618
2619`-V, --version'
2620     prints the version number to `stdout' and exits.
2621
2622
2623
2624File: flex.info,  Node: Performance,  Next: Cxx,  Prev: Scanner Options,  Up: Top
2625
262617 Performance Considerations
2627*****************************
2628
2629The main design goal of `flex' is that it generate high-performance
2630scanners.  It has been optimized for dealing well with large sets of
2631rules.  Aside from the effects on scanner speed of the table compression
2632`-C' options outlined above, there are a number of options/actions
2633which degrade performance.  These are, from most expensive to least:
2634
2635         REJECT
2636         arbitrary trailing context
2637
2638         pattern sets that require backing up
2639         %option yylineno
2640         %array
2641
2642         %option interactive
2643         %option always-interactive
2644
2645         ^ beginning-of-line operator
2646         yymore()
2647
2648   with the first two all being quite expensive and the last two being
2649quite cheap.  Note also that `unput()' is implemented as a routine call
2650that potentially does quite a bit of work, while `yyless()' is a
2651quite-cheap macro. So if you are just putting back some excess text you
2652scanned, use `yyless()'.
2653
2654   `REJECT' should be avoided at all costs when performance is
2655important.  It is a particularly expensive option.
2656
2657   There is one case when `%option yylineno' can be expensive. That is
2658when your patterns match long tokens that could _possibly_ contain a
2659newline character. There is no performance penalty for rules that can
2660not possibly match newlines, since flex does not need to check them for
2661newlines.  In general, you should avoid rules such as `[^f]+', which
2662match very long tokens, including newlines, and may possibly match your
2663entire file! A better approach is to separate `[^f]+' into two rules:
2664
2665     %option yylineno
2666     %%
2667         [^f\n]+
2668         \n+
2669
2670   The above scanner does not incur a performance penalty.
2671
2672   Getting rid of backing up is messy and often may be an enormous
2673amount of work for a complicated scanner.  In principal, one begins by
2674using the `-b' flag to generate a `lex.backup' file.  For example, on
2675the input:
2676
2677         %%
2678         foo        return TOK_KEYWORD;
2679         foobar     return TOK_KEYWORD;
2680
2681   the file looks like:
2682
2683         State #6 is non-accepting -
2684          associated rule line numbers:
2685                2       3
2686          out-transitions: [ o ]
2687          jam-transitions: EOF [ \001-n  p-\177 ]
2688
2689         State #8 is non-accepting -
2690          associated rule line numbers:
2691                3
2692          out-transitions: [ a ]
2693          jam-transitions: EOF [ \001-`  b-\177 ]
2694
2695         State #9 is non-accepting -
2696          associated rule line numbers:
2697                3
2698          out-transitions: [ r ]
2699          jam-transitions: EOF [ \001-q  s-\177 ]
2700
2701         Compressed tables always back up.
2702
2703   The first few lines tell us that there's a scanner state in which it
2704can make a transition on an 'o' but not on any other character, and
2705that in that state the currently scanned text does not match any rule.
2706The state occurs when trying to match the rules found at lines 2 and 3
2707in the input file.  If the scanner is in that state and then reads
2708something other than an 'o', it will have to back up to find a rule
2709which is matched.  With a bit of headscratching one can see that this
2710must be the state it's in when it has seen `fo'.  When this has
2711happened, if anything other than another `o' is seen, the scanner will
2712have to back up to simply match the `f' (by the default rule).
2713
2714   The comment regarding State #8 indicates there's a problem when
2715`foob' has been scanned.  Indeed, on any character other than an `a',
2716the scanner will have to back up to accept "foo".  Similarly, the
2717comment for State #9 concerns when `fooba' has been scanned and an `r'
2718does not follow.
2719
2720   The final comment reminds us that there's no point going to all the
2721trouble of removing backing up from the rules unless we're using `-Cf'
2722or `-CF', since there's no performance gain doing so with compressed
2723scanners.
2724
2725   The way to remove the backing up is to add "error" rules:
2726
2727         %%
2728         foo         return TOK_KEYWORD;
2729         foobar      return TOK_KEYWORD;
2730
2731         fooba       |
2732         foob        |
2733         fo          {
2734                     /* false alarm, not really a keyword */
2735                     return TOK_ID;
2736                     }
2737
2738   Eliminating backing up among a list of keywords can also be done
2739using a "catch-all" rule:
2740
2741         %%
2742         foo         return TOK_KEYWORD;
2743         foobar      return TOK_KEYWORD;
2744
2745         [a-z]+      return TOK_ID;
2746
2747   This is usually the best solution when appropriate.
2748
2749   Backing up messages tend to cascade.  With a complicated set of rules
2750it's not uncommon to get hundreds of messages.  If one can decipher
2751them, though, it often only takes a dozen or so rules to eliminate the
2752backing up (though it's easy to make a mistake and have an error rule
2753accidentally match a valid token.  A possible future `flex' feature
2754will be to automatically add rules to eliminate backing up).
2755
2756   It's important to keep in mind that you gain the benefits of
2757eliminating backing up only if you eliminate _every_ instance of
2758backing up.  Leaving just one means you gain nothing.
2759
2760   _Variable_ trailing context (where both the leading and trailing
2761parts do not have a fixed length) entails almost the same performance
2762loss as `REJECT' (i.e., substantial).  So when possible a rule like:
2763
2764         %%
2765         mouse|rat/(cat|dog)   run();
2766
2767   is better written:
2768
2769         %%
2770         mouse/cat|dog         run();
2771         rat/cat|dog           run();
2772
2773   or as
2774
2775         %%
2776         mouse|rat/cat         run();
2777         mouse|rat/dog         run();
2778
2779   Note that here the special '|' action does _not_ provide any
2780savings, and can even make things worse (*note Limitations::).
2781
2782   Another area where the user can increase a scanner's performance (and
2783one that's easier to implement) arises from the fact that the longer the
2784tokens matched, the faster the scanner will run.  This is because with
2785long tokens the processing of most input characters takes place in the
2786(short) inner scanning loop, and does not often have to go through the
2787additional work of setting up the scanning environment (e.g., `yytext')
2788for the action.  Recall the scanner for C comments:
2789
2790         %x comment
2791         %%
2792                 int line_num = 1;
2793
2794         "/*"         BEGIN(comment);
2795
2796         <comment>[^*\n]*
2797         <comment>"*"+[^*/\n]*
2798         <comment>\n             ++line_num;
2799         <comment>"*"+"/"        BEGIN(INITIAL);
2800
2801   This could be sped up by writing it as:
2802
2803         %x comment
2804         %%
2805                 int line_num = 1;
2806
2807         "/*"         BEGIN(comment);
2808
2809         <comment>[^*\n]*
2810         <comment>[^*\n]*\n      ++line_num;
2811         <comment>"*"+[^*/\n]*
2812         <comment>"*"+[^*/\n]*\n ++line_num;
2813         <comment>"*"+"/"        BEGIN(INITIAL);
2814
2815   Now instead of each newline requiring the processing of another
2816action, recognizing the newlines is distributed over the other rules to
2817keep the matched text as long as possible.  Note that _adding_ rules
2818does _not_ slow down the scanner!  The speed of the scanner is
2819independent of the number of rules or (modulo the considerations given
2820at the beginning of this section) how complicated the rules are with
2821regard to operators such as `*' and `|'.
2822
2823   A final example in speeding up a scanner: suppose you want to scan
2824through a file containing identifiers and keywords, one per line and
2825with no other extraneous characters, and recognize all the keywords.  A
2826natural first approach is:
2827
2828         %%
2829         asm      |
2830         auto     |
2831         break    |
2832         ... etc ...
2833         volatile |
2834         while    /* it's a keyword */
2835
2836         .|\n     /* it's not a keyword */
2837
2838   To eliminate the back-tracking, introduce a catch-all rule:
2839
2840         %%
2841         asm      |
2842         auto     |
2843         break    |
2844         ... etc ...
2845         volatile |
2846         while    /* it's a keyword */
2847
2848         [a-z]+   |
2849         .|\n     /* it's not a keyword */
2850
2851   Now, if it's guaranteed that there's exactly one word per line, then
2852we can reduce the total number of matches by a half by merging in the
2853recognition of newlines with that of the other tokens:
2854
2855         %%
2856         asm\n    |
2857         auto\n   |
2858         break\n  |
2859         ... etc ...
2860         volatile\n |
2861         while\n  /* it's a keyword */
2862
2863         [a-z]+\n |
2864         .|\n     /* it's not a keyword */
2865
2866   One has to be careful here, as we have now reintroduced backing up
2867into the scanner.  In particular, while _we_ know that there will never
2868be any characters in the input stream other than letters or newlines,
2869`flex' can't figure this out, and it will plan for possibly needing to
2870back up when it has scanned a token like `auto' and then the next
2871character is something other than a newline or a letter.  Previously it
2872would then just match the `auto' rule and be done, but now it has no
2873`auto' rule, only a `auto\n' rule.  To eliminate the possibility of
2874backing up, we could either duplicate all rules but without final
2875newlines, or, since we never expect to encounter such an input and
2876therefore don't how it's classified, we can introduce one more
2877catch-all rule, this one which doesn't include a newline:
2878
2879         %%
2880         asm\n    |
2881         auto\n   |
2882         break\n  |
2883         ... etc ...
2884         volatile\n |
2885         while\n  /* it's a keyword */
2886
2887         [a-z]+\n |
2888         [a-z]+   |
2889         .|\n     /* it's not a keyword */
2890
2891   Compiled with `-Cf', this is about as fast as one can get a `flex'
2892scanner to go for this particular problem.
2893
2894   A final note: `flex' is slow when matching `NUL's, particularly when
2895a token contains multiple `NUL's.  It's best to write rules which match
2896_short_ amounts of text if it's anticipated that the text will often
2897include `NUL's.
2898
2899   Another final note regarding performance: as mentioned in *note
2900Matching::, dynamically resizing `yytext' to accommodate huge tokens is
2901a slow process because it presently requires that the (huge) token be
2902rescanned from the beginning.  Thus if performance is vital, you should
2903attempt to match "large" quantities of text but not "huge" quantities,
2904where the cutoff between the two is at about 8K characters per token.
2905
2906
2907File: flex.info,  Node: Cxx,  Next: Reentrant,  Prev: Performance,  Up: Top
2908
290918 Generating C++ Scanners
2910**************************
2911
2912*IMPORTANT*: the present form of the scanning class is _experimental_
2913and may change considerably between major releases.
2914
2915   `flex' provides two different ways to generate scanners for use with
2916C++.  The first way is to simply compile a scanner generated by `flex'
2917using a C++ compiler instead of a C compiler.  You should not encounter
2918any compilation errors (*note Reporting Bugs::).  You can then use C++
2919code in your rule actions instead of C code.  Note that the default
2920input source for your scanner remains `yyin', and default echoing is
2921still done to `yyout'.  Both of these remain `FILE *' variables and not
2922C++ _streams_.
2923
2924   You can also use `flex' to generate a C++ scanner class, using the
2925`-+' option (or, equivalently, `%option c++)', which is automatically
2926specified if the name of the `flex' executable ends in a '+', such as
2927`flex++'.  When using this option, `flex' defaults to generating the
2928scanner to the file `lex.yy.cc' instead of `lex.yy.c'.  The generated
2929scanner includes the header file `FlexLexer.h', which defines the
2930interface to two C++ classes.
2931
2932   The first class, `FlexLexer', provides an abstract base class
2933defining the general scanner class interface.  It provides the
2934following member functions:
2935
2936`const char* YYText()'
2937     returns the text of the most recently matched token, the
2938     equivalent of `yytext'.
2939
2940`int YYLeng()'
2941     returns the length of the most recently matched token, the
2942     equivalent of `yyleng'.
2943
2944`int lineno() const'
2945     returns the current input line number (see `%option yylineno)', or
2946     `1' if `%option yylineno' was not used.
2947
2948`void set_debug( int flag )'
2949     sets the debugging flag for the scanner, equivalent to assigning to
2950     `yy_flex_debug' (*note Scanner Options::).  Note that you must
2951     build the scanner using `%option debug' to include debugging
2952     information in it.
2953
2954`int debug() const'
2955     returns the current setting of the debugging flag.
2956
2957   Also provided are member functions equivalent to
2958`yy_switch_to_buffer()', `yy_create_buffer()' (though the first
2959argument is an `istream*' object pointer and not a `FILE*)',
2960`yy_flush_buffer()', `yy_delete_buffer()', and `yyrestart()' (again,
2961the first argument is a `istream*' object pointer).
2962
2963   The second class defined in `FlexLexer.h' is `yyFlexLexer', which is
2964derived from `FlexLexer'.  It defines the following additional member
2965functions:
2966
2967`yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 )'
2968     constructs a `yyFlexLexer' object using the given streams for input
2969     and output.  If not specified, the streams default to `cin' and
2970     `cout', respectively.
2971
2972`virtual int yylex()'
2973     performs the same role is `yylex()' does for ordinary `flex'
2974     scanners: it scans the input stream, consuming tokens, until a
2975     rule's action returns a value.  If you derive a subclass `S' from
2976     `yyFlexLexer' and want to access the member functions and variables
2977     of `S' inside `yylex()', then you need to use `%option
2978     yyclass="S"' to inform `flex' that you will be using that subclass
2979     instead of `yyFlexLexer'.  In this case, rather than generating
2980     `yyFlexLexer::yylex()', `flex' generates `S::yylex()' (and also
2981     generates a dummy `yyFlexLexer::yylex()' that calls
2982     `yyFlexLexer::LexerError()' if called).
2983
2984`virtual void switch_streams(istream* new_in = 0, ostream* new_out = 0)'
2985     reassigns `yyin' to `new_in' (if non-null) and `yyout' to
2986     `new_out' (if non-null), deleting the previous input buffer if
2987     `yyin' is reassigned.
2988
2989`int yylex( istream* new_in, ostream* new_out = 0 )'
2990     first switches the input streams via `switch_streams( new_in,
2991     new_out )' and then returns the value of `yylex()'.
2992
2993   In addition, `yyFlexLexer' defines the following protected virtual
2994functions which you can redefine in derived classes to tailor the
2995scanner:
2996
2997`virtual int LexerInput( char* buf, int max_size )'
2998     reads up to `max_size' characters into `buf' and returns the
2999     number of characters read.  To indicate end-of-input, return 0
3000     characters.  Note that `interactive' scanners (see the `-B' and
3001     `-I' flags in *note Scanner Options::) define the macro
3002     `YY_INTERACTIVE'.  If you redefine `LexerInput()' and need to take
3003     different actions depending on whether or not the scanner might be
3004     scanning an interactive input source, you can test for the
3005     presence of this name via `#ifdef' statements.
3006
3007`virtual void LexerOutput( const char* buf, int size )'
3008     writes out `size' characters from the buffer `buf', which, while
3009     `NUL'-terminated, may also contain internal `NUL's if the
3010     scanner's rules can match text with `NUL's in them.
3011
3012`virtual void LexerError( const char* msg )'
3013     reports a fatal error message.  The default version of this
3014     function writes the message to the stream `cerr' and exits.
3015
3016   Note that a `yyFlexLexer' object contains its _entire_ scanning
3017state.  Thus you can use such objects to create reentrant scanners, but
3018see also *note Reentrant::.  You can instantiate multiple instances of
3019the same `yyFlexLexer' class, and you can also combine multiple C++
3020scanner classes together in the same program using the `-P' option
3021discussed above.
3022
3023   Finally, note that the `%array' feature is not available to C++
3024scanner classes; you must use `%pointer' (the default).
3025
3026   Here is an example of a simple C++ scanner:
3027
3028          // An example of using the flex C++ scanner class.
3029
3030         %{
3031         #include <iostream>
3032         using namespace std;
3033         int mylineno = 0;
3034         %}
3035
3036         %option noyywrap
3037
3038         string  \"[^\n"]+\"
3039
3040         ws      [ \t]+
3041
3042         alpha   [A-Za-z]
3043         dig     [0-9]
3044         name    ({alpha}|{dig}|\$)({alpha}|{dig}|[_.\-/$])*
3045         num1    [-+]?{dig}+\.?([eE][-+]?{dig}+)?
3046         num2    [-+]?{dig}*\.{dig}+([eE][-+]?{dig}+)?
3047         number  {num1}|{num2}
3048
3049         %%
3050
3051         {ws}    /* skip blanks and tabs */
3052
3053         "/*"    {
3054                 int c;
3055
3056                 while((c = yyinput()) != 0)
3057                     {
3058                     if(c == '\n')
3059                         ++mylineno;
3060
3061                     else if(c == '*')
3062                         {
3063                         if((c = yyinput()) == '/')
3064                             break;
3065                         else
3066                             unput(c);
3067                         }
3068                     }
3069                 }
3070
3071         {number}  cout << "number " << YYText() << '\n';
3072
3073         \n        mylineno++;
3074
3075         {name}    cout << "name " << YYText() << '\n';
3076
3077         {string}  cout << "string " << YYText() << '\n';
3078
3079         %%
3080
3081         int main( int /* argc */, char** /* argv */ )
3082         {
3083             FlexLexer* lexer = new yyFlexLexer;
3084             while(lexer->yylex() != 0)
3085                 ;
3086             return 0;
3087         }
3088
3089   If you want to create multiple (different) lexer classes, you use the
3090`-P' flag (or the `prefix=' option) to rename each `yyFlexLexer' to
3091some other `xxFlexLexer'.  You then can include `<FlexLexer.h>' in your
3092other sources once per lexer class, first renaming `yyFlexLexer' as
3093follows:
3094
3095         #undef yyFlexLexer
3096         #define yyFlexLexer xxFlexLexer
3097         #include <FlexLexer.h>
3098
3099         #undef yyFlexLexer
3100         #define yyFlexLexer zzFlexLexer
3101         #include <FlexLexer.h>
3102
3103   if, for example, you used `%option prefix="xx"' for one of your
3104scanners and `%option prefix="zz"' for the other.
3105
3106
3107File: flex.info,  Node: Reentrant,  Next: Lex and Posix,  Prev: Cxx,  Up: Top
3108
310919 Reentrant C Scanners
3110***********************
3111
3112`flex' has the ability to generate a reentrant C scanner. This is
3113accomplished by specifying `%option reentrant' (`-R') The generated
3114scanner is both portable, and safe to use in one or more separate
3115threads of control.  The most common use for reentrant scanners is from
3116within multi-threaded applications.  Any thread may create and execute
3117a reentrant `flex' scanner without the need for synchronization with
3118other threads.
3119
3120* Menu:
3121
3122* Reentrant Uses::
3123* Reentrant Overview::
3124* Reentrant Example::
3125* Reentrant Detail::
3126* Reentrant Functions::
3127
3128
3129File: flex.info,  Node: Reentrant Uses,  Next: Reentrant Overview,  Prev: Reentrant,  Up: Reentrant
3130
313119.1 Uses for Reentrant Scanners
3132================================
3133
3134However, there are other uses for a reentrant scanner.  For example, you
3135could scan two or more files simultaneously to implement a `diff' at
3136the token level (i.e., instead of at the character level):
3137
3138         /* Example of maintaining more than one active scanner. */
3139
3140         do {
3141             int tok1, tok2;
3142
3143             tok1 = yylex( scanner_1 );
3144             tok2 = yylex( scanner_2 );
3145
3146             if( tok1 != tok2 )
3147                 printf("Files are different.");
3148
3149        } while ( tok1 && tok2 );
3150
3151   Another use for a reentrant scanner is recursion.  (Note that a
3152recursive scanner can also be created using a non-reentrant scanner and
3153buffer states. *Note Multiple Input Buffers::.)
3154
3155   The following crude scanner supports the `eval' command by invoking
3156another instance of itself.
3157
3158         /* Example of recursive invocation. */
3159
3160         %option reentrant
3161
3162         %%
3163         "eval(".+")"  {
3164                           yyscan_t scanner;
3165                           YY_BUFFER_STATE buf;
3166
3167                           yylex_init( &scanner );
3168                           yytext[yyleng-1] = ' ';
3169
3170                           buf = yy_scan_string( yytext + 5, scanner );
3171                           yylex( scanner );
3172
3173                           yy_delete_buffer(buf,scanner);
3174                           yylex_destroy( scanner );
3175                      }
3176         ...
3177         %%
3178
3179
3180File: flex.info,  Node: Reentrant Overview,  Next: Reentrant Example,  Prev: Reentrant Uses,  Up: Reentrant
3181
318219.2 An Overview of the Reentrant API
3183=====================================
3184
3185The API for reentrant scanners is different than for non-reentrant
3186scanners. Here is a quick overview of the API:
3187
3188     `%option reentrant' must be specified.
3189
3190   * All functions take one additional argument: `yyscanner'
3191
3192   * All global variables are replaced by their macro equivalents.  (We
3193     tell you this because it may be important to you during debugging.)
3194
3195   * `yylex_init' and `yylex_destroy' must be called before and after
3196     `yylex', respectively.
3197
3198   * Accessor methods (get/set functions) provide access to common
3199     `flex' variables.
3200
3201   * User-specific data can be stored in `yyextra'.
3202
3203
3204File: flex.info,  Node: Reentrant Example,  Next: Reentrant Detail,  Prev: Reentrant Overview,  Up: Reentrant
3205
320619.3 Reentrant Example
3207======================
3208
3209First, an example of a reentrant scanner:
3210         /* This scanner prints "//" comments. */
3211
3212         %option reentrant stack noyywrap
3213         %x COMMENT
3214
3215         %%
3216
3217         "//"                 yy_push_state( COMMENT, yyscanner);
3218         .|\n
3219
3220         <COMMENT>\n          yy_pop_state( yyscanner );
3221         <COMMENT>[^\n]+      fprintf( yyout, "%s\n", yytext);
3222
3223         %%
3224
3225         int main ( int argc, char * argv[] )
3226         {
3227             yyscan_t scanner;
3228
3229             yylex_init ( &scanner );
3230             yylex ( scanner );
3231             yylex_destroy ( scanner );
3232         return 0;
3233        }
3234
3235
3236File: flex.info,  Node: Reentrant Detail,  Next: Reentrant Functions,  Prev: Reentrant Example,  Up: Reentrant
3237
323819.4 The Reentrant API in Detail
3239================================
3240
3241Here are the things you need to do or know to use the reentrant C API of
3242`flex'.
3243
3244* Menu:
3245
3246* Specify Reentrant::
3247* Extra Reentrant Argument::
3248* Global Replacement::
3249* Init and Destroy Functions::
3250* Accessor Methods::
3251* Extra Data::
3252* About yyscan_t::
3253
3254
3255File: flex.info,  Node: Specify Reentrant,  Next: Extra Reentrant Argument,  Prev: Reentrant Detail,  Up: Reentrant Detail
3256
325719.4.1 Declaring a Scanner As Reentrant
3258---------------------------------------
3259
3260%option reentrant (-reentrant) must be specified.
3261
3262   Notice that `%option reentrant' is specified in the above example
3263(*note Reentrant Example::. Had this option not been specified, `flex'
3264would have happily generated a non-reentrant scanner without
3265complaining. You may explicitly specify `%option noreentrant', if you
3266do _not_ want a reentrant scanner, although it is not necessary. The
3267default is to generate a non-reentrant scanner.
3268
3269
3270File: flex.info,  Node: Extra Reentrant Argument,  Next: Global Replacement,  Prev: Specify Reentrant,  Up: Reentrant Detail
3271
327219.4.2 The Extra Argument
3273-------------------------
3274
3275All functions take one additional argument: `yyscanner'.
3276
3277   Notice that the calls to `yy_push_state' and `yy_pop_state' both
3278have an argument, `yyscanner' , that is not present in a non-reentrant
3279scanner.  Here are the declarations of `yy_push_state' and
3280`yy_pop_state' in the reentrant scanner:
3281
3282         static void yy_push_state  ( int new_state , yyscan_t yyscanner ) ;
3283         static void yy_pop_state  ( yyscan_t yyscanner  ) ;
3284
3285   Notice that the argument `yyscanner' appears in the declaration of
3286both functions.  In fact, all `flex' functions in a reentrant scanner
3287have this additional argument.  It is always the last argument in the
3288argument list, it is always of type `yyscan_t' (which is typedef'd to
3289`void *') and it is always named `yyscanner'.  As you may have guessed,
3290`yyscanner' is a pointer to an opaque data structure encapsulating the
3291current state of the scanner.  For a list of function declarations, see
3292*note Reentrant Functions::. Note that preprocessor macros, such as
3293`BEGIN', `ECHO', and `REJECT', do not take this additional argument.
3294
3295
3296File: flex.info,  Node: Global Replacement,  Next: Init and Destroy Functions,  Prev: Extra Reentrant Argument,  Up: Reentrant Detail
3297
329819.4.3 Global Variables Replaced By Macros
3299------------------------------------------
3300
3301All global variables in traditional flex have been replaced by macro
3302equivalents.
3303
3304   Note that in the above example, `yyout' and `yytext' are not plain
3305variables. These are macros that will expand to their equivalent lvalue.
3306All of the familiar `flex' globals have been replaced by their macro
3307equivalents. In particular, `yytext', `yyleng', `yylineno', `yyin',
3308`yyout', `yyextra', `yylval', and `yylloc' are macros. You may safely
3309use these macros in actions as if they were plain variables. We only
3310tell you this so you don't expect to link to these variables
3311externally. Currently, each macro expands to a member of an internal
3312struct, e.g.,
3313
3314     #define yytext (((struct yyguts_t*)yyscanner)->yytext_r)
3315
3316   One important thing to remember about `yytext' and friends is that
3317`yytext' is not a global variable in a reentrant scanner, you can not
3318access it directly from outside an action or from other functions. You
3319must use an accessor method, e.g., `yyget_text', to accomplish this.
3320(See below).
3321
3322
3323File: flex.info,  Node: Init and Destroy Functions,  Next: Accessor Methods,  Prev: Global Replacement,  Up: Reentrant Detail
3324
332519.4.4 Init and Destroy Functions
3326---------------------------------
3327
3328`yylex_init' and `yylex_destroy' must be called before and after
3329`yylex', respectively.
3330
3331         int yylex_init ( yyscan_t * ptr_yy_globals ) ;
3332         int yylex_init_extra ( YY_EXTRA_TYPE user_defined, yyscan_t * ptr_yy_globals ) ;
3333         int yylex ( yyscan_t yyscanner ) ;
3334         int yylex_destroy ( yyscan_t yyscanner ) ;
3335
3336   The function `yylex_init' must be called before calling any other
3337function. The argument to `yylex_init' is the address of an
3338uninitialized pointer to be filled in by `yylex_init', overwriting any
3339previous contents. The function `yylex_init_extra' may be used instead,
3340taking as its first argument a variable of type `YY_EXTRA_TYPE'.  See
3341the section on yyextra, below, for more details.
3342
3343   The value stored in `ptr_yy_globals' should thereafter be passed to
3344`yylex' and `yylex_destroy'.  Flex does not save the argument passed to
3345`yylex_init', so it is safe to pass the address of a local pointer to
3346`yylex_init' so long as it remains in scope for the duration of all
3347calls to the scanner, up to and including the call to `yylex_destroy'.
3348
3349   The function `yylex' should be familiar to you by now. The reentrant
3350version takes one argument, which is the value returned (via an
3351argument) by `yylex_init'.  Otherwise, it behaves the same as the
3352non-reentrant version of `yylex'.
3353
3354   Both `yylex_init' and `yylex_init_extra' returns 0 (zero) on success,
3355or non-zero on failure, in which case errno is set to one of the
3356following values:
3357
3358   * ENOMEM Memory allocation error. *Note memory-management::.
3359
3360   * EINVAL Invalid argument.
3361
3362   The function `yylex_destroy' should be called to free resources used
3363by the scanner. After `yylex_destroy' is called, the contents of
3364`yyscanner' should not be used.  Of course, there is no need to destroy
3365a scanner if you plan to reuse it.  A `flex' scanner (both reentrant
3366and non-reentrant) may be restarted by calling `yyrestart'.
3367
3368   Below is an example of a program that creates a scanner, uses it,
3369then destroys it when done:
3370
3371         int main ()
3372         {
3373             yyscan_t scanner;
3374             int tok;
3375
3376             yylex_init(&scanner);
3377
3378             while ((tok=yylex(scanner)) > 0)
3379                 printf("tok=%d  yytext=%s\n", tok, yyget_text(scanner));
3380
3381             yylex_destroy(scanner);
3382             return 0;
3383         }
3384
3385
3386File: flex.info,  Node: Accessor Methods,  Next: Extra Data,  Prev: Init and Destroy Functions,  Up: Reentrant Detail
3387
338819.4.5 Accessing Variables with Reentrant Scanners
3389--------------------------------------------------
3390
3391Accessor methods (get/set functions) provide access to common `flex'
3392variables.
3393
3394   Many scanners that you build will be part of a larger project.
3395Portions of your project will need access to `flex' values, such as
3396`yytext'.  In a non-reentrant scanner, these values are global, so
3397there is no problem accessing them. However, in a reentrant scanner,
3398there are no global `flex' values. You can not access them directly.
3399Instead, you must access `flex' values using accessor methods (get/set
3400functions). Each accessor method is named `yyget_NAME' or `yyset_NAME',
3401where `NAME' is the name of the `flex' variable you want. For example:
3402
3403         /* Set the last character of yytext to NULL. */
3404         void chop ( yyscan_t scanner )
3405         {
3406             int len = yyget_leng( scanner );
3407             yyget_text( scanner )[len - 1] = '\0';
3408         }
3409
3410   The above code may be called from within an action like this:
3411
3412         %%
3413         .+\n    { chop( yyscanner );}
3414
3415   You may find that `%option header-file' is particularly useful for
3416generating prototypes of all the accessor functions. *Note
3417option-header::.
3418
3419
3420File: flex.info,  Node: Extra Data,  Next: About yyscan_t,  Prev: Accessor Methods,  Up: Reentrant Detail
3421
342219.4.6 Extra Data
3423-----------------
3424
3425User-specific data can be stored in `yyextra'.
3426
3427   In a reentrant scanner, it is unwise to use global variables to
3428communicate with or maintain state between different pieces of your
3429program.  However, you may need access to external data or invoke
3430external functions from within the scanner actions.  Likewise, you may
3431need to pass information to your scanner (e.g., open file descriptors,
3432or database connections).  In a non-reentrant scanner, the only way to
3433do this would be through the use of global variables.  `Flex' allows
3434you to store arbitrary, "extra" data in a scanner.  This data is
3435accessible through the accessor methods `yyget_extra' and `yyset_extra'
3436from outside the scanner, and through the shortcut macro `yyextra' from
3437within the scanner itself. They are defined as follows:
3438
3439         #define YY_EXTRA_TYPE  void*
3440         YY_EXTRA_TYPE  yyget_extra ( yyscan_t scanner );
3441         void           yyset_extra ( YY_EXTRA_TYPE arbitrary_data , yyscan_t scanner);
3442
3443   In addition, an extra form of `yylex_init' is provided,
3444`yylex_init_extra'. This function is provided so that the yyextra value
3445can be accessed from within the very first yyalloc, used to allocate
3446the scanner itself.
3447
3448   By default, `YY_EXTRA_TYPE' is defined as type `void *'.  You may
3449redefine this type using `%option extra-type="your_type"' in the
3450scanner:
3451
3452         /* An example of overriding YY_EXTRA_TYPE. */
3453         %{
3454         #include <sys/stat.h>
3455         #include <unistd.h>
3456         %}
3457         %option reentrant
3458         %option extra-type="struct stat *"
3459         %%
3460
3461         __filesize__     printf( "%ld", yyextra->st_size  );
3462         __lastmod__      printf( "%ld", yyextra->st_mtime );
3463         %%
3464         void scan_file( char* filename )
3465         {
3466             yyscan_t scanner;
3467             struct stat buf;
3468             FILE *in;
3469
3470             in = fopen( filename, "r" );
3471             stat( filename, &buf );
3472
3473             yylex_init_extra( buf, &scanner );
3474             yyset_in( in, scanner );
3475             yylex( scanner );
3476             yylex_destroy( scanner );
3477
3478             fclose( in );
3479        }
3480
3481
3482File: flex.info,  Node: About yyscan_t,  Prev: Extra Data,  Up: Reentrant Detail
3483
348419.4.7 About yyscan_t
3485---------------------
3486
3487`yyscan_t' is defined as:
3488
3489          typedef void* yyscan_t;
3490
3491   It is initialized by `yylex_init()' to point to an internal
3492structure. You should never access this value directly. In particular,
3493you should never attempt to free it (use `yylex_destroy()' instead.)
3494
3495
3496File: flex.info,  Node: Reentrant Functions,  Prev: Reentrant Detail,  Up: Reentrant
3497
349819.5 Functions and Macros Available in Reentrant C Scanners
3499===========================================================
3500
3501The following Functions are available in a reentrant scanner:
3502
3503         char *yyget_text ( yyscan_t scanner );
3504         int yyget_leng ( yyscan_t scanner );
3505         FILE *yyget_in ( yyscan_t scanner );
3506         FILE *yyget_out ( yyscan_t scanner );
3507         int yyget_lineno ( yyscan_t scanner );
3508         YY_EXTRA_TYPE yyget_extra ( yyscan_t scanner );
3509         int  yyget_debug ( yyscan_t scanner );
3510
3511         void yyset_debug ( int flag, yyscan_t scanner );
3512         void yyset_in  ( FILE * in_str , yyscan_t scanner );
3513         void yyset_out  ( FILE * out_str , yyscan_t scanner );
3514         void yyset_lineno ( int line_number , yyscan_t scanner );
3515         void yyset_extra ( YY_EXTRA_TYPE user_defined , yyscan_t scanner );
3516
3517   There are no "set" functions for yytext and yyleng. This is
3518intentional.
3519
3520   The following Macro shortcuts are available in actions in a reentrant
3521scanner:
3522
3523         yytext
3524         yyleng
3525         yyin
3526         yyout
3527         yylineno
3528         yyextra
3529         yy_flex_debug
3530
3531   In a reentrant C scanner, support for yylineno is always present
3532(i.e., you may access yylineno), but the value is never modified by
3533`flex' unless `%option yylineno' is enabled. This is to allow the user
3534to maintain the line count independently of `flex'.
3535
3536   The following functions and macros are made available when `%option
3537bison-bridge' (`--bison-bridge') is specified:
3538
3539         YYSTYPE * yyget_lval ( yyscan_t scanner );
3540         void yyset_lval ( YYSTYPE * yylvalp , yyscan_t scanner );
3541         yylval
3542
3543   The following functions and macros are made available when `%option
3544bison-locations' (`--bison-locations') is specified:
3545
3546         YYLTYPE *yyget_lloc ( yyscan_t scanner );
3547         void yyset_lloc ( YYLTYPE * yyllocp , yyscan_t scanner );
3548         yylloc
3549
3550   Support for yylval assumes that `YYSTYPE' is a valid type.  Support
3551for yylloc assumes that `YYSLYPE' is a valid type.  Typically, these
3552types are generated by `bison', and are included in section 1 of the
3553`flex' input.
3554
3555
3556File: flex.info,  Node: Lex and Posix,  Next: Memory Management,  Prev: Reentrant,  Up: Top
3557
355820 Incompatibilities with Lex and Posix
3559***************************************
3560
3561`flex' is a rewrite of the AT&T Unix _lex_ tool (the two
3562implementations do not share any code, though), with some extensions and
3563incompatibilities, both of which are of concern to those who wish to
3564write scanners acceptable to both implementations.  `flex' is fully
3565compliant with the POSIX `lex' specification, except that when using
3566`%pointer' (the default), a call to `unput()' destroys the contents of
3567`yytext', which is counter to the POSIX specification.  In this section
3568we discuss all of the known areas of incompatibility between `flex',
3569AT&T `lex', and the POSIX specification.  `flex''s `-l' option turns on
3570maximum compatibility with the original AT&T `lex' implementation, at
3571the cost of a major loss in the generated scanner's performance.  We
3572note below which incompatibilities can be overcome using the `-l'
3573option.  `flex' is fully compatible with `lex' with the following
3574exceptions:
3575
3576   * The undocumented `lex' scanner internal variable `yylineno' is not
3577     supported unless `-l' or `%option yylineno' is used.
3578
3579   * `yylineno' should be maintained on a per-buffer basis, rather than
3580     a per-scanner (single global variable) basis.
3581
3582   * `yylineno' is not part of the POSIX specification.
3583
3584   * The `input()' routine is not redefinable, though it may be called
3585     to read characters following whatever has been matched by a rule.
3586     If `input()' encounters an end-of-file the normal `yywrap()'
3587     processing is done.  A "real" end-of-file is returned by `input()'
3588     as `EOF'.
3589
3590   * Input is instead controlled by defining the `YY_INPUT()' macro.
3591
3592   * The `flex' restriction that `input()' cannot be redefined is in
3593     accordance with the POSIX specification, which simply does not
3594     specify any way of controlling the scanner's input other than by
3595     making an initial assignment to `yyin'.
3596
3597   * The `unput()' routine is not redefinable.  This restriction is in
3598     accordance with POSIX.
3599
3600   * `flex' scanners are not as reentrant as `lex' scanners.  In
3601     particular, if you have an interactive scanner and an interrupt
3602     handler which long-jumps out of the scanner, and the scanner is
3603     subsequently called again, you may get the following message:
3604
3605              fatal flex scanner internal error--end of buffer missed
3606
3607     To reenter the scanner, first use:
3608
3609              yyrestart( yyin );
3610
3611     Note that this call will throw away any buffered input; usually
3612     this isn't a problem with an interactive scanner. *Note
3613     Reentrant::, for `flex''s reentrant API.
3614
3615   * Also note that `flex' C++ scanner classes _are_ reentrant, so if
3616     using C++ is an option for you, you should use them instead.
3617     *Note Cxx::, and *note Reentrant::  for details.
3618
3619   * `output()' is not supported.  Output from the ECHO macro is done
3620     to the file-pointer `yyout' (default `stdout)'.
3621
3622   * `output()' is not part of the POSIX specification.
3623
3624   * `lex' does not support exclusive start conditions (%x), though they
3625     are in the POSIX specification.
3626
3627   * When definitions are expanded, `flex' encloses them in parentheses.
3628     With `lex', the following:
3629
3630              NAME    [A-Z][A-Z0-9]*
3631              %%
3632              foo{NAME}?      printf( "Found it\n" );
3633              %%
3634
3635     will not match the string `foo' because when the macro is expanded
3636     the rule is equivalent to `foo[A-Z][A-Z0-9]*?'  and the precedence
3637     is such that the `?' is associated with `[A-Z0-9]*'.  With `flex',
3638     the rule will be expanded to `foo([A-Z][A-Z0-9]*)?' and so the
3639     string `foo' will match.
3640
3641   * Note that if the definition begins with `^' or ends with `$' then
3642     it is _not_ expanded with parentheses, to allow these operators to
3643     appear in definitions without losing their special meanings.  But
3644     the `<s>', `/', and `<<EOF>>' operators cannot be used in a `flex'
3645     definition.
3646
3647   * Using `-l' results in the `lex' behavior of no parentheses around
3648     the definition.
3649
3650   * The POSIX specification is that the definition be enclosed in
3651     parentheses.
3652
3653   * Some implementations of `lex' allow a rule's action to begin on a
3654     separate line, if the rule's pattern has trailing whitespace:
3655
3656              %%
3657              foo|bar<space here>
3658                { foobar_action();}
3659
3660     `flex' does not support this feature.
3661
3662   * The `lex' `%r' (generate a Ratfor scanner) option is not
3663     supported.  It is not part of the POSIX specification.
3664
3665   * After a call to `unput()', _yytext_ is undefined until the next
3666     token is matched, unless the scanner was built using `%array'.
3667     This is not the case with `lex' or the POSIX specification.  The
3668     `-l' option does away with this incompatibility.
3669
3670   * The precedence of the `{,}' (numeric range) operator is different.
3671     The AT&T and POSIX specifications of `lex' interpret `abc{1,3}' as
3672     match one, two, or three occurrences of `abc'", whereas `flex'
3673     interprets it as "match `ab' followed by one, two, or three
3674     occurrences of `c'".  The `-l' and `--posix' options do away with
3675     this incompatibility.
3676
3677   * The precedence of the `^' operator is different.  `lex' interprets
3678     `^foo|bar' as "match either 'foo' at the beginning of a line, or
3679     'bar' anywhere", whereas `flex' interprets it as "match either
3680     `foo' or `bar' if they come at the beginning of a line".  The
3681     latter is in agreement with the POSIX specification.
3682
3683   * The special table-size declarations such as `%a' supported by
3684     `lex' are not required by `flex' scanners..  `flex' ignores them.
3685
3686   * The name `FLEX_SCANNER' is `#define''d so scanners may be written
3687     for use with either `flex' or `lex'.  Scanners also include
3688     `YY_FLEX_MAJOR_VERSION',  `YY_FLEX_MINOR_VERSION' and
3689     `YY_FLEX_SUBMINOR_VERSION' indicating which version of `flex'
3690     generated the scanner. For example, for the 2.5.22 release, these
3691     defines would be 2,  5 and 22 respectively. If the version of
3692     `flex' being used is a beta version, then the symbol `FLEX_BETA'
3693     is defined.
3694
3695   * The symbols `[[' and `]]' in the code sections of the input may
3696     conflict with the m4 delimiters. *Note M4 Dependency::.
3697
3698
3699   The following `flex' features are not included in `lex' or the POSIX
3700specification:
3701
3702   * C++ scanners
3703
3704   * %option
3705
3706   * start condition scopes
3707
3708   * start condition stacks
3709
3710   * interactive/non-interactive scanners
3711
3712   * yy_scan_string() and friends
3713
3714   * yyterminate()
3715
3716   * yy_set_interactive()
3717
3718   * yy_set_bol()
3719
3720   * YY_AT_BOL()    <<EOF>>
3721
3722   * <*>
3723
3724   * YY_DECL
3725
3726   * YY_START
3727
3728   * YY_USER_ACTION
3729
3730   * YY_USER_INIT
3731
3732   * #line directives
3733
3734   * %{}'s around actions
3735
3736   * reentrant C API
3737
3738   * multiple actions on a line
3739
3740   * almost all of the `flex' command-line options
3741
3742   The feature "multiple actions on a line" refers to the fact that
3743with `flex' you can put multiple actions on the same line, separated
3744with semi-colons, while with `lex', the following:
3745
3746         foo    handle_foo(); ++num_foos_seen;
3747
3748   is (rather surprisingly) truncated to
3749
3750         foo    handle_foo();
3751
3752   `flex' does not truncate the action.  Actions that are not enclosed
3753in braces are simply terminated at the end of the line.
3754
3755
3756File: flex.info,  Node: Memory Management,  Next: Serialized Tables,  Prev: Lex and Posix,  Up: Top
3757
375821 Memory Management
3759********************
3760
3761This chapter describes how flex handles dynamic memory, and how you can
3762override the default behavior.
3763
3764* Menu:
3765
3766* The Default Memory Management::
3767* Overriding The Default Memory Management::
3768* A Note About yytext And Memory::
3769
3770
3771File: flex.info,  Node: The Default Memory Management,  Next: Overriding The Default Memory Management,  Prev: Memory Management,  Up: Memory Management
3772
377321.1 The Default Memory Management
3774==================================
3775
3776Flex allocates dynamic memory during initialization, and once in a
3777while from within a call to yylex(). Initialization takes place during
3778the first call to yylex(). Thereafter, flex may reallocate more memory
3779if it needs to enlarge a buffer. As of version 2.5.9 Flex will clean up
3780all memory when you call `yylex_destroy' *Note faq-memory-leak::.
3781
3782   Flex allocates dynamic memory for four purposes, listed below (1)
3783
378416kB for the input buffer.
3785     Flex allocates memory for the character buffer used to perform
3786     pattern matching.  Flex must read ahead from the input stream and
3787     store it in a large character buffer.  This buffer is typically
3788     the largest chunk of dynamic memory flex consumes. This buffer
3789     will grow if necessary, doubling the size each time.  Flex frees
3790     this memory when you call yylex_destroy().  The default size of
3791     this buffer (16384 bytes) is almost always too large.  The ideal
3792     size for this buffer is the length of the longest token expected,
3793     in bytes, plus a little more.  Flex will allocate a few extra
3794     bytes for housekeeping. Currently, to override the size of the
3795     input buffer you must `#define YY_BUF_SIZE' to whatever number of
3796     bytes you want. We don't plan to change this in the near future,
3797     but we reserve the right to do so if we ever add a more robust
3798     memory management API.
3799
380064kb for the REJECT state. This will only be allocated if you use REJECT.
3801     The size is  large enough to hold the same number of states as
3802     characters in the input buffer. If you override the size of the
3803     input buffer (via `YY_BUF_SIZE'), then you automatically override
3804     the size of this buffer as well.
3805
3806100 bytes for the start condition stack.
3807     Flex allocates memory for the start condition stack. This is the
3808     stack used for pushing start states, i.e., with yy_push_state().
3809     It will grow if necessary.  Since the states are simply integers,
3810     this stack doesn't consume much memory.  This stack is not present
3811     if `%option stack' is not specified.  You will rarely need to tune
3812     this buffer. The ideal size for this stack is the maximum depth
3813     expected.  The memory for this stack is automatically destroyed
3814     when you call yylex_destroy(). *Note option-stack::.
3815
381640 bytes for each YY_BUFFER_STATE.
3817     Flex allocates memory for each YY_BUFFER_STATE. The buffer state
3818     itself is about 40 bytes, plus an additional large character
3819     buffer (described above.)  The initial buffer state is created
3820     during initialization, and with each call to yy_create_buffer().
3821     You can't tune the size of this, but you can tune the character
3822     buffer as described above. Any buffer state that you explicitly
3823     create by calling yy_create_buffer() is _NOT_ destroyed
3824     automatically. You must call yy_delete_buffer() to free the
3825     memory. The exception to this rule is that flex will delete the
3826     current buffer automatically when you call yylex_destroy(). If you
3827     delete the current buffer, be sure to set it to NULL.  That way,
3828     flex will not try to delete the buffer a second time (possibly
3829     crashing your program!) At the time of this writing, flex does not
3830     provide a growable stack for the buffer states.  You have to
3831     manage that yourself.  *Note Multiple Input Buffers::.
3832
383384 bytes for the reentrant scanner guts
3834     Flex allocates about 84 bytes for the reentrant scanner structure
3835     when you call yylex_init(). It is destroyed when the user calls
3836     yylex_destroy().
3837
3838
3839   ---------- Footnotes ----------
3840
3841   (1) The quantities given here are approximate, and may vary due to
3842host architecture, compiler configuration, or due to future
3843enhancements to flex.
3844
3845
3846File: flex.info,  Node: Overriding The Default Memory Management,  Next: A Note About yytext And Memory,  Prev: The Default Memory Management,  Up: Memory Management
3847
384821.2 Overriding The Default Memory Management
3849=============================================
3850
3851Flex calls the functions `yyalloc', `yyrealloc', and `yyfree' when it
3852needs to allocate or free memory. By default, these functions are
3853wrappers around the standard C functions, `malloc', `realloc', and
3854`free', respectively. You can override the default implementations by
3855telling flex that you will provide your own implementations.
3856
3857   To override the default implementations, you must do two things:
3858
3859  1. Suppress the default implementations by specifying one or more of
3860     the following options:
3861
3862        * `%option noyyalloc'
3863
3864        * `%option noyyrealloc'
3865
3866        * `%option noyyfree'.
3867
3868  2. Provide your own implementation of the following functions: (1)
3869
3870          // For a non-reentrant scanner
3871          void * yyalloc (size_t bytes);
3872          void * yyrealloc (void * ptr, size_t bytes);
3873          void   yyfree (void * ptr);
3874
3875          // For a reentrant scanner
3876          void * yyalloc (size_t bytes, void * yyscanner);
3877          void * yyrealloc (void * ptr, size_t bytes, void * yyscanner);
3878          void   yyfree (void * ptr, void * yyscanner);
3879
3880
3881   In the following example, we will override all three memory
3882routines. We assume that there is a custom allocator with garbage
3883collection. In order to make this example interesting, we will use a
3884reentrant scanner, passing a pointer to the custom allocator through
3885`yyextra'.
3886
3887     %{
3888     #include "some_allocator.h"
3889     %}
3890
3891     /* Suppress the default implementations. */
3892     %option noyyalloc noyyrealloc noyyfree
3893     %option reentrant
3894
3895     /* Initialize the allocator. */
3896     #define YY_EXTRA_TYPE  struct allocator*
3897     #define YY_USER_INIT  yyextra = allocator_create();
3898
3899     %%
3900     .|\n   ;
3901     %%
3902
3903     /* Provide our own implementations. */
3904     void * yyalloc (size_t bytes, void* yyscanner) {
3905         return allocator_alloc (yyextra, bytes);
3906     }
3907
3908     void * yyrealloc (void * ptr, size_t bytes, void* yyscanner) {
3909         return allocator_realloc (yyextra, bytes);
3910     }
3911
3912     void yyfree (void * ptr, void * yyscanner) {
3913         /* Do nothing -- we leave it to the garbage collector. */
3914     }
3915
3916   ---------- Footnotes ----------
3917
3918   (1) It is not necessary to override all (or any) of the memory
3919management routines.  You may, for example, override `yyrealloc', but
3920not `yyfree' or `yyalloc'.
3921
3922
3923File: flex.info,  Node: A Note About yytext And Memory,  Prev: Overriding The Default Memory Management,  Up: Memory Management
3924
392521.3 A Note About yytext And Memory
3926===================================
3927
3928When flex finds a match, `yytext' points to the first character of the
3929match in the input buffer. The string itself is part of the input
3930buffer, and is _NOT_ allocated separately. The value of yytext will be
3931overwritten the next time yylex() is called. In short, the value of
3932yytext is only valid from within the matched rule's action.
3933
3934   Often, you want the value of yytext to persist for later processing,
3935i.e., by a parser with non-zero lookahead. In order to preserve yytext,
3936you will have to copy it with strdup() or a similar function. But this
3937introduces some headache because your parser is now responsible for
3938freeing the copy of yytext. If you use a yacc or bison parser,
3939(commonly used with flex), you will discover that the error recovery
3940mechanisms can cause memory to be leaked.
3941
3942   To prevent memory leaks from strdup'd yytext, you will have to track
3943the memory somehow. Our experience has shown that a garbage collection
3944mechanism or a pooled memory mechanism will save you a lot of grief
3945when writing parsers.
3946
3947
3948File: flex.info,  Node: Serialized Tables,  Next: Diagnostics,  Prev: Memory Management,  Up: Top
3949
395022 Serialized Tables
3951********************
3952
3953A `flex' scanner has the ability to save the DFA tables to a file, and
3954load them at runtime when needed.  The motivation for this feature is
3955to reduce the runtime memory footprint.  Traditionally, these tables
3956have been compiled into the scanner as C arrays, and are sometimes
3957quite large.  Since the tables are compiled into the scanner, the
3958memory used by the tables can never be freed.  This is a waste of
3959memory, especially if an application uses several scanners, but none of
3960them at the same time.
3961
3962   The serialization feature allows the tables to be loaded at runtime,
3963before scanning begins. The tables may be discarded when scanning is
3964finished.
3965
3966* Menu:
3967
3968* Creating Serialized Tables::
3969* Loading and Unloading Serialized Tables::
3970* Tables File Format::
3971
3972
3973File: flex.info,  Node: Creating Serialized Tables,  Next: Loading and Unloading Serialized Tables,  Prev: Serialized Tables,  Up: Serialized Tables
3974
397522.1 Creating Serialized Tables
3976===============================
3977
3978You may create a scanner with serialized tables by specifying:
3979
3980         %option tables-file=FILE
3981     or
3982         --tables-file=FILE
3983
3984   These options instruct flex to save the DFA tables to the file FILE.
3985The tables will _not_ be embedded in the generated scanner. The scanner
3986will not function on its own. The scanner will be dependent upon the
3987serialized tables. You must load the tables from this file at runtime
3988before you can scan anything.
3989
3990   If you do not specify a filename to `--tables-file', the tables will
3991be saved to `lex.yy.tables', where `yy' is the appropriate prefix.
3992
3993   If your project uses several different scanners, you can concatenate
3994the serialized tables into one file, and flex will find the correct set
3995of tables, using the scanner prefix as part of the lookup key. An
3996example follows:
3997
3998     $ flex --tables-file --prefix=cpp cpp.l
3999     $ flex --tables-file --prefix=c   c.l
4000     $ cat lex.cpp.tables lex.c.tables  >  all.tables
4001
4002   The above example created two scanners, `cpp', and `c'. Since we did
4003not specify a filename, the tables were serialized to `lex.c.tables' and
4004`lex.cpp.tables', respectively. Then, we concatenated the two files
4005together into `all.tables', which we will distribute with our project.
4006At runtime, we will open the file and tell flex to load the tables from
4007it.  Flex will find the correct tables automatically. (See next
4008section).
4009
4010
4011File: flex.info,  Node: Loading and Unloading Serialized Tables,  Next: Tables File Format,  Prev: Creating Serialized Tables,  Up: Serialized Tables
4012
401322.2 Loading and Unloading Serialized Tables
4014============================================
4015
4016If you've built your scanner with `%option tables-file', then you must
4017load the scanner tables at runtime. This can be accomplished with the
4018following function:
4019
4020 -- Function: int yytables_fload (FILE* FP [, yyscan_t SCANNER])
4021     Locates scanner tables in the stream pointed to by FP and loads
4022     them.  Memory for the tables is allocated via `yyalloc'.  You must
4023     call this function before the first call to `yylex'. The argument
4024     SCANNER only appears in the reentrant scanner.  This function
4025     returns `0' (zero) on success, or non-zero on error.
4026
4027   The loaded tables are *not* automatically destroyed (unloaded) when
4028you call `yylex_destroy'. The reason is that you may create several
4029scanners of the same type (in a reentrant scanner), each of which needs
4030access to these tables.  To avoid a nasty memory leak, you must call
4031the following function:
4032
4033 -- Function: int yytables_destroy ([yyscan_t SCANNER])
4034     Unloads the scanner tables. The tables must be loaded again before
4035     you can scan any more data.  The argument SCANNER only appears in
4036     the reentrant scanner.  This function returns `0' (zero) on
4037     success, or non-zero on error.
4038
4039   *The functions `yytables_fload' and `yytables_destroy' are not
4040thread-safe.* You must ensure that these functions are called exactly
4041once (for each scanner type) in a threaded program, before any thread
4042calls `yylex'.  After the tables are loaded, they are never written to,
4043and no thread protection is required thereafter - until you destroy
4044them.
4045
4046
4047File: flex.info,  Node: Tables File Format,  Prev: Loading and Unloading Serialized Tables,  Up: Serialized Tables
4048
404922.3 Tables File Format
4050=======================
4051
4052This section defines the file format of serialized `flex' tables.
4053
4054   The tables format allows for one or more sets of tables to be
4055specified, where each set corresponds to a given scanner. Scanners are
4056indexed by name, as described below. The file format is as follows:
4057
4058                      TABLE SET 1
4059                     +-------------------------------+
4060             Header  | uint32          th_magic;     |
4061                     | uint32          th_hsize;     |
4062                     | uint32          th_ssize;     |
4063                     | uint16          th_flags;     |
4064                     | char            th_version[]; |
4065                     | char            th_name[];    |
4066                     | uint8           th_pad64[];   |
4067                     +-------------------------------+
4068             Table 1 | uint16          td_id;        |
4069                     | uint16          td_flags;     |
4070                     | uint32          td_hilen;     |
4071                     | uint32          td_lolen;     |
4072                     | void            td_data[];    |
4073                     | uint8           td_pad64[];   |
4074                     +-------------------------------+
4075             Table 2 |                               |
4076                .    .                               .
4077                .    .                               .
4078                .    .                               .
4079                .    .                               .
4080             Table n |                               |
4081                     +-------------------------------+
4082                      TABLE SET 2
4083                           .
4084                           .
4085                           .
4086                      TABLE SET N
4087
4088   The above diagram shows that a complete set of tables consists of a
4089header followed by multiple individual tables. Furthermore, multiple
4090complete sets may be present in the same file, each set with its own
4091header and tables. The sets are contiguous in the file. The only way to
4092know if another set follows is to check the next four bytes for the
4093magic number (or check for EOF). The header and tables sections are
4094padded to 64-bit boundaries. Below we describe each field in detail.
4095This format does not specify how the scanner will expand the given
4096data, i.e., data may be serialized as int8, but expanded to an int32
4097array at runtime. This is to reduce the size of the serialized data
4098where possible.  Remember, _all integer values are in network byte
4099order_.
4100
4101Fields of a table header:
4102
4103`th_magic'
4104     Magic number, always 0xF13C57B1.
4105
4106`th_hsize'
4107     Size of this entire header, in bytes, including all fields plus
4108     any padding.
4109
4110`th_ssize'
4111     Size of this entire set, in bytes, including the header, all
4112     tables, plus any padding.
4113
4114`th_flags'
4115     Bit flags for this table set. Currently unused.
4116
4117`th_version[]'
4118     Flex version in NULL-terminated string format. e.g., `2.5.13a'.
4119     This is the version of flex that was used to create the serialized
4120     tables.
4121
4122`th_name[]'
4123     Contains the name of this table set. The default is `yytables',
4124     and is prefixed accordingly, e.g., `footables'. Must be
4125     NULL-terminated.
4126
4127`th_pad64[]'
4128     Zero or more NULL bytes, padding the entire header to the next
4129     64-bit boundary as calculated from the beginning of the header.
4130
4131Fields of a table:
4132
4133`td_id'
4134     Specifies the table identifier. Possible values are:
4135    `YYTD_ID_ACCEPT (0x01)'
4136          `yy_accept'
4137
4138    `YYTD_ID_BASE   (0x02)'
4139          `yy_base'
4140
4141    `YYTD_ID_CHK    (0x03)'
4142          `yy_chk'
4143
4144    `YYTD_ID_DEF    (0x04)'
4145          `yy_def'
4146
4147    `YYTD_ID_EC     (0x05)'
4148          `yy_ec '
4149
4150    `YYTD_ID_META   (0x06)'
4151          `yy_meta'
4152
4153    `YYTD_ID_NUL_TRANS (0x07)'
4154          `yy_NUL_trans'
4155
4156    `YYTD_ID_NXT (0x08)'
4157          `yy_nxt'. This array may be two dimensional. See the
4158          `td_hilen' field below.
4159
4160    `YYTD_ID_RULE_CAN_MATCH_EOL (0x09)'
4161          `yy_rule_can_match_eol'
4162
4163    `YYTD_ID_START_STATE_LIST (0x0A)'
4164          `yy_start_state_list'. This array is handled specially
4165          because it is an array of pointers to structs. See the
4166          `td_flags' field below.
4167
4168    `YYTD_ID_TRANSITION (0x0B)'
4169          `yy_transition'. This array is handled specially because it
4170          is an array of structs. See the `td_lolen' field below.
4171
4172    `YYTD_ID_ACCLIST (0x0C)'
4173          `yy_acclist'
4174
4175`td_flags'
4176     Bit flags describing how to interpret the data in `td_data'.  The
4177     data arrays are one-dimensional by default, but may be two
4178     dimensional as specified in the `td_hilen' field.
4179
4180    `YYTD_DATA8 (0x01)'
4181          The data is serialized as an array of type int8.
4182
4183    `YYTD_DATA16 (0x02)'
4184          The data is serialized as an array of type int16.
4185
4186    `YYTD_DATA32 (0x04)'
4187          The data is serialized as an array of type int32.
4188
4189    `YYTD_PTRANS (0x08)'
4190          The data is a list of indexes of entries in the expanded
4191          `yy_transition' array.  Each index should be expanded to a
4192          pointer to the corresponding entry in the `yy_transition'
4193          array. We count on the fact that the `yy_transition' array
4194          has already been seen.
4195
4196    `YYTD_STRUCT (0x10)'
4197          The data is a list of yy_trans_info structs, each of which
4198          consists of two integers. There is no padding between struct
4199          elements or between structs.  The type of each member is
4200          determined by the `YYTD_DATA*' bits.
4201
4202`td_hilen'
4203     If `td_hilen' is non-zero, then the data is a two-dimensional
4204     array.  Otherwise, the data is a one-dimensional array. `td_hilen'
4205     contains the number of elements in the higher dimensional array,
4206     and `td_lolen' contains the number of elements in the lowest
4207     dimension.
4208
4209     Conceptually, `td_data' is either `sometype td_data[td_lolen]', or
4210     `sometype td_data[td_hilen][td_lolen]', where `sometype' is
4211     specified by the `td_flags' field.  It is possible for both
4212     `td_lolen' and `td_hilen' to be zero, in which case `td_data' is a
4213     zero length array, and no data is loaded, i.e., this table is
4214     simply skipped. Flex does not currently generate tables of zero
4215     length.
4216
4217`td_lolen'
4218     Specifies the number of elements in the lowest dimension array. If
4219     this is a one-dimensional array, then it is simply the number of
4220     elements in this array.  The element size is determined by the
4221     `td_flags' field.
4222
4223`td_data[]'
4224     The table data. This array may be a one- or two-dimensional array,
4225     of type `int8', `int16', `int32', `struct yy_trans_info', or
4226     `struct yy_trans_info*',  depending upon the values in the
4227     `td_flags', `td_hilen', and `td_lolen' fields.
4228
4229`td_pad64[]'
4230     Zero or more NULL bytes, padding the entire table to the next
4231     64-bit boundary as calculated from the beginning of this table.
4232
4233
4234File: flex.info,  Node: Diagnostics,  Next: Limitations,  Prev: Serialized Tables,  Up: Top
4235
423623 Diagnostics
4237**************
4238
4239The following is a list of `flex' diagnostic messages:
4240
4241   * `warning, rule cannot be matched' indicates that the given rule
4242     cannot be matched because it follows other rules that will always
4243     match the same text as it.  For example, in the following `foo'
4244     cannot be matched because it comes after an identifier "catch-all"
4245     rule:
4246
4247              [a-z]+    got_identifier();
4248              foo       got_foo();
4249
4250     Using `REJECT' in a scanner suppresses this warning.
4251
4252   * `warning, -s option given but default rule can be matched' means
4253     that it is possible (perhaps only in a particular start condition)
4254     that the default rule (match any single character) is the only one
4255     that will match a particular input.  Since `-s' was given,
4256     presumably this is not intended.
4257
4258   * `reject_used_but_not_detected undefined' or
4259     `yymore_used_but_not_detected undefined'. These errors can occur
4260     at compile time.  They indicate that the scanner uses `REJECT' or
4261     `yymore()' but that `flex' failed to notice the fact, meaning that
4262     `flex' scanned the first two sections looking for occurrences of
4263     these actions and failed to find any, but somehow you snuck some in
4264     (via a #include file, for example).  Use `%option reject' or
4265     `%option yymore' to indicate to `flex' that you really do use
4266     these features.
4267
4268   * `flex scanner jammed'. a scanner compiled with `-s' has
4269     encountered an input string which wasn't matched by any of its
4270     rules.  This error can also occur due to internal problems.
4271
4272   * `token too large, exceeds YYLMAX'. your scanner uses `%array' and
4273     one of its rules matched a string longer than the `YYLMAX'
4274     constant (8K bytes by default).  You can increase the value by
4275     #define'ing `YYLMAX' in the definitions section of your `flex'
4276     input.
4277
4278   * `scanner requires -8 flag to use the character 'x''. Your scanner
4279     specification includes recognizing the 8-bit character `'x'' and
4280     you did not specify the -8 flag, and your scanner defaulted to
4281     7-bit because you used the `-Cf' or `-CF' table compression
4282     options.  See the discussion of the `-7' flag, *note Scanner
4283     Options::, for details.
4284
4285   * `flex scanner push-back overflow'. you used `unput()' to push back
4286     so much text that the scanner's buffer could not hold both the
4287     pushed-back text and the current token in `yytext'.  Ideally the
4288     scanner should dynamically resize the buffer in this case, but at
4289     present it does not.
4290
4291   * `input buffer overflow, can't enlarge buffer because scanner uses
4292     REJECT'.  the scanner was working on matching an extremely large
4293     token and needed to expand the input buffer.  This doesn't work
4294     with scanners that use `REJECT'.
4295
4296   * `fatal flex scanner internal error--end of buffer missed'. This can
4297     occur in a scanner which is reentered after a long-jump has jumped
4298     out (or over) the scanner's activation frame.  Before reentering
4299     the scanner, use:
4300              yyrestart( yyin );
4301     or, as noted above, switch to using the C++ scanner class.
4302
4303   * `too many start conditions in <> construct!'  you listed more start
4304     conditions in a <> construct than exist (so you must have listed at
4305     least one of them twice).
4306
4307
4308File: flex.info,  Node: Limitations,  Next: Bibliography,  Prev: Diagnostics,  Up: Top
4309
431024 Limitations
4311**************
4312
4313Some trailing context patterns cannot be properly matched and generate
4314warning messages (`dangerous trailing context').  These are patterns
4315where the ending of the first part of the rule matches the beginning of
4316the second part, such as `zx*/xy*', where the 'x*' matches the 'x' at
4317the beginning of the trailing context.  (Note that the POSIX draft
4318states that the text matched by such patterns is undefined.)  For some
4319trailing context rules, parts which are actually fixed-length are not
4320recognized as such, leading to the abovementioned performance loss.  In
4321particular, parts using `|' or `{n}' (such as `foo{3}') are always
4322considered variable-length.  Combining trailing context with the
4323special `|' action can result in _fixed_ trailing context being turned
4324into the more expensive _variable_ trailing context.  For example, in
4325the following:
4326
4327         %%
4328         abc      |
4329         xyz/def
4330
4331   Use of `unput()' invalidates yytext and yyleng, unless the `%array'
4332directive or the `-l' option has been used.  Pattern-matching of `NUL's
4333is substantially slower than matching other characters.  Dynamic
4334resizing of the input buffer is slow, as it entails rescanning all the
4335text matched so far by the current (generally huge) token.  Due to both
4336buffering of input and read-ahead, you cannot intermix calls to
4337`<stdio.h>' routines, such as, getchar(), with `flex' rules and expect
4338it to work.  Call `input()' instead.  The total table entries listed by
4339the `-v' flag excludes the number of table entries needed to determine
4340what rule has been matched.  The number of entries is equal to the
4341number of DFA states if the scanner does not use `REJECT', and somewhat
4342greater than the number of states if it does.  `REJECT' cannot be used
4343with the `-f' or `-F' options.
4344
4345   The `flex' internal algorithms need documentation.
4346
4347
4348File: flex.info,  Node: Bibliography,  Next: FAQ,  Prev: Limitations,  Up: Top
4349
435025 Additional Reading
4351*********************
4352
4353You may wish to read more about the following programs:
4354   * lex
4355
4356   * yacc
4357
4358   * sed
4359
4360   * awk
4361
4362   The following books may contain material of interest:
4363
4364   John Levine, Tony Mason, and Doug Brown, _Lex & Yacc_, O'Reilly and
4365Associates.  Be sure to get the 2nd edition.
4366
4367   M. E. Lesk and E. Schmidt, _LEX - Lexical Analyzer Generator_
4368
4369   Alfred Aho, Ravi Sethi and Jeffrey Ullman, _Compilers: Principles,
4370Techniques and Tools_, Addison-Wesley (1986).  Describes the
4371pattern-matching techniques used by `flex' (deterministic finite
4372automata).
4373
4374
4375File: flex.info,  Node: FAQ,  Next: Appendices,  Prev: Bibliography,  Up: Top
4376
4377FAQ
4378***
4379
4380From time to time, the `flex' maintainer receives certain questions.
4381Rather than repeat answers to well-understood problems, we publish them
4382here.
4383
4384* Menu:
4385
4386* When was flex born?::
4387* How do I expand backslash-escape sequences in C-style quoted strings?::
4388* Why do flex scanners call fileno if it is not ANSI compatible?::
4389* Does flex support recursive pattern definitions?::
4390* How do I skip huge chunks of input (tens of megabytes) while using flex?::
4391* Flex is not matching my patterns in the same order that I defined them.::
4392* My actions are executing out of order or sometimes not at all.::
4393* How can I have multiple input sources feed into the same scanner at the same time?::
4394* Can I build nested parsers that work with the same input file?::
4395* How can I match text only at the end of a file?::
4396* How can I make REJECT cascade across start condition boundaries?::
4397* Why cant I use fast or full tables with interactive mode?::
4398* How much faster is -F or -f than -C?::
4399* If I have a simple grammar cant I just parse it with flex?::
4400* Why doesn't yyrestart() set the start state back to INITIAL?::
4401* How can I match C-style comments?::
4402* The period isn't working the way I expected.::
4403* Can I get the flex manual in another format?::
4404* Does there exist a "faster" NDFA->DFA algorithm?::
4405* How does flex compile the DFA so quickly?::
4406* How can I use more than 8192 rules?::
4407* How do I abandon a file in the middle of a scan and switch to a new file?::
4408* How do I execute code only during initialization (only before the first scan)?::
4409* How do I execute code at termination?::
4410* Where else can I find help?::
4411* Can I include comments in the "rules" section of the file?::
4412* I get an error about undefined yywrap().::
4413* How can I change the matching pattern at run time?::
4414* How can I expand macros in the input?::
4415* How can I build a two-pass scanner?::
4416* How do I match any string not matched in the preceding rules?::
4417* I am trying to port code from AT&T lex that uses yysptr and yysbuf.::
4418* Is there a way to make flex treat NULL like a regular character?::
4419* Whenever flex can not match the input it says "flex scanner jammed".::
4420* Why doesn't flex have non-greedy operators like perl does?::
4421* Memory leak - 16386 bytes allocated by malloc.::
4422* How do I track the byte offset for lseek()?::
4423* How do I use my own I/O classes in a C++ scanner?::
4424* How do I skip as many chars as possible?::
4425* deleteme00::
4426* Are certain equivalent patterns faster than others?::
4427* Is backing up a big deal?::
4428* Can I fake multi-byte character support?::
4429* deleteme01::
4430* Can you discuss some flex internals?::
4431* unput() messes up yy_at_bol::
4432* The | operator is not doing what I want::
4433* Why can't flex understand this variable trailing context pattern?::
4434* The ^ operator isn't working::
4435* Trailing context is getting confused with trailing optional patterns::
4436* Is flex GNU or not?::
4437* ERASEME53::
4438* I need to scan if-then-else blocks and while loops::
4439* ERASEME55::
4440* ERASEME56::
4441* ERASEME57::
4442* Is there a repository for flex scanners?::
4443* How can I conditionally compile or preprocess my flex input file?::
4444* Where can I find grammars for lex and yacc?::
4445* I get an end-of-buffer message for each character scanned.::
4446* unnamed-faq-62::
4447* unnamed-faq-63::
4448* unnamed-faq-64::
4449* unnamed-faq-65::
4450* unnamed-faq-66::
4451* unnamed-faq-67::
4452* unnamed-faq-68::
4453* unnamed-faq-69::
4454* unnamed-faq-70::
4455* unnamed-faq-71::
4456* unnamed-faq-72::
4457* unnamed-faq-73::
4458* unnamed-faq-74::
4459* unnamed-faq-75::
4460* unnamed-faq-76::
4461* unnamed-faq-77::
4462* unnamed-faq-78::
4463* unnamed-faq-79::
4464* unnamed-faq-80::
4465* unnamed-faq-81::
4466* unnamed-faq-82::
4467* unnamed-faq-83::
4468* unnamed-faq-84::
4469* unnamed-faq-85::
4470* unnamed-faq-86::
4471* unnamed-faq-87::
4472* unnamed-faq-88::
4473* unnamed-faq-90::
4474* unnamed-faq-91::
4475* unnamed-faq-92::
4476* unnamed-faq-93::
4477* unnamed-faq-94::
4478* unnamed-faq-95::
4479* unnamed-faq-96::
4480* unnamed-faq-97::
4481* unnamed-faq-98::
4482* unnamed-faq-99::
4483* unnamed-faq-100::
4484* unnamed-faq-101::
4485* What is the difference between YYLEX_PARAM and YY_DECL?::
4486* Why do I get "conflicting types for yylex" error?::
4487* How do I access the values set in a Flex action from within a Bison action?::
4488
4489
4490File: flex.info,  Node: When was flex born?,  Next: How do I expand backslash-escape sequences in C-style quoted strings?,  Up: FAQ
4491
4492When was flex born?
4493===================
4494
4495Vern Paxson took over the `Software Tools' lex project from Jef
4496Poskanzer in 1982.  At that point it was written in Ratfor.  Around
44971987 or so, Paxson translated it into C, and a legend was born :-).
4498
4499
4500File: flex.info,  Node: How do I expand backslash-escape sequences in C-style quoted strings?,  Next: Why do flex scanners call fileno if it is not ANSI compatible?,  Prev: When was flex born?,  Up: FAQ
4501
4502How do I expand backslash-escape sequences in C-style quoted strings?
4503=====================================================================
4504
4505A key point when scanning quoted strings is that you cannot (easily)
4506write a single rule that will precisely match the string if you allow
4507things like embedded escape sequences and newlines.  If you try to
4508match strings with a single rule then you'll wind up having to rescan
4509the string anyway to find any escape sequences.
4510
4511   Instead you can use exclusive start conditions and a set of rules,
4512one for matching non-escaped text, one for matching a single escape,
4513one for matching an embedded newline, and one for recognizing the end
4514of the string.  Each of these rules is then faced with the question of
4515where to put its intermediary results.  The best solution is for the
4516rules to append their local value of `yytext' to the end of a "string
4517literal" buffer.  A rule like the escape-matcher will append to the
4518buffer the meaning of the escape sequence rather than the literal text
4519in `yytext'.  In this way, `yytext' does not need to be modified at all.
4520
4521
4522File: flex.info,  Node: Why do flex scanners call fileno if it is not ANSI compatible?,  Next: Does flex support recursive pattern definitions?,  Prev: How do I expand backslash-escape sequences in C-style quoted strings?,  Up: FAQ
4523
4524Why do flex scanners call fileno if it is not ANSI compatible?
4525==============================================================
4526
4527Flex scanners call `fileno()' in order to get the file descriptor
4528corresponding to `yyin'. The file descriptor may be passed to
4529`isatty()' or `read()', depending upon which `%options' you specified.
4530If your system does not have `fileno()' support, to get rid of the
4531`read()' call, do not specify `%option read'. To get rid of the
4532`isatty()' call, you must specify one of `%option always-interactive' or
4533`%option never-interactive'.
4534
4535
4536File: flex.info,  Node: Does flex support recursive pattern definitions?,  Next: How do I skip huge chunks of input (tens of megabytes) while using flex?,  Prev: Why do flex scanners call fileno if it is not ANSI compatible?,  Up: FAQ
4537
4538Does flex support recursive pattern definitions?
4539================================================
4540
4541e.g.,
4542
4543     %%
4544     block   "{"({block}|{statement})*"}"
4545
4546   No. You cannot have recursive definitions.  The pattern-matching
4547power of regular expressions in general (and therefore flex scanners,
4548too) is limited.  In particular, regular expressions cannot "balance"
4549parentheses to an arbitrary degree.  For example, it's impossible to
4550write a regular expression that matches all strings containing the same
4551number of '{'s as '}'s.  For more powerful pattern matching, you need a
4552parser, such as `GNU bison'.
4553
4554
4555File: flex.info,  Node: How do I skip huge chunks of input (tens of megabytes) while using flex?,  Next: Flex is not matching my patterns in the same order that I defined them.,  Prev: Does flex support recursive pattern definitions?,  Up: FAQ
4556
4557How do I skip huge chunks of input (tens of megabytes) while using flex?
4558========================================================================
4559
4560Use `fseek()' (or `lseek()') to position yyin, then call `yyrestart()'.
4561
4562
4563File: flex.info,  Node: Flex is not matching my patterns in the same order that I defined them.,  Next: My actions are executing out of order or sometimes not at all.,  Prev: How do I skip huge chunks of input (tens of megabytes) while using flex?,  Up: FAQ
4564
4565Flex is not matching my patterns in the same order that I defined them.
4566=======================================================================
4567
4568`flex' picks the rule that matches the most text (i.e., the longest
4569possible input string).  This is because `flex' uses an entirely
4570different matching technique ("deterministic finite automata") that
4571actually does all of the matching simultaneously, in parallel.  (Seems
4572impossible, but it's actually a fairly simple technique once you
4573understand the principles.)
4574
4575   A side-effect of this parallel matching is that when the input
4576matches more than one rule, `flex' scanners pick the rule that matched
4577the _most_ text. This is explained further in the manual, in the
4578section *Note Matching::.
4579
4580   If you want `flex' to choose a shorter match, then you can work
4581around this behavior by expanding your short rule to match more text,
4582then put back the extra:
4583
4584     data_.*        yyless( 5 ); BEGIN BLOCKIDSTATE;
4585
4586   Another fix would be to make the second rule active only during the
4587`<BLOCKIDSTATE>' start condition, and make that start condition
4588exclusive by declaring it with `%x' instead of `%s'.
4589
4590   A final fix is to change the input language so that the ambiguity for
4591`data_' is removed, by adding characters to it that don't match the
4592identifier rule, or by removing characters (such as `_') from the
4593identifier rule so it no longer matches `data_'.  (Of course, you might
4594also not have the option of changing the input language.)
4595
4596
4597File: flex.info,  Node: My actions are executing out of order or sometimes not at all.,  Next: How can I have multiple input sources feed into the same scanner at the same time?,  Prev: Flex is not matching my patterns in the same order that I defined them.,  Up: FAQ
4598
4599My actions are executing out of order or sometimes not at all.
4600==============================================================
4601
4602Most likely, you have (in error) placed the opening `{' of the action
4603block on a different line than the rule, e.g.,
4604
4605     ^(foo|bar)
4606     {  <<<--- WRONG!
4607
4608     }
4609
4610   `flex' requires that the opening `{' of an action associated with a
4611rule begin on the same line as does the rule.  You need instead to
4612write your rules as follows:
4613
4614     ^(foo|bar)   {  // CORRECT!
4615
4616     }
4617
4618
4619File: flex.info,  Node: How can I have multiple input sources feed into the same scanner at the same time?,  Next: Can I build nested parsers that work with the same input file?,  Prev: My actions are executing out of order or sometimes not at all.,  Up: FAQ
4620
4621How can I have multiple input sources feed into the same scanner at the same time?
4622==================================================================================
4623
4624If ...
4625   * your scanner is free of backtracking (verified using `flex''s `-b'
4626     flag),
4627
4628   * AND you run your scanner interactively (`-I' option; default
4629     unless using special table compression options),
4630
4631   * AND you feed it one character at a time by redefining `YY_INPUT'
4632     to do so,
4633
4634   then every time it matches a token, it will have exhausted its input
4635buffer (because the scanner is free of backtracking).  This means you
4636can safely use `select()' at the point and only call `yylex()' for
4637another token if `select()' indicates there's data available.
4638
4639   That is, move the `select()' out from the input function to a point
4640where it determines whether `yylex()' gets called for the next token.
4641
4642   With this approach, you will still have problems if your input can
4643arrive piecemeal; `select()' could inform you that the beginning of a
4644token is available, you call `yylex()' to get it, but it winds up
4645blocking waiting for the later characters in the token.
4646
4647   Here's another way:  Move your input multiplexing inside of
4648`YY_INPUT'.  That is, whenever `YY_INPUT' is called, it `select()''s to
4649see where input is available.  If input is available for the scanner,
4650it reads and returns the next byte.  If input is available from another
4651source, it calls whatever function is responsible for reading from that
4652source.  (If no input is available, it blocks until some input is
4653available.)  I've used this technique in an interpreter I wrote that
4654both reads keyboard input using a `flex' scanner and IPC traffic from
4655sockets, and it works fine.
4656
4657
4658File: flex.info,  Node: Can I build nested parsers that work with the same input file?,  Next: How can I match text only at the end of a file?,  Prev: How can I have multiple input sources feed into the same scanner at the same time?,  Up: FAQ
4659
4660Can I build nested parsers that work with the same input file?
4661==============================================================
4662
4663This is not going to work without some additional effort.  The reason is
4664that `flex' block-buffers the input it reads from `yyin'.  This means
4665that the "outermost" `yylex()', when called, will automatically slurp
4666up the first 8K of input available on yyin, and subsequent calls to
4667other `yylex()''s won't see that input.  You might be tempted to work
4668around this problem by redefining `YY_INPUT' to only return a small
4669amount of text, but it turns out that that approach is quite difficult.
4670Instead, the best solution is to combine all of your scanners into one
4671large scanner, using a different exclusive start condition for each.
4672
4673
4674File: flex.info,  Node: How can I match text only at the end of a file?,  Next: How can I make REJECT cascade across start condition boundaries?,  Prev: Can I build nested parsers that work with the same input file?,  Up: FAQ
4675
4676How can I match text only at the end of a file?
4677===============================================
4678
4679There is no way to write a rule which is "match this text, but only if
4680it comes at the end of the file".  You can fake it, though, if you
4681happen to have a character lying around that you don't allow in your
4682input.  Then you redefine `YY_INPUT' to call your own routine which, if
4683it sees an `EOF', returns the magic character first (and remembers to
4684return a real `EOF' next time it's called).  Then you could write:
4685
4686     <COMMENT>(.|\n)*{EOF_CHAR}    /* saw comment at EOF */
4687
4688
4689File: flex.info,  Node: How can I make REJECT cascade across start condition boundaries?,  Next: Why cant I use fast or full tables with interactive mode?,  Prev: How can I match text only at the end of a file?,  Up: FAQ
4690
4691How can I make REJECT cascade across start condition boundaries?
4692================================================================
4693
4694You can do this as follows.  Suppose you have a start condition `A', and
4695after exhausting all of the possible matches in `<A>', you want to try
4696matches in `<INITIAL>'.  Then you could use the following:
4697
4698     %x A
4699     %%
4700     <A>rule_that_is_long    ...; REJECT;
4701     <A>rule                 ...; REJECT; /* shorter rule */
4702     <A>etc.
4703     ...
4704     <A>.|\n  {
4705     /* Shortest and last rule in <A>, so
4706     * cascaded REJECTs will eventually
4707     * wind up matching this rule.  We want
4708     * to now switch to the initial state
4709     * and try matching from there instead.
4710     */
4711     yyless(0);    /* put back matched text */
4712     BEGIN(INITIAL);
4713     }
4714
4715
4716File: flex.info,  Node: Why cant I use fast or full tables with interactive mode?,  Next: How much faster is -F or -f than -C?,  Prev: How can I make REJECT cascade across start condition boundaries?,  Up: FAQ
4717
4718Why can't I use fast or full tables with interactive mode?
4719==========================================================
4720
4721One of the assumptions flex makes is that interactive applications are
4722inherently slow (they're waiting on a human after all).  It has to do
4723with how the scanner detects that it must be finished scanning a token.
4724For interactive scanners, after scanning each character the current
4725state is looked up in a table (essentially) to see whether there's a
4726chance of another input character possibly extending the length of the
4727match.  If not, the scanner halts.  For non-interactive scanners, the
4728end-of-token test is much simpler, basically a compare with 0, so no
4729memory bus cycles.  Since the test occurs in the innermost scanning
4730loop, one would like to make it go as fast as possible.
4731
4732   Still, it seems reasonable to allow the user to choose to trade off
4733a bit of performance in this area to gain the corresponding
4734flexibility.  There might be another reason, though, why fast scanners
4735don't support the interactive option.
4736
4737
4738File: flex.info,  Node: How much faster is -F or -f than -C?,  Next: If I have a simple grammar cant I just parse it with flex?,  Prev: Why cant I use fast or full tables with interactive mode?,  Up: FAQ
4739
4740How much faster is -F or -f than -C?
4741====================================
4742
4743Much faster (factor of 2-3).
4744
4745
4746File: flex.info,  Node: If I have a simple grammar cant I just parse it with flex?,  Next: Why doesn't yyrestart() set the start state back to INITIAL?,  Prev: How much faster is -F or -f than -C?,  Up: FAQ
4747
4748If I have a simple grammar can't I just parse it with flex?
4749===========================================================
4750
4751Is your grammar recursive? That's almost always a sign that you're
4752better off using a parser/scanner rather than just trying to use a
4753scanner alone.
4754
4755
4756File: flex.info,  Node: Why doesn't yyrestart() set the start state back to INITIAL?,  Next: How can I match C-style comments?,  Prev: If I have a simple grammar cant I just parse it with flex?,  Up: FAQ
4757
4758Why doesn't yyrestart() set the start state back to INITIAL?
4759============================================================
4760
4761There are two reasons.  The first is that there might be programs that
4762rely on the start state not changing across file changes.  The second
4763is that beginning with `flex' version 2.4, use of `yyrestart()' is no
4764longer required, so fixing the problem there doesn't solve the more
4765general problem.
4766
4767
4768File: flex.info,  Node: How can I match C-style comments?,  Next: The period isn't working the way I expected.,  Prev: Why doesn't yyrestart() set the start state back to INITIAL?,  Up: FAQ
4769
4770How can I match C-style comments?
4771=================================
4772
4773You might be tempted to try something like this:
4774
4775     "/*".*"*/"       // WRONG!
4776
4777   or, worse, this:
4778
4779     "/*"(.|\n)"*/"   // WRONG!
4780
4781   The above rules will eat too much input, and blow up on things like:
4782
4783     /* a comment */ do_my_thing( "oops */" );
4784
4785   Here is one way which allows you to track line information:
4786
4787     <INITIAL>{
4788     "/*"              BEGIN(IN_COMMENT);
4789     }
4790     <IN_COMMENT>{
4791     "*/"      BEGIN(INITIAL);
4792     [^*\n]+   // eat comment in chunks
4793     "*"       // eat the lone star
4794     \n        yylineno++;
4795     }
4796
4797
4798File: flex.info,  Node: The period isn't working the way I expected.,  Next: Can I get the flex manual in another format?,  Prev: How can I match C-style comments?,  Up: FAQ
4799
4800The '.' isn't working the way I expected.
4801=========================================
4802
4803Here are some tips for using `.':
4804
4805   * A common mistake is to place the grouping parenthesis AFTER an
4806     operator, when you really meant to place the parenthesis BEFORE
4807     the operator, e.g., you probably want this `(foo|bar)+' and NOT
4808     this `(foo|bar+)'.
4809
4810     The first pattern matches the words `foo' or `bar' any number of
4811     times, e.g., it matches the text `barfoofoobarfoo'. The second
4812     pattern matches a single instance of `foo' or a single instance of
4813     `bar' followed by one or more `r's, e.g., it matches the text
4814     `barrrr' .
4815
4816   * A `.' inside `[]''s just means a literal`.' (period), and NOT "any
4817     character except newline".
4818
4819   * Remember that `.' matches any character EXCEPT `\n' (and `EOF').
4820     If you really want to match ANY character, including newlines,
4821     then use `(.|\n)' Beware that the regex `(.|\n)+' will match your
4822     entire input!
4823
4824   * Finally, if you want to match a literal `.' (a period), then use
4825     `[.]' or `"."'
4826
4827
4828File: flex.info,  Node: Can I get the flex manual in another format?,  Next: Does there exist a "faster" NDFA->DFA algorithm?,  Prev: The period isn't working the way I expected.,  Up: FAQ
4829
4830Can I get the flex manual in another format?
4831============================================
4832
4833The `flex' source distribution  includes a texinfo manual. You are free
4834to convert that texinfo into whatever format you desire. The `texinfo'
4835package includes tools for conversion to a number of formats.
4836
4837
4838File: flex.info,  Node: Does there exist a "faster" NDFA->DFA algorithm?,  Next: How does flex compile the DFA so quickly?,  Prev: Can I get the flex manual in another format?,  Up: FAQ
4839
4840Does there exist a "faster" NDFA->DFA algorithm?
4841================================================
4842
4843There's no way around the potential exponential running time - it can
4844take you exponential time just to enumerate all of the DFA states.  In
4845practice, though, the running time is closer to linear, or sometimes
4846quadratic.
4847
4848
4849File: flex.info,  Node: How does flex compile the DFA so quickly?,  Next: How can I use more than 8192 rules?,  Prev: Does there exist a "faster" NDFA->DFA algorithm?,  Up: FAQ
4850
4851How does flex compile the DFA so quickly?
4852=========================================
4853
4854There are two big speed wins that `flex' uses:
4855
4856  1. It analyzes the input rules to construct equivalence classes for
4857     those characters that always make the same transitions.  It then
4858     rewrites the NFA using equivalence classes for transitions instead
4859     of characters.  This cuts down the NFA->DFA computation time
4860     dramatically, to the point where, for uncompressed DFA tables, the
4861     DFA generation is often I/O bound in writing out the tables.
4862
4863  2. It maintains hash values for previously computed DFA states, so
4864     testing whether a newly constructed DFA state is equivalent to a
4865     previously constructed state can be done very quickly, by first
4866     comparing hash values.
4867
4868
4869File: flex.info,  Node: How can I use more than 8192 rules?,  Next: How do I abandon a file in the middle of a scan and switch to a new file?,  Prev: How does flex compile the DFA so quickly?,  Up: FAQ
4870
4871How can I use more than 8192 rules?
4872===================================
4873
4874`Flex' is compiled with an upper limit of 8192 rules per scanner.  If
4875you need more than 8192 rules in your scanner, you'll have to recompile
4876`flex' with the following changes in `flexdef.h':
4877
4878     <    #define YY_TRAILING_MASK 0x2000
4879     <    #define YY_TRAILING_HEAD_MASK 0x4000
4880     --
4881     >    #define YY_TRAILING_MASK 0x20000000
4882     >    #define YY_TRAILING_HEAD_MASK 0x40000000
4883
4884   This should work okay as long as your C compiler uses 32 bit
4885integers.  But you might want to think about whether using such a huge
4886number of rules is the best way to solve your problem.
4887
4888   The following may also be relevant:
4889
4890   With luck, you should be able to increase the definitions in
4891flexdef.h for:
4892
4893     #define JAMSTATE -32766 /* marks a reference to the state that always jams */
4894     #define MAXIMUM_MNS 31999
4895     #define BAD_SUBSCRIPT -32767
4896
4897   recompile everything, and it'll all work.  Flex only has these
489816-bit-like values built into it because a long time ago it was
4899developed on a machine with 16-bit ints.  I've given this advice to
4900others in the past but haven't heard back from them whether it worked
4901okay or not...
4902
4903
4904File: flex.info,  Node: How do I abandon a file in the middle of a scan and switch to a new file?,  Next: How do I execute code only during initialization (only before the first scan)?,  Prev: How can I use more than 8192 rules?,  Up: FAQ
4905
4906How do I abandon a file in the middle of a scan and switch to a new file?
4907=========================================================================
4908
4909Just call `yyrestart(newfile)'. Be sure to reset the start state if you
4910want a "fresh start, since `yyrestart' does NOT reset the start state
4911back to `INITIAL'.
4912
4913
4914File: flex.info,  Node: How do I execute code only during initialization (only before the first scan)?,  Next: How do I execute code at termination?,  Prev: How do I abandon a file in the middle of a scan and switch to a new file?,  Up: FAQ
4915
4916How do I execute code only during initialization (only before the first scan)?
4917==============================================================================
4918
4919You can specify an initial action by defining the macro `YY_USER_INIT'
4920(though note that `yyout' may not be available at the time this macro
4921is executed).  Or you can add to the beginning of your rules section:
4922
4923     %%
4924         /* Must be indented! */
4925         static int did_init = 0;
4926
4927         if ( ! did_init ){
4928     do_my_init();
4929             did_init = 1;
4930         }
4931
4932
4933File: flex.info,  Node: How do I execute code at termination?,  Next: Where else can I find help?,  Prev: How do I execute code only during initialization (only before the first scan)?,  Up: FAQ
4934
4935How do I execute code at termination?
4936=====================================
4937
4938You can specify an action for the `<<EOF>>' rule.
4939
4940
4941File: flex.info,  Node: Where else can I find help?,  Next: Can I include comments in the "rules" section of the file?,  Prev: How do I execute code at termination?,  Up: FAQ
4942
4943Where else can I find help?
4944===========================
4945
4946You can find the flex homepage on the web at
4947`http://flex.sourceforge.net/'. See that page for details about flex
4948mailing lists as well.
4949
4950
4951File: flex.info,  Node: Can I include comments in the "rules" section of the file?,  Next: I get an error about undefined yywrap().,  Prev: Where else can I find help?,  Up: FAQ
4952
4953Can I include comments in the "rules" section of the file?
4954==========================================================
4955
4956Yes, just about anywhere you want to. See the manual for the specific
4957syntax.
4958
4959
4960File: flex.info,  Node: I get an error about undefined yywrap().,  Next: How can I change the matching pattern at run time?,  Prev: Can I include comments in the "rules" section of the file?,  Up: FAQ
4961
4962I get an error about undefined yywrap().
4963========================================
4964
4965You must supply a `yywrap()' function of your own, or link to `libfl.a'
4966(which provides one), or use
4967
4968     %option noyywrap
4969
4970   in your source to say you don't want a `yywrap()' function.
4971
4972
4973File: flex.info,  Node: How can I change the matching pattern at run time?,  Next: How can I expand macros in the input?,  Prev: I get an error about undefined yywrap().,  Up: FAQ
4974
4975How can I change the matching pattern at run time?
4976==================================================
4977
4978You can't, it's compiled into a static table when flex builds the
4979scanner.
4980
4981
4982File: flex.info,  Node: How can I expand macros in the input?,  Next: How can I build a two-pass scanner?,  Prev: How can I change the matching pattern at run time?,  Up: FAQ
4983
4984How can I expand macros in the input?
4985=====================================
4986
4987The best way to approach this problem is at a higher level, e.g., in
4988the parser.
4989
4990   However, you can do this using multiple input buffers.
4991
4992     %%
4993     macro/[a-z]+	{
4994     /* Saw the macro "macro" followed by extra stuff. */
4995     main_buffer = YY_CURRENT_BUFFER;
4996     expansion_buffer = yy_scan_string(expand(yytext));
4997     yy_switch_to_buffer(expansion_buffer);
4998     }
4999
5000     <<EOF>>	{
5001     if ( expansion_buffer )
5002     {
5003     // We were doing an expansion, return to where
5004     // we were.
5005     yy_switch_to_buffer(main_buffer);
5006     yy_delete_buffer(expansion_buffer);
5007     expansion_buffer = 0;
5008     }
5009     else
5010     yyterminate();
5011     }
5012
5013   You probably will want a stack of expansion buffers to allow nested
5014macros.  From the above though hopefully the idea is clear.
5015
5016
5017File: flex.info,  Node: How can I build a two-pass scanner?,  Next: How do I match any string not matched in the preceding rules?,  Prev: How can I expand macros in the input?,  Up: FAQ
5018
5019How can I build a two-pass scanner?
5020===================================
5021
5022One way to do it is to filter the first pass to a temporary file, then
5023process the temporary file on the second pass. You will probably see a
5024performance hit, due to all the disk I/O.
5025
5026   When you need to look ahead far forward like this, it almost always
5027means that the right solution is to build a parse tree of the entire
5028input, then walk it after the parse in order to generate the output.
5029In a sense, this is a two-pass approach, once through the text and once
5030through the parse tree, but the performance hit for the latter is
5031usually an order of magnitude smaller, since everything is already
5032classified, in binary format, and residing in memory.
5033
5034
5035File: flex.info,  Node: How do I match any string not matched in the preceding rules?,  Next: I am trying to port code from AT&T lex that uses yysptr and yysbuf.,  Prev: How can I build a two-pass scanner?,  Up: FAQ
5036
5037How do I match any string not matched in the preceding rules?
5038=============================================================
5039
5040One way to assign precedence, is to place the more specific rules
5041first. If two rules would match the same input (same sequence of
5042characters) then the first rule listed in the `flex' input wins, e.g.,
5043
5044     %%
5045     foo[a-zA-Z_]+    return FOO_ID;
5046     bar[a-zA-Z_]+    return BAR_ID;
5047     [a-zA-Z_]+       return GENERIC_ID;
5048
5049   Note that the rule `[a-zA-Z_]+' must come *after* the others.  It
5050will match the same amount of text as the more specific rules, and in
5051that case the `flex' scanner will pick the first rule listed in your
5052scanner as the one to match.
5053
5054
5055File: flex.info,  Node: I am trying to port code from AT&T lex that uses yysptr and yysbuf.,  Next: Is there a way to make flex treat NULL like a regular character?,  Prev: How do I match any string not matched in the preceding rules?,  Up: FAQ
5056
5057I am trying to port code from AT&T lex that uses yysptr and yysbuf.
5058===================================================================
5059
5060Those are internal variables pointing into the AT&T scanner's input
5061buffer.  I imagine they're being manipulated in user versions of the
5062`input()' and `unput()' functions.  If so, what you need to do is
5063analyze those functions to figure out what they're doing, and then
5064replace `input()' with an appropriate definition of `YY_INPUT'.  You
5065shouldn't need to (and must not) replace `flex''s `unput()' function.
5066
5067
5068File: flex.info,  Node: Is there a way to make flex treat NULL like a regular character?,  Next: Whenever flex can not match the input it says "flex scanner jammed".,  Prev: I am trying to port code from AT&T lex that uses yysptr and yysbuf.,  Up: FAQ
5069
5070Is there a way to make flex treat NULL like a regular character?
5071================================================================
5072
5073Yes, `\0' and `\x00' should both do the trick.  Perhaps you have an
5074ancient version of `flex'.  The latest release is version 2.5.39.
5075
5076
5077File: flex.info,  Node: Whenever flex can not match the input it says "flex scanner jammed".,  Next: Why doesn't flex have non-greedy operators like perl does?,  Prev: Is there a way to make flex treat NULL like a regular character?,  Up: FAQ
5078
5079Whenever flex can not match the input it says "flex scanner jammed".
5080====================================================================
5081
5082You need to add a rule that matches the otherwise-unmatched text, e.g.,
5083
5084     %option yylineno
5085     %%
5086     [[a bunch of rules here]]
5087
5088     .	printf("bad input character '%s' at line %d\n", yytext, yylineno);
5089
5090   See `%option default' for more information.
5091
5092
5093File: flex.info,  Node: Why doesn't flex have non-greedy operators like perl does?,  Next: Memory leak - 16386 bytes allocated by malloc.,  Prev: Whenever flex can not match the input it says "flex scanner jammed".,  Up: FAQ
5094
5095Why doesn't flex have non-greedy operators like perl does?
5096==========================================================
5097
5098A DFA can do a non-greedy match by stopping the first time it enters an
5099accepting state, instead of consuming input until it determines that no
5100further matching is possible (a "jam" state).  This is actually easier
5101to implement than longest leftmost match (which flex does).
5102
5103   But it's also much less useful than longest leftmost match.  In
5104general, when you find yourself wishing for non-greedy matching, that's
5105usually a sign that you're trying to make the scanner do some parsing.
5106That's generally the wrong approach, since it lacks the power to do a
5107decent job.  Better is to either introduce a separate parser, or to
5108split the scanner into multiple scanners using (exclusive) start
5109conditions.
5110
5111   You might have a separate start state once you've seen the `BEGIN'.
5112In that state, you might then have a regex that will match `END' (to
5113kick you out of the state), and perhaps `(.|\n)' to get a single
5114character within the chunk ...
5115
5116   This approach also has much better error-reporting properties.
5117
5118
5119File: flex.info,  Node: Memory leak - 16386 bytes allocated by malloc.,  Next: How do I track the byte offset for lseek()?,  Prev: Why doesn't flex have non-greedy operators like perl does?,  Up: FAQ
5120
5121Memory leak - 16386 bytes allocated by malloc.
5122==============================================
5123
5124UPDATED 2002-07-10: As of `flex' version 2.5.9, this leak means that
5125you did not call `yylex_destroy()'. If you are using an earlier version
5126of `flex', then read on.
5127
5128   The leak is about 16426 bytes.  That is, (8192 * 2 + 2) for the
5129read-buffer, and about 40 for `struct yy_buffer_state' (depending upon
5130alignment). The leak is in the non-reentrant C scanner only (NOT in the
5131reentrant scanner, NOT in the C++ scanner). Since `flex' doesn't know
5132when you are done, the buffer is never freed.
5133
5134   However, the leak won't multiply since the buffer is reused no
5135matter how many times you call `yylex()'.
5136
5137   If you want to reclaim the memory when you are completely done
5138scanning, then you might try this:
5139
5140     /* For non-reentrant C scanner only. */
5141     yy_delete_buffer(YY_CURRENT_BUFFER);
5142     yy_init = 1;
5143
5144   Note: `yy_init' is an "internal variable", and hasn't been tested in
5145this situation. It is possible that some other globals may need
5146resetting as well.
5147
5148
5149File: flex.info,  Node: How do I track the byte offset for lseek()?,  Next: How do I use my own I/O classes in a C++ scanner?,  Prev: Memory leak - 16386 bytes allocated by malloc.,  Up: FAQ
5150
5151How do I track the byte offset for lseek()?
5152===========================================
5153
5154     >   We thought that it would be possible to have this number through the
5155     >   evaluation of the following expression:
5156     >
5157     >   seek_position = (no_buffers)*YY_READ_BUF_SIZE + yy_c_buf_p - YY_CURRENT_BUFFER->yy_ch_buf
5158
5159   While this is the right idea, it has two problems.  The first is that
5160it's possible that `flex' will request less than `YY_READ_BUF_SIZE'
5161during an invocation of `YY_INPUT' (or that your input source will
5162return less even though `YY_READ_BUF_SIZE' bytes were requested).  The
5163second problem is that when refilling its internal buffer, `flex' keeps
5164some characters from the previous buffer (because usually it's in the
5165middle of a match, and needs those characters to construct `yytext' for
5166the match once it's done).  Because of this, `yy_c_buf_p -
5167YY_CURRENT_BUFFER->yy_ch_buf' won't be exactly the number of characters
5168already read from the current buffer.
5169
5170   An alternative solution is to count the number of characters you've
5171matched since starting to scan.  This can be done by using
5172`YY_USER_ACTION'.  For example,
5173
5174     #define YY_USER_ACTION num_chars += yyleng;
5175
5176   (You need to be careful to update your bookkeeping if you use
5177`yymore('), `yyless()', `unput()', or `input()'.)
5178
5179
5180File: flex.info,  Node: How do I use my own I/O classes in a C++ scanner?,  Next: How do I skip as many chars as possible?,  Prev: How do I track the byte offset for lseek()?,  Up: FAQ
5181
5182How do I use my own I/O classes in a C++ scanner?
5183=================================================
5184
5185When the flex C++ scanning class rewrite finally happens, then this
5186sort of thing should become much easier.
5187
5188   You can do this by passing the various functions (such as
5189`LexerInput()' and `LexerOutput()') NULL `iostream*''s, and then
5190dealing with your own I/O classes surreptitiously (i.e., stashing them
5191in special member variables).  This works because the only assumption
5192about the lexer regarding what's done with the iostream's is that
5193they're ultimately passed to `LexerInput()' and `LexerOutput', which
5194then do whatever is necessary with them.
5195
5196
5197File: flex.info,  Node: How do I skip as many chars as possible?,  Next: deleteme00,  Prev: How do I use my own I/O classes in a C++ scanner?,  Up: FAQ
5198
5199How do I skip as many chars as possible?
5200========================================
5201
5202How do I skip as many chars as possible - without interfering with the
5203other patterns?
5204
5205   In the example below, we want to skip over characters until we see
5206the phrase "endskip". The following will _NOT_ work correctly (do you
5207see why not?)
5208
5209     /* INCORRECT SCANNER */
5210     %x SKIP
5211     %%
5212     <INITIAL>startskip   BEGIN(SKIP);
5213     ...
5214     <SKIP>"endskip"       BEGIN(INITIAL);
5215     <SKIP>.*             ;
5216
5217   The problem is that the pattern .* will eat up the word "endskip."
5218The simplest (but slow) fix is:
5219
5220     <SKIP>"endskip"      BEGIN(INITIAL);
5221     <SKIP>.              ;
5222
5223   The fix involves making the second rule match more, without making
5224it match "endskip" plus something else.  So for example:
5225
5226     <SKIP>"endskip"     BEGIN(INITIAL);
5227     <SKIP>[^e]+         ;
5228     <SKIP>.		        ;/* so you eat up e's, too */
5229
5230
5231File: flex.info,  Node: deleteme00,  Next: Are certain equivalent patterns faster than others?,  Prev: How do I skip as many chars as possible?,  Up: FAQ
5232
5233deleteme00
5234==========
5235
5236     QUESTION:
5237     When was flex born?
5238
5239     Vern Paxson took over
5240     the Software Tools lex project from Jef Poskanzer in 1982.  At that point it
5241     was written in Ratfor.  Around 1987 or so, Paxson translated it into C, and
5242     a legend was born :-).
5243
5244
5245File: flex.info,  Node: Are certain equivalent patterns faster than others?,  Next: Is backing up a big deal?,  Prev: deleteme00,  Up: FAQ
5246
5247Are certain equivalent patterns faster than others?
5248===================================================
5249
5250     To: Adoram Rogel <adoram@orna.hybridge.com>
5251     Subject: Re: Flex 2.5.2 performance questions
5252     In-reply-to: Your message of Wed, 18 Sep 96 11:12:17 EDT.
5253     Date: Wed, 18 Sep 96 10:51:02 PDT
5254     From: Vern Paxson <vern>
5255
5256     [Note, the most recent flex release is 2.5.4, which you can get from
5257     ftp.ee.lbl.gov.  It has bug fixes over 2.5.2 and 2.5.3.]
5258
5259     > 1. Using the pattern
5260     >    ([Ff](oot)?)?[Nn](ote)?(\.)?
5261     >    instead of
5262     >    (((F|f)oot(N|n)ote)|((N|n)ote)|((N|n)\.)|((F|f)(N|n)(\.)))
5263     >    (in a very complicated flex program) caused the program to slow from
5264     >    300K+/min to 100K/min (no other changes were done).
5265
5266     These two are not equivalent.  For example, the first can match "footnote."
5267     but the second can only match "footnote".  This is almost certainly the
5268     cause in the discrepancy - the slower scanner run is matching more tokens,
5269     and/or having to do more backing up.
5270
5271     > 2. Which of these two are better: [Ff]oot or (F|f)oot ?
5272
5273     From a performance point of view, they're equivalent (modulo presumably
5274     minor effects such as memory cache hit rates; and the presence of trailing
5275     context, see below).  From a space point of view, the first is slightly
5276     preferable.
5277
5278     > 3. I have a pattern that look like this:
5279     >    pats {p1}|{p2}|{p3}|...|{p50}     (50 patterns ORd)
5280     >
5281     >    running yet another complicated program that includes the following rule:
5282     >    <snext>{and}/{no4}{bb}{pats}
5283     >
5284     >    gets me to "too complicated - over 32,000 states"...
5285
5286     I can't tell from this example whether the trailing context is variable-length
5287     or fixed-length (it could be the latter if {and} is fixed-length).  If it's
5288     variable length, which flex -p will tell you, then this reflects a basic
5289     performance problem, and if you can eliminate it by restructuring your
5290     scanner, you will see significant improvement.
5291
5292     >    so I divided {pats} to {pats1}, {pats2},..., {pats5} each consists of about
5293     >    10 patterns and changed the rule to be 5 rules.
5294     >    This did compile, but what is the rule of thumb here ?
5295
5296     The rule is to avoid trailing context other than fixed-length, in which for
5297     a/b, either the 'a' pattern or the 'b' pattern have a fixed length.  Use
5298     of the '|' operator automatically makes the pattern variable length, so in
5299     this case '[Ff]oot' is preferred to '(F|f)oot'.
5300
5301     > 4. I changed a rule that looked like this:
5302     >    <snext8>{and}{bb}/{ROMAN}[^A-Za-z] { BEGIN...
5303     >
5304     >    to the next 2 rules:
5305     >    <snext8>{and}{bb}/{ROMAN}[A-Za-z] { ECHO;}
5306     >    <snext8>{and}{bb}/{ROMAN}         { BEGIN...
5307     >
5308     >    Again, I understand the using [^...] will cause a great performance loss
5309
5310     Actually, it doesn't cause any sort of performance loss.  It's a surprising
5311     fact about regular expressions that they always match in linear time
5312     regardless of how complex they are.
5313
5314     >    but are there any specific rules about it ?
5315
5316     See the "Performance Considerations" section of the man page, and also
5317     the example in MISC/fastwc/.
5318
5319     		Vern
5320
5321
5322File: flex.info,  Node: Is backing up a big deal?,  Next: Can I fake multi-byte character support?,  Prev: Are certain equivalent patterns faster than others?,  Up: FAQ
5323
5324Is backing up a big deal?
5325=========================
5326
5327     To: Adoram Rogel <adoram@hybridge.com>
5328     Subject: Re: Flex 2.5.2 performance questions
5329     In-reply-to: Your message of Thu, 19 Sep 96 10:16:04 EDT.
5330     Date: Thu, 19 Sep 96 09:58:00 PDT
5331     From: Vern Paxson <vern>
5332
5333     > a lot about the backing up problem.
5334     > I believe that there lies my biggest problem, and I'll try to improve
5335     > it.
5336
5337     Since you have variable trailing context, this is a bigger performance
5338     problem.  Fixing it is usually easier than fixing backing up, which in a
5339     complicated scanner (yours seems to fit the bill) can be extremely
5340     difficult to do correctly.
5341
5342     You also don't mention what flags you are using for your scanner.
5343     -f makes a large speed difference, and -Cfe buys you nearly as much
5344     speed but the resulting scanner is considerably smaller.
5345
5346     > I have an | operator in {and} and in {pats} so both of them are variable
5347     > length.
5348
5349     -p should have reported this.
5350
5351     > Is changing one of them to fixed-length is enough ?
5352
5353     Yes.
5354
5355     > Is it possible to change the 32,000 states limit ?
5356
5357     Yes.  I've appended instructions on how.  Before you make this change,
5358     though, you should think about whether there are ways to fundamentally
5359     simplify your scanner - those are certainly preferable!
5360
5361     		Vern
5362
5363     To increase the 32K limit (on a machine with 32 bit integers), you increase
5364     the magnitude of the following in flexdef.h:
5365
5366     #define JAMSTATE -32766 /* marks a reference to the state that always jams */
5367     #define MAXIMUM_MNS 31999
5368     #define BAD_SUBSCRIPT -32767
5369     #define MAX_SHORT 32700
5370
5371     Adding a 0 or two after each should do the trick.
5372
5373
5374File: flex.info,  Node: Can I fake multi-byte character support?,  Next: deleteme01,  Prev: Is backing up a big deal?,  Up: FAQ
5375
5376Can I fake multi-byte character support?
5377========================================
5378
5379     To: Heeman_Lee@hp.com
5380     Subject: Re: flex - multi-byte support?
5381     In-reply-to: Your message of Thu, 03 Oct 1996 17:24:04 PDT.
5382     Date: Fri, 04 Oct 1996 11:42:18 PDT
5383     From: Vern Paxson <vern>
5384
5385     >      I assume as long as my *.l file defines the
5386     >      range of expected character code values (in octal format), flex will
5387     >      scan the file and read multi-byte characters correctly. But I have no
5388     >      confidence in this assumption.
5389
5390     Your lack of confidence is justified - this won't work.
5391
5392     Flex has in it a widespread assumption that the input is processed
5393     one byte at a time.  Fixing this is on the to-do list, but is involved,
5394     so it won't happen any time soon.  In the interim, the best I can suggest
5395     (unless you want to try fixing it yourself) is to write your rules in
5396     terms of pairs of bytes, using definitions in the first section:
5397
5398     	X	\xfe\xc2
5399     	...
5400     	%%
5401     	foo{X}bar	found_foo_fe_c2_bar();
5402
5403     etc.  Definitely a pain - sorry about that.
5404
5405     By the way, the email address you used for me is ancient, indicating you
5406     have a very old version of flex.  You can get the most recent, 2.5.4, from
5407     ftp.ee.lbl.gov.
5408
5409     		Vern
5410
5411
5412File: flex.info,  Node: deleteme01,  Next: Can you discuss some flex internals?,  Prev: Can I fake multi-byte character support?,  Up: FAQ
5413
5414deleteme01
5415==========
5416
5417     To: moleary@primus.com
5418     Subject: Re: Flex / Unicode compatibility question
5419     In-reply-to: Your message of Tue, 22 Oct 1996 10:15:42 PDT.
5420     Date: Tue, 22 Oct 1996 11:06:13 PDT
5421     From: Vern Paxson <vern>
5422
5423     Unfortunately flex at the moment has a widespread assumption within it
5424     that characters are processed 8 bits at a time.  I don't see any easy
5425     fix for this (other than writing your rules in terms of double characters -
5426     a pain).  I also don't know of a wider lex, though you might try surfing
5427     the Plan 9 stuff because I know it's a Unicode system, and also the PCCT
5428     toolkit (try searching say Alta Vista for "Purdue Compiler Construction
5429     Toolkit").
5430
5431     Fixing flex to handle wider characters is on the long-term to-do list.
5432     But since flex is a strictly spare-time project these days, this probably
5433     won't happen for quite a while, unless someone else does it first.
5434
5435     		Vern
5436
5437
5438File: flex.info,  Node: Can you discuss some flex internals?,  Next: unput() messes up yy_at_bol,  Prev: deleteme01,  Up: FAQ
5439
5440Can you discuss some flex internals?
5441====================================
5442
5443     To: Johan Linde <jl@theophys.kth.se>
5444     Subject: Re: translation of flex
5445     In-reply-to: Your message of Sun, 10 Nov 1996 09:16:36 PST.
5446     Date: Mon, 11 Nov 1996 10:33:50 PST
5447     From: Vern Paxson <vern>
5448
5449     > I'm working for the Swedish team translating GNU program, and I'm currently
5450     > working with flex. I have a few questions about some of the messages which
5451     > I hope you can answer.
5452
5453     All of the things you're wondering about, by the way, concerning flex
5454     internals - probably the only person who understands what they mean in
5455     English is me!  So I wouldn't worry too much about getting them right.
5456     That said ...
5457
5458     > #: main.c:545
5459     > msgid "  %d protos created\n"
5460     >
5461     > Does proto mean prototype?
5462
5463     Yes - prototypes of state compression tables.
5464
5465     > #: main.c:539
5466     > msgid "  %d/%d (peak %d) template nxt-chk entries created\n"
5467     >
5468     > Here I'm mainly puzzled by 'nxt-chk'. I guess it means 'next-check'. (?)
5469     > However, 'template next-check entries' doesn't make much sense to me. To be
5470     > able to find a good translation I need to know a little bit more about it.
5471
5472     There is a scheme in the Aho/Sethi/Ullman compiler book for compressing
5473     scanner tables.  It involves creating two pairs of tables.  The first has
5474     "base" and "default" entries, the second has "next" and "check" entries.
5475     The "base" entry is indexed by the current state and yields an index into
5476     the next/check table.  The "default" entry gives what to do if the state
5477     transition isn't found in next/check.  The "next" entry gives the next
5478     state to enter, but only if the "check" entry verifies that this entry is
5479     correct for the current state.  Flex creates templates of series of
5480     next/check entries and then encodes differences from these templates as a
5481     way to compress the tables.
5482
5483     > #: main.c:533
5484     > msgid "  %d/%d base-def entries created\n"
5485     >
5486     > The same problem here for 'base-def'.
5487
5488     See above.
5489
5490     		Vern
5491
5492
5493File: flex.info,  Node: unput() messes up yy_at_bol,  Next: The | operator is not doing what I want,  Prev: Can you discuss some flex internals?,  Up: FAQ
5494
5495unput() messes up yy_at_bol
5496===========================
5497
5498     To: Xinying Li <xli@npac.syr.edu>
5499     Subject: Re: FLEX ?
5500     In-reply-to: Your message of Wed, 13 Nov 1996 17:28:38 PST.
5501     Date: Wed, 13 Nov 1996 19:51:54 PST
5502     From: Vern Paxson <vern>
5503
5504     > "unput()" them to input flow, question occurs. If I do this after I scan
5505     > a carriage, the variable "YY_CURRENT_BUFFER->yy_at_bol" is changed. That
5506     > means the carriage flag has gone.
5507
5508     You can control this by calling yy_set_bol().  It's described in the manual.
5509
5510     >      And if in pre-reading it goes to the end of file, is anything done
5511     > to control the end of curren buffer and end of file?
5512
5513     No, there's no way to put back an end-of-file.
5514
5515     >      By the way I am using flex 2.5.2 and using the "-l".
5516
5517     The latest release is 2.5.4, by the way.  It fixes some bugs in 2.5.2 and
5518     2.5.3.  You can get it from ftp.ee.lbl.gov.
5519
5520     		Vern
5521
5522
5523File: flex.info,  Node: The | operator is not doing what I want,  Next: Why can't flex understand this variable trailing context pattern?,  Prev: unput() messes up yy_at_bol,  Up: FAQ
5524
5525The | operator is not doing what I want
5526=======================================
5527
5528     To: Alain.ISSARD@st.com
5529     Subject: Re: Start condition with FLEX
5530     In-reply-to: Your message of Mon, 18 Nov 1996 09:45:02 PST.
5531     Date: Mon, 18 Nov 1996 10:41:34 PST
5532     From: Vern Paxson <vern>
5533
5534     > I am not able to use the start condition scope and to use the | (OR) with
5535     > rules having start conditions.
5536
5537     The problem is that if you use '|' as a regular expression operator, for
5538     example "a|b" meaning "match either 'a' or 'b'", then it must *not* have
5539     any blanks around it.  If you instead want the special '|' *action* (which
5540     from your scanner appears to be the case), which is a way of giving two
5541     different rules the same action:
5542
5543     	foo	|
5544     	bar	matched_foo_or_bar();
5545
5546     then '|' *must* be separated from the first rule by whitespace and *must*
5547     be followed by a new line.  You *cannot* write it as:
5548
5549     	foo | bar	matched_foo_or_bar();
5550
5551     even though you might think you could because yacc supports this syntax.
5552     The reason for this unfortunately incompatibility is historical, but it's
5553     unlikely to be changed.
5554
5555     Your problems with start condition scope are simply due to syntax errors
5556     from your use of '|' later confusing flex.
5557
5558     Let me know if you still have problems.
5559
5560     		Vern
5561
5562
5563File: flex.info,  Node: Why can't flex understand this variable trailing context pattern?,  Next: The ^ operator isn't working,  Prev: The | operator is not doing what I want,  Up: FAQ
5564
5565Why can't flex understand this variable trailing context pattern?
5566=================================================================
5567
5568     To: Gregory Margo <gmargo@newton.vip.best.com>
5569     Subject: Re: flex-2.5.3 bug report
5570     In-reply-to: Your message of Sat, 23 Nov 1996 16:50:09 PST.
5571     Date: Sat, 23 Nov 1996 17:07:32 PST
5572     From: Vern Paxson <vern>
5573
5574     > Enclosed is a lex file that "real" lex will process, but I cannot get
5575     > flex to process it.  Could you try it and maybe point me in the right direction?
5576
5577     Your problem is that some of the definitions in the scanner use the '/'
5578     trailing context operator, and have it enclosed in ()'s.  Flex does not
5579     allow this operator to be enclosed in ()'s because doing so allows undefined
5580     regular expressions such as "(a/b)+".  So the solution is to remove the
5581     parentheses.  Note that you must also be building the scanner with the -l
5582     option for AT&T lex compatibility.  Without this option, flex automatically
5583     encloses the definitions in parentheses.
5584
5585     		Vern
5586
5587
5588File: flex.info,  Node: The ^ operator isn't working,  Next: Trailing context is getting confused with trailing optional patterns,  Prev: Why can't flex understand this variable trailing context pattern?,  Up: FAQ
5589
5590The ^ operator isn't working
5591============================
5592
5593     To: Thomas Hadig <hadig@toots.physik.rwth-aachen.de>
5594     Subject: Re: Flex Bug ?
5595     In-reply-to: Your message of Tue, 26 Nov 1996 14:35:01 PST.
5596     Date: Tue, 26 Nov 1996 11:15:05 PST
5597     From: Vern Paxson <vern>
5598
5599     > In my lexer code, i have the line :
5600     > ^\*.*          { }
5601     >
5602     > Thus all lines starting with an astrix (*) are comment lines.
5603     > This does not work !
5604
5605     I can't get this problem to reproduce - it works fine for me.  Note
5606     though that if what you have is slightly different:
5607
5608     	COMMENT	^\*.*
5609     	%%
5610     	{COMMENT}	{ }
5611
5612     then it won't work, because flex pushes back macro definitions enclosed
5613     in ()'s, so the rule becomes
5614
5615     	(^\*.*)		{ }
5616
5617     and now that the '^' operator is not at the immediate beginning of the
5618     line, it's interpreted as just a regular character.  You can avoid this
5619     behavior by using the "-l" lex-compatibility flag, or "%option lex-compat".
5620
5621     		Vern
5622
5623
5624File: flex.info,  Node: Trailing context is getting confused with trailing optional patterns,  Next: Is flex GNU or not?,  Prev: The ^ operator isn't working,  Up: FAQ
5625
5626Trailing context is getting confused with trailing optional patterns
5627====================================================================
5628
5629     To: Adoram Rogel <adoram@hybridge.com>
5630     Subject: Re: Flex 2.5.4 BOF ???
5631     In-reply-to: Your message of Tue, 26 Nov 1996 16:10:41 PST.
5632     Date: Wed, 27 Nov 1996 10:56:25 PST
5633     From: Vern Paxson <vern>
5634
5635     >     Organization(s)?/[a-z]
5636     >
5637     > This matched "Organizations" (looking in debug mode, the trailing s
5638     > was matched with trailing context instead of the optional (s) in the
5639     > end of the word.
5640
5641     That should only happen with lex.  Flex can properly match this pattern.
5642     (That might be what you're saying, I'm just not sure.)
5643
5644     > Is there a way to avoid this dangerous trailing context problem ?
5645
5646     Unfortunately, there's no easy way.  On the other hand, I don't see why
5647     it should be a problem.  Lex's matching is clearly wrong, and I'd hope
5648     that usually the intent remains the same as expressed with the pattern,
5649     so flex's matching will be correct.
5650
5651     		Vern
5652
5653
5654File: flex.info,  Node: Is flex GNU or not?,  Next: ERASEME53,  Prev: Trailing context is getting confused with trailing optional patterns,  Up: FAQ
5655
5656Is flex GNU or not?
5657===================
5658
5659     To: Cameron MacKinnon <mackin@interlog.com>
5660     Subject: Re: Flex documentation bug
5661     In-reply-to: Your message of Mon, 02 Dec 1996 00:07:08 PST.
5662     Date: Sun, 01 Dec 1996 22:29:39 PST
5663     From: Vern Paxson <vern>
5664
5665     > I'm not sure how or where to submit bug reports (documentation or
5666     > otherwise) for the GNU project stuff ...
5667
5668     Well, strictly speaking flex isn't part of the GNU project.  They just
5669     distribute it because no one's written a decent GPL'd lex replacement.
5670     So you should send bugs directly to me.  Those sent to the GNU folks
5671     sometimes find there way to me, but some may drop between the cracks.
5672
5673     > In GNU Info, under the section 'Start Conditions', and also in the man
5674     > page (mine's dated April '95) is a nice little snippet showing how to
5675     > parse C quoted strings into a buffer, defined to be MAX_STR_CONST in
5676     > size. Unfortunately, no overflow checking is ever done ...
5677
5678     This is already mentioned in the manual:
5679
5680     Finally, here's an example of how to  match  C-style  quoted
5681     strings using exclusive start conditions, including expanded
5682     escape sequences (but not including checking  for  a  string
5683     that's too long):
5684
5685     The reason for not doing the overflow checking is that it will needlessly
5686     clutter up an example whose main purpose is just to demonstrate how to
5687     use flex.
5688
5689     The latest release is 2.5.4, by the way, available from ftp.ee.lbl.gov.
5690
5691     		Vern
5692
5693
5694File: flex.info,  Node: ERASEME53,  Next: I need to scan if-then-else blocks and while loops,  Prev: Is flex GNU or not?,  Up: FAQ
5695
5696ERASEME53
5697=========
5698
5699     To: tsv@cs.UManitoba.CA
5700     Subject: Re: Flex (reg)..
5701     In-reply-to: Your message of Thu, 06 Mar 1997 23:50:16 PST.
5702     Date: Thu, 06 Mar 1997 15:54:19 PST
5703     From: Vern Paxson <vern>
5704
5705     > [:alpha:] ([:alnum:] | \\_)*
5706
5707     If your rule really has embedded blanks as shown above, then it won't
5708     work, as the first blank delimits the rule from the action.  (It wouldn't
5709     even compile ...)  You need instead:
5710
5711     [:alpha:]([:alnum:]|\\_)*
5712
5713     and that should work fine - there's no restriction on what can go inside
5714     of ()'s except for the trailing context operator, '/'.
5715
5716     		Vern
5717
5718
5719File: flex.info,  Node: I need to scan if-then-else blocks and while loops,  Next: ERASEME55,  Prev: ERASEME53,  Up: FAQ
5720
5721I need to scan if-then-else blocks and while loops
5722==================================================
5723
5724     To: "Mike Stolnicki" <mstolnic@ford.com>
5725     Subject: Re: FLEX help
5726     In-reply-to: Your message of Fri, 30 May 1997 13:33:27 PDT.
5727     Date: Fri, 30 May 1997 10:46:35 PDT
5728     From: Vern Paxson <vern>
5729
5730     > We'd like to add "if-then-else", "while", and "for" statements to our
5731     > language ...
5732     > We've investigated many possible solutions.  The one solution that seems
5733     > the most reasonable involves knowing the position of a TOKEN in yyin.
5734
5735     I strongly advise you to instead build a parse tree (abstract syntax tree)
5736     and loop over that instead.  You'll find this has major benefits in keeping
5737     your interpreter simple and extensible.
5738
5739     That said, the functionality you mention for get_position and set_position
5740     have been on the to-do list for a while.  As flex is a purely spare-time
5741     project for me, no guarantees when this will be added (in particular, it
5742     for sure won't be for many months to come).
5743
5744     		Vern
5745
5746
5747File: flex.info,  Node: ERASEME55,  Next: ERASEME56,  Prev: I need to scan if-then-else blocks and while loops,  Up: FAQ
5748
5749ERASEME55
5750=========
5751
5752     To: Colin Paul Adams <colin@colina.demon.co.uk>
5753     Subject: Re: Flex C++ classes and Bison
5754     In-reply-to: Your message of 09 Aug 1997 17:11:41 PDT.
5755     Date: Fri, 15 Aug 1997 10:48:19 PDT
5756     From: Vern Paxson <vern>
5757
5758     > #define YY_DECL   int yylex (YYSTYPE *lvalp, struct parser_control
5759     > *parm)
5760     >
5761     > I have been trying  to get this to work as a C++ scanner, but it does
5762     > not appear to be possible (warning that it matches no declarations in
5763     > yyFlexLexer, or something like that).
5764     >
5765     > Is this supposed to be possible, or is it being worked on (I DID
5766     > notice the comment that scanner classes are still experimental, so I'm
5767     > not too hopeful)?
5768
5769     What you need to do is derive a subclass from yyFlexLexer that provides
5770     the above yylex() method, squirrels away lvalp and parm into member
5771     variables, and then invokes yyFlexLexer::yylex() to do the regular scanning.
5772
5773     		Vern
5774
5775
5776File: flex.info,  Node: ERASEME56,  Next: ERASEME57,  Prev: ERASEME55,  Up: FAQ
5777
5778ERASEME56
5779=========
5780
5781     To: Mikael.Latvala@lmf.ericsson.se
5782     Subject: Re: Possible mistake in Flex v2.5 document
5783     In-reply-to: Your message of Fri, 05 Sep 1997 16:07:24 PDT.
5784     Date: Fri, 05 Sep 1997 10:01:54 PDT
5785     From: Vern Paxson <vern>
5786
5787     > In that example you show how to count comment lines when using
5788     > C style /* ... */ comments. My question is, shouldn't you take into
5789     > account a scenario where end of a comment marker occurs inside
5790     > character or string literals?
5791
5792     The scanner certainly needs to also scan character and string literals.
5793     However it does that (there's an example in the man page for strings), the
5794     lexer will recognize the beginning of the literal before it runs across the
5795     embedded "/*".  Consequently, it will finish scanning the literal before it
5796     even considers the possibility of matching "/*".
5797
5798     Example:
5799
5800     	'([^']*|{ESCAPE_SEQUENCE})'
5801
5802     will match all the text between the ''s (inclusive).  So the lexer
5803     considers this as a token beginning at the first ', and doesn't even
5804     attempt to match other tokens inside it.
5805
5806     I thinnk this subtlety is not worth putting in the manual, as I suspect
5807     it would confuse more people than it would enlighten.
5808
5809     		Vern
5810
5811
5812File: flex.info,  Node: ERASEME57,  Next: Is there a repository for flex scanners?,  Prev: ERASEME56,  Up: FAQ
5813
5814ERASEME57
5815=========
5816
5817     To: "Marty Leisner" <leisner@sdsp.mc.xerox.com>
5818     Subject: Re: flex limitations
5819     In-reply-to: Your message of Sat, 06 Sep 1997 11:27:21 PDT.
5820     Date: Mon, 08 Sep 1997 11:38:08 PDT
5821     From: Vern Paxson <vern>
5822
5823     > %%
5824     > [a-zA-Z]+       /* skip a line */
5825     >                 {  printf("got %s\n", yytext); }
5826     > %%
5827
5828     What version of flex are you using?  If I feed this to 2.5.4, it complains:
5829
5830     	"bug.l", line 5: EOF encountered inside an action
5831     	"bug.l", line 5: unrecognized rule
5832     	"bug.l", line 5: fatal parse error
5833
5834     Not the world's greatest error message, but it manages to flag the problem.
5835
5836     (With the introduction of start condition scopes, flex can't accommodate
5837     an action on a separate line, since it's ambiguous with an indented rule.)
5838
5839     You can get 2.5.4 from ftp.ee.lbl.gov.
5840
5841     		Vern
5842
5843
5844File: flex.info,  Node: Is there a repository for flex scanners?,  Next: How can I conditionally compile or preprocess my flex input file?,  Prev: ERASEME57,  Up: FAQ
5845
5846Is there a repository for flex scanners?
5847========================================
5848
5849Not that we know of. You might try asking on comp.compilers.
5850
5851
5852File: flex.info,  Node: How can I conditionally compile or preprocess my flex input file?,  Next: Where can I find grammars for lex and yacc?,  Prev: Is there a repository for flex scanners?,  Up: FAQ
5853
5854How can I conditionally compile or preprocess my flex input file?
5855=================================================================
5856
5857Flex doesn't have a preprocessor like C does.  You might try using m4,
5858or the C preprocessor plus a sed script to clean up the result.
5859
5860
5861File: flex.info,  Node: Where can I find grammars for lex and yacc?,  Next: I get an end-of-buffer message for each character scanned.,  Prev: How can I conditionally compile or preprocess my flex input file?,  Up: FAQ
5862
5863Where can I find grammars for lex and yacc?
5864===========================================
5865
5866In the sources for flex and bison.
5867
5868
5869File: flex.info,  Node: I get an end-of-buffer message for each character scanned.,  Next: unnamed-faq-62,  Prev: Where can I find grammars for lex and yacc?,  Up: FAQ
5870
5871I get an end-of-buffer message for each character scanned.
5872==========================================================
5873
5874This will happen if your LexerInput() function returns only one
5875character at a time, which can happen either if you're scanner is
5876"interactive", or if the streams library on your platform always
5877returns 1 for yyin->gcount().
5878
5879   Solution: override LexerInput() with a version that returns whole
5880buffers.
5881
5882
5883File: flex.info,  Node: unnamed-faq-62,  Next: unnamed-faq-63,  Prev: I get an end-of-buffer message for each character scanned.,  Up: FAQ
5884
5885unnamed-faq-62
5886==============
5887
5888     To: Georg.Rehm@CL-KI.Uni-Osnabrueck.DE
5889     Subject: Re: Flex maximums
5890     In-reply-to: Your message of Mon, 17 Nov 1997 17:16:06 PST.
5891     Date: Mon, 17 Nov 1997 17:16:15 PST
5892     From: Vern Paxson <vern>
5893
5894     > I took a quick look into the flex-sources and altered some #defines in
5895     > flexdefs.h:
5896     >
5897     > 	#define INITIAL_MNS 64000
5898     > 	#define MNS_INCREMENT 1024000
5899     > 	#define MAXIMUM_MNS 64000
5900
5901     The things to fix are to add a couple of zeroes to:
5902
5903     #define JAMSTATE -32766 /* marks a reference to the state that always jams */
5904     #define MAXIMUM_MNS 31999
5905     #define BAD_SUBSCRIPT -32767
5906     #define MAX_SHORT 32700
5907
5908     and, if you get complaints about too many rules, make the following change too:
5909
5910     	#define YY_TRAILING_MASK 0x200000
5911     	#define YY_TRAILING_HEAD_MASK 0x400000
5912
5913     - Vern
5914
5915
5916File: flex.info,  Node: unnamed-faq-63,  Next: unnamed-faq-64,  Prev: unnamed-faq-62,  Up: FAQ
5917
5918unnamed-faq-63
5919==============
5920
5921     To: jimmey@lexis-nexis.com (Jimmey Todd)
5922     Subject: Re: FLEX question regarding istream vs ifstream
5923     In-reply-to: Your message of Mon, 08 Dec 1997 15:54:15 PST.
5924     Date: Mon, 15 Dec 1997 13:21:35 PST
5925     From: Vern Paxson <vern>
5926
5927     >         stdin_handle = YY_CURRENT_BUFFER;
5928     >         ifstream fin( "aFile" );
5929     >         yy_switch_to_buffer( yy_create_buffer( fin, YY_BUF_SIZE ) );
5930     >
5931     > What I'm wanting to do, is pass the contents of a file thru one set
5932     > of rules and then pass stdin thru another set... It works great if, I
5933     > don't use the C++ classes. But since everything else that I'm doing is
5934     > in C++, I thought I'd be consistent.
5935     >
5936     > The problem is that 'yy_create_buffer' is expecting an istream* as it's
5937     > first argument (as stated in the man page). However, fin is a ifstream
5938     > object. Any ideas on what I might be doing wrong? Any help would be
5939     > appreciated. Thanks!!
5940
5941     You need to pass &fin, to turn it into an ifstream* instead of an ifstream.
5942     Then its type will be compatible with the expected istream*, because ifstream
5943     is derived from istream.
5944
5945     		Vern
5946
5947
5948File: flex.info,  Node: unnamed-faq-64,  Next: unnamed-faq-65,  Prev: unnamed-faq-63,  Up: FAQ
5949
5950unnamed-faq-64
5951==============
5952
5953     To: Enda Fadian <fadiane@piercom.ie>
5954     Subject: Re: Question related to Flex man page?
5955     In-reply-to: Your message of Tue, 16 Dec 1997 15:17:34 PST.
5956     Date: Tue, 16 Dec 1997 14:17:09 PST
5957     From: Vern Paxson <vern>
5958
5959     > Can you explain to me what is ment by a long-jump in relation to flex?
5960
5961     Using the longjmp() function while inside yylex() or a routine called by it.
5962
5963     > what is the flex activation frame.
5964
5965     Just yylex()'s stack frame.
5966
5967     > As far as I can see yyrestart will bring me back to the sart of the input
5968     > file and using flex++ isnot really an option!
5969
5970     No, yyrestart() doesn't imply a rewind, even though its name might sound
5971     like it does.  It tells the scanner to flush its internal buffers and
5972     start reading from the given file at its present location.
5973
5974     		Vern
5975
5976
5977File: flex.info,  Node: unnamed-faq-65,  Next: unnamed-faq-66,  Prev: unnamed-faq-64,  Up: FAQ
5978
5979unnamed-faq-65
5980==============
5981
5982     To: hassan@larc.info.uqam.ca (Hassan Alaoui)
5983     Subject: Re: Need urgent Help
5984     In-reply-to: Your message of Sat, 20 Dec 1997 19:38:19 PST.
5985     Date: Sun, 21 Dec 1997 21:30:46 PST
5986     From: Vern Paxson <vern>
5987
5988     > /usr/lib/yaccpar: In function `int yyparse()':
5989     > /usr/lib/yaccpar:184: warning: implicit declaration of function `int yylex(...)'
5990     >
5991     > ld: Undefined symbol
5992     >    _yylex
5993     >    _yyparse
5994     >    _yyin
5995
5996     This is a known problem with Solaris C++ (and/or Solaris yacc).  I believe
5997     the fix is to explicitly insert some 'extern "C"' statements for the
5998     corresponding routines/symbols.
5999
6000     		Vern
6001
6002
6003File: flex.info,  Node: unnamed-faq-66,  Next: unnamed-faq-67,  Prev: unnamed-faq-65,  Up: FAQ
6004
6005unnamed-faq-66
6006==============
6007
6008     To: mc0307@mclink.it
6009     Cc: gnu@prep.ai.mit.edu
6010     Subject: Re: [mc0307@mclink.it: Help request]
6011     In-reply-to: Your message of Fri, 12 Dec 1997 17:57:29 PST.
6012     Date: Sun, 21 Dec 1997 22:33:37 PST
6013     From: Vern Paxson <vern>
6014
6015     > This is my definition for float and integer types:
6016     > . . .
6017     > NZD          [1-9]
6018     > ...
6019     > I've tested my program on other lex version (on UNIX Sun Solaris an HP
6020     > UNIX) and it work well, so I think that my definitions are correct.
6021     > There are any differences between Lex and Flex?
6022
6023     There are indeed differences, as discussed in the man page.  The one
6024     you are probably running into is that when flex expands a name definition,
6025     it puts parentheses around the expansion, while lex does not.  There's
6026     an example in the man page of how this can lead to different matching.
6027     Flex's behavior complies with the POSIX standard (or at least with the
6028     last POSIX draft I saw).
6029
6030     		Vern
6031
6032
6033File: flex.info,  Node: unnamed-faq-67,  Next: unnamed-faq-68,  Prev: unnamed-faq-66,  Up: FAQ
6034
6035unnamed-faq-67
6036==============
6037
6038     To: hassan@larc.info.uqam.ca (Hassan Alaoui)
6039     Subject: Re: Thanks
6040     In-reply-to: Your message of Mon, 22 Dec 1997 16:06:35 PST.
6041     Date: Mon, 22 Dec 1997 14:35:05 PST
6042     From: Vern Paxson <vern>
6043
6044     > Thank you very much for your help. I compile and link well with C++ while
6045     > declaring 'yylex ...' extern, But a little problem remains. I get a
6046     > segmentation default when executing ( I linked with lfl library) while it
6047     > works well when using LEX instead of flex. Do you have some ideas about the
6048     > reason for this ?
6049
6050     The one possible reason for this that comes to mind is if you've defined
6051     yytext as "extern char yytext[]" (which is what lex uses) instead of
6052     "extern char *yytext" (which is what flex uses).  If it's not that, then
6053     I'm afraid I don't know what the problem might be.
6054
6055     		Vern
6056
6057
6058File: flex.info,  Node: unnamed-faq-68,  Next: unnamed-faq-69,  Prev: unnamed-faq-67,  Up: FAQ
6059
6060unnamed-faq-68
6061==============
6062
6063     To: "Bart Niswonger" <NISWONGR@almaden.ibm.com>
6064     Subject: Re: flex 2.5: c++ scanners & start conditions
6065     In-reply-to: Your message of Tue, 06 Jan 1998 10:34:21 PST.
6066     Date: Tue, 06 Jan 1998 19:19:30 PST
6067     From: Vern Paxson <vern>
6068
6069     > The problem is that when I do this (using %option c++) start
6070     > conditions seem to not apply.
6071
6072     The BEGIN macro modifies the yy_start variable.  For C scanners, this
6073     is a static with scope visible through the whole file.  For C++ scanners,
6074     it's a member variable, so it only has visible scope within a member
6075     function.  Your lexbegin() routine is not a member function when you
6076     build a C++ scanner, so it's not modifying the correct yy_start.  The
6077     diagnostic that indicates this is that you found you needed to add
6078     a declaration of yy_start in order to get your scanner to compile when
6079     using C++; instead, the correct fix is to make lexbegin() a member
6080     function (by deriving from yyFlexLexer).
6081
6082     		Vern
6083
6084
6085File: flex.info,  Node: unnamed-faq-69,  Next: unnamed-faq-70,  Prev: unnamed-faq-68,  Up: FAQ
6086
6087unnamed-faq-69
6088==============
6089
6090     To: "Boris Zinin" <boris@ippe.rssi.ru>
6091     Subject: Re: current position in flex buffer
6092     In-reply-to: Your message of Mon, 12 Jan 1998 18:58:23 PST.
6093     Date: Mon, 12 Jan 1998 12:03:15 PST
6094     From: Vern Paxson <vern>
6095
6096     > The problem is how to determine the current position in flex active
6097     > buffer when a rule is matched....
6098
6099     You will need to keep track of this explicitly, such as by redefining
6100     YY_USER_ACTION to count the number of characters matched.
6101
6102     The latest flex release, by the way, is 2.5.4, available from ftp.ee.lbl.gov.
6103
6104     		Vern
6105
6106
6107File: flex.info,  Node: unnamed-faq-70,  Next: unnamed-faq-71,  Prev: unnamed-faq-69,  Up: FAQ
6108
6109unnamed-faq-70
6110==============
6111
6112     To: Bik.Dhaliwal@bis.org
6113     Subject: Re: Flex question
6114     In-reply-to: Your message of Mon, 26 Jan 1998 13:05:35 PST.
6115     Date: Tue, 27 Jan 1998 22:41:52 PST
6116     From: Vern Paxson <vern>
6117
6118     > That requirement involves knowing
6119     > the character position at which a particular token was matched
6120     > in the lexer.
6121
6122     The way you have to do this is by explicitly keeping track of where
6123     you are in the file, by counting the number of characters scanned
6124     for each token (available in yyleng).  It may prove convenient to
6125     do this by redefining YY_USER_ACTION, as described in the manual.
6126
6127     		Vern
6128
6129
6130File: flex.info,  Node: unnamed-faq-71,  Next: unnamed-faq-72,  Prev: unnamed-faq-70,  Up: FAQ
6131
6132unnamed-faq-71
6133==============
6134
6135     To: Vladimir Alexiev <vladimir@cs.ualberta.ca>
6136     Subject: Re: flex: how to control start condition from parser?
6137     In-reply-to: Your message of Mon, 26 Jan 1998 05:50:16 PST.
6138     Date: Tue, 27 Jan 1998 22:45:37 PST
6139     From: Vern Paxson <vern>
6140
6141     > It seems useful for the parser to be able to tell the lexer about such
6142     > context dependencies, because then they don't have to be limited to
6143     > local or sequential context.
6144
6145     One way to do this is to have the parser call a stub routine that's
6146     included in the scanner's .l file, and consequently that has access ot
6147     BEGIN.  The only ugliness is that the parser can't pass in the state
6148     it wants, because those aren't visible - but if you don't have many
6149     such states, then using a different set of names doesn't seem like
6150     to much of a burden.
6151
6152     While generating a .h file like you suggests is certainly cleaner,
6153     flex development has come to a virtual stand-still :-(, so a workaround
6154     like the above is much more pragmatic than waiting for a new feature.
6155
6156     		Vern
6157
6158
6159File: flex.info,  Node: unnamed-faq-72,  Next: unnamed-faq-73,  Prev: unnamed-faq-71,  Up: FAQ
6160
6161unnamed-faq-72
6162==============
6163
6164     To: Barbara Denny <denny@3com.com>
6165     Subject: Re: freebsd flex bug?
6166     In-reply-to: Your message of Fri, 30 Jan 1998 12:00:43 PST.
6167     Date: Fri, 30 Jan 1998 12:42:32 PST
6168     From: Vern Paxson <vern>
6169
6170     > lex.yy.c:1996: parse error before `='
6171
6172     This is the key, identifying this error.  (It may help to pinpoint
6173     it by using flex -L, so it doesn't generate #line directives in its
6174     output.)  I will bet you heavy money that you have a start condition
6175     name that is also a variable name, or something like that; flex spits
6176     out #define's for each start condition name, mapping them to a number,
6177     so you can wind up with:
6178
6179     	%x foo
6180     	%%
6181     		...
6182     	%%
6183     	void bar()
6184     		{
6185     		int foo = 3;
6186     		}
6187
6188     and the penultimate will turn into "int 1 = 3" after C preprocessing,
6189     since flex will put "#define foo 1" in the generated scanner.
6190
6191     		Vern
6192
6193
6194File: flex.info,  Node: unnamed-faq-73,  Next: unnamed-faq-74,  Prev: unnamed-faq-72,  Up: FAQ
6195
6196unnamed-faq-73
6197==============
6198
6199     To: Maurice Petrie <mpetrie@infoscigroup.com>
6200     Subject: Re: Lost flex .l file
6201     In-reply-to: Your message of Mon, 02 Feb 1998 14:10:01 PST.
6202     Date: Mon, 02 Feb 1998 11:15:12 PST
6203     From: Vern Paxson <vern>
6204
6205     > I am curious as to
6206     > whether there is a simple way to backtrack from the generated source to
6207     > reproduce the lost list of tokens we are searching on.
6208
6209     In theory, it's straight-forward to go from the DFA representation
6210     back to a regular-expression representation - the two are isomorphic.
6211     In practice, a huge headache, because you have to unpack all the tables
6212     back into a single DFA representation, and then write a program to munch
6213     on that and translate it into an RE.
6214
6215     Sorry for the less-than-happy news ...
6216
6217     		Vern
6218
6219
6220File: flex.info,  Node: unnamed-faq-74,  Next: unnamed-faq-75,  Prev: unnamed-faq-73,  Up: FAQ
6221
6222unnamed-faq-74
6223==============
6224
6225     To: jimmey@lexis-nexis.com (Jimmey Todd)
6226     Subject: Re: Flex performance question
6227     In-reply-to: Your message of Thu, 19 Feb 1998 11:01:17 PST.
6228     Date: Thu, 19 Feb 1998 08:48:51 PST
6229     From: Vern Paxson <vern>
6230
6231     > What I have found, is that the smaller the data chunk, the faster the
6232     > program executes. This is the opposite of what I expected. Should this be
6233     > happening this way?
6234
6235     This is exactly what will happen if your input file has embedded NULs.
6236     From the man page:
6237
6238     A final note: flex is slow when matching NUL's, particularly
6239     when  a  token  contains multiple NUL's.  It's best to write
6240     rules which match short amounts of text if it's  anticipated
6241     that the text will often include NUL's.
6242
6243     So that's the first thing to look for.
6244
6245     		Vern
6246
6247
6248File: flex.info,  Node: unnamed-faq-75,  Next: unnamed-faq-76,  Prev: unnamed-faq-74,  Up: FAQ
6249
6250unnamed-faq-75
6251==============
6252
6253     To: jimmey@lexis-nexis.com (Jimmey Todd)
6254     Subject: Re: Flex performance question
6255     In-reply-to: Your message of Thu, 19 Feb 1998 11:01:17 PST.
6256     Date: Thu, 19 Feb 1998 15:42:25 PST
6257     From: Vern Paxson <vern>
6258
6259     So there are several problems.
6260
6261     First, to go fast, you want to match as much text as possible, which
6262     your scanners don't in the case that what they're scanning is *not*
6263     a <RN> tag.  So you want a rule like:
6264
6265     	[^<]+
6266
6267     Second, C++ scanners are particularly slow if they're interactive,
6268     which they are by default.  Using -B speeds it up by a factor of 3-4
6269     on my workstation.
6270
6271     Third, C++ scanners that use the istream interface are slow, because
6272     of how poorly implemented istream's are.  I built two versions of
6273     the following scanner:
6274
6275     	%%
6276     	.*\n
6277     	.*
6278     	%%
6279
6280     and the C version inhales a 2.5MB file on my workstation in 0.8 seconds.
6281     The C++ istream version, using -B, takes 3.8 seconds.
6282
6283     		Vern
6284
6285
6286File: flex.info,  Node: unnamed-faq-76,  Next: unnamed-faq-77,  Prev: unnamed-faq-75,  Up: FAQ
6287
6288unnamed-faq-76
6289==============
6290
6291     To: "Frescatore, David (CRD, TAD)" <frescatore@exc01crdge.crd.ge.com>
6292     Subject: Re: FLEX 2.5 & THE YEAR 2000
6293     In-reply-to: Your message of Wed, 03 Jun 1998 11:26:22 PDT.
6294     Date: Wed, 03 Jun 1998 10:22:26 PDT
6295     From: Vern Paxson <vern>
6296
6297     > I am researching the Y2K problem with General Electric R&D
6298     > and need to know if there are any known issues concerning
6299     > the above mentioned software and Y2K regardless of version.
6300
6301     There shouldn't be, all it ever does with the date is ask the system
6302     for it and then print it out.
6303
6304     		Vern
6305
6306
6307File: flex.info,  Node: unnamed-faq-77,  Next: unnamed-faq-78,  Prev: unnamed-faq-76,  Up: FAQ
6308
6309unnamed-faq-77
6310==============
6311
6312     To: "Hans Dermot Doran" <htd@ibhdoran.com>
6313     Subject: Re: flex problem
6314     In-reply-to: Your message of Wed, 15 Jul 1998 21:30:13 PDT.
6315     Date: Tue, 21 Jul 1998 14:23:34 PDT
6316     From: Vern Paxson <vern>
6317
6318     > To overcome this, I gets() the stdin into a string and lex the string. The
6319     > string is lexed OK except that the end of string isn't lexed properly
6320     > (yy_scan_string()), that is the lexer dosn't recognise the end of string.
6321
6322     Flex doesn't contain mechanisms for recognizing buffer endpoints.  But if
6323     you use fgets instead (which you should anyway, to protect against buffer
6324     overflows), then the final \n will be preserved in the string, and you can
6325     scan that in order to find the end of the string.
6326
6327     		Vern
6328
6329
6330File: flex.info,  Node: unnamed-faq-78,  Next: unnamed-faq-79,  Prev: unnamed-faq-77,  Up: FAQ
6331
6332unnamed-faq-78
6333==============
6334
6335     To: soumen@almaden.ibm.com
6336     Subject: Re: Flex++ 2.5.3 instance member vs. static member
6337     In-reply-to: Your message of Mon, 27 Jul 1998 02:10:04 PDT.
6338     Date: Tue, 28 Jul 1998 01:10:34 PDT
6339     From: Vern Paxson <vern>
6340
6341     > %{
6342     > int mylineno = 0;
6343     > %}
6344     > ws      [ \t]+
6345     > alpha   [A-Za-z]
6346     > dig     [0-9]
6347     > %%
6348     >
6349     > Now you'd expect mylineno to be a member of each instance of class
6350     > yyFlexLexer, but is this the case?  A look at the lex.yy.cc file seems to
6351     > indicate otherwise; unless I am missing something the declaration of
6352     > mylineno seems to be outside any class scope.
6353     >
6354     > How will this work if I want to run a multi-threaded application with each
6355     > thread creating a FlexLexer instance?
6356
6357     Derive your own subclass and make mylineno a member variable of it.
6358
6359     		Vern
6360
6361
6362File: flex.info,  Node: unnamed-faq-79,  Next: unnamed-faq-80,  Prev: unnamed-faq-78,  Up: FAQ
6363
6364unnamed-faq-79
6365==============
6366
6367     To: Adoram Rogel <adoram@hybridge.com>
6368     Subject: Re: More than 32K states change hangs
6369     In-reply-to: Your message of Tue, 04 Aug 1998 16:55:39 PDT.
6370     Date: Tue, 04 Aug 1998 22:28:45 PDT
6371     From: Vern Paxson <vern>
6372
6373     > Vern Paxson,
6374     >
6375     > I followed your advice, posted on Usenet bu you, and emailed to me
6376     > personally by you, on how to overcome the 32K states limit. I'm running
6377     > on Linux machines.
6378     > I took the full source of version 2.5.4 and did the following changes in
6379     > flexdef.h:
6380     > #define JAMSTATE -327660
6381     > #define MAXIMUM_MNS 319990
6382     > #define BAD_SUBSCRIPT -327670
6383     > #define MAX_SHORT 327000
6384     >
6385     > and compiled.
6386     > All looked fine, including check and bigcheck, so I installed.
6387
6388     Hmmm, you shouldn't increase MAX_SHORT, though looking through my email
6389     archives I see that I did indeed recommend doing so.  Try setting it back
6390     to 32700; that should suffice that you no longer need -Ca.  If it still
6391     hangs, then the interesting question is - where?
6392
6393     > Compiling the same hanged program with a out-of-the-box (RedHat 4.2
6394     > distribution of Linux)
6395     > flex 2.5.4 binary works.
6396
6397     Since Linux comes with source code, you should diff it against what
6398     you have to see what problems they missed.
6399
6400     > Should I always compile with the -Ca option now ? even short and simple
6401     > filters ?
6402
6403     No, definitely not.  It's meant to be for those situations where you
6404     absolutely must squeeze every last cycle out of your scanner.
6405
6406     		Vern
6407
6408
6409File: flex.info,  Node: unnamed-faq-80,  Next: unnamed-faq-81,  Prev: unnamed-faq-79,  Up: FAQ
6410
6411unnamed-faq-80
6412==============
6413
6414     To: "Schmackpfeffer, Craig" <Craig.Schmackpfeffer@usa.xerox.com>
6415     Subject: Re: flex output for static code portion
6416     In-reply-to: Your message of Tue, 11 Aug 1998 11:55:30 PDT.
6417     Date: Mon, 17 Aug 1998 23:57:42 PDT
6418     From: Vern Paxson <vern>
6419
6420     > I would like to use flex under the hood to generate a binary file
6421     > containing the data structures that control the parse.
6422
6423     This has been on the wish-list for a long time.  In principle it's
6424     straight-forward - you redirect mkdata() et al's I/O to another file,
6425     and modify the skeleton to have a start-up function that slurps these
6426     into dynamic arrays.  The concerns are (1) the scanner generation code
6427     is hairy and full of corner cases, so it's easy to get surprised when
6428     going down this path :-( ; and (2) being careful about buffering so
6429     that when the tables change you make sure the scanner starts in the
6430     correct state and reading at the right point in the input file.
6431
6432     > I was wondering if you know of anyone who has used flex in this way.
6433
6434     I don't - but it seems like a reasonable project to undertake (unlike
6435     numerous other flex tweaks :-).
6436
6437     		Vern
6438
6439
6440File: flex.info,  Node: unnamed-faq-81,  Next: unnamed-faq-82,  Prev: unnamed-faq-80,  Up: FAQ
6441
6442unnamed-faq-81
6443==============
6444
6445     Received: from 131.173.17.11 (131.173.17.11 [131.173.17.11])
6446     	by ee.lbl.gov (8.9.1/8.9.1) with ESMTP id AAA03838
6447     	for <vern@ee.lbl.gov>; Thu, 20 Aug 1998 00:47:57 -0700 (PDT)
6448     Received: from hal.cl-ki.uni-osnabrueck.de (hal.cl-ki.Uni-Osnabrueck.DE [131.173.141.2])
6449     	by deimos.rz.uni-osnabrueck.de (8.8.7/8.8.8) with ESMTP id JAA34694
6450     	for <vern@ee.lbl.gov>; Thu, 20 Aug 1998 09:47:55 +0200
6451     Received: (from georg@localhost) by hal.cl-ki.uni-osnabrueck.de (8.6.12/8.6.12) id JAA34834 for vern@ee.lbl.gov; Thu, 20 Aug 1998 09:47:54 +0200
6452     From: Georg Rehm <georg@hal.cl-ki.uni-osnabrueck.de>
6453     Message-Id: <199808200747.JAA34834@hal.cl-ki.uni-osnabrueck.de>
6454     Subject: "flex scanner push-back overflow"
6455     To: vern@ee.lbl.gov
6456     Date: Thu, 20 Aug 1998 09:47:54 +0200 (MEST)
6457     Reply-To: Georg.Rehm@CL-KI.Uni-Osnabrueck.DE
6458     X-NoJunk: Do NOT send commercial mail, spam or ads to this address!
6459     X-URL: http://www.cl-ki.uni-osnabrueck.de/~georg/
6460     X-Mailer: ELM [version 2.4ME+ PL28 (25)]
6461     MIME-Version: 1.0
6462     Content-Type: text/plain; charset=US-ASCII
6463     Content-Transfer-Encoding: 7bit
6464
6465     Hi Vern,
6466
6467     Yesterday, I encountered a strange problem: I use the macro processor m4
6468     to include some lengthy lists into a .l file. Following is a flex macro
6469     definition that causes some serious pain in my neck:
6470
6471     AUTHOR           ("A. Boucard / L. Boucard"|"A. Dastarac / M. Levent"|"A.Boucaud / L.Boucaud"|"Abderrahim Lamchichi"|"Achmat Dangor"|"Adeline Toullier"|"Adewale Maja-Pearce"|"Ahmed Ziri"|"Akram Ellyas"|"Alain Bihr"|"Alain Gresh"|"Alain Guillemoles"|"Alain Joxe"|"Alain Morice"|"Alain Renon"|"Alain Zecchini"|"Albert Memmi"|"Alberto Manguel"|"Alex De Waal"|"Alfonso Artico"| [...])
6472
6473     The complete list contains about 10kB. When I try to "flex" this file
6474     (on a Solaris 2.6 machine, using a modified flex 2.5.4 (I only increased
6475     some of the predefined values in flexdefs.h) I get the error:
6476
6477     myflex/flex -8  sentag.tmp.l
6478     flex scanner push-back overflow
6479
6480     When I remove the slashes in the macro definition everything works fine.
6481     As I understand it, the double quotes escape the slash-character so it
6482     really means "/" and not "trailing context". Furthermore, I tried to
6483     escape the slashes with backslashes, but with no use, the same error message
6484     appeared when flexing the code.
6485
6486     Do you have an idea what's going on here?
6487
6488     Greetings from Germany,
6489     	Georg
6490     --
6491     Georg Rehm                                     georg@cl-ki.uni-osnabrueck.de
6492     Institute for Semantic Information Processing, University of Osnabrueck, FRG
6493
6494
6495File: flex.info,  Node: unnamed-faq-82,  Next: unnamed-faq-83,  Prev: unnamed-faq-81,  Up: FAQ
6496
6497unnamed-faq-82
6498==============
6499
6500     To: Georg.Rehm@CL-KI.Uni-Osnabrueck.DE
6501     Subject: Re: "flex scanner push-back overflow"
6502     In-reply-to: Your message of Thu, 20 Aug 1998 09:47:54 PDT.
6503     Date: Thu, 20 Aug 1998 07:05:35 PDT
6504     From: Vern Paxson <vern>
6505
6506     > myflex/flex -8  sentag.tmp.l
6507     > flex scanner push-back overflow
6508
6509     Flex itself uses a flex scanner.  That scanner is running out of buffer
6510     space when it tries to unput() the humongous macro you've defined.  When
6511     you remove the '/'s, you make it small enough so that it fits in the buffer;
6512     removing spaces would do the same thing.
6513
6514     The fix is to either rethink how come you're using such a big macro and
6515     perhaps there's another/better way to do it; or to rebuild flex's own
6516     scan.c with a larger value for
6517
6518     	#define YY_BUF_SIZE 16384
6519
6520     - Vern
6521
6522
6523File: flex.info,  Node: unnamed-faq-83,  Next: unnamed-faq-84,  Prev: unnamed-faq-82,  Up: FAQ
6524
6525unnamed-faq-83
6526==============
6527
6528     To: Jan Kort <jan@research.techforce.nl>
6529     Subject: Re: Flex
6530     In-reply-to: Your message of Fri, 04 Sep 1998 12:18:43 +0200.
6531     Date: Sat, 05 Sep 1998 00:59:49 PDT
6532     From: Vern Paxson <vern>
6533
6534     > %%
6535     >
6536     > "TEST1\n"       { fprintf(stderr, "TEST1\n"); yyless(5); }
6537     > ^\n             { fprintf(stderr, "empty line\n"); }
6538     > .               { }
6539     > \n              { fprintf(stderr, "new line\n"); }
6540     >
6541     > %%
6542     > -- input ---------------------------------------
6543     > TEST1
6544     > -- output --------------------------------------
6545     > TEST1
6546     > empty line
6547     > ------------------------------------------------
6548
6549     IMHO, it's not clear whether or not this is in fact a bug.  It depends
6550     on whether you view yyless() as backing up in the input stream, or as
6551     pushing new characters onto the beginning of the input stream.  Flex
6552     interprets it as the latter (for implementation convenience, I'll admit),
6553     and so considers the newline as in fact matching at the beginning of a
6554     line, as after all the last token scanned an entire line and so the
6555     scanner is now at the beginning of a new line.
6556
6557     I agree that this is counter-intuitive for yyless(), given its
6558     functional description (it's less so for unput(), depending on whether
6559     you're unput()'ing new text or scanned text).  But I don't plan to
6560     change it any time soon, as it's a pain to do so.  Consequently,
6561     you do indeed need to use yy_set_bol() and YY_AT_BOL() to tweak
6562     your scanner into the behavior you desire.
6563
6564     Sorry for the less-than-completely-satisfactory answer.
6565
6566     		Vern
6567
6568
6569File: flex.info,  Node: unnamed-faq-84,  Next: unnamed-faq-85,  Prev: unnamed-faq-83,  Up: FAQ
6570
6571unnamed-faq-84
6572==============
6573
6574     To: Patrick Krusenotto <krusenot@mac-info-link.de>
6575     Subject: Re: Problems with restarting flex-2.5.2-generated scanner
6576     In-reply-to: Your message of Thu, 24 Sep 1998 10:14:07 PDT.
6577     Date: Thu, 24 Sep 1998 23:28:43 PDT
6578     From: Vern Paxson <vern>
6579
6580     > I am using flex-2.5.2 and bison 1.25 for Solaris and I am desperately
6581     > trying to make my scanner restart with a new file after my parser stops
6582     > with a parse error. When my compiler restarts, the parser always
6583     > receives the token after the token (in the old file!) that caused the
6584     > parser error.
6585
6586     I suspect the problem is that your parser has read ahead in order
6587     to attempt to resolve an ambiguity, and when it's restarted it picks
6588     up with that token rather than reading a fresh one.  If you're using
6589     yacc, then the special "error" production can sometimes be used to
6590     consume tokens in an attempt to get the parser into a consistent state.
6591
6592     		Vern
6593
6594
6595File: flex.info,  Node: unnamed-faq-85,  Next: unnamed-faq-86,  Prev: unnamed-faq-84,  Up: FAQ
6596
6597unnamed-faq-85
6598==============
6599
6600     To: Henric Jungheim <junghelh@pe-nelson.com>
6601     Subject: Re: flex 2.5.4a
6602     In-reply-to: Your message of Tue, 27 Oct 1998 16:41:42 PST.
6603     Date: Tue, 27 Oct 1998 16:50:14 PST
6604     From: Vern Paxson <vern>
6605
6606     > This brings up a feature request:  How about a command line
6607     > option to specify the filename when reading from stdin?  That way one
6608     > doesn't need to create a temporary file in order to get the "#line"
6609     > directives to make sense.
6610
6611     Use -o combined with -t (per the man page description of -o).
6612
6613     > P.S., Is there any simple way to use non-blocking IO to parse multiple
6614     > streams?
6615
6616     Simple, no.
6617
6618     One approach might be to return a magic character on EWOULDBLOCK and
6619     have a rule
6620
6621     	.*<magic-character>	// put back .*, eat magic character
6622
6623     This is off the top of my head, not sure it'll work.
6624
6625     		Vern
6626
6627
6628File: flex.info,  Node: unnamed-faq-86,  Next: unnamed-faq-87,  Prev: unnamed-faq-85,  Up: FAQ
6629
6630unnamed-faq-86
6631==============
6632
6633     To: "Repko, Billy D" <billy.d.repko@intel.com>
6634     Subject: Re: Compiling scanners
6635     In-reply-to: Your message of Wed, 13 Jan 1999 10:52:47 PST.
6636     Date: Thu, 14 Jan 1999 00:25:30 PST
6637     From: Vern Paxson <vern>
6638
6639     > It appears that maybe it cannot find the lfl library.
6640
6641     The Makefile in the distribution builds it, so you should have it.
6642     It's exceedingly trivial, just a main() that calls yylex() and
6643     a yyrap() that always returns 1.
6644
6645     > %%
6646     >       \n      ++num_lines; ++num_chars;
6647     >       .       ++num_chars;
6648
6649     You can't indent your rules like this - that's where the errors are coming
6650     from.  Flex copies indented text to the output file, it's how you do things
6651     like
6652
6653     	int num_lines_seen = 0;
6654
6655     to declare local variables.
6656
6657     		Vern
6658
6659
6660File: flex.info,  Node: unnamed-faq-87,  Next: unnamed-faq-88,  Prev: unnamed-faq-86,  Up: FAQ
6661
6662unnamed-faq-87
6663==============
6664
6665     To: Erick Branderhorst <Erick.Branderhorst@asml.nl>
6666     Subject: Re: flex input buffer
6667     In-reply-to: Your message of Tue, 09 Feb 1999 13:53:46 PST.
6668     Date: Tue, 09 Feb 1999 21:03:37 PST
6669     From: Vern Paxson <vern>
6670
6671     > In the flex.skl file the size of the default input buffers is set.  Can you
6672     > explain why this size is set and why it is such a high number.
6673
6674     It's large to optimize performance when scanning large files.  You can
6675     safely make it a lot lower if needed.
6676
6677     		Vern
6678
6679
6680File: flex.info,  Node: unnamed-faq-88,  Next: unnamed-faq-90,  Prev: unnamed-faq-87,  Up: FAQ
6681
6682unnamed-faq-88
6683==============
6684
6685     To: "Guido Minnen" <guidomi@cogs.susx.ac.uk>
6686     Subject: Re: Flex error message
6687     In-reply-to: Your message of Wed, 24 Feb 1999 15:31:46 PST.
6688     Date: Thu, 25 Feb 1999 00:11:31 PST
6689     From: Vern Paxson <vern>
6690
6691     > I'm extending a larger scanner written in Flex and I keep running into
6692     > problems. More specifically, I get the error message:
6693     > "flex: input rules are too complicated (>= 32000 NFA states)"
6694
6695     Increase the definitions in flexdef.h for:
6696
6697     #define JAMSTATE -32766 /* marks a reference to the state that always j
6698     ams */
6699     #define MAXIMUM_MNS 31999
6700     #define BAD_SUBSCRIPT -32767
6701
6702     recompile everything, and it should all work.
6703
6704     		Vern
6705
6706
6707File: flex.info,  Node: unnamed-faq-90,  Next: unnamed-faq-91,  Prev: unnamed-faq-88,  Up: FAQ
6708
6709unnamed-faq-90
6710==============
6711
6712     To: "Dmitriy Goldobin" <gold@ems.chel.su>
6713     Subject: Re: FLEX trouble
6714     In-reply-to: Your message of Mon, 31 May 1999 18:44:49 PDT.
6715     Date: Tue, 01 Jun 1999 00:15:07 PDT
6716     From: Vern Paxson <vern>
6717
6718     >   I have a trouble with FLEX. Why rule "/*".*"*/" work properly,=20
6719     > but rule "/*"(.|\n)*"*/" don't work ?
6720
6721     The second of these will have to scan the entire input stream (because
6722     "(.|\n)*" matches an arbitrary amount of any text) in order to see if
6723     it ends with "*/", terminating the comment.  That potentially will overflow
6724     the input buffer.
6725
6726     >   More complex rule "/*"([^*]|(\*/[^/]))*"*/ give an error
6727     > 'unrecognized rule'.
6728
6729     You can't use the '/' operator inside parentheses.  It's not clear
6730     what "(a/b)*" actually means.
6731
6732     >   I now use workaround with state <comment>, but single-rule is
6733     > better, i think.
6734
6735     Single-rule is nice but will always have the problem of either setting
6736     restrictions on comments (like not allowing multi-line comments) and/or
6737     running the risk of consuming the entire input stream, as noted above.
6738
6739     		Vern
6740
6741
6742File: flex.info,  Node: unnamed-faq-91,  Next: unnamed-faq-92,  Prev: unnamed-faq-90,  Up: FAQ
6743
6744unnamed-faq-91
6745==============
6746
6747     Received: from mc-qout4.whowhere.com (mc-qout4.whowhere.com [209.185.123.18])
6748     	by ee.lbl.gov (8.9.3/8.9.3) with SMTP id IAA05100
6749     	for <vern@ee.lbl.gov>; Tue, 15 Jun 1999 08:56:06 -0700 (PDT)
6750     Received: from Unknown/Local ([?.?.?.?]) by my-deja.com; Tue Jun 15 08:55:43 1999
6751     To: vern@ee.lbl.gov
6752     Date: Tue, 15 Jun 1999 08:55:43 -0700
6753     From: "Aki Niimura" <neko@my-deja.com>
6754     Message-ID: <KNONDOHDOBGAEAAA@my-deja.com>
6755     Mime-Version: 1.0
6756     Cc:
6757     X-Sent-Mail: on
6758     Reply-To:
6759     X-Mailer: MailCity Service
6760     Subject: A question on flex C++ scanner
6761     X-Sender-Ip: 12.72.207.61
6762     Organization: My Deja Email  (http://www.my-deja.com:80)
6763     Content-Type: text/plain; charset=us-ascii
6764     Content-Transfer-Encoding: 7bit
6765
6766     Dear Dr. Paxon,
6767
6768     I have been using flex for years.
6769     It works very well on many projects.
6770     Most case, I used it to generate a scanner on C language.
6771     However, one project I needed to generate  a scanner
6772     on C++ lanuage. Thanks to your enhancement, flex did
6773     the job.
6774
6775     Currently, I'm working on enhancing my previous project.
6776     I need to deal with multiple input streams (recursive
6777     inclusion) in this scanner (C++).
6778     I did similar thing for another scanner (C) as you
6779     explained in your documentation.
6780
6781     The generated scanner (C++) has necessary methods:
6782     - switch_to_buffer(struct yy_buffer_state *b)
6783     - yy_create_buffer(istream *is, int sz)
6784     - yy_delete_buffer(struct yy_buffer_state *b)
6785
6786     However, I couldn't figure out how to access current
6787     buffer (yy_current_buffer).
6788
6789     yy_current_buffer is a protected member of yyFlexLexer.
6790     I can't access it directly.
6791     Then, I thought yy_create_buffer() with is = 0 might
6792     return current stream buffer. But it seems not as far
6793     as I checked the source. (flex 2.5.4)
6794
6795     I went through the Web in addition to Flex documentation.
6796     However, it hasn't been successful, so far.
6797
6798     It is not my intention to bother you, but, can you
6799     comment about how to obtain the current stream buffer?
6800
6801     Your response would be highly appreciated.
6802
6803     Best regards,
6804     Aki Niimura
6805
6806     --== Sent via Deja.com http://www.deja.com/ ==--
6807     Share what you know. Learn what you don't.
6808
6809
6810File: flex.info,  Node: unnamed-faq-92,  Next: unnamed-faq-93,  Prev: unnamed-faq-91,  Up: FAQ
6811
6812unnamed-faq-92
6813==============
6814
6815     To: neko@my-deja.com
6816     Subject: Re: A question on flex C++ scanner
6817     In-reply-to: Your message of Tue, 15 Jun 1999 08:55:43 PDT.
6818     Date: Tue, 15 Jun 1999 09:04:24 PDT
6819     From: Vern Paxson <vern>
6820
6821     > However, I couldn't figure out how to access current
6822     > buffer (yy_current_buffer).
6823
6824     Derive your own subclass from yyFlexLexer.
6825
6826     		Vern
6827
6828
6829File: flex.info,  Node: unnamed-faq-93,  Next: unnamed-faq-94,  Prev: unnamed-faq-92,  Up: FAQ
6830
6831unnamed-faq-93
6832==============
6833
6834     To: "Stones, Darren" <Darren.Stones@nectech.co.uk>
6835     Subject: Re: You're the man to see?
6836     In-reply-to: Your message of Wed, 23 Jun 1999 11:10:29 PDT.
6837     Date: Wed, 23 Jun 1999 09:01:40 PDT
6838     From: Vern Paxson <vern>
6839
6840     > I hope you can help me.  I am using Flex and Bison to produce an interpreted
6841     > language.  However all goes well until I try to implement an IF statement or
6842     > a WHILE.  I cannot get this to work as the parser parses all the conditions
6843     > eg. the TRUE and FALSE conditons to check for a rule match.  So I cannot
6844     > make a decision!!
6845
6846     You need to use the parser to build a parse tree (= abstract syntax trwee),
6847     and when that's all done you recursively evaluate the tree, binding variables
6848     to values at that time.
6849
6850     		Vern
6851
6852
6853File: flex.info,  Node: unnamed-faq-94,  Next: unnamed-faq-95,  Prev: unnamed-faq-93,  Up: FAQ
6854
6855unnamed-faq-94
6856==============
6857
6858     To: Petr Danecek <petr@ics.cas.cz>
6859     Subject: Re: flex - question
6860     In-reply-to: Your message of Mon, 28 Jun 1999 19:21:41 PDT.
6861     Date: Fri, 02 Jul 1999 16:52:13 PDT
6862     From: Vern Paxson <vern>
6863
6864     > file, it takes an enormous amount of time. It is funny, because the
6865     > source code has only 12 rules!!! I think it looks like an exponencial
6866     > growth.
6867
6868     Right, that's the problem - some patterns (those with a lot of
6869     ambiguity, where yours has because at any given time the scanner can
6870     be in the middle of all sorts of combinations of the different
6871     rules) blow up exponentially.
6872
6873     For your rules, there is an easy fix.  Change the ".*" that comes fater
6874     the directory name to "[^ ]*".  With that in place, the rules are no
6875     longer nearly so ambiguous, because then once one of the directories
6876     has been matched, no other can be matched (since they all require a
6877     leading blank).
6878
6879     If that's not an acceptable solution, then you can enter a start state
6880     to pick up the .*\n after each directory is matched.
6881
6882     Also note that for speed, you'll want to add a ".*" rule at the end,
6883     otherwise rules that don't match any of the patterns will be matched
6884     very slowly, a character at a time.
6885
6886     		Vern
6887
6888
6889File: flex.info,  Node: unnamed-faq-95,  Next: unnamed-faq-96,  Prev: unnamed-faq-94,  Up: FAQ
6890
6891unnamed-faq-95
6892==============
6893
6894     To: Tielman Koekemoer <tielman@spi.co.za>
6895     Subject: Re: Please help.
6896     In-reply-to: Your message of Thu, 08 Jul 1999 13:20:37 PDT.
6897     Date: Thu, 08 Jul 1999 08:20:39 PDT
6898     From: Vern Paxson <vern>
6899
6900     > I was hoping you could help me with my problem.
6901     >
6902     > I tried compiling (gnu)flex on a Solaris 2.4 machine
6903     > but when I ran make (after configure) I got an error.
6904     >
6905     > --------------------------------------------------------------
6906     > gcc -c -I. -I. -g -O parse.c
6907     > ./flex -t -p  ./scan.l >scan.c
6908     > sh: ./flex: not found
6909     > *** Error code 1
6910     > make: Fatal error: Command failed for target `scan.c'
6911     > -------------------------------------------------------------
6912     >
6913     > What's strange to me is that I'm only
6914     > trying to install flex now. I then edited the Makefile to
6915     > and changed where it says "FLEX = flex" to "FLEX = lex"
6916     > ( lex: the native Solaris one ) but then it complains about
6917     > the "-p" option. Is there any way I can compile flex without
6918     > using flex or lex?
6919     >
6920     > Thanks so much for your time.
6921
6922     You managed to step on the bootstrap sequence, which first copies
6923     initscan.c to scan.c in order to build flex.  Try fetching a fresh
6924     distribution from ftp.ee.lbl.gov.  (Or you can first try removing
6925     ".bootstrap" and doing a make again.)
6926
6927     		Vern
6928
6929
6930File: flex.info,  Node: unnamed-faq-96,  Next: unnamed-faq-97,  Prev: unnamed-faq-95,  Up: FAQ
6931
6932unnamed-faq-96
6933==============
6934
6935     To: Tielman Koekemoer <tielman@spi.co.za>
6936     Subject: Re: Please help.
6937     In-reply-to: Your message of Fri, 09 Jul 1999 09:16:14 PDT.
6938     Date: Fri, 09 Jul 1999 00:27:20 PDT
6939     From: Vern Paxson <vern>
6940
6941     > First I removed .bootstrap (and ran make) - no luck. I downloaded the
6942     > software but I still have the same problem. Is there anything else I
6943     > could try.
6944
6945     Try:
6946
6947     	cp initscan.c scan.c
6948     	touch scan.c
6949     	make scan.o
6950
6951     If this last tries to first build scan.c from scan.l using ./flex, then
6952     your "make" is broken, in which case compile scan.c to scan.o by hand.
6953
6954     		Vern
6955
6956
6957File: flex.info,  Node: unnamed-faq-97,  Next: unnamed-faq-98,  Prev: unnamed-faq-96,  Up: FAQ
6958
6959unnamed-faq-97
6960==============
6961
6962     To: Sumanth Kamenani <skamenan@crl.nmsu.edu>
6963     Subject: Re: Error
6964     In-reply-to: Your message of Mon, 19 Jul 1999 23:08:41 PDT.
6965     Date: Tue, 20 Jul 1999 00:18:26 PDT
6966     From: Vern Paxson <vern>
6967
6968     > I am getting a compilation error. The error is given as "unknown symbol- yylex".
6969
6970     The parser relies on calling yylex(), but you're instead using the C++ scanning
6971     class, so you need to supply a yylex() "glue" function that calls an instance
6972     scanner of the scanner (e.g., "scanner->yylex()").
6973
6974     		Vern
6975
6976
6977File: flex.info,  Node: unnamed-faq-98,  Next: unnamed-faq-99,  Prev: unnamed-faq-97,  Up: FAQ
6978
6979unnamed-faq-98
6980==============
6981
6982     To: daniel@synchrods.synchrods.COM (Daniel Senderowicz)
6983     Subject: Re: lex
6984     In-reply-to: Your message of Mon, 22 Nov 1999 11:19:04 PST.
6985     Date: Tue, 23 Nov 1999 15:54:30 PST
6986     From: Vern Paxson <vern>
6987
6988     Well, your problem is the
6989
6990     switch (yybgin-yysvec-1) {      /* witchcraft */
6991
6992     at the beginning of lex rules.  "witchcraft" == "non-portable".  It's
6993     assuming knowledge of the AT&T lex's internal variables.
6994
6995     For flex, you can probably do the equivalent using a switch on YYSTATE.
6996
6997     		Vern
6998
6999
7000File: flex.info,  Node: unnamed-faq-99,  Next: unnamed-faq-100,  Prev: unnamed-faq-98,  Up: FAQ
7001
7002unnamed-faq-99
7003==============
7004
7005     To: archow@hss.hns.com
7006     Subject: Re: Regarding distribution of flex and yacc based grammars
7007     In-reply-to: Your message of Sun, 19 Dec 1999 17:50:24 +0530.
7008     Date: Wed, 22 Dec 1999 01:56:24 PST
7009     From: Vern Paxson <vern>
7010
7011     > When we provide the customer with an object code distribution, is it
7012     > necessary for us to provide source
7013     > for the generated C files from flex and bison since they are generated by
7014     > flex and bison ?
7015
7016     For flex, no.  I don't know what the current state of this is for bison.
7017
7018     > Also, is there any requrirement for us to neccessarily  provide source for
7019     > the grammar files which are fed into flex and bison ?
7020
7021     Again, for flex, no.
7022
7023     See the file "COPYING" in the flex distribution for the legalese.
7024
7025     		Vern
7026
7027
7028File: flex.info,  Node: unnamed-faq-100,  Next: unnamed-faq-101,  Prev: unnamed-faq-99,  Up: FAQ
7029
7030unnamed-faq-100
7031===============
7032
7033     To: Martin Gallwey <gallweym@hyperion.moe.ul.ie>
7034     Subject: Re: Flex, and self referencing rules
7035     In-reply-to: Your message of Sun, 20 Feb 2000 01:01:21 PST.
7036     Date: Sat, 19 Feb 2000 18:33:16 PST
7037     From: Vern Paxson <vern>
7038
7039     > However, I do not use unput anywhere. I do use self-referencing
7040     > rules like this:
7041     >
7042     > UnaryExpr               ({UnionExpr})|("-"{UnaryExpr})
7043
7044     You can't do this - flex is *not* a parser like yacc (which does indeed
7045     allow recursion), it is a scanner that's confined to regular expressions.
7046
7047     		Vern
7048
7049
7050File: flex.info,  Node: unnamed-faq-101,  Next: What is the difference between YYLEX_PARAM and YY_DECL?,  Prev: unnamed-faq-100,  Up: FAQ
7051
7052unnamed-faq-101
7053===============
7054
7055     To: slg3@lehigh.edu (SAMUEL L. GULDEN)
7056     Subject: Re: Flex problem
7057     In-reply-to: Your message of Thu, 02 Mar 2000 12:29:04 PST.
7058     Date: Thu, 02 Mar 2000 23:00:46 PST
7059     From: Vern Paxson <vern>
7060
7061     If this is exactly your program:
7062
7063     > digit [0-9]
7064     > digits {digit}+
7065     > whitespace [ \t\n]+
7066     >
7067     > %%
7068     > "[" { printf("open_brac\n");}
7069     > "]" { printf("close_brac\n");}
7070     > "+" { printf("addop\n");}
7071     > "*" { printf("multop\n");}
7072     > {digits} { printf("NUMBER = %s\n", yytext);}
7073     > whitespace ;
7074
7075     then the problem is that the last rule needs to be "{whitespace}" !
7076
7077     		Vern
7078
7079
7080File: flex.info,  Node: What is the difference between YYLEX_PARAM and YY_DECL?,  Next: Why do I get "conflicting types for yylex" error?,  Prev: unnamed-faq-101,  Up: FAQ
7081
7082What is the difference between YYLEX_PARAM and YY_DECL?
7083=======================================================
7084
7085YYLEX_PARAM is not a flex symbol. It is for Bison. It tells Bison to
7086pass extra params when it calls yylex() from the parser.
7087
7088   YY_DECL is the Flex declaration of yylex. The default is similar to
7089this:
7090
7091     #define int yy_lex ()
7092
7093
7094File: flex.info,  Node: Why do I get "conflicting types for yylex" error?,  Next: How do I access the values set in a Flex action from within a Bison action?,  Prev: What is the difference between YYLEX_PARAM and YY_DECL?,  Up: FAQ
7095
7096Why do I get "conflicting types for yylex" error?
7097=================================================
7098
7099This is a compiler error regarding a generated Bison parser, not a Flex
7100scanner.  It means you need a prototype of yylex() in the top of the
7101Bison file.  Be sure the prototype matches YY_DECL.
7102
7103
7104File: flex.info,  Node: How do I access the values set in a Flex action from within a Bison action?,  Prev: Why do I get "conflicting types for yylex" error?,  Up: FAQ
7105
7106How do I access the values set in a Flex action from within a Bison action?
7107===========================================================================
7108
7109With $1, $2, $3, etc. These are called "Semantic Values" in the Bison
7110manual.  See *note Top: (bison)Top.
7111
7112
7113File: flex.info,  Node: Appendices,  Next: Indices,  Prev: FAQ,  Up: Top
7114
7115Appendix A Appendices
7116*********************
7117
7118* Menu:
7119
7120* Makefiles and Flex::
7121* Bison Bridge::
7122* M4 Dependency::
7123* Common Patterns::
7124
7125
7126File: flex.info,  Node: Makefiles and Flex,  Next: Bison Bridge,  Prev: Appendices,  Up: Appendices
7127
7128A.1 Makefiles and Flex
7129======================
7130
7131In this appendix, we provide tips for writing Makefiles to build your
7132scanners.
7133
7134   In a traditional build environment, we say that the `.c' files are
7135the sources, and the `.o' files are the intermediate files. When using
7136`flex', however, the `.l' files are the sources, and the generated `.c'
7137files (along with the `.o' files) are the intermediate files.  This
7138requires you to carefully plan your Makefile.
7139
7140   Modern `make' programs understand that `foo.l' is intended to
7141generate `lex.yy.c' or `foo.c', and will behave accordingly(1)(2).  The
7142following Makefile does not explicitly instruct `make' how to build
7143`foo.c' from `foo.l'. Instead, it relies on the implicit rules of the
7144`make' program to build the intermediate file, `scan.c':
7145
7146         # Basic Makefile -- relies on implicit rules
7147         # Creates "myprogram" from "scan.l" and "myprogram.c"
7148         #
7149         LEX=flex
7150         myprogram: scan.o myprogram.o
7151         scan.o: scan.l
7152
7153   For simple cases, the above may be sufficient. For other cases, you
7154may have to explicitly instruct `make' how to build your scanner.  The
7155following is an example of a Makefile containing explicit rules:
7156
7157         # Basic Makefile -- provides explicit rules
7158         # Creates "myprogram" from "scan.l" and "myprogram.c"
7159         #
7160         LEX=flex
7161         myprogram: scan.o myprogram.o
7162                 $(CC) -o $@  $(LDFLAGS) $^
7163
7164         myprogram.o: myprogram.c
7165                 $(CC) $(CPPFLAGS) $(CFLAGS) -o $@ -c $^
7166
7167         scan.o: scan.c
7168                 $(CC) $(CPPFLAGS) $(CFLAGS) -o $@ -c $^
7169
7170         scan.c: scan.l
7171                 $(LEX) $(LFLAGS) -o $@ $^
7172
7173         clean:
7174                 $(RM) *.o scan.c
7175
7176   Notice in the above example that `scan.c' is in the `clean' target.
7177This is because we consider the file `scan.c' to be an intermediate
7178file.
7179
7180   Finally, we provide a realistic example of a `flex' scanner used
7181with a `bison' parser(3).  There is a tricky problem we have to deal
7182with. Since a `flex' scanner will typically include a header file
7183(e.g., `y.tab.h') generated by the parser, we need to be sure that the
7184header file is generated BEFORE the scanner is compiled. We handle this
7185case in the following example:
7186
7187         # Makefile example -- scanner and parser.
7188         # Creates "myprogram" from "scan.l", "parse.y", and "myprogram.c"
7189         #
7190         LEX     = flex
7191         YACC    = bison -y
7192         YFLAGS  = -d
7193         objects = scan.o parse.o myprogram.o
7194
7195         myprogram: $(objects)
7196         scan.o: scan.l parse.c
7197         parse.o: parse.y
7198         myprogram.o: myprogram.c
7199
7200   In the above example, notice the line,
7201
7202         scan.o: scan.l parse.c
7203
7204   , which lists the file `parse.c' (the generated parser) as a
7205dependency of `scan.o'. We want to ensure that the parser is created
7206before the scanner is compiled, and the above line seems to do the
7207trick. Feel free to experiment with your specific implementation of
7208`make'.
7209
7210   For more details on writing Makefiles, see *note Top: (make)Top.
7211
7212   ---------- Footnotes ----------
7213
7214   (1) GNU `make' and GNU `automake' are two such programs that provide
7215implicit rules for flex-generated scanners.
7216
7217   (2) GNU `automake' may generate code to execute flex in
7218lex-compatible mode, or to stdout. If this is not what you want, then
7219you should provide an explicit rule in your Makefile.am
7220
7221   (3) This example also applies to yacc parsers.
7222
7223
7224File: flex.info,  Node: Bison Bridge,  Next: M4 Dependency,  Prev: Makefiles and Flex,  Up: Appendices
7225
7226A.2 C Scanners with Bison Parsers
7227=================================
7228
7229This section describes the `flex' features useful when integrating
7230`flex' with `GNU bison'(1).  Skip this section if you are not using
7231`bison' with your scanner.  Here we discuss only the `flex' half of the
7232`flex' and `bison' pair.  We do not discuss `bison' in any detail.  For
7233more information about generating `bison' parsers, see *note Top:
7234(bison)Top.
7235
7236   A compatible `bison' scanner is generated by declaring `%option
7237bison-bridge' or by supplying `--bison-bridge' when invoking `flex'
7238from the command line.  This instructs `flex' that the macro `yylval'
7239may be used. The data type for `yylval', `YYSTYPE', is typically
7240defined in a header file, included in section 1 of the `flex' input
7241file.  For a list of functions and macros available, *Note
7242bison-functions::.
7243
7244   The declaration of yylex becomes,
7245
7246           int yylex ( YYSTYPE * lvalp, yyscan_t scanner );
7247
7248   If `%option bison-locations' is specified, then the declaration
7249becomes,
7250
7251           int yylex ( YYSTYPE * lvalp, YYLTYPE * llocp, yyscan_t scanner );
7252
7253   Note that the macros `yylval' and `yylloc' evaluate to pointers.
7254Support for `yylloc' is optional in `bison', so it is optional in
7255`flex' as well. The following is an example of a `flex' scanner that is
7256compatible with `bison'.
7257
7258         /* Scanner for "C" assignment statements... sort of. */
7259         %{
7260         #include "y.tab.h"  /* Generated by bison. */
7261         %}
7262
7263         %option bison-bridge bison-locations
7264         %
7265
7266         [[:digit:]]+  { yylval->num = atoi(yytext);   return NUMBER;}
7267         [[:alnum:]]+  { yylval->str = strdup(yytext); return STRING;}
7268         "="|";"       { return yytext[0];}
7269         .  {}
7270         %
7271
7272   As you can see, there really is no magic here. We just use `yylval'
7273as we would any other variable. The data type of `yylval' is generated
7274by `bison', and included in the file `y.tab.h'. Here is the
7275corresponding `bison' parser:
7276
7277         /* Parser to convert "C" assignments to lisp. */
7278         %{
7279         /* Pass the argument to yyparse through to yylex. */
7280         #define YYPARSE_PARAM scanner
7281         #define YYLEX_PARAM   scanner
7282         %}
7283         %locations
7284         %pure_parser
7285         %union {
7286             int num;
7287             char* str;
7288         }
7289         %token <str> STRING
7290         %token <num> NUMBER
7291         %%
7292         assignment:
7293             STRING '=' NUMBER ';' {
7294                 printf( "(setf %s %d)", $1, $3 );
7295            }
7296         ;
7297
7298   ---------- Footnotes ----------
7299
7300   (1) The features described here are purely optional, and are by no
7301means the only way to use flex with bison.  We merely provide some glue
7302to ease development of your parser-scanner pair.
7303
7304
7305File: flex.info,  Node: M4 Dependency,  Next: Common Patterns,  Prev: Bison Bridge,  Up: Appendices
7306
7307A.3 M4 Dependency
7308=================
7309
7310The macro processor `m4'(1) must be installed wherever flex is
7311installed.  `flex' invokes `m4', found by searching the directories in
7312the `PATH' environment variable. Any code you place in section 1 or in
7313the actions will be sent through m4. Please follow these rules to
7314protect your code from unwanted `m4' processing.
7315
7316   * Do not use symbols that begin with, `m4_', such as, `m4_define',
7317     or `m4_include', since those are reserved for `m4' macro names. If
7318     for some reason you need m4_ as a prefix, use a preprocessor
7319     #define to get your symbol past m4 unmangled.
7320
7321   * Do not use the strings `[[' or `]]' anywhere in your code. The
7322     former is not valid in C, except within comments and strings, but
7323     the latter is valid in code such as `x[y[z]]'. The solution is
7324     simple. To get the literal string `"]]"', use `"]""]"'. To get the
7325     array notation `x[y[z]]', use `x[y[z] ]'. Flex will attempt to
7326     detect these sequences in user code, and escape them. However,
7327     it's best to avoid this complexity where possible, by removing
7328     such sequences from your code.
7329
7330
7331   `m4' is only required at the time you run `flex'. The generated
7332scanner is ordinary C or C++, and does _not_ require `m4'.
7333
7334   ---------- Footnotes ----------
7335
7336   (1) The use of m4 is subject to change in future revisions of flex.
7337It is not part of the public API of flex. Do not depend on it.
7338
7339
7340File: flex.info,  Node: Common Patterns,  Prev: M4 Dependency,  Up: Appendices
7341
7342A.4 Common Patterns
7343===================
7344
7345This appendix provides examples of common regular expressions you might
7346use in your scanner.
7347
7348* Menu:
7349
7350* Numbers::
7351* Identifiers::
7352* Quoted Constructs::
7353* Addresses::
7354
7355
7356File: flex.info,  Node: Numbers,  Next: Identifiers,  Up: Common Patterns
7357
7358A.4.1 Numbers
7359-------------
7360
7361C99 decimal constant
7362     `([[:digit:]]{-}[0])[[:digit:]]*'
7363
7364C99 hexadecimal constant
7365     `0[xX][[:xdigit:]]+'
7366
7367C99 octal constant
7368     `0[01234567]*'
7369
7370C99 floating point constant
7371      {dseq}      ([[:digit:]]+)
7372      {dseq_opt}  ([[:digit:]]*)
7373      {frac}      (({dseq_opt}"."{dseq})|{dseq}".")
7374      {exp}       ([eE][+-]?{dseq})
7375      {exp_opt}   ({exp}?)
7376      {fsuff}     [flFL]
7377      {fsuff_opt} ({fsuff}?)
7378      {hpref}     (0[xX])
7379      {hdseq}     ([[:xdigit:]]+)
7380      {hdseq_opt} ([[:xdigit:]]*)
7381      {hfrac}     (({hdseq_opt}"."{hdseq})|({hdseq}"."))
7382      {bexp}      ([pP][+-]?{dseq})
7383      {dfc}       (({frac}{exp_opt}{fsuff_opt})|({dseq}{exp}{fsuff_opt}))
7384      {hfc}       (({hpref}{hfrac}{bexp}{fsuff_opt})|({hpref}{hdseq}{bexp}{fsuff_opt}))
7385
7386      {c99_floating_point_constant}  ({dfc}|{hfc})
7387
7388     See C99 section 6.4.4.2 for the gory details.
7389
7390
7391
7392File: flex.info,  Node: Identifiers,  Next: Quoted Constructs,  Prev: Numbers,  Up: Common Patterns
7393
7394A.4.2 Identifiers
7395-----------------
7396
7397C99 Identifier
7398     ucn        ((\\u([[:xdigit:]]{4}))|(\\U([[:xdigit:]]{8})))
7399     nondigit    [_[:alpha:]]
7400     c99_id     ([_[:alpha:]]|{ucn})([_[:alnum:]]|{ucn})*
7401
7402     Technically, the above pattern does not encompass all possible C99
7403     identifiers, since C99 allows for "implementation-defined"
7404     characters. In practice, C compilers follow the above pattern,
7405     with the addition of the `$' character.
7406
7407UTF-8 Encoded Unicode Code Point
7408     [\x09\x0A\x0D\x20-\x7E]|[\xC2-\xDF][\x80-\xBF]|\xE0[\xA0-\xBF][\x80-\xBF]|[\xE1-\xEC\xEE\xEF]([\x80-\xBF]{2})|\xED[\x80-\x9F][\x80-\xBF]|\xF0[\x90-\xBF]([\x80-\xBF]{2})|[\xF1-\xF3]([\x80-\xBF]{3})|\xF4[\x80-\x8F]([\x80-\xBF]{2})
7409
7410
7411
7412File: flex.info,  Node: Quoted Constructs,  Next: Addresses,  Prev: Identifiers,  Up: Common Patterns
7413
7414A.4.3 Quoted Constructs
7415-----------------------
7416
7417C99 String Literal
7418     `L?\"([^\"\\\n]|(\\['\"?\\abfnrtv])|(\\([0123456]{1,3}))|(\\x[[:xdigit:]]+)|(\\u([[:xdigit:]]{4}))|(\\U([[:xdigit:]]{8})))*\"'
7419
7420C99 Comment
7421     `("/*"([^*]|"*"[^/])*"*/")|("/"(\\\n)*"/"[^\n]*)'
7422
7423     Note that in C99, a `//'-style comment may be split across lines,
7424     and, contrary to popular belief, does not include the trailing
7425     `\n' character.
7426
7427     A better way to scan `/* */' comments is by line, rather than
7428     matching possibly huge comments all at once. This will allow you
7429     to scan comments of unlimited length, as long as line breaks
7430     appear at sane intervals. This is also more efficient when used
7431     with automatic line number processing. *Note option-yylineno::.
7432
7433     <INITIAL>{
7434         "/*"      BEGIN(COMMENT);
7435     }
7436     <COMMENT>{
7437         "*/"      BEGIN(0);
7438         [^*\n]+   ;
7439         "*"[^/]   ;
7440         \n        ;
7441     }
7442
7443
7444
7445File: flex.info,  Node: Addresses,  Prev: Quoted Constructs,  Up: Common Patterns
7446
7447A.4.4 Addresses
7448---------------
7449
7450IPv4 Address
7451     dec-octet     [0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5]
7452     IPv4address   {dec-octet}\.{dec-octet}\.{dec-octet}\.{dec-octet}
7453
7454IPv6 Address
7455     h16           [0-9A-Fa-f]{1,4}
7456     ls32          {h16}:{h16}|{IPv4address}
7457     IPv6address   ({h16}:){6}{ls32}|
7458                   ::({h16}:){5}{ls32}|
7459                   ({h16})?::({h16}:){4}{ls32}|
7460                   (({h16}:){0,1}{h16})?::({h16}:){3}{ls32}|
7461                   (({h16}:){0,2}{h16})?::({h16}:){2}{ls32}|
7462                   (({h16}:){0,3}{h16})?::{h16}:{ls32}|
7463                   (({h16}:){0,4}{h16})?::{ls32}|
7464                   (({h16}:){0,5}{h16})?::{h16}|
7465                   (({h16}:){0,6}{h16})?::
7466
7467     See RFC 2373 (http://www.ietf.org/rfc/rfc2373.txt) for details.
7468     Note that you have to fold the definition of `IPv6address' into one
7469     line and that it also matches the "unspecified address" "::".
7470
7471URI
7472     `(([^:/?#]+):)?("//"([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?'
7473
7474     This pattern is nearly useless, since it allows just about any
7475     character to appear in a URI, including spaces and control
7476     characters.  See RFC 2396 (http://www.ietf.org/rfc/rfc2396.txt)
7477     for details.
7478
7479
7480
7481File: flex.info,  Node: Indices,  Prev: Appendices,  Up: Top
7482
7483Indices
7484*******
7485
7486* Menu:
7487
7488* Concept Index::
7489* Index of Functions and Macros::
7490* Index of Variables::
7491* Index of Data Types::
7492* Index of Hooks::
7493* Index of Scanner Options::
7494
7495