1NAME
2 YAPE::Regex - Yet Another Parser/Extractor for Regular
3 Expressions
4
5SYNOPSIS
6 use YAPE::Regex;
7 use strict;
8
9 my $regex = qr/reg(ular\s+)?exp?(ression)?/i;
10 my $parser = YAPE::Regex->new($regex);
11
12 # here is the tokenizing part
13 while (my $chunk = $parser->next) {
14 # ...
15 }
16
17`YAPE' MODULES
18 The `YAPE' hierarchy of modules is an attempt at a unified means
19 of parsing and extracting content. It attempts to maintain a
20 generic interface, to promote simplicity and reusability. The
21 API is powerful, yet simple. The modules do tokenization (which
22 can be intercepted) and build trees, so that extraction of
23 specific nodes is doable.
24
25DESCRIPTION
26 This module is yet another (?) parser and tree-builder for Perl
27 regular expressions. It builds a tree out of a regex, but at the
28 moment, the extent of the extraction tool for the tree is quite
29 limited (see the section on "Extracting Sections"). However, the
30 tree can be useful to extension modules.
31
32USAGE
33 In addition to the base class, `YAPE::Regex', there is the
34 auxiliary class `YAPE::Regex::Element' (common to all `YAPE'
35 base classes) that holds the individual nodes' classes. There is
36 documentation for the node classes in that module's
37 documentation.
38
39 Methods for `YAPE::Regex'
40
41 * `use YAPE::Regex;'
42 * `use YAPE::Regex qw( MyExt::Mod );'
43 If supplied no arguments, the module is loaded normally, and
44 the node classes are given the proper inheritence (from
45 `YAPE::Regex::Element'). If you supply a module (or list of
46 modules), `import' will automatically include them (if
47 needed) and set up *their* node classes with the proper
48 inheritence -- that is, it will append `YAPE::Regex' to
49 `@MyExt::Mod::ISA', and `YAPE::Regex::xxx' to each node
50 class's `@ISA' (where `xxx' is the name of the specific node
51 class).
52
53 package MyExt::Mod;
54 use YAPE::Regex 'MyExt::Mod';
55
56 # does the work of:
57 # @MyExt::Mod::ISA = 'YAPE::Regex'
58 # @MyExt::Mod::text::ISA = 'YAPE::Regex::text'
59 # ...
60
61 * `my $p = YAPE::Regex->new($REx);'
62 Creates a `YAPE::Regex' object, using the contents of `$REx'
63 as a regular expression. The `new' method will *attempt* to
64 convert `$REx' to a compiled regex (using `qr//') if `$REx'
65 isn't already one. If there is an error in the regex, this
66 will fail, but the parser will pretend it was ok. It will
67 then report the bad token when it gets to it, in the course
68 of parsing.
69
70 * `my $text = $p->chunk($len);'
71 Returns the next `$len' characters in the input string;
72 `$len' defaults to 30 characters. This is useful for
73 figuring out why a parsing error occurs.
74
75 * `my $done = $p->done;'
76 Returns true if the parser is done with the input string,
77 and false otherwise.
78
79 * `my $errstr = $p->error;'
80 Returns the parser error message.
81
82 * `my $backref = $p->extract;'
83 Returns a code reference that returns the next back-
84 reference in the regex. For more information on enhancements
85 in upcoming versions of this module, check the section on
86 "Extracting Sections".
87
88 * `my $node = $p->display(...);'
89 Returns a string representation of the entire content. It
90 calls the `parse' method in case there is more data that has
91 not yet been parsed. This calls the `fullstring' method on
92 the root nodes. Check the `YAPE::Regex::Element' docs on the
93 arguments to `fullstring'.
94
95 * `my $node = $p->next;'
96 Returns the next token, or `undef' if there is no valid
97 token. There will be an error message (accessible with the
98 `error' method) if there was a problem in the parsing.
99
100 * `my $node = $p->parse;'
101 Calls `next' until all the data has been parsed.
102
103 * `my $node = $p->root;'
104 Returns the root node of the tree structure.
105
106 * `my $state = $p->state;'
107 Returns the current state of the parser. It is one of the
108 following values: `alt', `anchor', `any', `backref',
109 `capture(N)', `Cchar', `class', `close', `code', `comment',
110 `cond(TYPE)', `ctrl', `cut', `done', `error', `flags',
111 `group', `hex', `later', `lookahead(neg|pos)',
112 `lookbehind(neg|pos)', `macro', `named', `oct', `slash',
113 `text', and `utf8hex'.
114
115 For `capture(N)', *N* will be the number the captured
116 pattern represents.
117
118 For `cond(TYPE)', *TYPE* will either be a number
119 representing the back-reference that the conditional depends
120 on, or the string `assert'.
121
122 For `lookahead' and `lookbehind', one of `neg' and `pos'
123 will be there, depending on the type of assertion.
124
125 * `my $node = $p->top;'
126 Synonymous to `root'.
127
128 Extracting Sections
129
130 While extraction of nodes is the goal of the `YAPE' modules, the
131 author is at a loss for words as to what needs to be extracted
132 from a regex. At the current time, all the `extract' method does
133 is allow you access to the regex's set of back-references:
134
135 my $extor = $parser->extract;
136 while (my $backref = $extor->()) {
137 # ...
138 }
139
140 `japhy' is very open to suggestions as to the approach to node
141 extraction (in how the API should look, in addition to what
142 should be proffered). Preliminary ideas include extraction
143 keywords like the output of -Dr (or the `re' module's `debug'
144 option).
145
146EXTENSIONS
147 * `YAPE::Regex::Explain' 3.00
148 Presents an explanation of a regular expression, node by
149 node.
150
151 * `YAPE::Regex::Reverse' (Not released)
152 Reverses the nodes of a regular expression.
153
154TO DO
155 This is a listing of things to add to future versions of this
156 module.
157
158 API
159
160 * Create a robust `extract' method
161 Open to suggestions.
162
163BUGS
164 Following is a list of known or reported bugs.
165
166 Pending
167
168 * `use charnames ':full''
169 To understand `\N{...}' properly, you must be using 5.6.0 or
170 higher. However, the parser only knows how to resolve full
171 names (those made using `use charnames ':full''). There
172 might be an option in the future to specify a class name.
173
174SUPPORT
175 Visit `YAPE''s web site at http://www.pobox.com/~japhy/YAPE/.
176
177SEE ALSO
178 The `YAPE::Regex::Element' documentation, for information on the
179 node classes. Also, `Text::Balanced', Damian Conway's excellent
180 module, used for the matching of `(?{ ... })' and `(??{ ... })'
181 blocks.
182
183AUTHOR
184 Jeff "japhy" Pinyan
185 CPAN ID: PINYAN
186 japhy@pobox.com
187 http://www.pobox.com/~japhy/
188
189