12009-10-09:
2
3At the moment, nqp-rx is configured to build an executable called
4"p6regex", which is a Perl 6 regular expression compiler for Parrot.
5Yes, Parrot already has a Perl 6 regular expression compiler (PGE);
6this one is different in that it will be self-hosting and based on
7PAST/POST generation.
8
9Building the system is similar to building Rakudo:
10
11    $ perl Configure.pl --gen-parrot
12    $ make
13
14This builds a "p6regex" executable, which can be used to view
15the results of compiling various regular expressions.  Like Rakudo,
16p6regex accepts --target=parse, --target=past, and --target=pir, to
17see the results of compiling various regular expressions.  For example,
18
19    $ ./p6regex --target=parse
20    > abcde*f
21
22will display the parse tree for the regular expression "abcde*f".  Similarly,
23
24    $ ./p6regex --target=pir
25    > abcde*f
26
27will display the PIR subroutine generated to match the regular
28expression "abcde*f".
29
30At the moment there's not an easy command-line tool for doing matches
31against the compiled regular expression; that should be coming soon
32as nqp-rx gets a little farther along.
33
34The test suite can be run via "make test" -- because the new regex
35engine is incomplete, we expect quite a few failures (which should
36diminish as we add new features to the project).
37
38The key files for the p6regex compiler are:
39
40    src/Regex/P6Regex/Grammar.pm     # regular expression parse grammar
41    src/Regex/P6Regex/Actions.pm     # actions to create PAST from parse
42
43
44Things that work (2009-10-15, 06h16 UTC):
45
46* bare literal strings
47* quantifiers  *, +, ?, *:, +:, ?:, *?, +?, ??, *!, +!, ?!
48* dot
49* \d, \s, \w, \n, \D, \S, \W, \N
50* brackets for grouping
51* alternation (|| works, | cheats)
52* anchors ^, ^^, $, $$, <<, >>
53* backslash-quoted punctuation
54* #-comments (mostly)
55* obsolete backslash sequences \A \Z \z \Q
56* \b, \B, \e, \E, \f, \F, \h, \H, \r, \R, \t, \T, \v, \V
57* enumerated character lists <[ab0..9]>
58* character class compositions <+foo-bar+[xyz]>
59* quantified by numeric range
60* quantified by separator
61* capturing subrules
62* capturing subpatterns
63* capture aliases
64* cut rule
65* Match objects created lazily
66* built-in methods <alpha> <digit> <xdigit> <ws> <wb> etc.
67* :ignorecase
68* :sigspace
69* :ratchet
70* single-quoted literals (without quotes)
71