12009-10-09: 2 3At the moment, nqp-rx is configured to build an executable called 4"p6regex", which is a Perl 6 regular expression compiler for Parrot. 5Yes, Parrot already has a Perl 6 regular expression compiler (PGE); 6this one is different in that it will be self-hosting and based on 7PAST/POST generation. 8 9Building the system is similar to building Rakudo: 10 11 $ perl Configure.pl --gen-parrot 12 $ make 13 14This builds a "p6regex" executable, which can be used to view 15the results of compiling various regular expressions. Like Rakudo, 16p6regex accepts --target=parse, --target=past, and --target=pir, to 17see the results of compiling various regular expressions. For example, 18 19 $ ./p6regex --target=parse 20 > abcde*f 21 22will display the parse tree for the regular expression "abcde*f". Similarly, 23 24 $ ./p6regex --target=pir 25 > abcde*f 26 27will display the PIR subroutine generated to match the regular 28expression "abcde*f". 29 30At the moment there's not an easy command-line tool for doing matches 31against the compiled regular expression; that should be coming soon 32as nqp-rx gets a little farther along. 33 34The test suite can be run via "make test" -- because the new regex 35engine is incomplete, we expect quite a few failures (which should 36diminish as we add new features to the project). 37 38The key files for the p6regex compiler are: 39 40 src/Regex/P6Regex/Grammar.pm # regular expression parse grammar 41 src/Regex/P6Regex/Actions.pm # actions to create PAST from parse 42 43 44Things that work (2009-10-15, 06h16 UTC): 45 46* bare literal strings 47* quantifiers *, +, ?, *:, +:, ?:, *?, +?, ??, *!, +!, ?! 48* dot 49* \d, \s, \w, \n, \D, \S, \W, \N 50* brackets for grouping 51* alternation (|| works, | cheats) 52* anchors ^, ^^, $, $$, <<, >> 53* backslash-quoted punctuation 54* #-comments (mostly) 55* obsolete backslash sequences \A \Z \z \Q 56* \b, \B, \e, \E, \f, \F, \h, \H, \r, \R, \t, \T, \v, \V 57* enumerated character lists <[ab0..9]> 58* character class compositions <+foo-bar+[xyz]> 59* quantified by numeric range 60* quantified by separator 61* capturing subrules 62* capturing subpatterns 63* capture aliases 64* cut rule 65* Match objects created lazily 66* built-in methods <alpha> <digit> <xdigit> <ws> <wb> etc. 67* :ignorecase 68* :sigspace 69* :ratchet 70* single-quoted literals (without quotes) 71