• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

syntax/H31-Mar-2017-5,3814,082

.gitignoreH A D31-Mar-2017271 2520

.travis.ymlH A D31-Mar-201733 54

ATTRIBH A D31-Mar-20176.8 KiB134101

LICENSEH A D31-Mar-20171 KiB2217

README.mdH A D31-Mar-20174.1 KiB6852

match.goH A D31-Mar-20178.6 KiB348227

regexp.goH A D31-Mar-20179.6 KiB358239

regexp_mono_test.goH A D31-Mar-201776.8 KiB1,0821,032

regexp_options_test.goH A D31-Mar-20171.1 KiB4438

regexp_pcre_test.goH A D31-Mar-20179.5 KiB410294

regexp_performance_test.goH A D31-Mar-20178.6 KiB302259

regexp_test.goH A D31-Mar-201723 KiB789637

replace.goH A D31-Mar-20174 KiB178143

replace_test.goH A D31-Mar-20174.6 KiB173156

rtl_test.goH A D31-Mar-20171.3 KiB5344

runner.goH A D31-Mar-201734.3 KiB1,6221,151

testoutput1H A D31-Mar-2017105.8 KiB7,0626,228

README.md

1# regexp2 - full featured regular expressions for Go
2Regexp2 is a feature-rich RegExp engine for Go.  It doesn't have constant time guarantees like the built-in `regexp` package, but it allows backtracking and is compatible with Perl5 and .NET.  You'll likely be better off with the RE2 engine from the `regexp` package and should only use this if you need to write very complex patterns or require compatibility with .NET.
3
4## Basis of the engine
5The engine is ported from the .NET framework's System.Text.RegularExpressions.Regex engine.  That engine was open sourced in 2015 under the MIT license.  There are some fundamental differences between .NET strings and Go strings that required a bit of borrowing from the Go framework regex engine as well.  I cleaned up a couple of the dirtier bits during the port (regexcharclass.cs was terrible), but the parse tree, code emmitted, and therefore patterns matched should be identical.
6
7## Installing
8This is a go-gettable library, so install is easy:
9
10    go get github.com/dlclark/regexp2/...
11
12## Usage
13Usage is similar to the Go `regexp` package.  Just like in `regexp`, you start by converting a regex into a state machine via the `Compile` or `MustCompile` methods.  They ultimately do the same thing, but `MustCompile` will panic if the regex is invalid.  You can then use the provided `Regexp` struct to find matches repeatedly.  A `Regexp` struct is safe to use across goroutines.
14
15```go
16re := regexp2.MustCompile(`Your pattern`, 0)
17if isMatch, _ := re.MatchString(`Something to match`); isMatch {
18    //do something
19}
20```
21
22The only error that the `*Match*` methods *should* return is a Timeout if you set the `re.MatchTimeout` field.  Any other error is a bug in the `regexp2` package.  If you need more details about capture groups in a match then use the `FindStringMatch` method, like so:
23
24```go
25if m, _ := re.FindStringMatch(`Something to match`); m != nil {
26    // the whole match is always group 0
27    fmt.Printf("Group 0: %v\n", m.String())
28
29    // you can get all the groups too
30    gps := m.Groups()
31
32    // a group can be captured multiple times, so each cap is separately addressable
33    fmt.Printf("Group 1, first capture", gps[1].Captures[0].String())
34    fmt.Printf("Group 1, second capture", gps[1].Captures[1].String())
35}
36```
37
38Group 0 is embedded in the Match.  Group 0 is an automatically-assigned group that encompasses the whole pattern.  This means that `m.String()` is the same as `m.Group.String()` and `m.Groups()[0].String()`
39
40The __last__ capture is embedded in each group, so `g.String()` will return the same thing as `g.Capture.String()` and  `g.Captures[len(g.Captures)-1].String()`.
41
42## Compare `regexp` and `regexp2`
43| Category | regexp | regexp2 |
44| --- | --- | --- |
45| Catastrophic backtracking possible | no, constant execution time guarantees | yes, if your pattern is at risk you can use the `re.MatchTimeout` field |
46| Python-style capture groups `(P<name>re)` | yes | no |
47| .NET-style capture groups `(<name>re)` or `('name're)` | no | yes |
48| comments `(?#comment)` | no | yes |
49| branch numbering reset `(?\|a\|b)` | no | no |
50| possessive match `(?>re)` | no | yes |
51| positive lookahead `(?=re)` | no | yes |
52| negative lookahead `(?!re)` | no | yes |
53| positive lookbehind `(?<=re)` | no | yes |
54| negative lookbehind `(?<!re)` | no | yes |
55| back reference `\1` | no | yes |
56| named back reference `\k'name'` | no | yes |
57| named ascii character class `[[:foo:]]`| yes | no |
58| conditionals `((expr)yes\|no)` | no | yes |
59
60## Library features that I'm still working on
61- Regex split
62
63## Potential bugs
64I've run a battery of tests against regexp2 from various sources and found the debug output matches the .NET engine, but .NET and Go handle strings very differently.  I've attempted to handle these differences, but most of my testing deals with basic ASCII with a little bit of multi-byte Unicode.  There's a chance that there are bugs in the string handling related to character sets with supplementary Unicode chars.  Right-to-Left support is coded, but not well tested either.
65
66## Find a bug?
67I'm open to new issues and pull requests with tests if you find something odd!
68