• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..21-May-2021-

.github/H21-May-2021-272221

build/Release/H21-May-2021-

lib/H21-May-2021-2,0341,736

scripts/H03-May-2022-

vendor/H21-May-2021-38,04728,194

LICENSEH A D21-May-20211.9 KiB3528

README.mdH A D21-May-202116.4 KiB371275

binding.gypH A D21-May-20212.1 KiB8887

package.jsonH A D21-May-20211.1 KiB4140

README.md

1# node-re2
2
3This project provides bindings for [RE2](https://github.com/google/re2):
4fast, safe alternative to backtracking regular expression engines written by [Russ Cox](http://swtch.com/~rsc/).
5To learn more about RE2, start with an overview
6[Regular Expression Matching in the Wild](http://swtch.com/~rsc/regexp/regexp3.html). More resources can be found
7at his [Implementing Regular Expressions](http://swtch.com/~rsc/regexp/) page.
8
9`RE2`'s regular expression language is almost a superset of what is provided by `RegExp`
10(see [Syntax](https://github.com/google/re2/wiki/Syntax)),
11but it lacks two features: backreferences and lookahead assertions. See below for more details.
12
13`RE2` object emulates standard `RegExp` making it a practical drop-in replacement in most cases.
14`RE2` is extended to provide `String`-based regular expression methods as well. To help to convert
15`RegExp` objects to `RE2` its constructor can take `RegExp` directly honoring all properties.
16
17It can work with [node.js buffers](http://nodejs.org/api/buffer.html) directly reducing overhead
18on recoding and copying characters, and making processing/parsing long files fast.
19
20All documentation can be found in this README and in the [wiki](https://github.com/uhop/node-re2/wiki).
21
22## Why use node-re2?
23
24The built-in Node.js regular expression engine can run in exponential time with a special combination:
25 - A vulnerable regular expression
26 - "Evil input"
27
28This can lead to what is known as a [Regular Expression Denial of Service (ReDoS)](https://www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS).
29To tell if your regular expressions are vulnerable, you might try the one of these projects:
30 - [rxxr2](http://www.cs.bham.ac.uk/~hxt/research/rxxr2/)
31 - [safe-regex](https://github.com/substack/safe-regex)
32
33However, neither project is perfect.
34
35node-re2 can protect your Node.js application from ReDoS.
36node-re2 makes vulnerable regular expression patterns safe by evaluating them in `RE2` instead of the built-in Node.js regex engine.
37
38## Standard features
39
40`RE2` object can be created just like `RegExp`:
41
42* [`new RE2(pattern[, flags])`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp)
43
44Supported properties:
45
46* [`re2.lastIndex`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/lastIndex)
47* [`re2.global`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/global)
48* [`re2.ignoreCase`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/ignoreCase)
49* [`re2.multiline`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/multiline)
50* [`re2.unicode`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicode)
51  * `RE2` engine always works in the Unicode mode. See details below.
52* [`re2.sticky`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/sticky)
53* [`re2.source`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/source)
54* [`re2.flags`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/flags)
55
56Supported methods:
57
58* [`re2.exec(str)`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/exec)
59* [`re2.test(str)`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/test)
60* [`re2.toString()`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/toString)
61
62Starting with 1.6.0 following well-known symbol-based methods are supported (see [Symbols](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Symbol)):
63
64* [`re2[Symbol.match](str)`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Symbol/match)
65* [`re2[Symbol.search](str)`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Symbol/search)
66* [`re2[Symbol.replace](str, newSubStr|function)`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Symbol/replace)
67* [`re2[Symbol.split](str[, limit])`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Symbol/split)
68
69It allows to use `RE2` instances on strings directly, just like `RegExp` instances:
70
71```js
72var re = new RE2("1");
73"213".match(re);        // [ '1', index: 1, input: '213' ]
74"213".search(re);       // 1
75"213".replace(re, "+"); // 2+3
76"213".split(re);        // [ '2', '3' ]
77```
78
79Starting with 1.8.0 [named groups](https://tc39.github.io/proposal-regexp-named-groups/) are supported.
80
81## Extensions
82
83### Shortcut construction
84
85`RE2` object can be created from a regular expression:
86
87```js
88var re1 = new RE2(/ab*/ig); // from a RegExp object
89var re2 = new RE2(re1);     // from another RE2 object
90```
91
92### `String` methods
93
94Standard `String` defines four more methods that can use regular expressions. `RE2` provides them as methods
95exchanging positions of a string, and a regular expression:
96
97* `re2.match(str)`
98  * See [`str.match(regexp)`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/match)
99* `re2.replace(str, newSubStr|function)`
100  * See [`str.replace(regexp, newSubStr|function)`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace)
101* `re2.search(str)`
102  * See [`str.search(regexp)`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/search)
103* `re2.split(str[, limit])`
104  * See [`str.split(regexp[, limit])`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/split)
105
106Starting with 1.6.0, these methods added as well-known symbol-based methods to be used transparently with ES6 string/regex machinery.
107
108### `Buffer` support
109
110In order to support `Buffer` directly, most methods can accept buffers instead of strings. It speeds up all operations.
111Following signatures are supported:
112
113* `re2.exec(buf)`
114* `re2.test(buf)`
115* `re2.match(buf)`
116* `re2.search(buf)`
117* `re2.split(buf[, limit])`
118* `re2.replace(buf, replacer)`
119
120Differences with their string-based versions:
121
122* All buffers are assumed to be encoded as [UTF-8](http://en.wikipedia.org/wiki/UTF-8)
123  (ASCII is a proper subset of UTF-8).
124* Instead of strings they return `Buffer` objects, even in composite objects. A buffer can be converted to a string with
125  [`buf.toString()`](http://nodejs.org/api/buffer.html#buffer_buf_tostring_encoding_start_end).
126* All offsets and lengths are in bytes, rather than characters (each UTF-8 character can occupy from 1 to 4 bytes).
127  This way users can properly slice buffers without costly recalculations from characters to bytes.
128
129When `re2.replace()` is used with a replacer function, the replacer can return a buffer, or a string. But all arguments
130(except for an input object) will be strings, and an offset will be in characters. If you prefer to deal
131with buffers and byte offsets in a replacer function, set a property `useBuffers` to `true` on the function:
132
133```js
134function strReplacer(match, offset, input) {
135	// typeof match == "string"
136	return "<= " + offset + " characters|";
137}
138
139RE2("б").replace("абв", strReplacer);
140// "а<= 1 characters|в"
141
142function bufReplacer(match, offset, input) {
143	// typeof match == "string"
144	return "<= " + offset + " bytes|";
145}
146bufReplacer.useBuffers = true;
147
148RE2("б").replace("абв", bufReplacer);
149// "а<= 2 bytes|в"
150```
151
152This feature works for string and buffer inputs. If a buffer was used as an input, its output will be returned as
153a buffer too, otherwise a string will be returned.
154
155### Calculate length
156
157Two functions to calculate string sizes between
158[UTF-8](http://en.wikipedia.org/wiki/UTF-8) and
159[UTF-16](http://en.wikipedia.org/wiki/UTF-16) are exposed on `RE2`:
160
161* `RE2.getUtf8Length(str)` &mdash; calculates a buffer size in bytes to encode a UTF-16 string as
162  a UTF-8 buffer.
163* `RE2.getUtf16Length(buf)` &mdash; calculates a string size in characters to encode a UTF-8 buffer as
164  a UTF-16 string.
165
166JavaScript supports UCS-2 strings with 16-bit characters, while node.js 0.11 supports full UTF-16 as
167a default string.
168
169### Property: `internalSource`
170
171Starting 1.8.0 property `source` emulates the same property of `RegExp`, meaning that it can be used to create an identical `RE2` or `RegExp` instance. Sometimes, for troubleshooting purposes, a user wants to inspect a `RE2` translated source. It is available as a read-only property called `internalSource`.
172
173### Unicode warning level
174
175`RE2` engine always works in the Unicode mode. In most cases either there is no difference or the Unicode mode is actually preferred. But sometimes a user want a tight control over her regular expressions. For those cases, there is a static string property `RE2.unicodeWarningLevel`.
176
177Regular expressions in the Unicode mode work as usual. But if a regular expression lacks the Unicode flag, it is always added silently.
178
179```js
180const x = /./;
181x.flags; // ''
182const y = new RE2(x);
183y.flags; // 'u'
184```
185
186In the latter case `RE2` can do following actions depending on `RE2.unicodeWarningLevel`:
187
188* `'nothing'` (the default): no warnings or notifications of any kind, a regular expression will be created with `'u'` flag.
189* `'warnOnce'`: warns exactly once the very first time, a regular expression will be created with `'u'` flag.
190  * Assigning this value resets an internal flag, so `RE2` will warn once again.
191* `'warn'`: warns every time, a regular expression will be created with `'u'` flag.
192* `'throw'`: throws a `SyntaxError` every time.
193* All other warning level values are silently ignored on asignment leaving the previous value unchanged.
194
195Warnings and exceptions help to audit an application for stray non-Unicode regular expressions.
196
197## How to install
198
199Installation:
200
201```
202npm install --save re2
203```
204
205## How to use
206
207It is used just like a `RegExp` object.
208
209```js
210var RE2 = require("re2");
211
212// with default flags
213var re = new RE2("a(b*)");
214var result = re.exec("abbc");
215console.log(result[0]); // "abb"
216console.log(result[1]); // "bb"
217
218result = re.exec("aBbC");
219console.log(result[0]); // "a"
220console.log(result[1]); // ""
221
222// with explicit flags
223re = new RE2("a(b*)", "i");
224result = re.exec("aBbC");
225console.log(result[0]); // "aBb"
226console.log(result[1]); // "Bb"
227
228// from regular expression object
229var regexp = new RegExp("a(b*)", "i");
230re = new RE2(regexp);
231result = re.exec("aBbC");
232console.log(result[0]); // "aBb"
233console.log(result[1]); // "Bb"
234
235// from regular expression literal
236re = new RE2(/a(b*)/i);
237result = re.exec("aBbC");
238console.log(result[0]); // "aBb"
239console.log(result[1]); // "Bb"
240
241// from another RE2 object
242var rex = new RE2(re);
243result = rex.exec("aBbC");
244console.log(result[0]); // "aBb"
245console.log(result[1]); // "Bb"
246
247// shortcut
248result = new RE2("ab*").exec("abba");
249
250// factory
251result = RE2("ab*").exec("abba");
252```
253
254## Limitations (Things RE2 does not support)
255
256`RE2` consciously avoids any regular expression features that require worst-case exponential time to evaluate.
257These features are essentially those that describe a Context-Free Language (CFL) rather than a Regular Expression,
258and are extensions to the traditional regular expression language because some people don't know when enough is enough.
259
260The most noteworthy missing features are backreferences and lookahead assertions.
261If your application uses these features, you should continue to use `RegExp`.
262But since these features are fundamentally vulnerable to
263[ReDoS](https://www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS),
264you should strongly consider replacing them.
265
266`RE2` will throw a `SyntaxError` if you try to declare a regular expression using these features.
267If you are evaluating an externally-provided regular expression, wrap your `RE2` declarations in a try-catch block. It allows to use `RegExp`, when `RE2` misses a feature:
268
269```js
270var re = /(a)+(b)*/;
271try {
272  re = new RE2(re);
273  // use RE2 as a drop-in replacement
274} catch (e) {
275  // suppress an error, and use
276  // the original RegExp
277}
278var result = re.exec(sample);
279```
280
281In addition to these missing features, `RE2` also behaves somewhat differently from the built-in regular expression engine in corner cases.
282
283### Backreferences
284
285`RE2` doesn't support backreferences, which are numbered references to previously
286matched groups, like so: `\1`, `\2`, and so on. Example of backrefrences:
287
288```js
289/(cat|dog)\1/.test("catcat"); // true
290/(cat|dog)\1/.test("dogdog"); // true
291/(cat|dog)\1/.test("catdog"); // false
292/(cat|dog)\1/.test("dogcat"); // false
293```
294
295### Lookahead assertions
296
297`RE2` doesn't support lookahead assertions, which are ways to allow a matching dependent on subsequent contents.
298
299```js
300/abc(?=def)/; // match abc only if it is followed by def
301/abc(?!def)/; // match abc only if it is not followed by def
302```
303
304### Mismatched behavior
305
306`RE2` and the built-in regex engines disagree a bit. Before you switch to `RE2`, verify that your regular expressions continue to work as expected. They should do so in the vast majority of cases.
307
308Here is an example of a case where they may not:
309
310```js
311var RE2  = require("../re2");
312
313var pattern = '(?:(a)|(b)|(c))+';
314
315var built_in = new RegExp(pattern);
316var re2 = new RE2(pattern);
317
318var input = 'abc';
319
320var bi_res = built_in.exec(input);
321var re2_res = re2.exec(input);
322
323console.log('bi_res: ' + bi_res);    // prints: bi_res: abc,,,c
324console.log('re2_res : ' + re2_res); // prints: re2_res : abc,a,b,c
325```
326
327### Unicode
328
329`RE2` always works in the Unicode mode. See `RE2.unicodeWarningLevel` above for more details on how to control warnings about this feature.
330
331## Release history
332
333- 1.14.0 *New delivery mechanism for binary artifacts (thx, [Brandon Kobel](https://github.com/kobelb) for the idea and the research) + minor fix to eliminate warnings on Windows.*
334- 1.13.1 *Fix for Windows builds.*
335- 1.13.0 *Got rid of a single static variable to support multithreading.*
336- 1.12.1 *Updated `re2` to the latest version.*
337- 1.12.0 *Updated the way `RE2` objects are constructed.*
338- 1.11.0 *Updated the way to initialize the extension (thx [BannerBomb](https://github.com/BannerBomb)).*
339- 1.10.5 *Bugfix for optional groups (thx [Josh Yudaken](https://github.com/qix)), the latest version of `re2`.*
340- 1.10.4 *Technical release: even better TypeScript types (thx [Louis Brann](https://github.com/louis-brann)).*
341- 1.10.3 *Technical release: missing reference to TS types (thx [Jamie Magee](https://github.com/JamieMagee)).*
342- 1.10.2 *Technical release: added TypeScript types (thx [Jamie Magee](https://github.com/JamieMagee)).*
343- 1.10.1 *Updated `re2` to the latest version (thx [Jamie Magee](https://github.com/JamieMagee)), dropped Node 6.*
344- 1.10.0 *Added back support for Node 6 and Node 8. Now Node 6-12 is supported.*
345- 1.9.0 *Refreshed dependencies to support Node 12. Only versions 10-12 are supported now (`v8` restrictions). For older versions use `node-re2@1.8`.*
346- 1.8.4 *Refreshed dependencies, removed `unistd.h` to compile on Windows.*
347- 1.8.3 *Refreshed dependencies, removed suppression of some warnings.*
348- 1.8.2 *Bugfix to support the null prototype for groups. Thx [Exter-N](https://github.com/Exter-N)!*
349- 1.8.1 *Bugfix for better source escaping.*
350- 1.8.0 *Clarified Unicode support, added `unicode` flag, added named groups &mdash; thx [Exter-N](https://github.com/Exter-N)! Bugfixes &mdash; thx [Barak Amar](https://github.com/nopcoder)!*
351- 1.7.0 *Implemented `sticky` and `flags` + bug fixes + more tests. Thx [Exter-N](https://github.com/Exter-N)!*
352- 1.6.2 *Bugfix for a prototype access. Thx [Exter-N](https://github.com/Exter-N)!*
353- 1.6.1 *Returned support for node 4 LTS. Thx [Kannan Goundan](https://github.com/cakoose)!*
354- 1.6.0 *Added well-known symbol-based methods of ES6. Refreshed NAN.*
355- 1.5.0 *Bugfixes, error checks, better docs. Thx [Jamie Davis](https://github.com/davisjam), and [omg](https://github.com/omg)!*
356- 1.4.1 *Minor corrections in README.*
357- 1.4.0 *Use re2 as a git submodule. Thx [Ben James](https://github.com/benhjames)!*
358- 1.3.3 *Refreshed dependencies.*
359- 1.3.2 *Updated references in README (re2 was moved to github).*
360- 1.3.1 *Refreshed dependencies, new Travis-CI config.*
361- 1.3.0 *Upgraded NAN to 1.6.3, now we support node.js 0.10.36, 0.12.0, and io.js 1.3.0. Thx @reid!*
362- 1.2.0 *Documented getUtfXLength() functions. Added support for `\c` and `\u` commands.*
363- 1.1.1 *Minor corrections in README.*
364- 1.1.0 *Buffer-based API is public. Unicode is fully supported.*
365- 1.0.0 *Implemented all `RegExp` methods, and all relevant `String` methods.*
366- 0.9.0 *The initial public release.*
367
368## License
369
370BSD
371