1regex 2===== 3A Rust library for parsing, compiling, and executing regular expressions. Its 4syntax is similar to Perl-style regular expressions, but lacks a few features 5like look around and backreferences. In exchange, all searches execute in 6linear time with respect to the size of the regular expression and search text. 7Much of the syntax and implementation is inspired 8by [RE2](https://github.com/google/re2). 9 10[![Build status](https://github.com/rust-lang/regex/workflows/ci/badge.svg)](https://github.com/rust-lang/regex/actions) 11[![](https://meritbadge.herokuapp.com/regex)](https://crates.io/crates/regex) 12[![Rust](https://img.shields.io/badge/rust-1.41.1%2B-blue.svg?maxAge=3600)](https://github.com/rust-lang/regex) 13 14### Documentation 15 16[Module documentation with examples](https://docs.rs/regex). 17The module documentation also includes a comprehensive description of the 18syntax supported. 19 20Documentation with examples for the various matching functions and iterators 21can be found on the 22[`Regex` type](https://docs.rs/regex/*/regex/struct.Regex.html). 23 24### Usage 25 26Add this to your `Cargo.toml`: 27 28```toml 29[dependencies] 30regex = "1.5" 31``` 32 33Here's a simple example that matches a date in YYYY-MM-DD format and prints the 34year, month and day: 35 36```rust 37use regex::Regex; 38 39fn main() { 40 let re = Regex::new(r"(?x) 41(?P<year>\d{4}) # the year 42- 43(?P<month>\d{2}) # the month 44- 45(?P<day>\d{2}) # the day 46").unwrap(); 47 let caps = re.captures("2010-03-14").unwrap(); 48 49 assert_eq!("2010", &caps["year"]); 50 assert_eq!("03", &caps["month"]); 51 assert_eq!("14", &caps["day"]); 52} 53``` 54 55If you have lots of dates in text that you'd like to iterate over, then it's 56easy to adapt the above example with an iterator: 57 58```rust 59use regex::Regex; 60 61const TO_SEARCH: &'static str = " 62On 2010-03-14, foo happened. On 2014-10-14, bar happened. 63"; 64 65fn main() { 66 let re = Regex::new(r"(\d{4})-(\d{2})-(\d{2})").unwrap(); 67 68 for caps in re.captures_iter(TO_SEARCH) { 69 // Note that all of the unwraps are actually OK for this regex 70 // because the only way for the regex to match is if all of the 71 // capture groups match. This is not true in general though! 72 println!("year: {}, month: {}, day: {}", 73 caps.get(1).unwrap().as_str(), 74 caps.get(2).unwrap().as_str(), 75 caps.get(3).unwrap().as_str()); 76 } 77} 78``` 79 80This example outputs: 81 82```text 83year: 2010, month: 03, day: 14 84year: 2014, month: 10, day: 14 85``` 86 87### Usage: Avoid compiling the same regex in a loop 88 89It is an anti-pattern to compile the same regular expression in a loop since 90compilation is typically expensive. (It takes anywhere from a few microseconds 91to a few **milliseconds** depending on the size of the regex.) Not only is 92compilation itself expensive, but this also prevents optimizations that reuse 93allocations internally to the matching engines. 94 95In Rust, it can sometimes be a pain to pass regular expressions around if 96they're used from inside a helper function. Instead, we recommend using the 97[`lazy_static`](https://crates.io/crates/lazy_static) crate to ensure that 98regular expressions are compiled exactly once. 99 100For example: 101 102```rust,ignore 103use regex::Regex; 104 105fn some_helper_function(text: &str) -> bool { 106 lazy_static! { 107 static ref RE: Regex = Regex::new("...").unwrap(); 108 } 109 RE.is_match(text) 110} 111``` 112 113Specifically, in this example, the regex will be compiled when it is used for 114the first time. On subsequent uses, it will reuse the previous compilation. 115 116### Usage: match regular expressions on `&[u8]` 117 118The main API of this crate (`regex::Regex`) requires the caller to pass a 119`&str` for searching. In Rust, an `&str` is required to be valid UTF-8, which 120means the main API can't be used for searching arbitrary bytes. 121 122To match on arbitrary bytes, use the `regex::bytes::Regex` API. The API 123is identical to the main API, except that it takes an `&[u8]` to search 124on instead of an `&str`. By default, `.` will match any *byte* using 125`regex::bytes::Regex`, while `.` will match any *UTF-8 encoded Unicode scalar 126value* using the main API. 127 128This example shows how to find all null-terminated strings in a slice of bytes: 129 130```rust 131use regex::bytes::Regex; 132 133let re = Regex::new(r"(?P<cstr>[^\x00]+)\x00").unwrap(); 134let text = b"foo\x00bar\x00baz\x00"; 135 136// Extract all of the strings without the null terminator from each match. 137// The unwrap is OK here since a match requires the `cstr` capture to match. 138let cstrs: Vec<&[u8]> = 139 re.captures_iter(text) 140 .map(|c| c.name("cstr").unwrap().as_bytes()) 141 .collect(); 142assert_eq!(vec![&b"foo"[..], &b"bar"[..], &b"baz"[..]], cstrs); 143``` 144 145Notice here that the `[^\x00]+` will match any *byte* except for `NUL`. When 146using the main API, `[^\x00]+` would instead match any valid UTF-8 sequence 147except for `NUL`. 148 149### Usage: match multiple regular expressions simultaneously 150 151This demonstrates how to use a `RegexSet` to match multiple (possibly 152overlapping) regular expressions in a single scan of the search text: 153 154```rust 155use regex::RegexSet; 156 157let set = RegexSet::new(&[ 158 r"\w+", 159 r"\d+", 160 r"\pL+", 161 r"foo", 162 r"bar", 163 r"barfoo", 164 r"foobar", 165]).unwrap(); 166 167// Iterate over and collect all of the matches. 168let matches: Vec<_> = set.matches("foobar").into_iter().collect(); 169assert_eq!(matches, vec![0, 2, 3, 4, 6]); 170 171// You can also test whether a particular regex matched: 172let matches = set.matches("foobar"); 173assert!(!matches.matched(5)); 174assert!(matches.matched(6)); 175``` 176 177### Usage: enable SIMD optimizations 178 179SIMD optimizations are enabled automatically on Rust stable 1.27 and newer. 180For nightly versions of Rust, this requires a recent version with the SIMD 181features stabilized. 182 183 184### Usage: a regular expression parser 185 186This repository contains a crate that provides a well tested regular expression 187parser, abstract syntax and a high-level intermediate representation for 188convenient analysis. It provides no facilities for compilation or execution. 189This may be useful if you're implementing your own regex engine or otherwise 190need to do analysis on the syntax of a regular expression. It is otherwise not 191recommended for general use. 192 193[Documentation `regex-syntax`.](https://docs.rs/regex-syntax) 194 195 196### Crate features 197 198This crate comes with several features that permit tweaking the trade off 199between binary size, compilation time and runtime performance. Users of this 200crate can selectively disable Unicode tables, or choose from a variety of 201optimizations performed by this crate to disable. 202 203When all of these features are disabled, runtime match performance may be much 204worse, but if you're matching on short strings, or if high performance isn't 205necessary, then such a configuration is perfectly serviceable. To disable 206all such features, use the following `Cargo.toml` dependency configuration: 207 208```toml 209[dependencies.regex] 210version = "1.3" 211default-features = false 212# regex currently requires the standard library, you must re-enable it. 213features = ["std"] 214``` 215 216This will reduce the dependency tree of `regex` down to a single crate 217(`regex-syntax`). 218 219The full set of features one can disable are 220[in the "Crate features" section of the documentation](https://docs.rs/regex/*/#crate-features). 221 222 223### Minimum Rust version policy 224 225This crate's minimum supported `rustc` version is `1.41.1`. 226 227The current **tentative** policy is that the minimum Rust version required 228to use this crate can be increased in minor version updates. For example, if 229regex 1.0 requires Rust 1.20.0, then regex 1.0.z for all values of `z` will 230also require Rust 1.20.0 or newer. However, regex 1.y for `y > 0` may require a 231newer minimum version of Rust. 232 233In general, this crate will be conservative with respect to the minimum 234supported version of Rust. 235 236 237### License 238 239This project is licensed under either of 240 241 * Apache License, Version 2.0, ([LICENSE-APACHE](LICENSE-APACHE) or 242 https://www.apache.org/licenses/LICENSE-2.0) 243 * MIT license ([LICENSE-MIT](LICENSE-MIT) or 244 https://opensource.org/licenses/MIT) 245 246at your option. 247 248The data in `regex-syntax/src/unicode_tables/` is licensed under the Unicode 249License Agreement 250([LICENSE-UNICODE](https://www.unicode.org/copyright.html#License)). 251