Name | Date | Size | #Lines | LOC | ||
---|---|---|---|---|---|---|
.. | 03-May-2022 | - | ||||
benches/ | H | 03-May-2022 | - | 26 | 21 | |
src/ | H | 03-May-2022 | - | 572 | 339 | |
.cargo-checksum.json | H A D | 03-May-2022 | 89 | 1 | 1 | |
.cargo_vcs_info.json | H A D | 01-Jan-1970 | 74 | 6 | 5 | |
.gitignore | H A D | 09-Jul-2016 | 88 | 10 | 9 | |
COPYING | H A D | 09-Jul-2016 | 126 | 4 | 2 | |
Cargo.toml | H A D | 01-Jan-1970 | 1.2 KiB | 33 | 30 | |
Cargo.toml.orig-cargo | H A D | 03-Aug-2019 | 682 | 20 | 17 | |
LICENSE-MIT | H A D | 09-Jul-2016 | 1.1 KiB | 22 | 17 | |
README.md | H A D | 03-Aug-2019 | 1.5 KiB | 58 | 40 | |
UNLICENSE | H A D | 09-Jul-2016 | 1.2 KiB | 25 | 20 |
README.md
1**DEPRECATED:** This crate has been folded into the 2[`regex-syntax`](https://docs.rs/regex-syntax) and is now deprecated. 3 4utf8-ranges 5=========== 6This crate converts contiguous ranges of Unicode scalar values to UTF-8 byte 7ranges. This is useful when constructing byte based automata from Unicode. 8Stated differently, this lets one embed UTF-8 decoding as part of one's 9automaton. 10 11[![Linux build status](https://api.travis-ci.org/BurntSushi/utf8-ranges.png)](https://travis-ci.org/BurntSushi/utf8-ranges) 12[![](http://meritbadge.herokuapp.com/utf8-ranges)](https://crates.io/crates/utf8-ranges) 13 14Dual-licensed under MIT or the [UNLICENSE](http://unlicense.org). 15 16 17### Documentation 18 19https://docs.rs/utf8-ranges 20 21 22### Example 23 24This shows how to convert a scalar value range (e.g., the basic multilingual 25plane) to a sequence of byte based character classes. 26 27 28```rust 29extern crate utf8_ranges; 30 31use utf8_ranges::Utf8Sequences; 32 33fn main() { 34 for range in Utf8Sequences::new('\u{0}', '\u{FFFF}') { 35 println!("{:?}", range); 36 } 37} 38``` 39 40The output: 41 42```text 43[0-7F] 44[C2-DF][80-BF] 45[E0][A0-BF][80-BF] 46[E1-EC][80-BF][80-BF] 47[ED][80-9F][80-BF] 48[EE-EF][80-BF][80-BF] 49``` 50 51These ranges can then be used to build an automaton. Namely: 52 531. Every arbitrary sequence of bytes matches exactly one of the sequences of 54 ranges or none of them. 552. Every match sequence of bytes is guaranteed to be valid UTF-8. (Erroneous 56 encodings of surrogate codepoints in UTF-8 cannot match any of the byte 57 ranges above.) 58