• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

benches/H03-May-2022-2621

src/H03-May-2022-572339

.cargo-checksum.jsonH A D03-May-202289 11

.cargo_vcs_info.jsonH A D01-Jan-197074 65

.gitignoreH A D09-Jul-201688 109

COPYINGH A D09-Jul-2016126 42

Cargo.tomlH A D01-Jan-19701.2 KiB3330

Cargo.toml.orig-cargoH A D03-Aug-2019682 2017

LICENSE-MITH A D09-Jul-20161.1 KiB2217

README.mdH A D03-Aug-20191.5 KiB5840

UNLICENSEH A D09-Jul-20161.2 KiB2520

README.md

1**DEPRECATED:** This crate has been folded into the
2[`regex-syntax`](https://docs.rs/regex-syntax) and is now deprecated.
3
4utf8-ranges
5===========
6This crate converts contiguous ranges of Unicode scalar values to UTF-8 byte
7ranges. This is useful when constructing byte based automata from Unicode.
8Stated differently, this lets one embed UTF-8 decoding as part of one's
9automaton.
10
11[![Linux build status](https://api.travis-ci.org/BurntSushi/utf8-ranges.png)](https://travis-ci.org/BurntSushi/utf8-ranges)
12[![](http://meritbadge.herokuapp.com/utf8-ranges)](https://crates.io/crates/utf8-ranges)
13
14Dual-licensed under MIT or the [UNLICENSE](http://unlicense.org).
15
16
17### Documentation
18
19https://docs.rs/utf8-ranges
20
21
22### Example
23
24This shows how to convert a scalar value range (e.g., the basic multilingual
25plane) to a sequence of byte based character classes.
26
27
28```rust
29extern crate utf8_ranges;
30
31use utf8_ranges::Utf8Sequences;
32
33fn main() {
34    for range in Utf8Sequences::new('\u{0}', '\u{FFFF}') {
35        println!("{:?}", range);
36    }
37}
38```
39
40The output:
41
42```text
43[0-7F]
44[C2-DF][80-BF]
45[E0][A0-BF][80-BF]
46[E1-EC][80-BF][80-BF]
47[ED][80-9F][80-BF]
48[EE-EF][80-BF][80-BF]
49```
50
51These ranges can then be used to build an automaton. Namely:
52
531. Every arbitrary sequence of bytes matches exactly one of the sequences of
54   ranges or none of them.
552. Every match sequence of bytes is guaranteed to be valid UTF-8. (Erroneous
56   encodings of surrogate codepoints in UTF-8 cannot match any of the byte
57   ranges above.)
58