• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

benches/H03-May-2022-2621

src/H03-May-2022-572339

.cargo-checksum.jsonH A D03-May-202289 11

.cargo_vcs_info.jsonH A D01-Jan-197074 65

.gitignoreH A D09-Jul-201688 109

COPYINGH A D09-Jul-2016126 42

Cargo.tomlH A D01-Jan-19701.2 KiB3330

Cargo.toml.orig-cargoH A D09-Jun-2019687 2017

LICENSE-MITH A D09-Jul-20161.1 KiB2217

README.mdH A D09-Jun-20191.4 KiB5538

UNLICENSEH A D09-Jul-20161.2 KiB2520

README.md

1utf8-ranges
2===========
3This crate converts contiguous ranges of Unicode scalar values to UTF-8 byte
4ranges. This is useful when constructing byte based automata from Unicode.
5Stated differently, this lets one embed UTF-8 decoding as part of one's
6automaton.
7
8[![Linux build status](https://api.travis-ci.org/BurntSushi/utf8-ranges.png)](https://travis-ci.org/BurntSushi/utf8-ranges)
9[![](http://meritbadge.herokuapp.com/utf8-ranges)](https://crates.io/crates/utf8-ranges)
10
11Dual-licensed under MIT or the [UNLICENSE](http://unlicense.org).
12
13
14### Documentation
15
16https://docs.rs/utf8-ranges
17
18
19### Example
20
21This shows how to convert a scalar value range (e.g., the basic multilingual
22plane) to a sequence of byte based character classes.
23
24
25```rust
26extern crate utf8_ranges;
27
28use utf8_ranges::Utf8Sequences;
29
30fn main() {
31    for range in Utf8Sequences::new('\u{0}', '\u{FFFF}') {
32        println!("{:?}", range);
33    }
34}
35```
36
37The output:
38
39```text
40[0-7F]
41[C2-DF][80-BF]
42[E0][A0-BF][80-BF]
43[E1-EC][80-BF][80-BF]
44[ED][80-9F][80-BF]
45[EE-EF][80-BF][80-BF]
46```
47
48These ranges can then be used to build an automaton. Namely:
49
501. Every arbitrary sequence of bytes matches exactly one of the sequences of
51   ranges or none of them.
522. Every match sequence of bytes is guaranteed to be valid UTF-8. (Erroneous
53   encodings of surrogate codepoints in UTF-8 cannot match any of the byte
54   ranges above.)
55