• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

scripts/H03-May-2022-7540

src/H03-May-2022-8,3155,389

.cargo-checksum.jsonH A D03-May-202289 11

.cargo_vcs_info.jsonH A D01-Jan-197074 65

.gitignoreH A D29-Nov-197394 1110

.ignoreH A D29-Nov-19739 21

COPYINGH A D29-Nov-1973126 42

Cargo.tomlH A D01-Jan-19701.4 KiB6052

Cargo.toml.orig-cargoH A D29-Nov-19731.6 KiB5846

LICENSE-MITH A D29-Nov-19731.1 KiB2217

README.mdH A D29-Nov-19734.4 KiB10877

UNLICENSEH A D29-Nov-19731.2 KiB2520

build.rsH A D29-Nov-19732.2 KiB7546

rustfmt.tomlH A D29-Nov-197344 32

README.md

1memchr
2======
3This library provides heavily optimized routines for string search primitives.
4
5[![Build status](https://github.com/BurntSushi/memchr/workflows/ci/badge.svg)](https://github.com/BurntSushi/memchr/actions)
6[![](https://meritbadge.herokuapp.com/memchr)](https://crates.io/crates/memchr)
7
8Dual-licensed under MIT or the [UNLICENSE](https://unlicense.org/).
9
10
11### Documentation
12
13[https://docs.rs/memchr](https://docs.rs/memchr)
14
15
16### Overview
17
18* The top-level module provides routines for searching for 1, 2 or 3 bytes
19  in the forward or reverse direction. When searching for more than one byte,
20  positions are considered a match if the byte at that position matches any
21  of the bytes.
22* The `memmem` sub-module provides forward and reverse substring search
23  routines.
24
25In all such cases, routines operate on `&[u8]` without regard to encoding. This
26is exactly what you want when searching either UTF-8 or arbitrary bytes.
27
28### Compiling without the standard library
29
30memchr links to the standard library by default, but you can disable the
31`std` feature if you want to use it in a `#![no_std]` crate:
32
33```toml
34[dependencies]
35memchr = { version = "2", default-features = false }
36```
37
38On x86 platforms, when the `std` feature is disabled, the SSE2 accelerated
39implementations will be used. When `std` is enabled, AVX accelerated
40implementations will be used if the CPU is determined to support it at runtime.
41
42### Using libc
43
44`memchr` is a routine that is part of libc, although this crate does not use
45libc by default. Instead, it uses its own routines, which are either vectorized
46or generic fallback routines. In general, these should be competitive with
47what's in libc, although this has not been tested for all architectures. If
48using `memchr` from libc is desirable and a vectorized routine is not otherwise
49available in this crate, then enabling the `libc` feature will use libc's
50version of `memchr`.
51
52The rest of the functions in this crate, e.g., `memchr2` or `memrchr3` and the
53substring search routines, will always use the implementations in this crate.
54One exception to this is `memrchr`, which is an extension in `libc` found on
55Linux. On Linux, `memrchr` is used in precisely the same scenario as `memchr`,
56as described above.
57
58
59### Minimum Rust version policy
60
61This crate's minimum supported `rustc` version is `1.41.1`.
62
63The current policy is that the minimum Rust version required to use this crate
64can be increased in minor version updates. For example, if `crate 1.0` requires
65Rust 1.20.0, then `crate 1.0.z` for all values of `z` will also require Rust
661.20.0 or newer. However, `crate 1.y` for `y > 0` may require a newer minimum
67version of Rust.
68
69In general, this crate will be conservative with respect to the minimum
70supported version of Rust.
71
72
73### Testing strategy
74
75Given the complexity of the code in this crate, along with the pervasive use
76of `unsafe`, this crate has an extensive testing strategy. It combines multiple
77approaches:
78
79* Hand-written tests.
80* Exhaustive-style testing meant to exercise all possible branching and offset
81  calculations.
82* Property based testing through [`quickcheck`](https://github.com/BurntSushi/quickcheck).
83* Fuzz testing through [`cargo fuzz`](https://github.com/rust-fuzz/cargo-fuzz).
84* A huge suite of benchmarks that are also run as tests. Benchmarks always
85  confirm that the expected result occurs.
86
87Improvements to the testing infrastructure are very welcome.
88
89
90### Algorithms used
91
92At time of writing, this crate's implementation of substring search actually
93has a few different algorithms to choose from depending on the situation.
94
95* For very small haystacks,
96  [Rabin-Karp](https://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm)
97  is used to reduce latency. Rabin-Karp has very small overhead and can often
98  complete before other searchers have even been constructed.
99* For small needles, a variant of the
100  ["Generic SIMD"](http://0x80.pl/articles/simd-strfind.html#algorithm-1-generic-simd)
101  algorithm is used. Instead of using the first and last bytes, a heuristic is
102  used to select bytes based on a background distribution of byte frequencies.
103* In all other cases,
104  [Two-Way](https://en.wikipedia.org/wiki/Two-way_string-matching_algorithm)
105  is used. If possible, a prefilter based on the "Generic SIMD" algorithm
106  linked above is used to find candidates quickly. A dynamic heuristic is used
107  to detect if the prefilter is ineffective, and if so, disables it.
108