1# content_inspector 2 3[![Crates.io](https://img.shields.io/crates/v/content_inspector.svg)](https://crates.io/crates/content_inspector) 4[![Documentation](https://docs.rs/content_inspector/badge.svg)](https://docs.rs/content_inspector) 5 6A simple library for *fast* inspection of binary buffers to guess the type of content. 7 8This is mainly intended to quickly determine whether a given buffer contains "binary" 9or "text" data. Programs like `grep` or `git diff` use similar mechanisms to decide whether 10to treat some files as "binary data" or not. 11 12The analysis is based on a very simple heuristic: Searching for NULL bytes 13(indicating "binary" content) and the detection of special [byte order 14marks](https://en.wikipedia.org/wiki/Byte_order_mark) (indicating a particular kind of textual 15encoding). Note that **this analysis can fail**. For example, even if unlikely, UTF-8-encoded 16text can legally contain NULL bytes. Conversely, some particular binary formats (like binary 17[PGM](https://en.wikipedia.org/wiki/Netpbm_format)) may not contain NULL bytes. Also, for 18performance reasons, only the first 1024 bytes are checked for the NULL-byte (if no BOM was 19detected). 20 21If this library reports a certain type of encoding (say `UTF_16LE`), there is **no guarantee** that 22the binary buffer can actually be decoded as UTF-16LE. 23 24## Usage 25 26```rust 27use content_inspector::{ContentType, inspect}; 28 29assert_eq!(ContentType::UTF_8, inspect(b"Hello")); 30assert_eq!(ContentType::BINARY, inspect(b"\xFF\xE0\x00\x10\x4A\x46\x49\x46\x00")); 31 32assert!(inspect(b"Hello").is_text()); 33``` 34 35## CLI example 36 37This crate also comes with a small example command-line program (see [`examples/inspect.rs`](examples/inspect.rs)) that demonstrates the usage: 38```bash 39> inspect 40USAGE: inspect FILE [FILE...] 41 42> inspect testdata/* 43testdata/create_text_files.py: UTF-8 44testdata/file_sources.md: UTF-8 45testdata/test.jpg: binary 46testdata/test.pdf: binary 47testdata/test.png: binary 48testdata/text_UTF-16BE-BOM.txt: UTF-16BE 49testdata/text_UTF-16LE-BOM.txt: UTF-16LE 50testdata/text_UTF-32BE-BOM.txt: UTF-32BE 51testdata/text_UTF-32LE-BOM.txt: UTF-32LE 52testdata/text_UTF-8-BOM.txt: UTF-8-BOM 53testdata/text_UTF-8.txt: UTF-8 54``` 55 56If you only want to detect whether something is a binary or text file, this is about a factor of 250 faster than `file --mime ...`. 57 58## License 59 60Licensed under either of 61 62 * Apache License, Version 2.0, ([LICENSE-APACHE](LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0) 63 * MIT license ([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT) 64 65at your option. 66