• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

ci/H03-May-2022-157

doc/H03-May-2022-

src/H03-May-2022-137,135131,334

.cargo-checksum.jsonH A D03-May-202289 11

.cargo_vcs_info.jsonH A D01-Jan-197094 66

.gitignoreH A D29-Nov-1973100 1110

.travis.ymlH A D29-Nov-1973416 2220

CONTRIBUTING.mdH A D29-Nov-19731.9 KiB4934

COPYRIGHTH A D29-Nov-1973713 1814

Cargo.tomlH A D01-Jan-19701.8 KiB6255

Cargo.toml.orig-cargoH A D29-Nov-19731.4 KiB4842

Ideas.mdH A D29-Nov-19734.9 KiB10777

LICENSE-APACHEH A D29-Nov-197311.1 KiB203169

LICENSE-MITH A D29-Nov-19731 KiB2622

LICENSE-WHATWGH A D29-Nov-19731.5 KiB2721

README.mdH A D29-Nov-197330 KiB821583

build.rsH A D29-Nov-1973588 134

generate-encoding-data.pyH A D29-Nov-197360.4 KiB2,0091,576

rustfmt.tomlH A D29-Nov-197331 21

README.md

1# encoding_rs
2
3[![Build Status](https://travis-ci.org/hsivonen/encoding_rs.svg?branch=master)](https://travis-ci.org/hsivonen/encoding_rs)
4[![crates.io](https://img.shields.io/crates/v/encoding_rs.svg)](https://crates.io/crates/encoding_rs)
5[![docs.rs](https://docs.rs/encoding_rs/badge.svg)](https://docs.rs/encoding_rs/)
6
7encoding_rs an implementation of the (non-JavaScript parts of) the
8[Encoding Standard](https://encoding.spec.whatwg.org/) written in Rust.
9
10The Encoding Standard defines the Web-compatible set of character encodings,
11which means this crate can be used to decode Web content. encoding_rs is
12used in Gecko starting with Firefox 56. Due to the notable overlap between
13the legacy encodings on the Web and the legacy encodings used on Windows,
14this crate may be of use for non-Web-related situations as well; see below
15for links to adjacent crates.
16
17Additionally, the `mem` module provides various operations for dealing with
18in-RAM text (as opposed to data that's coming from or going to an IO boundary).
19The `mem` module is a module instead of a separate crate due to internal
20implementation detail efficiencies.
21
22## Functionality
23
24Due to the Gecko use case, encoding_rs supports decoding to and encoding from
25UTF-16 in addition to supporting the usual Rust use case of decoding to and
26encoding from UTF-8. Additionally, the API has been designed to be FFI-friendly
27to accommodate the C++ side of Gecko.
28
29Specifically, encoding_rs does the following:
30
31* Decodes a stream of bytes in an Encoding Standard-defined character encoding
32  into valid aligned native-endian in-RAM UTF-16 (units of `u16` / `char16_t`).
33* Encodes a stream of potentially-invalid aligned native-endian in-RAM UTF-16
34  (units of `u16` / `char16_t`) into a sequence of bytes in an Encoding
35  Standard-defined character encoding as if the lone surrogates had been
36  replaced with the REPLACEMENT CHARACTER before performing the encode.
37  (Gecko's UTF-16 is potentially invalid.)
38* Decodes a stream of bytes in an Encoding Standard-defined character
39  encoding into valid UTF-8.
40* Encodes a stream of valid UTF-8 into a sequence of bytes in an Encoding
41  Standard-defined character encoding. (Rust's UTF-8 is guaranteed-valid.)
42* Does the above in streaming (input and output split across multiple
43  buffers) and non-streaming (whole input in a single buffer and whole
44  output in a single buffer) variants.
45* Avoids copying (borrows) when possible in the non-streaming cases when
46  decoding to or encoding from UTF-8.
47* Resolves textual labels that identify character encodings in
48  protocol text into type-safe objects representing the those encodings
49  conceptually.
50* Maps the type-safe encoding objects onto strings suitable for
51  returning from `document.characterSet`.
52* Validates UTF-8 (in common instruction set scenarios a bit faster for Web
53  workloads than the standard library; hopefully will get upstreamed some
54  day) and ASCII.
55
56Additionally, `encoding_rs::mem` does the following:
57
58* Checks if a byte buffer contains only ASCII.
59* Checks if a potentially-invalid UTF-16 buffer contains only Basic Latin (ASCII).
60* Checks if a valid UTF-8, potentially-invalid UTF-8 or potentially-invalid UTF-16
61  buffer contains only Latin1 code points (below U+0100).
62* Checks if a valid UTF-8, potentially-invalid UTF-8 or potentially-invalid UTF-16
63  buffer or a code point or a UTF-16 code unit can trigger right-to-left behavior
64  (suitable for checking if the Unicode Bidirectional Algorithm can be optimized
65  out).
66* Combined versions of the above two checks.
67* Converts valid UTF-8, potentially-invalid UTF-8 and Latin1 to UTF-16.
68* Converts potentially-invalid UTF-16 and Latin1 to UTF-8.
69* Converts UTF-8 and UTF-16 to Latin1 (if in range).
70* Finds the first invalid code unit in a buffer of potentially-invalid UTF-16.
71* Makes a mutable buffer of potential-invalid UTF-16 contain valid UTF-16.
72* Copies ASCII from one buffer to another up to the first non-ASCII byte.
73* Converts ASCII to UTF-16 up to the first non-ASCII byte.
74* Converts UTF-16 to ASCII up to the first non-Basic Latin code unit.
75
76## Integration with `std::io`
77
78Notably, the above feature list doesn't include the capability to wrap
79a `std::io::Read`, decode it into UTF-8 and presenting the result via
80`std::io::Read`. The [`encoding_rs_io`](https://crates.io/crates/encoding_rs_io)
81crate provides that capability.
82
83## `no_std` Environment
84
85The crate works in a `no_std` environment. By default, the `alloc` feature,
86which assumes that an allocator is present is enabled. For a no-allocator
87environment, the default features (i.e. `alloc`) can be turned off. This
88makes the part of the API that returns `Vec`/`String`/`Cow` unavailable.
89
90## Decoding Email
91
92For decoding character encodings that occur in email, use the
93[`charset`](https://crates.io/crates/charset) crate instead of using this
94one directly. (It wraps this crate and adds UTF-7 decoding.)
95
96## Windows Code Page Identifier Mappings
97
98For mappings to and from Windows code page identifiers, use the
99[`codepage`](https://crates.io/crates/codepage) crate.
100
101## DOS Encodings
102
103This crate does not support single-byte DOS encodings that aren't required by
104the Web Platform, but the [`oem_cp`](https://crates.io/crates/oem_cp) crate does.
105
106## Preparing Text for the Encoders
107
108Normalizing text into Unicode Normalization Form C prior to encoding text into
109a legacy encoding minimizes unmappable characters. Text can be normalized to
110Unicode Normalization Form C using the
111[`unic-normal`](https://crates.io/crates/unic-normal) crate.
112
113The exception is windows-1258, which after normalizing to Unicode Normalization
114Form C requires tone marks to be decomposed in order to minimize unmappable
115characters. Vietnamese tone marks can be decomposed using the
116[`detone`](https://crates.io/crates/detone) crate.
117
118## Licensing
119
120TL;DR: ((Apache-2.0 OR MIT) AND BSD-3-Clause) for the code and data combination,
121but [crates.io doesn't support
122parentheses](https://github.com/rust-lang/crates.io/issues/2595), so the crate
123metadata points to a custom file.
124
125Please see the file named
126[COPYRIGHT](https://github.com/hsivonen/encoding_rs/blob/master/COPYRIGHT).
127
128The non-test code that isn't generated from the WHATWG data in this crate is
129under Apache-2.0 OR MIT. Test code is under CC0.
130
131This crate contains code/data generated from WHATWG-supplied data. The WHATWG
132upstream changed its license for portions of specs incorporated into source code
133from CC0 to BSD-3-Clause between the initial release of this crate and the present
134version of this crate. The in-source licensing legends have been updated for the
135parts of the generated code that have changed since the upstream license change.
136
137## Documentation
138
139Generated [API documentation](https://docs.rs/encoding_rs/) is available
140online.
141
142There is a [long-form write-up](https://hsivonen.fi/encoding_rs/) about the
143design and internals of the crate.
144
145## C and C++ bindings
146
147An FFI layer for encoding_rs is available as a
148[separate crate](https://github.com/hsivonen/encoding_c). The crate comes
149with a [demo C++ wrapper](https://github.com/hsivonen/encoding_c/blob/master/include/encoding_rs_cpp.h)
150using the C++ standard library and [GSL](https://github.com/Microsoft/GSL/) types.
151
152The bindings for the `mem` module are in the
153[encoding_c_mem crate](https://github.com/hsivonen/encoding_c_mem).
154
155For the Gecko context, there's a
156[C++ wrapper using the MFBT/XPCOM types](https://searchfox.org/mozilla-central/source/intl/Encoding.h#100).
157
158There's a [write-up](https://hsivonen.fi/modern-cpp-in-rust/) about the C++
159wrappers.
160
161## Sample programs
162
163* [Rust](https://github.com/hsivonen/recode_rs)
164* [C](https://github.com/hsivonen/recode_c)
165* [C++](https://github.com/hsivonen/recode_cpp)
166
167## Optional features
168
169There are currently these optional cargo features:
170
171### `simd-accel`
172
173Enables SIMD acceleration using the nightly-dependent `packed_simd_2` crate.
174
175This is an opt-in feature, because enabling this feature _opts out_ of Rust's
176guarantees of future compilers compiling old code (aka. "stability story").
177
178Currently, this has not been tested to be an improvement except for these
179targets:
180
181* x86_64
182* i686
183* aarch64
184* thumbv7neon
185
186If you use nightly Rust, you use targets whose first component is one of the
187above, and you are prepared _to have to revise your configuration when updating
188Rust_, you should enable this feature. Otherwise, please _do not_ enable this
189feature.
190
191_Note!_ If you are compiling for a target that does not have 128-bit SIMD
192enabled as part of the target definition and you are enabling 128-bit SIMD
193using `-C target_feature`, you need to enable the `core_arch` Cargo feature
194for `packed_simd_2` to compile a crates.io snapshot of `core_arch` instead of
195using the standard-library copy of `core::arch`, because the `core::arch`
196module of the pre-compiled standard library has been compiled with the
197assumption that the CPU doesn't have 128-bit SIMD. At present this applies
198mainly to 32-bit ARM targets whose first component does not include the
199substring `neon`.
200
201The encoding_rs side of things has not been properly set up for POWER,
202PowerPC, MIPS, etc., SIMD at this time, so even if you were to follow
203the advice from the previous paragraph, you probably shouldn't use
204the `simd-accel` option on the less mainstream architectures at this
205time.
206
207Used by Firefox.
208
209### `serde`
210
211Enables support for serializing and deserializing `&'static Encoding`-typed
212struct fields using [Serde][1].
213
214[1]: https://serde.rs/
215
216Not used by Firefox.
217
218### `fast-legacy-encode`
219
220A catch-all option for enabling the fastest legacy encode options. _Does not
221affect decode speed or UTF-8 encode speed._
222
223At present, this option is equivalent to enabling the following options:
224 * `fast-hangul-encode`
225 * `fast-hanja-encode`
226 * `fast-kanji-encode`
227 * `fast-gb-hanzi-encode`
228 * `fast-big5-hanzi-encode`
229
230Adds 176 KB to the binary size.
231
232Not used by Firefox.
233
234### `fast-hangul-encode`
235
236Changes encoding precomposed Hangul syllables into EUC-KR from binary
237search over the decode-optimized tables to lookup by index making Korean
238plain-text encode about 4 times as fast as without this option.
239
240Adds 20 KB to the binary size.
241
242Does _not_ affect decode speed.
243
244Not used by Firefox.
245
246### `fast-hanja-encode`
247
248Changes encoding of Hanja into EUC-KR from linear search over the
249decode-optimized table to lookup by index. Since Hanja is practically absent
250in modern Korean text, this option doesn't affect perfomance in the common
251case and mainly makes sense if you want to make your application resilient
252agaist denial of service by someone intentionally feeding it a lot of Hanja
253to encode into EUC-KR.
254
255Adds 40 KB to the binary size.
256
257Does _not_ affect decode speed.
258
259Not used by Firefox.
260
261### `fast-kanji-encode`
262
263Changes encoding of Kanji into Shift_JIS, EUC-JP and ISO-2022-JP from linear
264search over the decode-optimized tables to lookup by index making Japanese
265plain-text encode to legacy encodings 30 to 50 times as fast as without this
266option (about 2 times as fast as with `less-slow-kanji-encode`).
267
268Takes precedence over `less-slow-kanji-encode`.
269
270Adds 36 KB to the binary size (24 KB compared to `less-slow-kanji-encode`).
271
272Does _not_ affect decode speed.
273
274Not used by Firefox.
275
276### `less-slow-kanji-encode`
277
278Makes JIS X 0208 Level 1 Kanji (the most common Kanji in Shift_JIS, EUC-JP and
279ISO-2022-JP) encode less slow (binary search instead of linear search) making
280Japanese plain-text encode to legacy encodings 14 to 23 times as fast as
281without this option.
282
283Adds 12 KB to the binary size.
284
285Does _not_ affect decode speed.
286
287Not used by Firefox.
288
289### `fast-gb-hanzi-encode`
290
291Changes encoding of Hanzi in the CJK Unified Ideographs block into GBK and
292gb18030 from linear search over a part the decode-optimized tables followed
293by a binary search over another part of the decode-optimized tables to lookup
294by index making Simplified Chinese plain-text encode to the legacy encodings
295100 to 110 times as fast as without this option (about 2.5 times as fast as
296with `less-slow-gb-hanzi-encode`).
297
298Takes precedence over `less-slow-gb-hanzi-encode`.
299
300Adds 36 KB to the binary size (24 KB compared to `less-slow-gb-hanzi-encode`).
301
302Does _not_ affect decode speed.
303
304Not used by Firefox.
305
306### `less-slow-gb-hanzi-encode`
307
308Makes GB2312 Level 1 Hanzi (the most common Hanzi in gb18030 and GBK) encode
309less slow (binary search instead of linear search) making Simplified Chinese
310plain-text encode to the legacy encodings about 40 times as fast as without
311this option.
312
313Adds 12 KB to the binary size.
314
315Does _not_ affect decode speed.
316
317Not used by Firefox.
318
319### `fast-big5-hanzi-encode`
320
321Changes encoding of Hanzi in the CJK Unified Ideographs block into Big5 from
322linear search over a part the decode-optimized tables to lookup by index
323making Traditional Chinese plain-text encode to Big5 105 to 125 times as fast
324as without this option (about 3 times as fast as with
325`less-slow-big5-hanzi-encode`).
326
327Takes precedence over `less-slow-big5-hanzi-encode`.
328
329Adds 40 KB to the binary size (20 KB compared to `less-slow-big5-hanzi-encode`).
330
331Does _not_ affect decode speed.
332
333Not used by Firefox.
334
335### `less-slow-big5-hanzi-encode`
336
337Makes Big5 Level 1 Hanzi (the most common Hanzi in Big5) encode less slow
338(binary search instead of linear search) making Traditional Chinese
339plain-text encode to Big5 about 36 times as fast as without this option.
340
341Adds 20 KB to the binary size.
342
343Does _not_ affect decode speed.
344
345Not used by Firefox.
346
347## Performance goals
348
349For decoding to UTF-16, the goal is to perform at least as well as Gecko's old
350uconv. For decoding to UTF-8, the goal is to perform at least as well as
351rust-encoding. These goals have been achieved.
352
353Encoding to UTF-8 should be fast. (UTF-8 to UTF-8 encode should be equivalent
354to `memcpy` and UTF-16 to UTF-8 should be fast.)
355
356Speed is a non-goal when encoding to legacy encodings. By default, encoding to
357legacy encodings should not be optimized for speed at the expense of code size
358as long as form submission and URL parsing in Gecko don't become noticeably
359too slow in real-world use.
360
361In the interest of binary size, by default, encoding_rs does not have
362encode-specific data tables beyond 32 bits of encode-specific data for each
363single-byte encoding. Therefore, encoders search the decode-optimized data
364tables. This is a linear search in most cases. As a result, by default, encode
365to legacy encodings varies from slow to extremely slow relative to other
366libraries. Still, with realistic work loads, this seemed fast enough not to be
367user-visibly slow on Raspberry Pi 3 (which stood in for a phone for testing)
368in the Web-exposed encoder use cases.
369
370See the cargo features above for optionally making CJK legacy encode fast.
371
372A framework for measuring performance is [available separately][2].
373
374[2]: https://github.com/hsivonen/encoding_bench/
375
376## Rust Version Compatibility
377
378It is a goal to support the latest stable Rust, the latest nightly Rust and
379the version of Rust that's used for Firefox Nightly.
380
381At this time, there is no firm commitment to support a version older than
382what's required by Firefox, and there is no commitment to treat MSRV changes
383as semver-breaking, because this crate depends on `cfg-if`, which doesn't
384appear to treat MSRV changes as semver-breaking, so it would be useless for
385this crate to treat MSRV changes as semver-breaking.
386
387As of 2021-02-04, MSRV appears to be Rust 1.36.0 for using the crate and
3881.42.0 for doc tests to pass without errors about the global allocator.
389
390## Compatibility with rust-encoding
391
392A compatibility layer that implements the rust-encoding API on top of
393encoding_rs is
394[provided as a separate crate](https://github.com/hsivonen/encoding_rs_compat)
395(cannot be uploaded to crates.io). The compatibility layer was originally
396written with the assuption that Firefox would need it, but it is not currently
397used in Firefox.
398
399## Regenerating Generated Code
400
401To regenerate the generated code:
402
403 * Have Python 2 installed.
404 * Clone [`https://github.com/hsivonen/encoding_c`](https://github.com/hsivonen/encoding_c)
405   next to the `encoding_rs` directory.
406 * Clone [`https://github.com/hsivonen/codepage`](https://github.com/hsivonen/codepage)
407   next to the `encoding_rs` directory.
408 * Clone [`https://github.com/whatwg/encoding`](https://github.com/whatwg/encoding)
409   next to the `encoding_rs` directory.
410 * Checkout revision `be3337450e7df1c49dca7872153c4c4670dd8256` of the `encoding` repo.
411   (Note: `f381389` was the revision of `encoding` used from before the `encoding` repo
412   license change. So far, only output changed since then has been updated to
413   the new license legend.)
414 * With the `encoding_rs` directory as the working directory, run
415   `python generate-encoding-data.py`.
416
417## Roadmap
418
419- [x] Design the low-level API.
420- [x] Provide Rust-only convenience features.
421- [x] Provide an stl/gsl-flavored C++ API.
422- [x] Implement all decoders and encoders.
423- [x] Add unit tests for all decoders and encoders.
424- [x] Finish BOM sniffing variants in Rust-only convenience features.
425- [x] Document the API.
426- [x] Publish the crate on crates.io.
427- [x] Create a solution for measuring performance.
428- [x] Accelerate ASCII conversions using SSE2 on x86.
429- [x] Accelerate ASCII conversions using ALU register-sized operations on
430      non-x86 architectures (process an `usize` instead of `u8` at a time).
431- [x] Split FFI into a separate crate so that the FFI doesn't interfere with
432      LTO in pure-Rust usage.
433- [x] Compress CJK indices by making use of sequential code points as well
434      as Unicode-ordered parts of indices.
435- [x] Make lookups by label or name use binary search that searches from the
436      end of the label/name to the start.
437- [x] Make labels with non-ASCII bytes fail fast.
438- [ ] ~Parallelize UTF-8 validation using [Rayon](https://github.com/nikomatsakis/rayon).~
439      (This turned out to be a pessimization in the ASCII case due to memory bandwidth reasons.)
440- [x] Provide an XPCOM/MFBT-flavored C++ API.
441- [x] Investigate accelerating single-byte encode with a single fast-tracked
442      range per encoding.
443- [x] Replace uconv with encoding_rs in Gecko.
444- [x] Implement the rust-encoding API in terms of encoding_rs.
445- [x] Add SIMD acceleration for Aarch64.
446- [x] Investigate the use of NEON on 32-bit ARM.
447- [ ] ~Investigate Björn Höhrmann's lookup table acceleration for UTF-8 as
448      adapted to Rust in rust-encoding.~
449- [x] Add actually fast CJK encode options.
450- [ ] ~Investigate [Bob Steagall's lookup table acceleration for UTF-8](https://github.com/BobSteagall/CppNow2018/blob/master/FastConversionFromUTF-8/Fast%20Conversion%20From%20UTF-8%20with%20C%2B%2B%2C%20DFAs%2C%20and%20SSE%20Intrinsics%20-%20Bob%20Steagall%20-%20C%2B%2BNow%202018.pdf).~
451- [ ] Provide a build mode that works without `alloc` (with lesser API surface).
452- [ ] Migrate to `std::simd` once it is stable and declare 1.0.
453
454## Release Notes
455
456### 0.8.30
457
458* Update the licensing information to take into account the WHATWG data license change.
459
460### 0.8.29
461
462* Make the parts that use an allocator optional.
463
464### 0.8.28
465
466* Fix error in Serde support introduced as part of `no_std` support.
467
468### 0.8.27
469
470* Make the crate works in a `no_std` environment (with `alloc`).
471
472### 0.8.26
473
474* Fix oversights in edition 2018 migration that broke the `simd-accel` feature.
475
476### 0.8.25
477
478* Do pointer alignment checks in a way where intermediate steps aren't defined to be Undefined Behavior.
479* Update the `packed_simd` dependency to `packed_simd_2`.
480* Update the `cfg-if` dependency to 1.0.
481* Address warnings that have been introduced by newer Rust versions along the way.
482* Update to edition 2018, since even prior to 1.0 `cfg-if` updated to edition 2018 without a semver break.
483
484### 0.8.24
485
486* Avoid computing an intermediate (not dereferenced) pointer value in a manner designated as Undefined Behavior when computing pointer alignment.
487
488### 0.8.23
489
490* Remove year from copyright notices. (No features or bug fixes.)
491
492### 0.8.22
493
494* Formatting fix and new unit test. (No features or bug fixes.)
495
496### 0.8.21
497
498* Fixed a panic with invalid UTF-16[BE|LE] input at the end of the stream.
499
500### 0.8.20
501
502* Make `Decoder::latin1_byte_compatible_up_to` return `None` in more
503  cases to make the method actually useful. While this could be argued
504  to be a breaking change due to the bug fix changing semantics, it does
505  not break callers that had to handle the `None` case in a reasonable
506  way anyway.
507
508### 0.8.19
509
510* Removed a bunch of bound checks in `convert_str_to_utf16`.
511* Added `mem::convert_utf8_to_utf16_without_replacement`.
512
513### 0.8.18
514
515* Added `mem::utf8_latin1_up_to` and `mem::str_latin1_up_to`.
516* Added `Decoder::latin1_byte_compatible_up_to`.
517
518### 0.8.17
519
520* Update `bincode` (dev dependency) version requirement to 1.0.
521
522### 0.8.16
523
524* Switch from the `simd` crate to `packed_simd`.
525
526### 0.8.15
527
528* Adjust documentation for `simd-accel` (README-only release).
529
530### 0.8.14
531
532* Made UTF-16 to UTF-8 encode conversion fill the output buffer as
533  closely as possible.
534
535### 0.8.13
536
537* Made the UTF-8 to UTF-16 decoder compare the number of code units written
538  with the length of the right slice (the output slice) to fix a panic
539  introduced in 0.8.11.
540
541### 0.8.12
542
543* Removed the `clippy::` prefix from clippy lint names.
544
545### 0.8.11
546
547* Changed minimum Rust requirement to 1.29.0 (for the ability to refer
548  to the interior of a `static` when defining another `static`).
549* Explicitly aligned the lookup tables for single-byte encodings and
550  UTF-8 to cache lines in the hope of freeing up one cache line for
551  other data. (Perhaps the tables were already aligned and this is
552  placebo.)
553* Added 32 bits of encode-oriented data for each single-byte encoding.
554  The change was performance-neutral for non-Latin1-ish Latin legacy
555  encodings, improved Latin1-ish and Arabic legacy encode speed
556  somewhat (new speed is 2.4x the old speed for German, 2.3x for
557  Arabic, 1.7x for Portuguese and 1.4x for French) and improved
558  non-Latin1, non-Arabic legacy single-byte encode a lot (7.2x for
559  Thai, 6x for Greek, 5x for Russian, 4x for Hebrew).
560* Added compile-time options for fast CJK legacy encode options (at
561  the cost of binary size (up to 176 KB) and run-time memory usage).
562  These options still retain the overall code structure instead of
563  rewriting the CJK encoders totally, so the speed isn't as good as
564  what could be achieved by using even more memory / making the
565  binary even langer.
566* Made UTF-8 decode and validation faster.
567* Added method `is_single_byte()` on `Encoding`.
568* Added `mem::decode_latin1()` and `mem::encode_latin1_lossy()`.
569
570### 0.8.10
571
572* Disabled a unit test that tests a panic condition when the assertion
573  being tested is disabled.
574
575### 0.8.9
576
577* Made `--features simd-accel` work with stable-channel compiler to
578  simplify the Firefox build system.
579
580### 0.8.8
581
582* Made the `is_foo_bidi()` not treat U+FEFF (ZERO WIDTH NO-BREAK SPACE
583  aka. BYTE ORDER MARK) as right-to-left.
584* Made the `is_foo_bidi()` functions report `true` if the input contains
585  Hebrew presentations forms (which are right-to-left but not in a
586  right-to-left-roadmapped block).
587
588### 0.8.7
589
590* Fixed a panic in the UTF-16LE/UTF-16BE decoder when decoding to UTF-8.
591
592### 0.8.6
593
594* Temporarily removed the debug assertion added in version 0.8.5 from
595  `convert_utf16_to_latin1_lossy`.
596
597### 0.8.5
598
599* If debug assertions are enabled but fuzzing isn't enabled, lossy conversions
600  to Latin1 in the `mem` module assert that the input is in the range
601  U+0000...U+00FF (inclusive).
602* In the `mem` module provide conversions from Latin1 and UTF-16 to UTF-8
603  that can deal with insufficient output space. The idea is to use them
604  first with an allocation rounded up to jemalloc bucket size and do the
605  worst-case allocation only if the jemalloc rounding up was insufficient
606  as the first guess.
607
608### 0.8.4
609
610* Fix SSE2-specific, `simd-accel`-specific memory corruption introduced in
611  version 0.8.1 in conversions between UTF-16 and Latin1 in the `mem` module.
612
613### 0.8.3
614
615* Removed an `#[inline(never)]` annotation that was not meant for release.
616
617### 0.8.2
618
619* Made non-ASCII UTF-16 to UTF-8 encode faster by manually omitting bound
620  checks and manually adding branch prediction annotations.
621
622### 0.8.1
623
624* Tweaked loop unrolling and memory alignment for SSE2 conversions between
625  UTF-16 and Latin1 in the `mem` module to increase the performance when
626  converting long buffers.
627
628### 0.8.0
629
630* Changed the minimum supported version of Rust to 1.21.0 (semver breaking
631  change).
632* Flipped around the defaults vs. optional features for controlling the size
633  vs. speed trade-off for Kanji and Hanzi legacy encode (semver breaking
634  change).
635* Added NEON support on ARMv7.
636* SIMD-accelerated x-user-defined to UTF-16 decode.
637* Made UTF-16LE and UTF-16BE decode a lot faster (including SIMD
638  acceleration).
639
640### 0.7.2
641
642* Add the `mem` module.
643* Refactor SIMD code which can affect performance outside the `mem`
644  module.
645
646### 0.7.1
647
648* When encoding from invalid UTF-16, correctly handle U+DC00 followed by
649  another low surrogate.
650
651### 0.7.0
652
653* [Make `replacement` a label of the replacement
654  encoding.](https://github.com/whatwg/encoding/issues/70) (Spec change.)
655* Remove `Encoding::for_name()`. (`Encoding::for_label(foo).unwrap()` is
656  now close enough after the above label change.)
657* Remove the `parallel-utf8` cargo feature.
658* Add optional Serde support for `&'static Encoding`.
659* Performance tweaks for ASCII handling.
660* Performance tweaks for UTF-8 validation.
661* SIMD support on aarch64.
662
663### 0.6.11
664
665* Make `Encoder::has_pending_state()` public.
666* Update the `simd` crate dependency to 0.2.0.
667
668### 0.6.10
669
670* Reserve enough space for NCRs when encoding to ISO-2022-JP.
671* Correct max length calculations for multibyte decoders.
672* Correct max length calculations before BOM sniffing has been
673  performed.
674* Correctly calculate max length when encoding from UTF-16 to GBK.
675
676### 0.6.9
677
678* [Don't prepend anything when gb18030 range decode
679  fails](https://github.com/whatwg/encoding/issues/110). (Spec change.)
680
681### 0.6.8
682
683* Correcly handle the case where the first buffer contains potentially
684  partial BOM and the next buffer is the last buffer.
685* Decode byte `7F` correctly in ISO-2022-JP.
686* Make UTF-16 to UTF-8 encode write closer to the end of the buffer.
687* Implement `Hash` for `Encoding`.
688
689### 0.6.7
690
691* [Map half-width katakana to full-width katana in ISO-2022-JP
692  encoder](https://github.com/whatwg/encoding/issues/105). (Spec change.)
693* Give `InputEmpty` correct precedence over `OutputFull` when encoding
694  with replacement and the output buffer passed in is too short or the
695  remaining space in the output buffer is too small after a replacement.
696
697### 0.6.6
698
699* Correct max length calculation when a partial BOM prefix is part of
700  the decoder's state.
701
702### 0.6.5
703
704* Correct max length calculation in various encoders.
705* Correct max length calculation in the UTF-16 decoder.
706* Derive `PartialEq` and `Eq` for the `CoderResult`, `DecoderResult`
707  and `EncoderResult` types.
708
709### 0.6.4
710
711* Avoid panic when encoding with replacement and the destination buffer is
712  too short to hold one numeric character reference.
713
714### 0.6.3
715
716* Add support for 32-bit big-endian hosts. (For real this time.)
717
718### 0.6.2
719
720* Fix a panic from subslicing with bad indices in
721  `Encoder::encode_from_utf16`. (Due to an oversight, it lacked the fix that
722  `Encoder::encode_from_utf8` already had.)
723* Micro-optimize error status accumulation in non-streaming case.
724
725### 0.6.1
726
727* Avoid panic near integer overflow in a case that's unlikely to actually
728  happen.
729* Address Clippy lints.
730
731### 0.6.0
732
733* Make the methods for computing worst-case buffer size requirements check
734  for integer overflow.
735* Upgrade rayon to 0.7.0.
736
737### 0.5.1
738
739* Reorder methods for better documentation readability.
740* Add support for big-endian hosts. (Only 64-bit case actually tested.)
741* Optimize the ALU (non-SIMD) case for 32-bit ARM instead of x86_64.
742
743### 0.5.0
744
745* Avoid allocating an excessively long buffers in non-streaming decode.
746* Fix the behavior of ISO-2022-JP and replacement decoders near the end of the
747  output buffer.
748* Annotate the result structs with `#[must_use]`.
749
750### 0.4.0
751
752* Split FFI into a separate crate.
753* Performance tweaks.
754* CJK binary size and encoding performance changes.
755* Parallelize UTF-8 validation in the case of long buffers (with optional
756  feature `parallel-utf8`).
757* Borrow even with ISO-2022-JP when possible.
758
759### 0.3.2
760
761* Fix moving pointers to alignment in ALU-based ASCII acceleration.
762* Fix errors in documentation and improve documentation.
763
764### 0.3.1
765
766* Fix UTF-8 to UTF-16 decode for byte sequences beginning with 0xEE.
767* Make UTF-8 to UTF-8 decode SSE2-accelerated when feature `simd-accel` is used.
768* When decoding and encoding ASCII-only input from or to an ASCII-compatible
769  encoding using the non-streaming API, return a borrow of the input.
770* Make encode from UTF-16 to UTF-8 faster.
771
772### 0.3
773
774* Change the references to the instances of `Encoding` from `const` to `static`
775  to make the referents unique across crates that use the refernces.
776* Introduce non-reference-typed `FOO_INIT` instances of `Encoding` to allow
777  foreign crates to initialize `static` arrays with references to `Encoding`
778  instances even under Rust's constraints that prohibit the initialization of
779  `&'static Encoding`-typed array items with `&'static Encoding`-typed
780  `statics`.
781* Document that the above two points will be reverted if Rust changes `const`
782  to work so that cross-crate usage keeps the referents unique.
783* Return `Cow`s from Rust-only non-streaming methods for encode and decode.
784* `Encoding::for_bom()` returns the length of the BOM.
785* ASCII-accelerated conversions for encodings other than UTF-16LE, UTF-16BE,
786  ISO-2022-JP and x-user-defined.
787* Add SSE2 acceleration behind the `simd-accel` feature flag. (Requires
788  nightly Rust.)
789* Fix panic with long bogus labels.
790* Map [0xCA to U+05BA in windows-1255](https://github.com/whatwg/encoding/issues/73).
791  (Spec change.)
792* Correct the [end of the Shift_JIS EUDC range](https://github.com/whatwg/encoding/issues/53).
793  (Spec change.)
794
795### 0.2.4
796
797* Polish FFI documentation.
798
799### 0.2.3
800
801* Fix UTF-16 to UTF-8 encode.
802
803### 0.2.2
804
805* Add `Encoder.encode_from_utf8_to_vec_without_replacement()`.
806
807### 0.2.1
808
809* Add `Encoding.is_ascii_compatible()`.
810
811* Add `Encoding::for_bom()`.
812
813* Make `==` for `Encoding` use name comparison instead of pointer comparison,
814  because uses of the encoding constants in different crates result in
815  different addresses and the constant cannot be turned into statics without
816  breaking other things.
817
818### 0.2.0
819
820The initial release.
821