• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..31-Mar-2022-

doc/H03-May-2022-

src/H31-Mar-2022-136,947131,194

.cargo-checksum.jsonH A D03-May-202289 11

CONTRIBUTING.mdH A D31-Mar-20221.8 KiB4833

COPYRIGHTH A D31-Mar-2022520 1310

Cargo.tomlH A D31-Mar-20221.8 KiB5952

Ideas.mdH A D31-Mar-20224.9 KiB10777

LICENSE-APACHEH A D31-Mar-202211.1 KiB203169

LICENSE-MITH A D31-Mar-20221 KiB2622

README.mdH A D31-Mar-202227.2 KiB749536

build.rsH A D31-Mar-2022588 134

generate-encoding-data.pyH A D31-Mar-202260 KiB1,9921,560

rustfmt.tomlH A D31-Mar-202231 21

README.md

1# encoding_rs
2
3[![Build Status](https://travis-ci.org/hsivonen/encoding_rs.svg?branch=master)](https://travis-ci.org/hsivonen/encoding_rs)
4[![crates.io](https://meritbadge.herokuapp.com/encoding_rs)](https://crates.io/crates/encoding_rs)
5[![docs.rs](https://docs.rs/encoding_rs/badge.svg)](https://docs.rs/encoding_rs/)
6[![Apache 2 / MIT dual-licensed](https://img.shields.io/badge/license-Apache%202%20%2F%20MIT-blue.svg)](https://github.com/hsivonen/encoding_rs/blob/master/COPYRIGHT)
7
8encoding_rs an implementation of the (non-JavaScript parts of) the
9[Encoding Standard](https://encoding.spec.whatwg.org/) written in Rust and
10used in Gecko (starting with Firefox 56).
11
12Additionally, the `mem` module provides various operations for dealing with
13in-RAM text (as opposed to data that's coming from or going to an IO boundary).
14The `mem` module is a module instead of a separate crate due to internal
15implementation detail efficiencies.
16
17## Functionality
18
19Due to the Gecko use case, encoding_rs supports decoding to and encoding from
20UTF-16 in addition to supporting the usual Rust use case of decoding to and
21encoding from UTF-8. Additionally, the API has been designed to be FFI-friendly
22to accommodate the C++ side of Gecko.
23
24Specifically, encoding_rs does the following:
25
26* Decodes a stream of bytes in an Encoding Standard-defined character encoding
27  into valid aligned native-endian in-RAM UTF-16 (units of `u16` / `char16_t`).
28* Encodes a stream of potentially-invalid aligned native-endian in-RAM UTF-16
29  (units of `u16` / `char16_t`) into a sequence of bytes in an Encoding
30  Standard-defined character encoding as if the lone surrogates had been
31  replaced with the REPLACEMENT CHARACTER before performing the encode.
32  (Gecko's UTF-16 is potentially invalid.)
33* Decodes a stream of bytes in an Encoding Standard-defined character
34  encoding into valid UTF-8.
35* Encodes a stream of valid UTF-8 into a sequence of bytes in an Encoding
36  Standard-defined character encoding. (Rust's UTF-8 is guaranteed-valid.)
37* Does the above in streaming (input and output split across multiple
38  buffers) and non-streaming (whole input in a single buffer and whole
39  output in a single buffer) variants.
40* Avoids copying (borrows) when possible in the non-streaming cases when
41  decoding to or encoding from UTF-8.
42* Resolves textual labels that identify character encodings in
43  protocol text into type-safe objects representing the those encodings
44  conceptually.
45* Maps the type-safe encoding objects onto strings suitable for
46  returning from `document.characterSet`.
47* Validates UTF-8 (in common instruction set scenarios a bit faster for Web
48  workloads than the standard library; hopefully will get upstreamed some
49  day) and ASCII.
50
51Additionally, `encoding_rs::mem` does the following:
52
53* Checks if a byte buffer contains only ASCII.
54* Checks if a potentially-invalid UTF-16 buffer contains only Basic Latin (ASCII).
55* Checks if a valid UTF-8, potentially-invalid UTF-8 or potentially-invalid UTF-16
56  buffer contains only Latin1 code points (below U+0100).
57* Checks if a valid UTF-8, potentially-invalid UTF-8 or potentially-invalid UTF-16
58  buffer or a code point or a UTF-16 code unit can trigger right-to-left behavior
59  (suitable for checking if the Unicode Bidirectional Algorithm can be optimized
60  out).
61* Combined versions of the above two checks.
62* Converts valid UTF-8, potentially-invalid UTF-8 and Latin1 to UTF-16.
63* Converts potentially-invalid UTF-16 and Latin1 to UTF-8.
64* Converts UTF-8 and UTF-16 to Latin1 (if in range).
65* Finds the first invalid code unit in a buffer of potentially-invalid UTF-16.
66* Makes a mutable buffer of potential-invalid UTF-16 contain valid UTF-16.
67* Copies ASCII from one buffer to another up to the first non-ASCII byte.
68* Converts ASCII to UTF-16 up to the first non-ASCII byte.
69* Converts UTF-16 to ASCII up to the first non-Basic Latin code unit.
70
71## Integration with `std::io`
72
73Notably, the above feature list doesn't include the capability to wrap
74a `std::io::Read`, decode it into UTF-8 and presenting the result via
75`std::io::Read`. The [`encoding_rs_io`](https://crates.io/crates/encoding_rs_io)
76crate provides that capability.
77
78## Decoding Email
79
80For decoding character encodings that occur in email, use the
81[`charset`](https://crates.io/crates/charset) crate instead of using this
82one directly. (It wraps this crate and adds UTF-7 decoding.)
83
84## Windows Code Page Identifier Mappings
85
86For mappings to and from Windows code page identifiers, use the
87[`codepage`](https://crates.io/crates/codepage) crate.
88
89## Preparing Text for the Encoders
90
91Normalizing text into Unicode Normalization Form C prior to encoding text into
92a legacy encoding minimizes unmappable characters. Text can be normalized to
93Unicode Normalization Form C using the
94[`unic-normal`](https://crates.io/crates/unic-normal) crate.
95
96The exception is windows-1258, which after normalizing to Unicode Normalization
97Form C requires tone marks to be decomposed in order to minimize unmappable
98characters. Vietnamese tone marks can be decomposed using the
99[`detone`](https://crates.io/crates/detone) crate.
100
101## Licensing
102
103Please see the file named
104[COPYRIGHT](https://github.com/hsivonen/encoding_rs/blob/master/COPYRIGHT).
105
106## Documentation
107
108Generated [API documentation](https://docs.rs/encoding_rs/) is available
109online.
110
111There is a [long-form write-up](https://hsivonen.fi/encoding_rs/) about the
112design and internals of the crate.
113
114## C and C++ bindings
115
116An FFI layer for encoding_rs is available as a
117[separate crate](https://github.com/hsivonen/encoding_c). The crate comes
118with a [demo C++ wrapper](https://github.com/hsivonen/encoding_c/blob/master/include/encoding_rs_cpp.h)
119using the C++ standard library and [GSL](https://github.com/Microsoft/GSL/) types.
120
121The bindings for the `mem` module are in the
122[encoding_c_mem crate](https://github.com/hsivonen/encoding_c_mem).
123
124For the Gecko context, there's a
125[C++ wrapper using the MFBT/XPCOM types](https://searchfox.org/mozilla-central/source/intl/Encoding.h#100).
126
127There's a [write-up](https://hsivonen.fi/modern-cpp-in-rust/) about the C++
128wrappers.
129
130## Sample programs
131
132* [Rust](https://github.com/hsivonen/recode_rs)
133* [C](https://github.com/hsivonen/recode_c)
134* [C++](https://github.com/hsivonen/recode_cpp)
135
136## Optional features
137
138There are currently these optional cargo features:
139
140### `simd-accel`
141
142Enables SIMD acceleration using the nightly-dependent `packed_simd` crate.
143
144This is an opt-in feature, because enabling this feature _opts out_ of Rust's
145guarantees of future compilers compiling old code (aka. "stability story").
146
147Currently, this has not been tested to be an improvement except for these
148targets:
149
150* x86_64
151* i686
152* aarch64
153* thumbv7neon
154
155If you use nightly Rust, you use targets whose first component is one of the
156above, and you are prepared _to have to revise your configuration when updating
157Rust_, you should enable this feature. Otherwise, please _do not_ enable this
158feature.
159
160_Note!_ If you are compiling for a target that does not have 128-bit SIMD
161enabled as part of the target definition and you are enabling 128-bit SIMD
162using `-C target_feature`, you need to enable the `core_arch` Cargo feature
163for `packed_simd` to compile a crates.io snapshot of `core_arch` instead of
164using the standard-library copy of `core::arch`, because the `core::arch`
165module of the pre-compiled standard library has been compiled with the
166assumption that the CPU doesn't have 128-bit SIMD. At present this applies
167mainly to 32-bit ARM targets whose first component does not include the
168substring `neon`.
169
170The encoding_rs side of things has not been properly set up for POWER,
171PowerPC, MIPS, etc., SIMD at this time, so even if you were to follow
172the advice from the previous paragraph, you probably shouldn't use
173the `simd-accel` option on the less mainstream architectures at this
174time.
175
176Used by Firefox.
177
178### `serde`
179
180Enables support for serializing and deserializing `&'static Encoding`-typed
181struct fields using [Serde][1].
182
183[1]: https://serde.rs/
184
185Not used by Firefox.
186
187### `fast-legacy-encode`
188
189A catch-all option for enabling the fastest legacy encode options. _Does not
190affect decode speed or UTF-8 encode speed._
191
192At present, this option is equivalent to enabling the following options:
193 * `fast-hangul-encode`
194 * `fast-hanja-encode`
195 * `fast-kanji-encode`
196 * `fast-gb-hanzi-encode`
197 * `fast-big5-hanzi-encode`
198
199Adds 176 KB to the binary size.
200
201Not used by Firefox.
202
203### `fast-hangul-encode`
204
205Changes encoding precomposed Hangul syllables into EUC-KR from binary
206search over the decode-optimized tables to lookup by index making Korean
207plain-text encode about 4 times as fast as without this option.
208
209Adds 20 KB to the binary size.
210
211Does _not_ affect decode speed.
212
213Not used by Firefox.
214
215### `fast-hanja-encode`
216
217Changes encoding of Hanja into EUC-KR from linear search over the
218decode-optimized table to lookup by index. Since Hanja is practically absent
219in modern Korean text, this option doesn't affect perfomance in the common
220case and mainly makes sense if you want to make your application resilient
221agaist denial of service by someone intentionally feeding it a lot of Hanja
222to encode into EUC-KR.
223
224Adds 40 KB to the binary size.
225
226Does _not_ affect decode speed.
227
228Not used by Firefox.
229
230### `fast-kanji-encode`
231
232Changes encoding of Kanji into Shift_JIS, EUC-JP and ISO-2022-JP from linear
233search over the decode-optimized tables to lookup by index making Japanese
234plain-text encode to legacy encodings 30 to 50 times as fast as without this
235option (about 2 times as fast as with `less-slow-kanji-encode`).
236
237Takes precedence over `less-slow-kanji-encode`.
238
239Adds 36 KB to the binary size (24 KB compared to `less-slow-kanji-encode`).
240
241Does _not_ affect decode speed.
242
243Not used by Firefox.
244
245### `less-slow-kanji-encode`
246
247Makes JIS X 0208 Level 1 Kanji (the most common Kanji in Shift_JIS, EUC-JP and
248ISO-2022-JP) encode less slow (binary search instead of linear search) making
249Japanese plain-text encode to legacy encodings 14 to 23 times as fast as
250without this option.
251
252Adds 12 KB to the binary size.
253
254Does _not_ affect decode speed.
255
256Not used by Firefox.
257
258### `fast-gb-hanzi-encode`
259
260Changes encoding of Hanzi in the CJK Unified Ideographs block into GBK and
261gb18030 from linear search over a part the decode-optimized tables followed
262by a binary search over another part of the decode-optimized tables to lookup
263by index making Simplified Chinese plain-text encode to the legacy encodings
264100 to 110 times as fast as without this option (about 2.5 times as fast as
265with `less-slow-gb-hanzi-encode`).
266
267Takes precedence over `less-slow-gb-hanzi-encode`.
268
269Adds 36 KB to the binary size (24 KB compared to `less-slow-gb-hanzi-encode`).
270
271Does _not_ affect decode speed.
272
273Not used by Firefox.
274
275### `less-slow-gb-hanzi-encode`
276
277Makes GB2312 Level 1 Hanzi (the most common Hanzi in gb18030 and GBK) encode
278less slow (binary search instead of linear search) making Simplified Chinese
279plain-text encode to the legacy encodings about 40 times as fast as without
280this option.
281
282Adds 12 KB to the binary size.
283
284Does _not_ affect decode speed.
285
286Not used by Firefox.
287
288### `fast-big5-hanzi-encode`
289
290Changes encoding of Hanzi in the CJK Unified Ideographs block into Big5 from
291linear search over a part the decode-optimized tables to lookup by index
292making Traditional Chinese plain-text encode to Big5 105 to 125 times as fast
293as without this option (about 3 times as fast as with
294`less-slow-big5-hanzi-encode`).
295
296Takes precedence over `less-slow-big5-hanzi-encode`.
297
298Adds 40 KB to the binary size (20 KB compared to `less-slow-big5-hanzi-encode`).
299
300Does _not_ affect decode speed.
301
302Not used by Firefox.
303
304### `less-slow-big5-hanzi-encode`
305
306Makes Big5 Level 1 Hanzi (the most common Hanzi in Big5) encode less slow
307(binary search instead of linear search) making Traditional Chinese
308plain-text encode to Big5 about 36 times as fast as without this option.
309
310Adds 20 KB to the binary size.
311
312Does _not_ affect decode speed.
313
314Not used by Firefox.
315
316## Performance goals
317
318For decoding to UTF-16, the goal is to perform at least as well as Gecko's old
319uconv. For decoding to UTF-8, the goal is to perform at least as well as
320rust-encoding. These goals have been achieved.
321
322Encoding to UTF-8 should be fast. (UTF-8 to UTF-8 encode should be equivalent
323to `memcpy` and UTF-16 to UTF-8 should be fast.)
324
325Speed is a non-goal when encoding to legacy encodings. By default, encoding to
326legacy encodings should not be optimized for speed at the expense of code size
327as long as form submission and URL parsing in Gecko don't become noticeably
328too slow in real-world use.
329
330In the interest of binary size, by default, encoding_rs does not have
331encode-specific data tables beyond 32 bits of encode-specific data for each
332single-byte encoding. Therefore, encoders search the decode-optimized data
333tables. This is a linear search in most cases. As a result, by default, encode
334to legacy encodings varies from slow to extremely slow relative to other
335libraries. Still, with realistic work loads, this seemed fast enough not to be
336user-visibly slow on Raspberry Pi 3 (which stood in for a phone for testing)
337in the Web-exposed encoder use cases.
338
339See the cargo features above for optionally making CJK legacy encode fast.
340
341A framework for measuring performance is [available separately][2].
342
343[2]: https://github.com/hsivonen/encoding_bench/
344
345## Rust Version Compatibility
346
347It is a goal to support the latest stable Rust, the latest nightly Rust and
348the version of Rust that's used for Firefox Nightly (currently 1.29.0).
349These are tested on Travis.
350
351Additionally, beta and the oldest known to work Rust version (currently
3521.29.0) are tested on Travis. The oldest Rust known to work is tested as
353a canary so that when the oldest known to work no longer works, the change
354can be documented here. At this time, there is no firm commitment to support
355a version older than what's required by Firefox. The oldest supported Rust
356is expected to move forward rapidly when `packed_simd` can replace the `simd`
357crate without performance regression.
358
359## Compatibility with rust-encoding
360
361A compatibility layer that implements the rust-encoding API on top of
362encoding_rs is
363[provided as a separate crate](https://github.com/hsivonen/encoding_rs_compat)
364(cannot be uploaded to crates.io). The compatibility layer was originally
365written with the assuption that Firefox would need it, but it is not currently
366used in Firefox.
367
368## Regenerating Generated Code
369
370To regenerate the generated code:
371
372 * Have Python 2 installed.
373 * Clone [`https://github.com/hsivonen/encoding_c`](https://github.com/hsivonen/encoding_c)
374   next to the `encoding_rs` directory.
375 * Clone [`https://github.com/hsivonen/codepage`](https://github.com/hsivonen/codepage)
376   next to the `encoding_rs` directory.
377 * Clone [`https://github.com/whatwg/encoding`](https://github.com/whatwg/encoding)
378   next to the `encoding_rs` directory.
379 * Checkout revision `f381389` of the `encoding` repo.
380 * With the `encoding_rs` directory as the working directory, run
381   `python generate-encoding-data.py`.
382
383## Roadmap
384
385- [x] Design the low-level API.
386- [x] Provide Rust-only convenience features.
387- [x] Provide an stl/gsl-flavored C++ API.
388- [x] Implement all decoders and encoders.
389- [x] Add unit tests for all decoders and encoders.
390- [x] Finish BOM sniffing variants in Rust-only convenience features.
391- [x] Document the API.
392- [x] Publish the crate on crates.io.
393- [x] Create a solution for measuring performance.
394- [x] Accelerate ASCII conversions using SSE2 on x86.
395- [x] Accelerate ASCII conversions using ALU register-sized operations on
396      non-x86 architectures (process an `usize` instead of `u8` at a time).
397- [x] Split FFI into a separate crate so that the FFI doesn't interfere with
398      LTO in pure-Rust usage.
399- [x] Compress CJK indices by making use of sequential code points as well
400      as Unicode-ordered parts of indices.
401- [x] Make lookups by label or name use binary search that searches from the
402      end of the label/name to the start.
403- [x] Make labels with non-ASCII bytes fail fast.
404- [ ] ~Parallelize UTF-8 validation using [Rayon](https://github.com/nikomatsakis/rayon).~
405      (This turned out to be a pessimization in the ASCII case due to memory bandwidth reasons.)
406- [x] Provide an XPCOM/MFBT-flavored C++ API.
407- [x] Investigate accelerating single-byte encode with a single fast-tracked
408      range per encoding.
409- [x] Replace uconv with encoding_rs in Gecko.
410- [x] Implement the rust-encoding API in terms of encoding_rs.
411- [x] Add SIMD acceleration for Aarch64.
412- [x] Investigate the use of NEON on 32-bit ARM.
413- [ ] ~Investigate Björn Höhrmann's lookup table acceleration for UTF-8 as
414      adapted to Rust in rust-encoding.~
415- [x] Add actually fast CJK encode options.
416- [ ] ~Investigate [Bob Steagall's lookup table acceleration for UTF-8](https://github.com/BobSteagall/CppNow2018/blob/master/FastConversionFromUTF-8/Fast%20Conversion%20From%20UTF-8%20with%20C%2B%2B%2C%20DFAs%2C%20and%20SSE%20Intrinsics%20-%20Bob%20Steagall%20-%20C%2B%2BNow%202018.pdf).~
417
418## Release Notes
419
420## 0.8.22
421
422* Formatting fix and new unit test. (No features or bug fixes.)
423
424## 0.8.21
425
426* Fixed a panic with invalid UTF-16[BE|LE] input at the end of the stream.
427
428### 0.8.20
429
430* Make `Decoder::latin1_byte_compatible_up_to` return `None` in more
431  cases to make the method actually useful. While this could be argued
432  to be a breaking change due to the bug fix changing semantics, it does
433  not break callers that had to handle the `None` case in a reasonable
434  way anyway.
435
436### 0.8.19
437
438* Removed a bunch of bound checks in `convert_str_to_utf16`.
439* Added `mem::convert_utf8_to_utf16_without_replacement`.
440
441### 0.8.18
442
443* Added `mem::utf8_latin1_up_to` and `mem::str_latin1_up_to`.
444* Added `Decoder::latin1_byte_compatible_up_to`.
445
446### 0.8.17
447
448* Update `bincode` (dev dependency) version requirement to 1.0.
449
450### 0.8.16
451
452* Switch from the `simd` crate to `packed_simd`.
453
454### 0.8.15
455
456* Adjust documentation for `simd-accel` (README-only release).
457
458### 0.8.14
459
460* Made UTF-16 to UTF-8 encode conversion fill the output buffer as
461  closely as possible.
462
463### 0.8.13
464
465* Made the UTF-8 to UTF-16 decoder compare the number of code units written
466  with the length of the right slice (the output slice) to fix a panic
467  introduced in 0.8.11.
468
469### 0.8.12
470
471* Removed the `clippy::` prefix from clippy lint names.
472
473### 0.8.11
474
475* Changed minimum Rust requirement to 1.29.0 (for the ability to refer
476  to the interior of a `static` when defining another `static`).
477* Explicitly aligned the lookup tables for single-byte encodings and
478  UTF-8 to cache lines in the hope of freeing up one cache line for
479  other data. (Perhaps the tables were already aligned and this is
480  placebo.)
481* Added 32 bits of encode-oriented data for each single-byte encoding.
482  The change was performance-neutral for non-Latin1-ish Latin legacy
483  encodings, improved Latin1-ish and Arabic legacy encode speed
484  somewhat (new speed is 2.4x the old speed for German, 2.3x for
485  Arabic, 1.7x for Portuguese and 1.4x for French) and improved
486  non-Latin1, non-Arabic legacy single-byte encode a lot (7.2x for
487  Thai, 6x for Greek, 5x for Russian, 4x for Hebrew).
488* Added compile-time options for fast CJK legacy encode options (at
489  the cost of binary size (up to 176 KB) and run-time memory usage).
490  These options still retain the overall code structure instead of
491  rewriting the CJK encoders totally, so the speed isn't as good as
492  what could be achieved by using even more memory / making the
493  binary even langer.
494* Made UTF-8 decode and validation faster.
495* Added method `is_single_byte()` on `Encoding`.
496* Added `mem::decode_latin1()` and `mem::encode_latin1_lossy()`.
497
498### 0.8.10
499
500* Disabled a unit test that tests a panic condition when the assertion
501  being tested is disabled.
502
503### 0.8.9
504
505* Made `--features simd-accel` work with stable-channel compiler to
506  simplify the Firefox build system.
507
508### 0.8.8
509
510* Made the `is_foo_bidi()` not treat U+FEFF (ZERO WIDTH NO-BREAK SPACE
511  aka. BYTE ORDER MARK) as right-to-left.
512* Made the `is_foo_bidi()` functions report `true` if the input contains
513  Hebrew presentations forms (which are right-to-left but not in a
514  right-to-left-roadmapped block).
515
516### 0.8.7
517
518* Fixed a panic in the UTF-16LE/UTF-16BE decoder when decoding to UTF-8.
519
520### 0.8.6
521
522* Temporarily removed the debug assertion added in version 0.8.5 from
523  `convert_utf16_to_latin1_lossy`.
524
525### 0.8.5
526
527* If debug assertions are enabled but fuzzing isn't enabled, lossy conversions
528  to Latin1 in the `mem` module assert that the input is in the range
529  U+0000...U+00FF (inclusive).
530* In the `mem` module provide conversions from Latin1 and UTF-16 to UTF-8
531  that can deal with insufficient output space. The idea is to use them
532  first with an allocation rounded up to jemalloc bucket size and do the
533  worst-case allocation only if the jemalloc rounding up was insufficient
534  as the first guess.
535
536### 0.8.4
537
538* Fix SSE2-specific, `simd-accel`-specific memory corruption introduced in
539  version 0.8.1 in conversions between UTF-16 and Latin1 in the `mem` module.
540
541### 0.8.3
542
543* Removed an `#[inline(never)]` annotation that was not meant for release.
544
545### 0.8.2
546
547* Made non-ASCII UTF-16 to UTF-8 encode faster by manually omitting bound
548  checks and manually adding branch prediction annotations.
549
550### 0.8.1
551
552* Tweaked loop unrolling and memory alignment for SSE2 conversions between
553  UTF-16 and Latin1 in the `mem` module to increase the performance when
554  converting long buffers.
555
556### 0.8.0
557
558* Changed the minimum supported version of Rust to 1.21.0 (semver breaking
559  change).
560* Flipped around the defaults vs. optional features for controlling the size
561  vs. speed trade-off for Kanji and Hanzi legacy encode (semver breaking
562  change).
563* Added NEON support on ARMv7.
564* SIMD-accelerated x-user-defined to UTF-16 decode.
565* Made UTF-16LE and UTF-16BE decode a lot faster (including SIMD
566  acceleration).
567
568### 0.7.2
569
570* Add the `mem` module.
571* Refactor SIMD code which can affect performance outside the `mem`
572  module.
573
574### 0.7.1
575
576* When encoding from invalid UTF-16, correctly handle U+DC00 followed by
577  another low surrogate.
578
579### 0.7.0
580
581* [Make `replacement` a label of the replacement
582  encoding.](https://github.com/whatwg/encoding/issues/70) (Spec change.)
583* Remove `Encoding::for_name()`. (`Encoding::for_label(foo).unwrap()` is
584  now close enough after the above label change.)
585* Remove the `parallel-utf8` cargo feature.
586* Add optional Serde support for `&'static Encoding`.
587* Performance tweaks for ASCII handling.
588* Performance tweaks for UTF-8 validation.
589* SIMD support on aarch64.
590
591### 0.6.11
592
593* Make `Encoder::has_pending_state()` public.
594* Update the `simd` crate dependency to 0.2.0.
595
596### 0.6.10
597
598* Reserve enough space for NCRs when encoding to ISO-2022-JP.
599* Correct max length calculations for multibyte decoders.
600* Correct max length calculations before BOM sniffing has been
601  performed.
602* Correctly calculate max length when encoding from UTF-16 to GBK.
603
604### 0.6.9
605
606* [Don't prepend anything when gb18030 range decode
607  fails](https://github.com/whatwg/encoding/issues/110). (Spec change.)
608
609### 0.6.8
610
611* Correcly handle the case where the first buffer contains potentially
612  partial BOM and the next buffer is the last buffer.
613* Decode byte `7F` correctly in ISO-2022-JP.
614* Make UTF-16 to UTF-8 encode write closer to the end of the buffer.
615* Implement `Hash` for `Encoding`.
616
617### 0.6.7
618
619* [Map half-width katakana to full-width katana in ISO-2022-JP
620  encoder](https://github.com/whatwg/encoding/issues/105). (Spec change.)
621* Give `InputEmpty` correct precedence over `OutputFull` when encoding
622  with replacement and the output buffer passed in is too short or the
623  remaining space in the output buffer is too small after a replacement.
624
625### 0.6.6
626
627* Correct max length calculation when a partial BOM prefix is part of
628  the decoder's state.
629
630### 0.6.5
631
632* Correct max length calculation in various encoders.
633* Correct max length calculation in the UTF-16 decoder.
634* Derive `PartialEq` and `Eq` for the `CoderResult`, `DecoderResult`
635  and `EncoderResult` types.
636
637### 0.6.4
638
639* Avoid panic when encoding with replacement and the destination buffer is
640  too short to hold one numeric character reference.
641
642### 0.6.3
643
644* Add support for 32-bit big-endian hosts. (For real this time.)
645
646### 0.6.2
647
648* Fix a panic from subslicing with bad indices in
649  `Encoder::encode_from_utf16`. (Due to an oversight, it lacked the fix that
650  `Encoder::encode_from_utf8` already had.)
651* Micro-optimize error status accumulation in non-streaming case.
652
653### 0.6.1
654
655* Avoid panic near integer overflow in a case that's unlikely to actually
656  happen.
657* Address Clippy lints.
658
659### 0.6.0
660
661* Make the methods for computing worst-case buffer size requirements check
662  for integer overflow.
663* Upgrade rayon to 0.7.0.
664
665### 0.5.1
666
667* Reorder methods for better documentation readability.
668* Add support for big-endian hosts. (Only 64-bit case actually tested.)
669* Optimize the ALU (non-SIMD) case for 32-bit ARM instead of x86_64.
670
671### 0.5.0
672
673* Avoid allocating an excessively long buffers in non-streaming decode.
674* Fix the behavior of ISO-2022-JP and replacement decoders near the end of the
675  output buffer.
676* Annotate the result structs with `#[must_use]`.
677
678### 0.4.0
679
680* Split FFI into a separate crate.
681* Performance tweaks.
682* CJK binary size and encoding performance changes.
683* Parallelize UTF-8 validation in the case of long buffers (with optional
684  feature `parallel-utf8`).
685* Borrow even with ISO-2022-JP when possible.
686
687### 0.3.2
688
689* Fix moving pointers to alignment in ALU-based ASCII acceleration.
690* Fix errors in documentation and improve documentation.
691
692### 0.3.1
693
694* Fix UTF-8 to UTF-16 decode for byte sequences beginning with 0xEE.
695* Make UTF-8 to UTF-8 decode SSE2-accelerated when feature `simd-accel` is used.
696* When decoding and encoding ASCII-only input from or to an ASCII-compatible
697  encoding using the non-streaming API, return a borrow of the input.
698* Make encode from UTF-16 to UTF-8 faster.
699
700### 0.3
701
702* Change the references to the instances of `Encoding` from `const` to `static`
703  to make the referents unique across crates that use the refernces.
704* Introduce non-reference-typed `FOO_INIT` instances of `Encoding` to allow
705  foreign crates to initialize `static` arrays with references to `Encoding`
706  instances even under Rust's constraints that prohibit the initialization of
707  `&'static Encoding`-typed array items with `&'static Encoding`-typed
708  `statics`.
709* Document that the above two points will be reverted if Rust changes `const`
710  to work so that cross-crate usage keeps the referents unique.
711* Return `Cow`s from Rust-only non-streaming methods for encode and decode.
712* `Encoding::for_bom()` returns the length of the BOM.
713* ASCII-accelerated conversions for encodings other than UTF-16LE, UTF-16BE,
714  ISO-2022-JP and x-user-defined.
715* Add SSE2 acceleration behind the `simd-accel` feature flag. (Requires
716  nightly Rust.)
717* Fix panic with long bogus labels.
718* Map [0xCA to U+05BA in windows-1255](https://github.com/whatwg/encoding/issues/73).
719  (Spec change.)
720* Correct the [end of the Shift_JIS EUDC range](https://github.com/whatwg/encoding/issues/53).
721  (Spec change.)
722
723### 0.2.4
724
725* Polish FFI documentation.
726
727### 0.2.3
728
729* Fix UTF-16 to UTF-8 encode.
730
731### 0.2.2
732
733* Add `Encoder.encode_from_utf8_to_vec_without_replacement()`.
734
735### 0.2.1
736
737* Add `Encoding.is_ascii_compatible()`.
738
739* Add `Encoding::for_bom()`.
740
741* Make `==` for `Encoding` use name comparison instead of pointer comparison,
742  because uses of the encoding constants in different crates result in
743  different addresses and the constant cannot be turned into statics without
744  breaking other things.
745
746### 0.2.0
747
748The initial release.
749