1# encoding_rs
2
3[![Build Status](https://travis-ci.org/hsivonen/encoding_rs.svg?branch=master)](https://travis-ci.org/hsivonen/encoding_rs)
4[![crates.io](https://meritbadge.herokuapp.com/encoding_rs)](https://crates.io/crates/encoding_rs)
5[![docs.rs](https://docs.rs/encoding_rs/badge.svg)](https://docs.rs/encoding_rs/)
6[![Apache 2 / MIT dual-licensed](https://img.shields.io/badge/license-Apache%202%20%2F%20MIT-blue.svg)](https://github.com/hsivonen/encoding_rs/blob/master/COPYRIGHT)
7
8encoding_rs an implementation of the (non-JavaScript parts of) the
9[Encoding Standard](https://encoding.spec.whatwg.org/) written in Rust and
10used in Gecko (starting with Firefox 56).
11
12Additionally, the `mem` module provides various operations for dealing with
13in-RAM text (as opposed to data that's coming from or going to an IO boundary).
14The `mem` module is a module instead of a separate crate due to internal
15implementation detail efficiencies.
16
17## Functionality
18
19Due to the Gecko use case, encoding_rs supports decoding to and encoding from
20UTF-16 in addition to supporting the usual Rust use case of decoding to and
21encoding from UTF-8. Additionally, the API has been designed to be FFI-friendly
22to accommodate the C++ side of Gecko.
23
24Specifically, encoding_rs does the following:
25
26* Decodes a stream of bytes in an Encoding Standard-defined character encoding
27  into valid aligned native-endian in-RAM UTF-16 (units of `u16` / `char16_t`).
28* Encodes a stream of potentially-invalid aligned native-endian in-RAM UTF-16
29  (units of `u16` / `char16_t`) into a sequence of bytes in an Encoding
30  Standard-defined character encoding as if the lone surrogates had been
31  replaced with the REPLACEMENT CHARACTER before performing the encode.
32  (Gecko's UTF-16 is potentially invalid.)
33* Decodes a stream of bytes in an Encoding Standard-defined character
34  encoding into valid UTF-8.
35* Encodes a stream of valid UTF-8 into a sequence of bytes in an Encoding
36  Standard-defined character encoding. (Rust's UTF-8 is guaranteed-valid.)
37* Does the above in streaming (input and output split across multiple
38  buffers) and non-streaming (whole input in a single buffer and whole
39  output in a single buffer) variants.
40* Avoids copying (borrows) when possible in the non-streaming cases when
41  decoding to or encoding from UTF-8.
42* Resolves textual labels that identify character encodings in
43  protocol text into type-safe objects representing the those encodings
44  conceptually.
45* Maps the type-safe encoding objects onto strings suitable for
46  returning from `document.characterSet`.
47* Validates UTF-8 (in common instruction set scenarios a bit faster for Web
48  workloads than the standard library; hopefully will get upstreamed some
49  day) and ASCII.
50
51Additionally, `encoding_rs::mem` does the following:
52
53* Checks if a byte buffer contains only ASCII.
54* Checks if a potentially-invalid UTF-16 buffer contains only Basic Latin (ASCII).
55* Checks if a valid UTF-8, potentially-invalid UTF-8 or potentially-invalid UTF-16
56  buffer contains only Latin1 code points (below U+0100).
57* Checks if a valid UTF-8, potentially-invalid UTF-8 or potentially-invalid UTF-16
58  buffer or a code point or a UTF-16 code unit can trigger right-to-left behavior
59  (suitable for checking if the Unicode Bidirectional Algorithm can be optimized
60  out).
61* Combined versions of the above two checks.
62* Converts valid UTF-8, potentially-invalid UTF-8 and Latin1 to UTF-16.
63* Converts potentially-invalid UTF-16 and Latin1 to UTF-8.
64* Converts UTF-8 and UTF-16 to Latin1 (if in range).
65* Finds the first invalid code unit in a buffer of potentially-invalid UTF-16.
66* Makes a mutable buffer of potential-invalid UTF-16 contain valid UTF-16.
67* Copies ASCII from one buffer to another up to the first non-ASCII byte.
68* Converts ASCII to UTF-16 up to the first non-ASCII byte.
69* Converts UTF-16 to ASCII up to the first non-Basic Latin code unit.
70
71## Integration with `std::io`
72
73Notably, the above feature list doesn't include the capability to wrap
74a `std::io::Read`, decode it into UTF-8 and presenting the result via
75`std::io::Read`. The [`encoding_rs_io`](https://crates.io/crates/encoding_rs_io)
76crate provides that capability.
77
78## `no_std` Environment
79
80The crate works in a `no_std` environment assuming that `alloc` is present.
81The `alloc`-using part are on the outer edge of the crate, so if there is
82interest in using the crate in environments without `alloc` it would be
83feasible to add a way to turn off those parts of the API of this crate that
84use `Vec`/`String`/`Cow`.
85
86## Decoding Email
87
88For decoding character encodings that occur in email, use the
89[`charset`](https://crates.io/crates/charset) crate instead of using this
90one directly. (It wraps this crate and adds UTF-7 decoding.)
91
92## Windows Code Page Identifier Mappings
93
94For mappings to and from Windows code page identifiers, use the
95[`codepage`](https://crates.io/crates/codepage) crate.
96
97## DOS Encodings
98
99This crate does not support single-byte DOS encodings that aren't required by
100the Web Platform, but the [`oem_cp`](https://crates.io/crates/oem_cp) crate does.
101
102## Preparing Text for the Encoders
103
104Normalizing text into Unicode Normalization Form C prior to encoding text into
105a legacy encoding minimizes unmappable characters. Text can be normalized to
106Unicode Normalization Form C using the
107[`unic-normal`](https://crates.io/crates/unic-normal) crate.
108
109The exception is windows-1258, which after normalizing to Unicode Normalization
110Form C requires tone marks to be decomposed in order to minimize unmappable
111characters. Vietnamese tone marks can be decomposed using the
112[`detone`](https://crates.io/crates/detone) crate.
113
114## Licensing
115
116Please see the file named
117[COPYRIGHT](https://github.com/hsivonen/encoding_rs/blob/master/COPYRIGHT).
118
119## Documentation
120
121Generated [API documentation](https://docs.rs/encoding_rs/) is available
122online.
123
124There is a [long-form write-up](https://hsivonen.fi/encoding_rs/) about the
125design and internals of the crate.
126
127## C and C++ bindings
128
129An FFI layer for encoding_rs is available as a
130[separate crate](https://github.com/hsivonen/encoding_c). The crate comes
131with a [demo C++ wrapper](https://github.com/hsivonen/encoding_c/blob/master/include/encoding_rs_cpp.h)
132using the C++ standard library and [GSL](https://github.com/Microsoft/GSL/) types.
133
134The bindings for the `mem` module are in the
135[encoding_c_mem crate](https://github.com/hsivonen/encoding_c_mem).
136
137For the Gecko context, there's a
138[C++ wrapper using the MFBT/XPCOM types](https://searchfox.org/mozilla-central/source/intl/Encoding.h#100).
139
140There's a [write-up](https://hsivonen.fi/modern-cpp-in-rust/) about the C++
141wrappers.
142
143## Sample programs
144
145* [Rust](https://github.com/hsivonen/recode_rs)
146* [C](https://github.com/hsivonen/recode_c)
147* [C++](https://github.com/hsivonen/recode_cpp)
148
149## Optional features
150
151There are currently these optional cargo features:
152
153### `simd-accel`
154
155Enables SIMD acceleration using the nightly-dependent `packed_simd_2` crate.
156
157This is an opt-in feature, because enabling this feature _opts out_ of Rust's
158guarantees of future compilers compiling old code (aka. "stability story").
159
160Currently, this has not been tested to be an improvement except for these
161targets:
162
163* x86_64
164* i686
165* aarch64
166* thumbv7neon
167
168If you use nightly Rust, you use targets whose first component is one of the
169above, and you are prepared _to have to revise your configuration when updating
170Rust_, you should enable this feature. Otherwise, please _do not_ enable this
171feature.
172
173_Note!_ If you are compiling for a target that does not have 128-bit SIMD
174enabled as part of the target definition and you are enabling 128-bit SIMD
175using `-C target_feature`, you need to enable the `core_arch` Cargo feature
176for `packed_simd_2` to compile a crates.io snapshot of `core_arch` instead of
177using the standard-library copy of `core::arch`, because the `core::arch`
178module of the pre-compiled standard library has been compiled with the
179assumption that the CPU doesn't have 128-bit SIMD. At present this applies
180mainly to 32-bit ARM targets whose first component does not include the
181substring `neon`.
182
183The encoding_rs side of things has not been properly set up for POWER,
184PowerPC, MIPS, etc., SIMD at this time, so even if you were to follow
185the advice from the previous paragraph, you probably shouldn't use
186the `simd-accel` option on the less mainstream architectures at this
187time.
188
189Used by Firefox.
190
191### `serde`
192
193Enables support for serializing and deserializing `&'static Encoding`-typed
194struct fields using [Serde][1].
195
196[1]: https://serde.rs/
197
198Not used by Firefox.
199
200### `fast-legacy-encode`
201
202A catch-all option for enabling the fastest legacy encode options. _Does not
203affect decode speed or UTF-8 encode speed._
204
205At present, this option is equivalent to enabling the following options:
206 * `fast-hangul-encode`
207 * `fast-hanja-encode`
208 * `fast-kanji-encode`
209 * `fast-gb-hanzi-encode`
210 * `fast-big5-hanzi-encode`
211
212Adds 176 KB to the binary size.
213
214Not used by Firefox.
215
216### `fast-hangul-encode`
217
218Changes encoding precomposed Hangul syllables into EUC-KR from binary
219search over the decode-optimized tables to lookup by index making Korean
220plain-text encode about 4 times as fast as without this option.
221
222Adds 20 KB to the binary size.
223
224Does _not_ affect decode speed.
225
226Not used by Firefox.
227
228### `fast-hanja-encode`
229
230Changes encoding of Hanja into EUC-KR from linear search over the
231decode-optimized table to lookup by index. Since Hanja is practically absent
232in modern Korean text, this option doesn't affect perfomance in the common
233case and mainly makes sense if you want to make your application resilient
234agaist denial of service by someone intentionally feeding it a lot of Hanja
235to encode into EUC-KR.
236
237Adds 40 KB to the binary size.
238
239Does _not_ affect decode speed.
240
241Not used by Firefox.
242
243### `fast-kanji-encode`
244
245Changes encoding of Kanji into Shift_JIS, EUC-JP and ISO-2022-JP from linear
246search over the decode-optimized tables to lookup by index making Japanese
247plain-text encode to legacy encodings 30 to 50 times as fast as without this
248option (about 2 times as fast as with `less-slow-kanji-encode`).
249
250Takes precedence over `less-slow-kanji-encode`.
251
252Adds 36 KB to the binary size (24 KB compared to `less-slow-kanji-encode`).
253
254Does _not_ affect decode speed.
255
256Not used by Firefox.
257
258### `less-slow-kanji-encode`
259
260Makes JIS X 0208 Level 1 Kanji (the most common Kanji in Shift_JIS, EUC-JP and
261ISO-2022-JP) encode less slow (binary search instead of linear search) making
262Japanese plain-text encode to legacy encodings 14 to 23 times as fast as
263without this option.
264
265Adds 12 KB to the binary size.
266
267Does _not_ affect decode speed.
268
269Not used by Firefox.
270
271### `fast-gb-hanzi-encode`
272
273Changes encoding of Hanzi in the CJK Unified Ideographs block into GBK and
274gb18030 from linear search over a part the decode-optimized tables followed
275by a binary search over another part of the decode-optimized tables to lookup
276by index making Simplified Chinese plain-text encode to the legacy encodings
277100 to 110 times as fast as without this option (about 2.5 times as fast as
278with `less-slow-gb-hanzi-encode`).
279
280Takes precedence over `less-slow-gb-hanzi-encode`.
281
282Adds 36 KB to the binary size (24 KB compared to `less-slow-gb-hanzi-encode`).
283
284Does _not_ affect decode speed.
285
286Not used by Firefox.
287
288### `less-slow-gb-hanzi-encode`
289
290Makes GB2312 Level 1 Hanzi (the most common Hanzi in gb18030 and GBK) encode
291less slow (binary search instead of linear search) making Simplified Chinese
292plain-text encode to the legacy encodings about 40 times as fast as without
293this option.
294
295Adds 12 KB to the binary size.
296
297Does _not_ affect decode speed.
298
299Not used by Firefox.
300
301### `fast-big5-hanzi-encode`
302
303Changes encoding of Hanzi in the CJK Unified Ideographs block into Big5 from
304linear search over a part the decode-optimized tables to lookup by index
305making Traditional Chinese plain-text encode to Big5 105 to 125 times as fast
306as without this option (about 3 times as fast as with
307`less-slow-big5-hanzi-encode`).
308
309Takes precedence over `less-slow-big5-hanzi-encode`.
310
311Adds 40 KB to the binary size (20 KB compared to `less-slow-big5-hanzi-encode`).
312
313Does _not_ affect decode speed.
314
315Not used by Firefox.
316
317### `less-slow-big5-hanzi-encode`
318
319Makes Big5 Level 1 Hanzi (the most common Hanzi in Big5) encode less slow
320(binary search instead of linear search) making Traditional Chinese
321plain-text encode to Big5 about 36 times as fast as without this option.
322
323Adds 20 KB to the binary size.
324
325Does _not_ affect decode speed.
326
327Not used by Firefox.
328
329## Performance goals
330
331For decoding to UTF-16, the goal is to perform at least as well as Gecko's old
332uconv. For decoding to UTF-8, the goal is to perform at least as well as
333rust-encoding. These goals have been achieved.
334
335Encoding to UTF-8 should be fast. (UTF-8 to UTF-8 encode should be equivalent
336to `memcpy` and UTF-16 to UTF-8 should be fast.)
337
338Speed is a non-goal when encoding to legacy encodings. By default, encoding to
339legacy encodings should not be optimized for speed at the expense of code size
340as long as form submission and URL parsing in Gecko don't become noticeably
341too slow in real-world use.
342
343In the interest of binary size, by default, encoding_rs does not have
344encode-specific data tables beyond 32 bits of encode-specific data for each
345single-byte encoding. Therefore, encoders search the decode-optimized data
346tables. This is a linear search in most cases. As a result, by default, encode
347to legacy encodings varies from slow to extremely slow relative to other
348libraries. Still, with realistic work loads, this seemed fast enough not to be
349user-visibly slow on Raspberry Pi 3 (which stood in for a phone for testing)
350in the Web-exposed encoder use cases.
351
352See the cargo features above for optionally making CJK legacy encode fast.
353
354A framework for measuring performance is [available separately][2].
355
356[2]: https://github.com/hsivonen/encoding_bench/
357
358## Rust Version Compatibility
359
360It is a goal to support the latest stable Rust, the latest nightly Rust and
361the version of Rust that's used for Firefox Nightly.
362
363At this time, there is no firm commitment to support a version older than
364what's required by Firefox, and there is no commitment to treat MSRV changes
365as semver-breaking, because this crate depends on `cfg-if`, which doesn't
366appear to treat MSRV changes as semver-breaking, so it would be useless for
367this crate to treat MSRV changes as semver-breaking.
368
369As of 2021-02-04, MSRV appears to be Rust 1.36.0 for using the crate and
3701.42.0 for doc tests to pass without errors about the global allocator.
371
372## Compatibility with rust-encoding
373
374A compatibility layer that implements the rust-encoding API on top of
375encoding_rs is
376[provided as a separate crate](https://github.com/hsivonen/encoding_rs_compat)
377(cannot be uploaded to crates.io). The compatibility layer was originally
378written with the assuption that Firefox would need it, but it is not currently
379used in Firefox.
380
381## Regenerating Generated Code
382
383To regenerate the generated code:
384
385 * Have Python 2 installed.
386 * Clone [`https://github.com/hsivonen/encoding_c`](https://github.com/hsivonen/encoding_c)
387   next to the `encoding_rs` directory.
388 * Clone [`https://github.com/hsivonen/codepage`](https://github.com/hsivonen/codepage)
389   next to the `encoding_rs` directory.
390 * Clone [`https://github.com/whatwg/encoding`](https://github.com/whatwg/encoding)
391   next to the `encoding_rs` directory.
392 * Checkout revision `f381389` of the `encoding` repo.
393 * With the `encoding_rs` directory as the working directory, run
394   `python generate-encoding-data.py`.
395
396## Roadmap
397
398- [x] Design the low-level API.
399- [x] Provide Rust-only convenience features.
400- [x] Provide an stl/gsl-flavored C++ API.
401- [x] Implement all decoders and encoders.
402- [x] Add unit tests for all decoders and encoders.
403- [x] Finish BOM sniffing variants in Rust-only convenience features.
404- [x] Document the API.
405- [x] Publish the crate on crates.io.
406- [x] Create a solution for measuring performance.
407- [x] Accelerate ASCII conversions using SSE2 on x86.
408- [x] Accelerate ASCII conversions using ALU register-sized operations on
409      non-x86 architectures (process an `usize` instead of `u8` at a time).
410- [x] Split FFI into a separate crate so that the FFI doesn't interfere with
411      LTO in pure-Rust usage.
412- [x] Compress CJK indices by making use of sequential code points as well
413      as Unicode-ordered parts of indices.
414- [x] Make lookups by label or name use binary search that searches from the
415      end of the label/name to the start.
416- [x] Make labels with non-ASCII bytes fail fast.
417- [ ] ~Parallelize UTF-8 validation using [Rayon](https://github.com/nikomatsakis/rayon).~
418      (This turned out to be a pessimization in the ASCII case due to memory bandwidth reasons.)
419- [x] Provide an XPCOM/MFBT-flavored C++ API.
420- [x] Investigate accelerating single-byte encode with a single fast-tracked
421      range per encoding.
422- [x] Replace uconv with encoding_rs in Gecko.
423- [x] Implement the rust-encoding API in terms of encoding_rs.
424- [x] Add SIMD acceleration for Aarch64.
425- [x] Investigate the use of NEON on 32-bit ARM.
426- [ ] ~Investigate Björn Höhrmann's lookup table acceleration for UTF-8 as
427      adapted to Rust in rust-encoding.~
428- [x] Add actually fast CJK encode options.
429- [ ] ~Investigate [Bob Steagall's lookup table acceleration for UTF-8](https://github.com/BobSteagall/CppNow2018/blob/master/FastConversionFromUTF-8/Fast%20Conversion%20From%20UTF-8%20with%20C%2B%2B%2C%20DFAs%2C%20and%20SSE%20Intrinsics%20-%20Bob%20Steagall%20-%20C%2B%2BNow%202018.pdf).~
430- [ ] Provide a build mode that works without `alloc` (with lesser API surface).
431- [ ] Migrate to `std::simd` once it is stable and declare 1.0.
432
433## Release Notes
434
435### 0.8.28
436
437* Fix error in Serde support introduced as part of `no_std` support.
438
439### 0.8.27
440
441* Make the crate works in a `no_std` environment (with `alloc`).
442
443### 0.8.26
444
445* Fix oversights in edition 2018 migration that broke the `simd-accel` feature.
446
447### 0.8.25
448
449* Do pointer alignment checks in a way where intermediate steps aren't defined to be Undefined Behavior.
450* Update the `packed_simd` dependency to `packed_simd_2`.
451* Update the `cfg-if` dependency to 1.0.
452* Address warnings that have been introduced by newer Rust versions along the way.
453* Update to edition 2018, since even prior to 1.0 `cfg-if` updated to edition 2018 without a semver break.
454
455### 0.8.24
456
457* Avoid computing an intermediate (not dereferenced) pointer value in a manner designated as Undefined Behavior when computing pointer alignment.
458
459### 0.8.23
460
461* Remove year from copyright notices. (No features or bug fixes.)
462
463### 0.8.22
464
465* Formatting fix and new unit test. (No features or bug fixes.)
466
467### 0.8.21
468
469* Fixed a panic with invalid UTF-16[BE|LE] input at the end of the stream.
470
471### 0.8.20
472
473* Make `Decoder::latin1_byte_compatible_up_to` return `None` in more
474  cases to make the method actually useful. While this could be argued
475  to be a breaking change due to the bug fix changing semantics, it does
476  not break callers that had to handle the `None` case in a reasonable
477  way anyway.
478
479### 0.8.19
480
481* Removed a bunch of bound checks in `convert_str_to_utf16`.
482* Added `mem::convert_utf8_to_utf16_without_replacement`.
483
484### 0.8.18
485
486* Added `mem::utf8_latin1_up_to` and `mem::str_latin1_up_to`.
487* Added `Decoder::latin1_byte_compatible_up_to`.
488
489### 0.8.17
490
491* Update `bincode` (dev dependency) version requirement to 1.0.
492
493### 0.8.16
494
495* Switch from the `simd` crate to `packed_simd`.
496
497### 0.8.15
498
499* Adjust documentation for `simd-accel` (README-only release).
500
501### 0.8.14
502
503* Made UTF-16 to UTF-8 encode conversion fill the output buffer as
504  closely as possible.
505
506### 0.8.13
507
508* Made the UTF-8 to UTF-16 decoder compare the number of code units written
509  with the length of the right slice (the output slice) to fix a panic
510  introduced in 0.8.11.
511
512### 0.8.12
513
514* Removed the `clippy::` prefix from clippy lint names.
515
516### 0.8.11
517
518* Changed minimum Rust requirement to 1.29.0 (for the ability to refer
519  to the interior of a `static` when defining another `static`).
520* Explicitly aligned the lookup tables for single-byte encodings and
521  UTF-8 to cache lines in the hope of freeing up one cache line for
522  other data. (Perhaps the tables were already aligned and this is
523  placebo.)
524* Added 32 bits of encode-oriented data for each single-byte encoding.
525  The change was performance-neutral for non-Latin1-ish Latin legacy
526  encodings, improved Latin1-ish and Arabic legacy encode speed
527  somewhat (new speed is 2.4x the old speed for German, 2.3x for
528  Arabic, 1.7x for Portuguese and 1.4x for French) and improved
529  non-Latin1, non-Arabic legacy single-byte encode a lot (7.2x for
530  Thai, 6x for Greek, 5x for Russian, 4x for Hebrew).
531* Added compile-time options for fast CJK legacy encode options (at
532  the cost of binary size (up to 176 KB) and run-time memory usage).
533  These options still retain the overall code structure instead of
534  rewriting the CJK encoders totally, so the speed isn't as good as
535  what could be achieved by using even more memory / making the
536  binary even langer.
537* Made UTF-8 decode and validation faster.
538* Added method `is_single_byte()` on `Encoding`.
539* Added `mem::decode_latin1()` and `mem::encode_latin1_lossy()`.
540
541### 0.8.10
542
543* Disabled a unit test that tests a panic condition when the assertion
544  being tested is disabled.
545
546### 0.8.9
547
548* Made `--features simd-accel` work with stable-channel compiler to
549  simplify the Firefox build system.
550
551### 0.8.8
552
553* Made the `is_foo_bidi()` not treat U+FEFF (ZERO WIDTH NO-BREAK SPACE
554  aka. BYTE ORDER MARK) as right-to-left.
555* Made the `is_foo_bidi()` functions report `true` if the input contains
556  Hebrew presentations forms (which are right-to-left but not in a
557  right-to-left-roadmapped block).
558
559### 0.8.7
560
561* Fixed a panic in the UTF-16LE/UTF-16BE decoder when decoding to UTF-8.
562
563### 0.8.6
564
565* Temporarily removed the debug assertion added in version 0.8.5 from
566  `convert_utf16_to_latin1_lossy`.
567
568### 0.8.5
569
570* If debug assertions are enabled but fuzzing isn't enabled, lossy conversions
571  to Latin1 in the `mem` module assert that the input is in the range
572  U+0000...U+00FF (inclusive).
573* In the `mem` module provide conversions from Latin1 and UTF-16 to UTF-8
574  that can deal with insufficient output space. The idea is to use them
575  first with an allocation rounded up to jemalloc bucket size and do the
576  worst-case allocation only if the jemalloc rounding up was insufficient
577  as the first guess.
578
579### 0.8.4
580
581* Fix SSE2-specific, `simd-accel`-specific memory corruption introduced in
582  version 0.8.1 in conversions between UTF-16 and Latin1 in the `mem` module.
583
584### 0.8.3
585
586* Removed an `#[inline(never)]` annotation that was not meant for release.
587
588### 0.8.2
589
590* Made non-ASCII UTF-16 to UTF-8 encode faster by manually omitting bound
591  checks and manually adding branch prediction annotations.
592
593### 0.8.1
594
595* Tweaked loop unrolling and memory alignment for SSE2 conversions between
596  UTF-16 and Latin1 in the `mem` module to increase the performance when
597  converting long buffers.
598
599### 0.8.0
600
601* Changed the minimum supported version of Rust to 1.21.0 (semver breaking
602  change).
603* Flipped around the defaults vs. optional features for controlling the size
604  vs. speed trade-off for Kanji and Hanzi legacy encode (semver breaking
605  change).
606* Added NEON support on ARMv7.
607* SIMD-accelerated x-user-defined to UTF-16 decode.
608* Made UTF-16LE and UTF-16BE decode a lot faster (including SIMD
609  acceleration).
610
611### 0.7.2
612
613* Add the `mem` module.
614* Refactor SIMD code which can affect performance outside the `mem`
615  module.
616
617### 0.7.1
618
619* When encoding from invalid UTF-16, correctly handle U+DC00 followed by
620  another low surrogate.
621
622### 0.7.0
623
624* [Make `replacement` a label of the replacement
625  encoding.](https://github.com/whatwg/encoding/issues/70) (Spec change.)
626* Remove `Encoding::for_name()`. (`Encoding::for_label(foo).unwrap()` is
627  now close enough after the above label change.)
628* Remove the `parallel-utf8` cargo feature.
629* Add optional Serde support for `&'static Encoding`.
630* Performance tweaks for ASCII handling.
631* Performance tweaks for UTF-8 validation.
632* SIMD support on aarch64.
633
634### 0.6.11
635
636* Make `Encoder::has_pending_state()` public.
637* Update the `simd` crate dependency to 0.2.0.
638
639### 0.6.10
640
641* Reserve enough space for NCRs when encoding to ISO-2022-JP.
642* Correct max length calculations for multibyte decoders.
643* Correct max length calculations before BOM sniffing has been
644  performed.
645* Correctly calculate max length when encoding from UTF-16 to GBK.
646
647### 0.6.9
648
649* [Don't prepend anything when gb18030 range decode
650  fails](https://github.com/whatwg/encoding/issues/110). (Spec change.)
651
652### 0.6.8
653
654* Correcly handle the case where the first buffer contains potentially
655  partial BOM and the next buffer is the last buffer.
656* Decode byte `7F` correctly in ISO-2022-JP.
657* Make UTF-16 to UTF-8 encode write closer to the end of the buffer.
658* Implement `Hash` for `Encoding`.
659
660### 0.6.7
661
662* [Map half-width katakana to full-width katana in ISO-2022-JP
663  encoder](https://github.com/whatwg/encoding/issues/105). (Spec change.)
664* Give `InputEmpty` correct precedence over `OutputFull` when encoding
665  with replacement and the output buffer passed in is too short or the
666  remaining space in the output buffer is too small after a replacement.
667
668### 0.6.6
669
670* Correct max length calculation when a partial BOM prefix is part of
671  the decoder's state.
672
673### 0.6.5
674
675* Correct max length calculation in various encoders.
676* Correct max length calculation in the UTF-16 decoder.
677* Derive `PartialEq` and `Eq` for the `CoderResult`, `DecoderResult`
678  and `EncoderResult` types.
679
680### 0.6.4
681
682* Avoid panic when encoding with replacement and the destination buffer is
683  too short to hold one numeric character reference.
684
685### 0.6.3
686
687* Add support for 32-bit big-endian hosts. (For real this time.)
688
689### 0.6.2
690
691* Fix a panic from subslicing with bad indices in
692  `Encoder::encode_from_utf16`. (Due to an oversight, it lacked the fix that
693  `Encoder::encode_from_utf8` already had.)
694* Micro-optimize error status accumulation in non-streaming case.
695
696### 0.6.1
697
698* Avoid panic near integer overflow in a case that's unlikely to actually
699  happen.
700* Address Clippy lints.
701
702### 0.6.0
703
704* Make the methods for computing worst-case buffer size requirements check
705  for integer overflow.
706* Upgrade rayon to 0.7.0.
707
708### 0.5.1
709
710* Reorder methods for better documentation readability.
711* Add support for big-endian hosts. (Only 64-bit case actually tested.)
712* Optimize the ALU (non-SIMD) case for 32-bit ARM instead of x86_64.
713
714### 0.5.0
715
716* Avoid allocating an excessively long buffers in non-streaming decode.
717* Fix the behavior of ISO-2022-JP and replacement decoders near the end of the
718  output buffer.
719* Annotate the result structs with `#[must_use]`.
720
721### 0.4.0
722
723* Split FFI into a separate crate.
724* Performance tweaks.
725* CJK binary size and encoding performance changes.
726* Parallelize UTF-8 validation in the case of long buffers (with optional
727  feature `parallel-utf8`).
728* Borrow even with ISO-2022-JP when possible.
729
730### 0.3.2
731
732* Fix moving pointers to alignment in ALU-based ASCII acceleration.
733* Fix errors in documentation and improve documentation.
734
735### 0.3.1
736
737* Fix UTF-8 to UTF-16 decode for byte sequences beginning with 0xEE.
738* Make UTF-8 to UTF-8 decode SSE2-accelerated when feature `simd-accel` is used.
739* When decoding and encoding ASCII-only input from or to an ASCII-compatible
740  encoding using the non-streaming API, return a borrow of the input.
741* Make encode from UTF-16 to UTF-8 faster.
742
743### 0.3
744
745* Change the references to the instances of `Encoding` from `const` to `static`
746  to make the referents unique across crates that use the refernces.
747* Introduce non-reference-typed `FOO_INIT` instances of `Encoding` to allow
748  foreign crates to initialize `static` arrays with references to `Encoding`
749  instances even under Rust's constraints that prohibit the initialization of
750  `&'static Encoding`-typed array items with `&'static Encoding`-typed
751  `statics`.
752* Document that the above two points will be reverted if Rust changes `const`
753  to work so that cross-crate usage keeps the referents unique.
754* Return `Cow`s from Rust-only non-streaming methods for encode and decode.
755* `Encoding::for_bom()` returns the length of the BOM.
756* ASCII-accelerated conversions for encodings other than UTF-16LE, UTF-16BE,
757  ISO-2022-JP and x-user-defined.
758* Add SSE2 acceleration behind the `simd-accel` feature flag. (Requires
759  nightly Rust.)
760* Fix panic with long bogus labels.
761* Map [0xCA to U+05BA in windows-1255](https://github.com/whatwg/encoding/issues/73).
762  (Spec change.)
763* Correct the [end of the Shift_JIS EUDC range](https://github.com/whatwg/encoding/issues/53).
764  (Spec change.)
765
766### 0.2.4
767
768* Polish FFI documentation.
769
770### 0.2.3
771
772* Fix UTF-16 to UTF-8 encode.
773
774### 0.2.2
775
776* Add `Encoder.encode_from_utf8_to_vec_without_replacement()`.
777
778### 0.2.1
779
780* Add `Encoding.is_ascii_compatible()`.
781
782* Add `Encoding::for_bom()`.
783
784* Make `==` for `Encoding` use name comparison instead of pointer comparison,
785  because uses of the encoding constants in different crates result in
786  different addresses and the constant cannot be turned into statics without
787  breaking other things.
788
789### 0.2.0
790
791The initial release.
792