• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

ci/H03-May-2022-157

doc/H03-May-2022-

src/H03-May-2022-137,121131,334

.cargo-checksum.jsonH A D03-May-202289 11

.cargo_vcs_info.jsonH A D01-Jan-197094 66

.gitignoreH A D29-Nov-1973100 1110

.travis.ymlH A D29-Nov-1973416 2220

CONTRIBUTING.mdH A D29-Nov-19731.9 KiB4934

COPYRIGHTH A D29-Nov-1973510 1310

Cargo.tomlH A D01-Jan-19701.8 KiB6255

Cargo.toml.orig-cargoH A D29-Nov-19731.4 KiB4842

Ideas.mdH A D29-Nov-19734.9 KiB10777

LICENSE-APACHEH A D29-Nov-197311.1 KiB203169

LICENSE-MITH A D29-Nov-19731 KiB2622

README.mdH A D29-Nov-197328.7 KiB795563

build.rsH A D29-Nov-1973588 134

generate-encoding-data.pyH A D29-Nov-197360.4 KiB2,0091,576

rustfmt.tomlH A D29-Nov-197331 21

README.md

1# encoding_rs
2
3[![Build Status](https://travis-ci.org/hsivonen/encoding_rs.svg?branch=master)](https://travis-ci.org/hsivonen/encoding_rs)
4[![crates.io](https://img.shields.io/crates/v/encoding_rs.svg)](https://crates.io/crates/encoding_rs)
5[![docs.rs](https://docs.rs/encoding_rs/badge.svg)](https://docs.rs/encoding_rs/)
6[![Apache 2 / MIT dual-licensed](https://img.shields.io/badge/license-Apache%202%20%2F%20MIT-blue.svg)](https://github.com/hsivonen/encoding_rs/blob/master/COPYRIGHT)
7
8encoding_rs an implementation of the (non-JavaScript parts of) the
9[Encoding Standard](https://encoding.spec.whatwg.org/) written in Rust and
10used in Gecko (starting with Firefox 56).
11
12Additionally, the `mem` module provides various operations for dealing with
13in-RAM text (as opposed to data that's coming from or going to an IO boundary).
14The `mem` module is a module instead of a separate crate due to internal
15implementation detail efficiencies.
16
17## Functionality
18
19Due to the Gecko use case, encoding_rs supports decoding to and encoding from
20UTF-16 in addition to supporting the usual Rust use case of decoding to and
21encoding from UTF-8. Additionally, the API has been designed to be FFI-friendly
22to accommodate the C++ side of Gecko.
23
24Specifically, encoding_rs does the following:
25
26* Decodes a stream of bytes in an Encoding Standard-defined character encoding
27  into valid aligned native-endian in-RAM UTF-16 (units of `u16` / `char16_t`).
28* Encodes a stream of potentially-invalid aligned native-endian in-RAM UTF-16
29  (units of `u16` / `char16_t`) into a sequence of bytes in an Encoding
30  Standard-defined character encoding as if the lone surrogates had been
31  replaced with the REPLACEMENT CHARACTER before performing the encode.
32  (Gecko's UTF-16 is potentially invalid.)
33* Decodes a stream of bytes in an Encoding Standard-defined character
34  encoding into valid UTF-8.
35* Encodes a stream of valid UTF-8 into a sequence of bytes in an Encoding
36  Standard-defined character encoding. (Rust's UTF-8 is guaranteed-valid.)
37* Does the above in streaming (input and output split across multiple
38  buffers) and non-streaming (whole input in a single buffer and whole
39  output in a single buffer) variants.
40* Avoids copying (borrows) when possible in the non-streaming cases when
41  decoding to or encoding from UTF-8.
42* Resolves textual labels that identify character encodings in
43  protocol text into type-safe objects representing the those encodings
44  conceptually.
45* Maps the type-safe encoding objects onto strings suitable for
46  returning from `document.characterSet`.
47* Validates UTF-8 (in common instruction set scenarios a bit faster for Web
48  workloads than the standard library; hopefully will get upstreamed some
49  day) and ASCII.
50
51Additionally, `encoding_rs::mem` does the following:
52
53* Checks if a byte buffer contains only ASCII.
54* Checks if a potentially-invalid UTF-16 buffer contains only Basic Latin (ASCII).
55* Checks if a valid UTF-8, potentially-invalid UTF-8 or potentially-invalid UTF-16
56  buffer contains only Latin1 code points (below U+0100).
57* Checks if a valid UTF-8, potentially-invalid UTF-8 or potentially-invalid UTF-16
58  buffer or a code point or a UTF-16 code unit can trigger right-to-left behavior
59  (suitable for checking if the Unicode Bidirectional Algorithm can be optimized
60  out).
61* Combined versions of the above two checks.
62* Converts valid UTF-8, potentially-invalid UTF-8 and Latin1 to UTF-16.
63* Converts potentially-invalid UTF-16 and Latin1 to UTF-8.
64* Converts UTF-8 and UTF-16 to Latin1 (if in range).
65* Finds the first invalid code unit in a buffer of potentially-invalid UTF-16.
66* Makes a mutable buffer of potential-invalid UTF-16 contain valid UTF-16.
67* Copies ASCII from one buffer to another up to the first non-ASCII byte.
68* Converts ASCII to UTF-16 up to the first non-ASCII byte.
69* Converts UTF-16 to ASCII up to the first non-Basic Latin code unit.
70
71## Integration with `std::io`
72
73Notably, the above feature list doesn't include the capability to wrap
74a `std::io::Read`, decode it into UTF-8 and presenting the result via
75`std::io::Read`. The [`encoding_rs_io`](https://crates.io/crates/encoding_rs_io)
76crate provides that capability.
77
78## `no_std` Environment
79
80The crate works in a `no_std` environment. By default, the `alloc` feature,
81which assumes that an allocator is present is enabled. For a no-allocator
82environment, the default features (i.e. `alloc`) can be turned off. This
83makes the part of the API that returns `Vec`/`String`/`Cow` unavailable.
84
85## Decoding Email
86
87For decoding character encodings that occur in email, use the
88[`charset`](https://crates.io/crates/charset) crate instead of using this
89one directly. (It wraps this crate and adds UTF-7 decoding.)
90
91## Windows Code Page Identifier Mappings
92
93For mappings to and from Windows code page identifiers, use the
94[`codepage`](https://crates.io/crates/codepage) crate.
95
96## DOS Encodings
97
98This crate does not support single-byte DOS encodings that aren't required by
99the Web Platform, but the [`oem_cp`](https://crates.io/crates/oem_cp) crate does.
100
101## Preparing Text for the Encoders
102
103Normalizing text into Unicode Normalization Form C prior to encoding text into
104a legacy encoding minimizes unmappable characters. Text can be normalized to
105Unicode Normalization Form C using the
106[`unic-normal`](https://crates.io/crates/unic-normal) crate.
107
108The exception is windows-1258, which after normalizing to Unicode Normalization
109Form C requires tone marks to be decomposed in order to minimize unmappable
110characters. Vietnamese tone marks can be decomposed using the
111[`detone`](https://crates.io/crates/detone) crate.
112
113## Licensing
114
115Please see the file named
116[COPYRIGHT](https://github.com/hsivonen/encoding_rs/blob/master/COPYRIGHT).
117
118## Documentation
119
120Generated [API documentation](https://docs.rs/encoding_rs/) is available
121online.
122
123There is a [long-form write-up](https://hsivonen.fi/encoding_rs/) about the
124design and internals of the crate.
125
126## C and C++ bindings
127
128An FFI layer for encoding_rs is available as a
129[separate crate](https://github.com/hsivonen/encoding_c). The crate comes
130with a [demo C++ wrapper](https://github.com/hsivonen/encoding_c/blob/master/include/encoding_rs_cpp.h)
131using the C++ standard library and [GSL](https://github.com/Microsoft/GSL/) types.
132
133The bindings for the `mem` module are in the
134[encoding_c_mem crate](https://github.com/hsivonen/encoding_c_mem).
135
136For the Gecko context, there's a
137[C++ wrapper using the MFBT/XPCOM types](https://searchfox.org/mozilla-central/source/intl/Encoding.h#100).
138
139There's a [write-up](https://hsivonen.fi/modern-cpp-in-rust/) about the C++
140wrappers.
141
142## Sample programs
143
144* [Rust](https://github.com/hsivonen/recode_rs)
145* [C](https://github.com/hsivonen/recode_c)
146* [C++](https://github.com/hsivonen/recode_cpp)
147
148## Optional features
149
150There are currently these optional cargo features:
151
152### `simd-accel`
153
154Enables SIMD acceleration using the nightly-dependent `packed_simd_2` crate.
155
156This is an opt-in feature, because enabling this feature _opts out_ of Rust's
157guarantees of future compilers compiling old code (aka. "stability story").
158
159Currently, this has not been tested to be an improvement except for these
160targets:
161
162* x86_64
163* i686
164* aarch64
165* thumbv7neon
166
167If you use nightly Rust, you use targets whose first component is one of the
168above, and you are prepared _to have to revise your configuration when updating
169Rust_, you should enable this feature. Otherwise, please _do not_ enable this
170feature.
171
172_Note!_ If you are compiling for a target that does not have 128-bit SIMD
173enabled as part of the target definition and you are enabling 128-bit SIMD
174using `-C target_feature`, you need to enable the `core_arch` Cargo feature
175for `packed_simd_2` to compile a crates.io snapshot of `core_arch` instead of
176using the standard-library copy of `core::arch`, because the `core::arch`
177module of the pre-compiled standard library has been compiled with the
178assumption that the CPU doesn't have 128-bit SIMD. At present this applies
179mainly to 32-bit ARM targets whose first component does not include the
180substring `neon`.
181
182The encoding_rs side of things has not been properly set up for POWER,
183PowerPC, MIPS, etc., SIMD at this time, so even if you were to follow
184the advice from the previous paragraph, you probably shouldn't use
185the `simd-accel` option on the less mainstream architectures at this
186time.
187
188Used by Firefox.
189
190### `serde`
191
192Enables support for serializing and deserializing `&'static Encoding`-typed
193struct fields using [Serde][1].
194
195[1]: https://serde.rs/
196
197Not used by Firefox.
198
199### `fast-legacy-encode`
200
201A catch-all option for enabling the fastest legacy encode options. _Does not
202affect decode speed or UTF-8 encode speed._
203
204At present, this option is equivalent to enabling the following options:
205 * `fast-hangul-encode`
206 * `fast-hanja-encode`
207 * `fast-kanji-encode`
208 * `fast-gb-hanzi-encode`
209 * `fast-big5-hanzi-encode`
210
211Adds 176 KB to the binary size.
212
213Not used by Firefox.
214
215### `fast-hangul-encode`
216
217Changes encoding precomposed Hangul syllables into EUC-KR from binary
218search over the decode-optimized tables to lookup by index making Korean
219plain-text encode about 4 times as fast as without this option.
220
221Adds 20 KB to the binary size.
222
223Does _not_ affect decode speed.
224
225Not used by Firefox.
226
227### `fast-hanja-encode`
228
229Changes encoding of Hanja into EUC-KR from linear search over the
230decode-optimized table to lookup by index. Since Hanja is practically absent
231in modern Korean text, this option doesn't affect perfomance in the common
232case and mainly makes sense if you want to make your application resilient
233agaist denial of service by someone intentionally feeding it a lot of Hanja
234to encode into EUC-KR.
235
236Adds 40 KB to the binary size.
237
238Does _not_ affect decode speed.
239
240Not used by Firefox.
241
242### `fast-kanji-encode`
243
244Changes encoding of Kanji into Shift_JIS, EUC-JP and ISO-2022-JP from linear
245search over the decode-optimized tables to lookup by index making Japanese
246plain-text encode to legacy encodings 30 to 50 times as fast as without this
247option (about 2 times as fast as with `less-slow-kanji-encode`).
248
249Takes precedence over `less-slow-kanji-encode`.
250
251Adds 36 KB to the binary size (24 KB compared to `less-slow-kanji-encode`).
252
253Does _not_ affect decode speed.
254
255Not used by Firefox.
256
257### `less-slow-kanji-encode`
258
259Makes JIS X 0208 Level 1 Kanji (the most common Kanji in Shift_JIS, EUC-JP and
260ISO-2022-JP) encode less slow (binary search instead of linear search) making
261Japanese plain-text encode to legacy encodings 14 to 23 times as fast as
262without this option.
263
264Adds 12 KB to the binary size.
265
266Does _not_ affect decode speed.
267
268Not used by Firefox.
269
270### `fast-gb-hanzi-encode`
271
272Changes encoding of Hanzi in the CJK Unified Ideographs block into GBK and
273gb18030 from linear search over a part the decode-optimized tables followed
274by a binary search over another part of the decode-optimized tables to lookup
275by index making Simplified Chinese plain-text encode to the legacy encodings
276100 to 110 times as fast as without this option (about 2.5 times as fast as
277with `less-slow-gb-hanzi-encode`).
278
279Takes precedence over `less-slow-gb-hanzi-encode`.
280
281Adds 36 KB to the binary size (24 KB compared to `less-slow-gb-hanzi-encode`).
282
283Does _not_ affect decode speed.
284
285Not used by Firefox.
286
287### `less-slow-gb-hanzi-encode`
288
289Makes GB2312 Level 1 Hanzi (the most common Hanzi in gb18030 and GBK) encode
290less slow (binary search instead of linear search) making Simplified Chinese
291plain-text encode to the legacy encodings about 40 times as fast as without
292this option.
293
294Adds 12 KB to the binary size.
295
296Does _not_ affect decode speed.
297
298Not used by Firefox.
299
300### `fast-big5-hanzi-encode`
301
302Changes encoding of Hanzi in the CJK Unified Ideographs block into Big5 from
303linear search over a part the decode-optimized tables to lookup by index
304making Traditional Chinese plain-text encode to Big5 105 to 125 times as fast
305as without this option (about 3 times as fast as with
306`less-slow-big5-hanzi-encode`).
307
308Takes precedence over `less-slow-big5-hanzi-encode`.
309
310Adds 40 KB to the binary size (20 KB compared to `less-slow-big5-hanzi-encode`).
311
312Does _not_ affect decode speed.
313
314Not used by Firefox.
315
316### `less-slow-big5-hanzi-encode`
317
318Makes Big5 Level 1 Hanzi (the most common Hanzi in Big5) encode less slow
319(binary search instead of linear search) making Traditional Chinese
320plain-text encode to Big5 about 36 times as fast as without this option.
321
322Adds 20 KB to the binary size.
323
324Does _not_ affect decode speed.
325
326Not used by Firefox.
327
328## Performance goals
329
330For decoding to UTF-16, the goal is to perform at least as well as Gecko's old
331uconv. For decoding to UTF-8, the goal is to perform at least as well as
332rust-encoding. These goals have been achieved.
333
334Encoding to UTF-8 should be fast. (UTF-8 to UTF-8 encode should be equivalent
335to `memcpy` and UTF-16 to UTF-8 should be fast.)
336
337Speed is a non-goal when encoding to legacy encodings. By default, encoding to
338legacy encodings should not be optimized for speed at the expense of code size
339as long as form submission and URL parsing in Gecko don't become noticeably
340too slow in real-world use.
341
342In the interest of binary size, by default, encoding_rs does not have
343encode-specific data tables beyond 32 bits of encode-specific data for each
344single-byte encoding. Therefore, encoders search the decode-optimized data
345tables. This is a linear search in most cases. As a result, by default, encode
346to legacy encodings varies from slow to extremely slow relative to other
347libraries. Still, with realistic work loads, this seemed fast enough not to be
348user-visibly slow on Raspberry Pi 3 (which stood in for a phone for testing)
349in the Web-exposed encoder use cases.
350
351See the cargo features above for optionally making CJK legacy encode fast.
352
353A framework for measuring performance is [available separately][2].
354
355[2]: https://github.com/hsivonen/encoding_bench/
356
357## Rust Version Compatibility
358
359It is a goal to support the latest stable Rust, the latest nightly Rust and
360the version of Rust that's used for Firefox Nightly.
361
362At this time, there is no firm commitment to support a version older than
363what's required by Firefox, and there is no commitment to treat MSRV changes
364as semver-breaking, because this crate depends on `cfg-if`, which doesn't
365appear to treat MSRV changes as semver-breaking, so it would be useless for
366this crate to treat MSRV changes as semver-breaking.
367
368As of 2021-02-04, MSRV appears to be Rust 1.36.0 for using the crate and
3691.42.0 for doc tests to pass without errors about the global allocator.
370
371## Compatibility with rust-encoding
372
373A compatibility layer that implements the rust-encoding API on top of
374encoding_rs is
375[provided as a separate crate](https://github.com/hsivonen/encoding_rs_compat)
376(cannot be uploaded to crates.io). The compatibility layer was originally
377written with the assuption that Firefox would need it, but it is not currently
378used in Firefox.
379
380## Regenerating Generated Code
381
382To regenerate the generated code:
383
384 * Have Python 2 installed.
385 * Clone [`https://github.com/hsivonen/encoding_c`](https://github.com/hsivonen/encoding_c)
386   next to the `encoding_rs` directory.
387 * Clone [`https://github.com/hsivonen/codepage`](https://github.com/hsivonen/codepage)
388   next to the `encoding_rs` directory.
389 * Clone [`https://github.com/whatwg/encoding`](https://github.com/whatwg/encoding)
390   next to the `encoding_rs` directory.
391 * Checkout revision `f381389` of the `encoding` repo.
392 * With the `encoding_rs` directory as the working directory, run
393   `python generate-encoding-data.py`.
394
395## Roadmap
396
397- [x] Design the low-level API.
398- [x] Provide Rust-only convenience features.
399- [x] Provide an stl/gsl-flavored C++ API.
400- [x] Implement all decoders and encoders.
401- [x] Add unit tests for all decoders and encoders.
402- [x] Finish BOM sniffing variants in Rust-only convenience features.
403- [x] Document the API.
404- [x] Publish the crate on crates.io.
405- [x] Create a solution for measuring performance.
406- [x] Accelerate ASCII conversions using SSE2 on x86.
407- [x] Accelerate ASCII conversions using ALU register-sized operations on
408      non-x86 architectures (process an `usize` instead of `u8` at a time).
409- [x] Split FFI into a separate crate so that the FFI doesn't interfere with
410      LTO in pure-Rust usage.
411- [x] Compress CJK indices by making use of sequential code points as well
412      as Unicode-ordered parts of indices.
413- [x] Make lookups by label or name use binary search that searches from the
414      end of the label/name to the start.
415- [x] Make labels with non-ASCII bytes fail fast.
416- [ ] ~Parallelize UTF-8 validation using [Rayon](https://github.com/nikomatsakis/rayon).~
417      (This turned out to be a pessimization in the ASCII case due to memory bandwidth reasons.)
418- [x] Provide an XPCOM/MFBT-flavored C++ API.
419- [x] Investigate accelerating single-byte encode with a single fast-tracked
420      range per encoding.
421- [x] Replace uconv with encoding_rs in Gecko.
422- [x] Implement the rust-encoding API in terms of encoding_rs.
423- [x] Add SIMD acceleration for Aarch64.
424- [x] Investigate the use of NEON on 32-bit ARM.
425- [ ] ~Investigate Björn Höhrmann's lookup table acceleration for UTF-8 as
426      adapted to Rust in rust-encoding.~
427- [x] Add actually fast CJK encode options.
428- [ ] ~Investigate [Bob Steagall's lookup table acceleration for UTF-8](https://github.com/BobSteagall/CppNow2018/blob/master/FastConversionFromUTF-8/Fast%20Conversion%20From%20UTF-8%20with%20C%2B%2B%2C%20DFAs%2C%20and%20SSE%20Intrinsics%20-%20Bob%20Steagall%20-%20C%2B%2BNow%202018.pdf).~
429- [ ] Provide a build mode that works without `alloc` (with lesser API surface).
430- [ ] Migrate to `std::simd` once it is stable and declare 1.0.
431
432## Release Notes
433
434### 0.8.29
435
436* Make the parts that use an allocator optional.
437
438### 0.8.28
439
440* Fix error in Serde support introduced as part of `no_std` support.
441
442### 0.8.27
443
444* Make the crate works in a `no_std` environment (with `alloc`).
445
446### 0.8.26
447
448* Fix oversights in edition 2018 migration that broke the `simd-accel` feature.
449
450### 0.8.25
451
452* Do pointer alignment checks in a way where intermediate steps aren't defined to be Undefined Behavior.
453* Update the `packed_simd` dependency to `packed_simd_2`.
454* Update the `cfg-if` dependency to 1.0.
455* Address warnings that have been introduced by newer Rust versions along the way.
456* Update to edition 2018, since even prior to 1.0 `cfg-if` updated to edition 2018 without a semver break.
457
458### 0.8.24
459
460* Avoid computing an intermediate (not dereferenced) pointer value in a manner designated as Undefined Behavior when computing pointer alignment.
461
462### 0.8.23
463
464* Remove year from copyright notices. (No features or bug fixes.)
465
466### 0.8.22
467
468* Formatting fix and new unit test. (No features or bug fixes.)
469
470### 0.8.21
471
472* Fixed a panic with invalid UTF-16[BE|LE] input at the end of the stream.
473
474### 0.8.20
475
476* Make `Decoder::latin1_byte_compatible_up_to` return `None` in more
477  cases to make the method actually useful. While this could be argued
478  to be a breaking change due to the bug fix changing semantics, it does
479  not break callers that had to handle the `None` case in a reasonable
480  way anyway.
481
482### 0.8.19
483
484* Removed a bunch of bound checks in `convert_str_to_utf16`.
485* Added `mem::convert_utf8_to_utf16_without_replacement`.
486
487### 0.8.18
488
489* Added `mem::utf8_latin1_up_to` and `mem::str_latin1_up_to`.
490* Added `Decoder::latin1_byte_compatible_up_to`.
491
492### 0.8.17
493
494* Update `bincode` (dev dependency) version requirement to 1.0.
495
496### 0.8.16
497
498* Switch from the `simd` crate to `packed_simd`.
499
500### 0.8.15
501
502* Adjust documentation for `simd-accel` (README-only release).
503
504### 0.8.14
505
506* Made UTF-16 to UTF-8 encode conversion fill the output buffer as
507  closely as possible.
508
509### 0.8.13
510
511* Made the UTF-8 to UTF-16 decoder compare the number of code units written
512  with the length of the right slice (the output slice) to fix a panic
513  introduced in 0.8.11.
514
515### 0.8.12
516
517* Removed the `clippy::` prefix from clippy lint names.
518
519### 0.8.11
520
521* Changed minimum Rust requirement to 1.29.0 (for the ability to refer
522  to the interior of a `static` when defining another `static`).
523* Explicitly aligned the lookup tables for single-byte encodings and
524  UTF-8 to cache lines in the hope of freeing up one cache line for
525  other data. (Perhaps the tables were already aligned and this is
526  placebo.)
527* Added 32 bits of encode-oriented data for each single-byte encoding.
528  The change was performance-neutral for non-Latin1-ish Latin legacy
529  encodings, improved Latin1-ish and Arabic legacy encode speed
530  somewhat (new speed is 2.4x the old speed for German, 2.3x for
531  Arabic, 1.7x for Portuguese and 1.4x for French) and improved
532  non-Latin1, non-Arabic legacy single-byte encode a lot (7.2x for
533  Thai, 6x for Greek, 5x for Russian, 4x for Hebrew).
534* Added compile-time options for fast CJK legacy encode options (at
535  the cost of binary size (up to 176 KB) and run-time memory usage).
536  These options still retain the overall code structure instead of
537  rewriting the CJK encoders totally, so the speed isn't as good as
538  what could be achieved by using even more memory / making the
539  binary even langer.
540* Made UTF-8 decode and validation faster.
541* Added method `is_single_byte()` on `Encoding`.
542* Added `mem::decode_latin1()` and `mem::encode_latin1_lossy()`.
543
544### 0.8.10
545
546* Disabled a unit test that tests a panic condition when the assertion
547  being tested is disabled.
548
549### 0.8.9
550
551* Made `--features simd-accel` work with stable-channel compiler to
552  simplify the Firefox build system.
553
554### 0.8.8
555
556* Made the `is_foo_bidi()` not treat U+FEFF (ZERO WIDTH NO-BREAK SPACE
557  aka. BYTE ORDER MARK) as right-to-left.
558* Made the `is_foo_bidi()` functions report `true` if the input contains
559  Hebrew presentations forms (which are right-to-left but not in a
560  right-to-left-roadmapped block).
561
562### 0.8.7
563
564* Fixed a panic in the UTF-16LE/UTF-16BE decoder when decoding to UTF-8.
565
566### 0.8.6
567
568* Temporarily removed the debug assertion added in version 0.8.5 from
569  `convert_utf16_to_latin1_lossy`.
570
571### 0.8.5
572
573* If debug assertions are enabled but fuzzing isn't enabled, lossy conversions
574  to Latin1 in the `mem` module assert that the input is in the range
575  U+0000...U+00FF (inclusive).
576* In the `mem` module provide conversions from Latin1 and UTF-16 to UTF-8
577  that can deal with insufficient output space. The idea is to use them
578  first with an allocation rounded up to jemalloc bucket size and do the
579  worst-case allocation only if the jemalloc rounding up was insufficient
580  as the first guess.
581
582### 0.8.4
583
584* Fix SSE2-specific, `simd-accel`-specific memory corruption introduced in
585  version 0.8.1 in conversions between UTF-16 and Latin1 in the `mem` module.
586
587### 0.8.3
588
589* Removed an `#[inline(never)]` annotation that was not meant for release.
590
591### 0.8.2
592
593* Made non-ASCII UTF-16 to UTF-8 encode faster by manually omitting bound
594  checks and manually adding branch prediction annotations.
595
596### 0.8.1
597
598* Tweaked loop unrolling and memory alignment for SSE2 conversions between
599  UTF-16 and Latin1 in the `mem` module to increase the performance when
600  converting long buffers.
601
602### 0.8.0
603
604* Changed the minimum supported version of Rust to 1.21.0 (semver breaking
605  change).
606* Flipped around the defaults vs. optional features for controlling the size
607  vs. speed trade-off for Kanji and Hanzi legacy encode (semver breaking
608  change).
609* Added NEON support on ARMv7.
610* SIMD-accelerated x-user-defined to UTF-16 decode.
611* Made UTF-16LE and UTF-16BE decode a lot faster (including SIMD
612  acceleration).
613
614### 0.7.2
615
616* Add the `mem` module.
617* Refactor SIMD code which can affect performance outside the `mem`
618  module.
619
620### 0.7.1
621
622* When encoding from invalid UTF-16, correctly handle U+DC00 followed by
623  another low surrogate.
624
625### 0.7.0
626
627* [Make `replacement` a label of the replacement
628  encoding.](https://github.com/whatwg/encoding/issues/70) (Spec change.)
629* Remove `Encoding::for_name()`. (`Encoding::for_label(foo).unwrap()` is
630  now close enough after the above label change.)
631* Remove the `parallel-utf8` cargo feature.
632* Add optional Serde support for `&'static Encoding`.
633* Performance tweaks for ASCII handling.
634* Performance tweaks for UTF-8 validation.
635* SIMD support on aarch64.
636
637### 0.6.11
638
639* Make `Encoder::has_pending_state()` public.
640* Update the `simd` crate dependency to 0.2.0.
641
642### 0.6.10
643
644* Reserve enough space for NCRs when encoding to ISO-2022-JP.
645* Correct max length calculations for multibyte decoders.
646* Correct max length calculations before BOM sniffing has been
647  performed.
648* Correctly calculate max length when encoding from UTF-16 to GBK.
649
650### 0.6.9
651
652* [Don't prepend anything when gb18030 range decode
653  fails](https://github.com/whatwg/encoding/issues/110). (Spec change.)
654
655### 0.6.8
656
657* Correcly handle the case where the first buffer contains potentially
658  partial BOM and the next buffer is the last buffer.
659* Decode byte `7F` correctly in ISO-2022-JP.
660* Make UTF-16 to UTF-8 encode write closer to the end of the buffer.
661* Implement `Hash` for `Encoding`.
662
663### 0.6.7
664
665* [Map half-width katakana to full-width katana in ISO-2022-JP
666  encoder](https://github.com/whatwg/encoding/issues/105). (Spec change.)
667* Give `InputEmpty` correct precedence over `OutputFull` when encoding
668  with replacement and the output buffer passed in is too short or the
669  remaining space in the output buffer is too small after a replacement.
670
671### 0.6.6
672
673* Correct max length calculation when a partial BOM prefix is part of
674  the decoder's state.
675
676### 0.6.5
677
678* Correct max length calculation in various encoders.
679* Correct max length calculation in the UTF-16 decoder.
680* Derive `PartialEq` and `Eq` for the `CoderResult`, `DecoderResult`
681  and `EncoderResult` types.
682
683### 0.6.4
684
685* Avoid panic when encoding with replacement and the destination buffer is
686  too short to hold one numeric character reference.
687
688### 0.6.3
689
690* Add support for 32-bit big-endian hosts. (For real this time.)
691
692### 0.6.2
693
694* Fix a panic from subslicing with bad indices in
695  `Encoder::encode_from_utf16`. (Due to an oversight, it lacked the fix that
696  `Encoder::encode_from_utf8` already had.)
697* Micro-optimize error status accumulation in non-streaming case.
698
699### 0.6.1
700
701* Avoid panic near integer overflow in a case that's unlikely to actually
702  happen.
703* Address Clippy lints.
704
705### 0.6.0
706
707* Make the methods for computing worst-case buffer size requirements check
708  for integer overflow.
709* Upgrade rayon to 0.7.0.
710
711### 0.5.1
712
713* Reorder methods for better documentation readability.
714* Add support for big-endian hosts. (Only 64-bit case actually tested.)
715* Optimize the ALU (non-SIMD) case for 32-bit ARM instead of x86_64.
716
717### 0.5.0
718
719* Avoid allocating an excessively long buffers in non-streaming decode.
720* Fix the behavior of ISO-2022-JP and replacement decoders near the end of the
721  output buffer.
722* Annotate the result structs with `#[must_use]`.
723
724### 0.4.0
725
726* Split FFI into a separate crate.
727* Performance tweaks.
728* CJK binary size and encoding performance changes.
729* Parallelize UTF-8 validation in the case of long buffers (with optional
730  feature `parallel-utf8`).
731* Borrow even with ISO-2022-JP when possible.
732
733### 0.3.2
734
735* Fix moving pointers to alignment in ALU-based ASCII acceleration.
736* Fix errors in documentation and improve documentation.
737
738### 0.3.1
739
740* Fix UTF-8 to UTF-16 decode for byte sequences beginning with 0xEE.
741* Make UTF-8 to UTF-8 decode SSE2-accelerated when feature `simd-accel` is used.
742* When decoding and encoding ASCII-only input from or to an ASCII-compatible
743  encoding using the non-streaming API, return a borrow of the input.
744* Make encode from UTF-16 to UTF-8 faster.
745
746### 0.3
747
748* Change the references to the instances of `Encoding` from `const` to `static`
749  to make the referents unique across crates that use the refernces.
750* Introduce non-reference-typed `FOO_INIT` instances of `Encoding` to allow
751  foreign crates to initialize `static` arrays with references to `Encoding`
752  instances even under Rust's constraints that prohibit the initialization of
753  `&'static Encoding`-typed array items with `&'static Encoding`-typed
754  `statics`.
755* Document that the above two points will be reverted if Rust changes `const`
756  to work so that cross-crate usage keeps the referents unique.
757* Return `Cow`s from Rust-only non-streaming methods for encode and decode.
758* `Encoding::for_bom()` returns the length of the BOM.
759* ASCII-accelerated conversions for encodings other than UTF-16LE, UTF-16BE,
760  ISO-2022-JP and x-user-defined.
761* Add SSE2 acceleration behind the `simd-accel` feature flag. (Requires
762  nightly Rust.)
763* Fix panic with long bogus labels.
764* Map [0xCA to U+05BA in windows-1255](https://github.com/whatwg/encoding/issues/73).
765  (Spec change.)
766* Correct the [end of the Shift_JIS EUDC range](https://github.com/whatwg/encoding/issues/53).
767  (Spec change.)
768
769### 0.2.4
770
771* Polish FFI documentation.
772
773### 0.2.3
774
775* Fix UTF-16 to UTF-8 encode.
776
777### 0.2.2
778
779* Add `Encoder.encode_from_utf8_to_vec_without_replacement()`.
780
781### 0.2.1
782
783* Add `Encoding.is_ascii_compatible()`.
784
785* Add `Encoding::for_bom()`.
786
787* Make `==` for `Encoding` use name comparison instead of pointer comparison,
788  because uses of the encoding constants in different crates result in
789  different addresses and the constant cannot be turned into statics without
790  breaking other things.
791
792### 0.2.0
793
794The initial release.
795