1	Changes made between 1.0.3 (2018/06/13) and 1.1.x (TBD).
2
3Merged yescrypt-opt.c and yescrypt-simd.c into one source file, which is
4a closer match to -simd but is called -opt (and -simd is now gone).
5With this change, performance of SIMD builds should be almost unchanged,
6while scalar builds should be faster than before on register-rich 64-bit
7architectures but may be slower than before on register-starved 32-bit
8architectures (this shortcoming may be addressed later).  This also
9happens to make SSE prefetch available even in otherwise-scalar builds
10and it paves the way for adding SIMD support on big-endian architectures
11(previously, -simd assumed little-endian).
12
13
14	Changes made between 1.0.2 (2018/06/06) and 1.0.3 (2018/06/13).
15
16In SMix1, optimized out the indexing of V for the sequential writes.
17
18
19	Changes made between 1.0.1 (2018/04/22) and 1.0.2 (2018/06/06).
20
21Don't use MAP_POPULATE anymore because new multi-threaded benchmarks on
22RHEL6'ish and RHEL7'ish systems revealed that it sometimes has adverse
23effect far in excess of its occasional positive effect.
24
25In the SIMD code, we now reuse the same buffer for BlockMix_pwxform's
26input and output in SMix2.  This might slightly improve cache hit rate
27and thus performance.
28
29Also in the SIMD code, a compiler memory barrier has been added between
30sub-blocks to ensure that none of the writes into what was S2 during
31processing of the previous sub-block are postponed until after a read
32from S0 or S1 in the inline asm code for the current sub-block.  This
33potential problem was never observed so far due to other constraints
34that we have, but strictly speaking those constraints were insufficient
35to guarantee it couldn't occur.
36
37
38	Changes made between 1.0.0 (2018/03/09) and 1.0.1 (2018/04/22).
39
40The included documentation has been improved, most notably adding new
41text files PARAMETERS (guidelines on parameter selection, and currently
42recommended parameter sets by use case) and COMPARISON (comparison to
43scrypt and Argon2).
44
45Code cleanups have been made, including removal of AVX2 support, which
46was deliberately temporarily preserved for the 1.0.0 release, but which
47almost always hurt performance with currently recommended low-level
48yescrypt parameters on Intel & AMD CPUs tested so far.  (The low-level
49parameters are chosen with consideration for relative performance of
50defensive vs. offensive implementations on different hardware, and not
51only for seemingly best performance on CPUs.  It is possible to change
52them such that AVX2 would be worthwhile, and this might happen in the
53future, but currently this wouldn't be obviously beneficial overall.)
54
55
56	Changes made between 0.8.1 (2015/10/25) and 1.0.0 (2018/03/09).
57
58Hash string encoding has been finalized under the "$y$" prefix for both
59native yescrypt and classic scrypt hashes, using a new variable-length
60and extremely compact encoding of (ye)scrypt's many parameters.  (Also
61still recognized under the "$7$" prefix is the previously used encoding
62for classic scrypt hashes, which is fixed-length and not so compact.)
63
64Optional format-preserving salt and hash (re-)encryption has been added,
65using the Luby-Rackoff construction with SHA-256 as the PRF.
66
67Support for hash upgrades has been temporarily excluded to allow for its
68finalization at a later time and based on actual needs (e.g., will 3x
69ROM size upgrades be in demand now that Intel went from 4 to 6 memory
70channels in their server CPUs, bringing a factor of 3 into RAM sizes?)
71
72ROM initialization has been sped up through a new simplified algorithm.
73
74ROM tags (magic constant values) and digests (values that depend on the
75entire computation of the ROM contents) have been added to the last
76block of ROM.  (The placement of these tags/digests is such that nested
77ROMs are possible, to allow for ROM size upgrades later.)
78
79The last block of ROM is now checked for the tag and is always used for
80hash computation before a secret-dependent memory access is first made.
81This ensures that hashes won't be computed with a partially initialized
82ROM or with one initialized using different machine word endianness, and
83that they will be consistently miscomputed if the ROM digest is other
84than what the caller expected.  This in turn helps early detection of
85problems with ROM initialization even if the calling application fails
86to check for them.  This also helps mitigate cache-timing attacks when
87the attacker doesn't know the contents of the last block of ROM.
88
89Many implementation changes have been made, such as for performance,
90portability, security (intentional reuse and thus rewrite of memory
91where practical and optional zeroization elsewhere), and coding style.
92This includes addition of optional SSE2 inline assembly code (a macro
93with 8 instructions) to yescrypt-simd.c, which tends to slightly
94outperform compiler-generated code, including AVX(2)-enabled code, for
95yescrypt's currently recommended settings.  This is no surprise since
96yescrypt was designed to fit the 64-bit mode extended SSE2 instruction
97set perfectly (including SSE2's lack of 3-register instructions), so for
98its optimal implementation AVX would merely result in extra instruction
99prefixes and not provide any benefit (except for the uses of Salsa20
100inherited from scrypt, but those are infrequent).
101
102The auxiliary files inherited from scrypt have been sync'ed with scrypt
1031.2.1, and the implementation of PBKDF2 has been further optimized,
104especially for its use in (ye)scrypt where the "iteration count" is 1
105but the output size is relatively large.  (The speedup is measurable at
106realistically low settings for yescrypt, such as at 2 MiB of memory.)
107
108The included tests have been revised and test vectors regenerated to
109account for the ROM initialization/use updates and hash (re-)encryption.
110
111The PHC test vectors have been compacted into a single SHA-256 hash of
112the expected output of phc.c, but have otherwise remained unchanged as
113none of the incompatible changes have affected the subset of yescrypt
114exposed via the PHS() interface for the Password Hashing Competition.
115
116The specification document and extra programs that were included with
117the PHC submission and its updates are now excluded from this release.
118
119The rest of documentation files have been updated for the 1.0.0 release.
120
121
122	Changes made between 0.7.1 (2015/01/31) and 0.8.1 (2015/10/25).
123
124pwxform became stateful, through writes to its S-boxes.  This further
125discourages TMTO attacks on yescrypt as a whole, as well as on pwxform
126S-boxes separately.  It also increases the total size of the S-boxes by
127a factor of 1.5 (8 KiB to 12 KiB by default) and it puts the previously
128mostly idle L1 cache write ports on CPUs to use.
129
130Salsa20/8 in BlockMix_pwxform has been replaced with Salsa20/2.
131
132An extra HMAC-SHA256 update of the password buffer (which is eventually
133passed into the final PBKDF2 invocation) is now performed right after
134the pwxform S-boxes initialization.
135
136Nloop_rw rounding has been adjusted to be the same as Nloop_all's.
137This avoids an unnecessary invocation of SMix2 with Nloop = 2, which
138would otherwise have occurred in some cases.
139
140t is now halved per hash upgrade (rather than reset to 0 right away on
141the very first upgrade, like it was in 0.7.1).
142
143Minor corrections and improvements to the specification and the code
144have been made.
145
146
147	Changes made between 0.6.4 (2015/01/30) and 0.7.1 (2015/01/31).
148
149The YESCRYPT_PARALLEL_SMIX and YESCRYPT_PWXFORM flags have been removed,
150with the corresponding functionality enabled along with the YESCRYPT_RW
151flag.  This change has simplified the SIMD implementation a little bit
152(eliminating specialized code for some flag combinations that are no
153longer possible), and it should help simplify documentation, analysis,
154testing, and benchmarking (fewer combinations of settings to test).
155
156Adjustments to pre- and post-hashing have been made to address subtle
157issues and non-intuitive behavior, as well as in some cases to reduce
158impact of garbage collector attacks.
159
160Support for hash upgrades has been added (the g parameter).
161
162Extra tests have been written and test vectors re-generated.
163
164
165	Changes made between 0.5.2 (2014/03/31) and 0.6.4 (2015/01/30).
166
167Dropped support for ROM access frequency mask since it made little sense
168when supporting only one ROM at a time.  (It'd make sense with two ROMs,
169for simultaneous use of a ROM-in-RAM and a ROM-on-SSD.  With just one
170ROM, the mask could still be used for a ROM-on-SSD, but only in lieu of
171a ROM-in-RAM, which would arguably be unreasonable.)
172
173Simplified the API by having it accept NULL for the "shared" parameter
174to indicate no ROM in use.  (Previously, a dummy "shared" structure had
175to be created.)
176
177Completed the specification of pwxform, BlockMix_pwxform, Salsa20 SIMD
178shuffling, and potential endianness conversion.  (No change to these has
179been made - they have just been specified in the included document more
180completely.)
181
182Provided rationale for the default compile-time settings for pwxform.
183
184Revised the reference and optimized implementations' source code to more
185closely match the current specification document in terms of identifier
186names, compile-time constant expressions, source code comments, and in
187some cases the ordering of source code lines.  None of these changes
188affect the computed hash values, hence the test vectors have remained
189the same.
190