1 Changes made between 1.0.3 (2018/06/13) and 1.1.x (TBD). 2 3Merged yescrypt-opt.c and yescrypt-simd.c into one source file, which is 4a closer match to -simd but is called -opt (and -simd is now gone). 5With this change, performance of SIMD builds should be almost unchanged, 6while scalar builds should be faster than before on register-rich 64-bit 7architectures but may be slower than before on register-starved 32-bit 8architectures (this shortcoming may be addressed later). This also 9happens to make SSE prefetch available even in otherwise-scalar builds 10and it paves the way for adding SIMD support on big-endian architectures 11(previously, -simd assumed little-endian). 12 13 14 Changes made between 1.0.2 (2018/06/06) and 1.0.3 (2018/06/13). 15 16In SMix1, optimized out the indexing of V for the sequential writes. 17 18 19 Changes made between 1.0.1 (2018/04/22) and 1.0.2 (2018/06/06). 20 21Don't use MAP_POPULATE anymore because new multi-threaded benchmarks on 22RHEL6'ish and RHEL7'ish systems revealed that it sometimes has adverse 23effect far in excess of its occasional positive effect. 24 25In the SIMD code, we now reuse the same buffer for BlockMix_pwxform's 26input and output in SMix2. This might slightly improve cache hit rate 27and thus performance. 28 29Also in the SIMD code, a compiler memory barrier has been added between 30sub-blocks to ensure that none of the writes into what was S2 during 31processing of the previous sub-block are postponed until after a read 32from S0 or S1 in the inline asm code for the current sub-block. This 33potential problem was never observed so far due to other constraints 34that we have, but strictly speaking those constraints were insufficient 35to guarantee it couldn't occur. 36 37 38 Changes made between 1.0.0 (2018/03/09) and 1.0.1 (2018/04/22). 39 40The included documentation has been improved, most notably adding new 41text files PARAMETERS (guidelines on parameter selection, and currently 42recommended parameter sets by use case) and COMPARISON (comparison to 43scrypt and Argon2). 44 45Code cleanups have been made, including removal of AVX2 support, which 46was deliberately temporarily preserved for the 1.0.0 release, but which 47almost always hurt performance with currently recommended low-level 48yescrypt parameters on Intel & AMD CPUs tested so far. (The low-level 49parameters are chosen with consideration for relative performance of 50defensive vs. offensive implementations on different hardware, and not 51only for seemingly best performance on CPUs. It is possible to change 52them such that AVX2 would be worthwhile, and this might happen in the 53future, but currently this wouldn't be obviously beneficial overall.) 54 55 56 Changes made between 0.8.1 (2015/10/25) and 1.0.0 (2018/03/09). 57 58Hash string encoding has been finalized under the "$y$" prefix for both 59native yescrypt and classic scrypt hashes, using a new variable-length 60and extremely compact encoding of (ye)scrypt's many parameters. (Also 61still recognized under the "$7$" prefix is the previously used encoding 62for classic scrypt hashes, which is fixed-length and not so compact.) 63 64Optional format-preserving salt and hash (re-)encryption has been added, 65using the Luby-Rackoff construction with SHA-256 as the PRF. 66 67Support for hash upgrades has been temporarily excluded to allow for its 68finalization at a later time and based on actual needs (e.g., will 3x 69ROM size upgrades be in demand now that Intel went from 4 to 6 memory 70channels in their server CPUs, bringing a factor of 3 into RAM sizes?) 71 72ROM initialization has been sped up through a new simplified algorithm. 73 74ROM tags (magic constant values) and digests (values that depend on the 75entire computation of the ROM contents) have been added to the last 76block of ROM. (The placement of these tags/digests is such that nested 77ROMs are possible, to allow for ROM size upgrades later.) 78 79The last block of ROM is now checked for the tag and is always used for 80hash computation before a secret-dependent memory access is first made. 81This ensures that hashes won't be computed with a partially initialized 82ROM or with one initialized using different machine word endianness, and 83that they will be consistently miscomputed if the ROM digest is other 84than what the caller expected. This in turn helps early detection of 85problems with ROM initialization even if the calling application fails 86to check for them. This also helps mitigate cache-timing attacks when 87the attacker doesn't know the contents of the last block of ROM. 88 89Many implementation changes have been made, such as for performance, 90portability, security (intentional reuse and thus rewrite of memory 91where practical and optional zeroization elsewhere), and coding style. 92This includes addition of optional SSE2 inline assembly code (a macro 93with 8 instructions) to yescrypt-simd.c, which tends to slightly 94outperform compiler-generated code, including AVX(2)-enabled code, for 95yescrypt's currently recommended settings. This is no surprise since 96yescrypt was designed to fit the 64-bit mode extended SSE2 instruction 97set perfectly (including SSE2's lack of 3-register instructions), so for 98its optimal implementation AVX would merely result in extra instruction 99prefixes and not provide any benefit (except for the uses of Salsa20 100inherited from scrypt, but those are infrequent). 101 102The auxiliary files inherited from scrypt have been sync'ed with scrypt 1031.2.1, and the implementation of PBKDF2 has been further optimized, 104especially for its use in (ye)scrypt where the "iteration count" is 1 105but the output size is relatively large. (The speedup is measurable at 106realistically low settings for yescrypt, such as at 2 MiB of memory.) 107 108The included tests have been revised and test vectors regenerated to 109account for the ROM initialization/use updates and hash (re-)encryption. 110 111The PHC test vectors have been compacted into a single SHA-256 hash of 112the expected output of phc.c, but have otherwise remained unchanged as 113none of the incompatible changes have affected the subset of yescrypt 114exposed via the PHS() interface for the Password Hashing Competition. 115 116The specification document and extra programs that were included with 117the PHC submission and its updates are now excluded from this release. 118 119The rest of documentation files have been updated for the 1.0.0 release. 120 121 122 Changes made between 0.7.1 (2015/01/31) and 0.8.1 (2015/10/25). 123 124pwxform became stateful, through writes to its S-boxes. This further 125discourages TMTO attacks on yescrypt as a whole, as well as on pwxform 126S-boxes separately. It also increases the total size of the S-boxes by 127a factor of 1.5 (8 KiB to 12 KiB by default) and it puts the previously 128mostly idle L1 cache write ports on CPUs to use. 129 130Salsa20/8 in BlockMix_pwxform has been replaced with Salsa20/2. 131 132An extra HMAC-SHA256 update of the password buffer (which is eventually 133passed into the final PBKDF2 invocation) is now performed right after 134the pwxform S-boxes initialization. 135 136Nloop_rw rounding has been adjusted to be the same as Nloop_all's. 137This avoids an unnecessary invocation of SMix2 with Nloop = 2, which 138would otherwise have occurred in some cases. 139 140t is now halved per hash upgrade (rather than reset to 0 right away on 141the very first upgrade, like it was in 0.7.1). 142 143Minor corrections and improvements to the specification and the code 144have been made. 145 146 147 Changes made between 0.6.4 (2015/01/30) and 0.7.1 (2015/01/31). 148 149The YESCRYPT_PARALLEL_SMIX and YESCRYPT_PWXFORM flags have been removed, 150with the corresponding functionality enabled along with the YESCRYPT_RW 151flag. This change has simplified the SIMD implementation a little bit 152(eliminating specialized code for some flag combinations that are no 153longer possible), and it should help simplify documentation, analysis, 154testing, and benchmarking (fewer combinations of settings to test). 155 156Adjustments to pre- and post-hashing have been made to address subtle 157issues and non-intuitive behavior, as well as in some cases to reduce 158impact of garbage collector attacks. 159 160Support for hash upgrades has been added (the g parameter). 161 162Extra tests have been written and test vectors re-generated. 163 164 165 Changes made between 0.5.2 (2014/03/31) and 0.6.4 (2015/01/30). 166 167Dropped support for ROM access frequency mask since it made little sense 168when supporting only one ROM at a time. (It'd make sense with two ROMs, 169for simultaneous use of a ROM-in-RAM and a ROM-on-SSD. With just one 170ROM, the mask could still be used for a ROM-on-SSD, but only in lieu of 171a ROM-in-RAM, which would arguably be unreasonable.) 172 173Simplified the API by having it accept NULL for the "shared" parameter 174to indicate no ROM in use. (Previously, a dummy "shared" structure had 175to be created.) 176 177Completed the specification of pwxform, BlockMix_pwxform, Salsa20 SIMD 178shuffling, and potential endianness conversion. (No change to these has 179been made - they have just been specified in the included document more 180completely.) 181 182Provided rationale for the default compile-time settings for pwxform. 183 184Revised the reference and optimized implementations' source code to more 185closely match the current specification document in terms of identifier 186names, compile-time constant expressions, source code comments, and in 187some cases the ordering of source code lines. None of these changes 188affect the computed hash values, hence the test vectors have remained 189the same. 190