1Side Channels 2========================= 3 4Many cryptographic systems can be easily broken by side channels. This document 5notes side channel protections which are currently implemented, as well as areas 6of the code which are known to be vulnerable to side channels. The latter are 7obviously all open for future improvement. 8 9The following text assumes the reader is already familiar with cryptographic 10implementations, side channel attacks, and common countermeasures. 11 12Modular Exponentiation 13------------------------ 14 15Modular exponentiation uses a fixed window algorithm with Montgomery 16representation. A side channel silent table lookup is used to access the 17precomputed powers. The caller provides the maximum possible bit length of the 18exponent, and the exponent is zero-padded as required. For example, in a DSA 19signature with 256-bit q, the caller will specify a maximum length of exponent 20of 256 bits, even if the k that was generated was 250 bits. This avoids leaking 21the length of the exponent through the number of loop iterations. 22See monty_exp.cpp and monty.cpp 23 24Karatsuba multiplication algorithm avoids any conditional branches; in 25cases where different operations must be performed it instead uses masked 26operations. See mp_karat.cpp for details. 27 28The Montgomery reduction is written to run in constant time. 29The final reduction is handled with a masked subtraction. See mp_monty.cpp. 30 31Barrett Reduction 32-------------------- 33 34The Barrett reduction code is written to avoid input dependent branches. The 35Barrett algorithm only works for inputs up to a certain size, and larger values 36fall back on a different (slower) division algorithm. This secondary algorithm 37is also const time, but the branch allows detecting when a value larger than 382^{2k} was reduced, where k is the word length of the modulus. This leaks only 39the size of the two values, and not anything else about their value. 40 41RSA 42---------------------- 43 44Blinding is always used to protect private key operations (there is no way to 45turn it off). Both base blinding and exponent blinding are used. 46 47For base blinding, as an optimization, instead of choosing a new random mask and 48inverse with each decryption, both the mask and its inverse are simply squared 49to choose the next blinding factor. This is much faster than computing a fresh 50value each time, and the additional relation is thought to provide only minimal 51useful information for an attacker. Every BOTAN_BLINDING_REINIT_INTERVAL 52(default 64) operations, a new starting point is chosen. 53 54Exponent blinding uses new values for each signature, with 64 bit masks. 55 56RSA signing uses the CRT optimization, which is much faster but vulnerable to 57trivial fault attacks [RsaFault] which can result in the key being entirely 58compromised. To protect against this (or any other computational error which 59would have the same effect as a fault attack in this case), after every private 60key operation the result is checked for consistency with the public key. This 61introduces only slight additional overhead and blocks most fault attacks; it is 62possible to use a second fault attack to bypass this verification, but such a 63double fault attack requires significantly more control on the part of an 64attacker than a BellCore style attack, which is possible if any error at all 65occurs during either modular exponentiation involved in the RSA signature 66operation. 67 68See blinding.cpp and rsa.cpp. 69 70If the OpenSSL provider is enabled, then no explicit blinding is done; we assume 71OpenSSL handles this. See openssl_rsa.cpp. 72 73Decryption of PKCS #1 v1.5 Ciphertexts 74---------------------------------------- 75 76This padding scheme is used with RSA, and is very vulnerable to errors. In a 77scenario where an attacker can repeatedly present RSA ciphertexts, and a 78legitimate key holder will attempt to decrypt each ciphertext and simply 79indicates to the attacker if the PKCS padding was valid or not (without 80revealing any additional information), the attacker can use this behavior as an 81oracle to perform iterative decryption of arbitrary RSA ciphertexts encrypted 82under that key. This is the famous million message attack [MillionMsg]. A side 83channel such as a difference in time taken to handle valid and invalid RSA 84ciphertexts is enough to mount the attack [MillionMsgTiming]. 85 86As a first step, the PKCS v1.5 decoding operation runs without any 87conditional jumps or indexes, with the only variance in runtime being 88based on the length of the public modulus, which is public information. 89 90Preventing the attack in full requires some application level changes. In 91protocols which know the expected length of the encrypted key, PK_Decryptor 92provides the function `decrypt_or_random` which first generates a random fake 93key, then decrypts the presented ciphertext, then in constant time either copies 94out the random key or the decrypted plaintext depending on if the ciphertext was 95valid or not (valid padding and expected plaintext length). Then in the case of 96an attack, the protocol will carry on with a randomly chosen key, which will 97presumably cause total failure in a way that does not allow an attacker to 98distinguish (via any timing or other side channel, nor any error messages 99specific to the one situation vs the other) if the RSA padding was valid or 100invalid. 101 102One very important user of PKCS #1 v1.5 encryption is the TLS protocol. In TLS, 103some extra versioning information is embedded in the plaintext message, along 104with the key. It turns out that this version information must be treated in an 105identical (constant-time) way with the PKCS padding, or again the system is 106broken. [VersionOracle]. This is supported by a special version of 107PK_Decryptor::decrypt_or_random that additionally allows verifying one or more 108content bytes, in addition to the PKCS padding. 109 110See eme_pkcs.cpp and pubkey.cpp. 111 112Verification of PKCS #1 v1.5 Signatures 113---------------------------------------- 114 115One way of verifying PKCS #1 v1.5 signature padding is to decode it with an 116ASN.1 BER parser. However such a design commonly leads to accepting signatures 117besides the (single) valid RSA PKCS #1 v1.5 signature for any given message, 118because often the BER parser accepts variations of the encoding which are 119actually invalid. It also needlessly exposes the BER parser to untrusted inputs. 120 121It is safer and simpler to instead re-encode the hash value we are expecting 122using the PKCS #1 v1.5 encoding rules, and const time compare our expected 123encoding with the output of the RSA operation. So that is what Botan does. 124 125See emsa_pkcs.cpp. 126 127OAEP 128---------------------- 129 130RSA OAEP is (PKCS#1 v2) is the recommended version of RSA encoding standard, 131because it is not directly vulnerable to Bleichenbacher attack. However, if 132implemented incorrectly, a side channel can be presented to an attacker and 133create an oracle for decrypting RSA ciphertexts [OaepTiming]. 134 135This attack is avoided in Botan by making the OAEP decoding operation run 136without any conditional jumps or indexes, with the only variance in runtime 137coming from the length of the RSA key (which is public information). 138 139See eme_oaep.cpp. 140 141ECC point decoding 142---------------------- 143 144The API function OS2ECP, which is used to convert byte strings to ECC points, 145verifies that all points satisfy the ECC curve equation. Points that do not 146satisfy the equation are invalid, and can sometimes be used to break 147protocols ([InvalidCurve] [InvalidCurveTLS]). See point_gfp.cpp. 148 149ECC scalar multiply 150---------------------- 151 152There are several different implementations of ECC scalar multiplications which 153depend on the API invoked. This include ``PointGFp::operator*``, 154``EC_Group::blinded_base_point_multiply`` and 155``EC_Group::blinded_var_point_multiply``. 156 157The ``PointGFp::operator*`` implementation uses the Montgomery ladder, which is 158fairly resistant to side channels. However it leaks the size of the scalar, 159because the loop iterations are bounded by the scalar size. It should not be 160used in cases when the scalar is a secret. 161 162Both ``blinded_base_point_multiply`` and ``blinded_var_point_multiply`` apply 163side channel countermeasures. The scalar is masked by a multiple of the group 164order (this is commonly called Coron's first countermeasure [CoronDpa]), 165currently the mask is an 80 bit random value. 166 167Botan stores all ECC points in Jacobian representation. This form allows faster 168computation by representing points (x,y) as (X,Y,Z) where x=X/Z^2 and 169y=Y/Z^3. As the representation is redundant, for any randomly chosen non-zero r, 170(X*r^2,Y*r^3,Z*r) is an equivalent point. Changing the point values prevents an 171attacker from mounting attacks based on the input point remaining unchanged over 172multiple executions. This is commonly called Coron's third countermeasure, see 173again [CoronDpa]. 174 175The base point multiplication algorithm is a comb-like technique which 176precomputes ``P^i,(2*P)^i,(3*P)^i`` for all ``i`` in the range of valid scalars. 177This means the scalar multiplication involves only point additions and no 178doublings, which may help against attacks which rely on distinguishing between 179point doublings and point additions. The elements of the table are accessed by 180masked lookups, so as not to leak information about bits of the scalar via a 181cache side channel. However, whenever 3 sequential bits of the (masked) scalar 182are all 0, no operation is performed in that iteration of the loop. This exposes 183the scalar multiply to a cache-based side channel attack; scalar blinding is 184necessary to prevent this attack from leaking information about the scalar. 185 186The variable point multiplication algorithm uses a fixed-window algorithm. Since 187this is normally invoked using untrusted points (eg during ECDH key exchange) it 188randomizes all inputs to prevent attacks which are based on chosen input 189points. The table of precomputed multiples is accessed using a masked lookup 190which should not leak information about the secret scalar to an attacker who can 191mount a cache-based side channel attack. 192 193See point_gfp.cpp and point_mul.cpp 194 195ECDH 196---------------------- 197 198ECDH verifies (through its use of OS2ECP) that all input points received from 199the other party satisfy the curve equation. This prevents twist attacks. The 200same check is performed on the output point, which helps prevent fault attacks. 201 202ECDSA 203---------------------- 204 205Inversion of the ECDSA nonce k must be done in constant time, as any leak of 206even a single bit of the nonce can be sufficient to allow recovering the private 207key. In Botan all inverses modulo an odd number are performed using a constant 208time algorithm due to Niels Möller. 209 210x25519 211---------------------- 212 213The x25519 code is independent of the main Weierstrass form ECC code, instead 214based on curve25519-donna-c64.c by Adam Langley. The code seems immune to cache 215based side channels. It does make use of integer multiplications; on some old 216CPUs these multiplications take variable time and might allow a side channel 217attack. This is not considered a problem on modern processors. 218 219TLS CBC ciphersuites 220---------------------- 221 222The original TLS v1.0 CBC Mac-then-Encrypt mode is vulnerable to an oracle 223attack. If an attacker can distinguish padding errors through different error 224messages [TlsCbcOracle] or via a side channel attack like [Lucky13], they can 225abuse the server as a decryption oracle. 226 227The side channel protection for Lucky13 follows the approach proposed in the 228Lucky13 paper. It is not perfectly constant time, but does hide the padding 229oracle in practice. Tools to test TLS CBC decoding are included in the timing 230tests. See https://github.com/randombit/botan/pull/675 for more information. 231 232The Encrypt-then-MAC extension, which completely avoids the side channel, is 233implemented and used by default for CBC ciphersuites. 234 235CBC mode padding 236---------------------- 237 238In theory, any good protocol protects CBC ciphertexts with a MAC. But in 239practice, some protocols are not good and cannot be fixed immediately. To avoid 240making a bad problem worse, the code to handle decoding CBC ciphertext padding 241bytes runs in constant time, depending only on the block size of the cipher. 242 243AES 244---------------------- 245 246Some x86, ARMv8 and POWER processors support AES instructions which 247are fast and are thought to be side channel silent. These instructions 248are used when available. 249 250On CPUs which do not have hardware AES instructions but do support SIMD vectors 251with a byte shuffle (including x86's SSSE3, ARM's NEON and PowerPC AltiVec), a 252version of AES is implemented which is side channel silent. This implementation 253is based on code by Mike Hamburg [VectorAes], see aes_vperm.cpp. 254 255On all other processors, a constant time bitsliced implementation is used. This 256is typically slower than the vector permute implementation, and additionally for 257best performance multiple blocks must be processed in parellel. So modes such 258as CTR, GCM or XTS are relatively fast, but others such as CBC encryption 259suffer. 260 261GCM 262--------------------- 263 264On platforms that support a carryless multiply instruction (ARMv8 and recent x86), 265GCM is fast and constant time. 266 267On all other platforms, GCM uses an algorithm based on precomputing all powers 268of H from 1 to 128. Then for every bit of the input a mask is formed which 269allows conditionally adding that power without leaking information via a cache 270side channel. There is also an SSSE3 variant of this algorithm which is somewhat 271faster on processors which have SSSE3 but no AES-NI instructions. 272 273OCB 274----------------------- 275 276It is straightforward to implement OCB mode in a efficient way that does not 277depend on any secret branches or lookups. See ocb.cpp for the implementation. 278 279Poly1305 280---------------------- 281 282The Poly1305 implementation does not have any secret lookups or conditionals. 283The code is based on the public domain version by Andrew Moon. 284 285DES/3DES 286---------------------- 287 288The DES implementation uses table lookups, and is likely vulnerable to side 289channel attacks. DES or 3DES should be avoided in new systems. The proper fix 290would be a scalar bitsliced implementation, this is not seen as worth the 291engineering investment given these algorithms end of life status. 292 293Twofish 294------------------------ 295 296This algorithm uses table lookups with secret sboxes. No cache-based side 297channel attack on Twofish has ever been published, but it is possible nobody 298sufficiently skilled has ever tried. 299 300ChaCha20, Serpent, Threefish, ... 301----------------------------------- 302 303Some algorithms including ChaCha, Salsa, Serpent and Threefish are 'naturally' 304silent to cache and timing side channels on all recent processors. 305 306IDEA 307--------------- 308 309IDEA encryption, decryption, and key schedule are implemented to take constant 310time regardless of their inputs. 311 312Hash Functions 313------------------------- 314 315Most hash functions included in Botan such as MD5, SHA-1, SHA-2, SHA-3, Skein, 316and BLAKE2 do not require any input-dependent memory lookups, and so seem to not be 317affected by common CPU side channels. However the implementations of Whirlpool 318and Streebog use table lookups and probably can be attacked by side channels. 319 320Memory comparisons 321---------------------- 322 323The function same_mem in header mem_ops.h provides a constant-time comparison 324function. It is used when comparing MACs or other secret values. It is also 325exposed for application use. 326 327Memory zeroizing 328---------------------- 329 330There is no way in portable C/C++ to zero out an array before freeing it, in 331such a way that it is guaranteed that the compiler will not elide the 332'additional' (seemingly unnecessary) writes to zero out the memory. 333 334The function secure_scrub_memory (in mem_ops.cpp) uses some system specific 335trick to zero out an array. If possible an OS provided routine (such as 336``RtlSecureZeroMemory`` or ``explicit_bzero``) is used. 337 338On other platforms, by default the trick of referencing memset through a 339volatile function pointer is used. This approach is not guaranteed to work on 340all platforms, and currently there is no systematic check of the resulting 341binary function that it is compiled as expected. But, it is the best approach 342currently known and has been verified to work as expected on common platforms. 343 344If BOTAN_USE_VOLATILE_MEMSET_FOR_ZERO is set to 0 in build.h (not the default) a 345byte at a time loop through a volatile pointer is used to overwrite the array. 346 347Memory allocation 348---------------------- 349 350Botan's secure_vector type is a std::vector with a custom allocator. The 351allocator calls secure_scrub_memory before freeing memory. 352 353Some operating systems support an API call to lock a range of pages 354into memory, such that they will never be swapped out (``mlock`` on POSIX, 355``VirtualLock`` on Windows). On many POSIX systems ``mlock`` is only usable by 356root, but on Linux, FreeBSD and possibly other systems a small amount 357of memory can be locked by processes without extra credentials. 358 359If available, Botan uses such a region for storing key material. A page-aligned 360block of memory is allocated and locked, then the memory is scrubbed before 361freeing. This memory pool is used by secure_vector when available. It can be 362disabled at runtime setting the environment variable BOTAN_MLOCK_POOL_SIZE to 0. 363 364Automated Analysis 365--------------------- 366 367Currently the main tool used by the Botan developers for testing for side 368channels at runtime is valgrind; valgrind's runtime API is used to taint memory 369values, and any jumps or indexes using data derived from these values will cause 370a valgrind warning. This technique was first used by Adam Langley in ctgrind. 371See header ct_utils.h. 372 373To check, install valgrind, configure the build with --with-valgrind, and run 374the tests. 375 376.. highlight:: shell 377 378There is also a test utility built into the command line util, `timing_test`, 379which runs an operation on several different inputs many times in order to 380detect simple timing differences. The output can be processed using the 381Mona timing report library (https://github.com/seecurity/mona-timing-report). 382To run a timing report (here for example pow_mod):: 383 384 $ ./botan timing_test pow_mod > pow_mod.raw 385 386This must be run from a checkout of the source, or otherwise ``--test-data-dir=`` 387must be used to point to the expected input files. 388 389Build and run the Mona report as:: 390 391 $ git clone https://github.com/seecurity/mona-timing-report.git 392 $ cd mona-timing-report 393 $ ant 394 $ java -jar ReportingTool.jar --lowerBound=0.4 --upperBound=0.5 --inputFile=pow_mod.raw --name=PowMod 395 396This will produce plots and an HTML file in subdirectory starting with 397``reports_`` followed by a representation of the current date and time. 398 399References 400--------------- 401 402[Aes256Sc] Neve, Tiri "On the complexity of side-channel attacks on AES-256" 403(https://eprint.iacr.org/2007/318.pdf) 404 405[AesCacheColl] Bonneau, Mironov "Cache-Collision Timing Attacks Against AES" 406(http://www.jbonneau.com/doc/BM06-CHES-aes_cache_timing.pdf) 407 408[CoronDpa] Coron, 409"Resistance against Differential Power Analysis for Elliptic Curve Cryptosystems" 410(https://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.5695) 411 412[InvalidCurve] Biehl, Meyer, Müller: Differential fault attacks on 413elliptic curve cryptosystems 414(https://www.iacr.org/archive/crypto2000/18800131/18800131.pdf) 415 416[InvalidCurveTLS] Jager, Schwenk, Somorovsky: Practical Invalid Curve 417Attacks on TLS-ECDH 418(https://www.nds.rub.de/research/publications/ESORICS15/) 419 420[SafeCurves] Bernstein, Lange: SafeCurves: choosing safe curves for 421elliptic-curve cryptography. (https://safecurves.cr.yp.to) 422 423[Lucky13] AlFardan, Paterson "Lucky Thirteen: Breaking the TLS and DTLS Record Protocols" 424(http://www.isg.rhul.ac.uk/tls/TLStiming.pdf) 425 426[MillionMsg] Bleichenbacher "Chosen Ciphertext Attacks Against Protocols Based 427on the RSA Encryption Standard PKCS1" 428(https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.19.8543) 429 430[MillionMsgTiming] Meyer, Somorovsky, Weiss, Schwenk, Schinzel, Tews: Revisiting 431SSL/TLS Implementations: New Bleichenbacher Side Channels and Attacks 432(https://www.nds.rub.de/research/publications/mswsst2014-bleichenbacher-usenix14/) 433 434[OaepTiming] Manger, "A Chosen Ciphertext Attack on RSA Optimal Asymmetric 435Encryption Padding (OAEP) as Standardized in PKCS #1 v2.0" 436(http://archiv.infsec.ethz.ch/education/fs08/secsem/Manger01.pdf) 437 438[RsaFault] Boneh, Demillo, Lipton 439"On the importance of checking cryptographic protocols for faults" 440(https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.48.9764) 441 442[RandomMonty] Le, Tan, Tunstall "Randomizing the Montgomery Powering Ladder" 443(https://eprint.iacr.org/2015/657) 444 445[VectorAes] Hamburg, "Accelerating AES with Vector Permute Instructions" 446https://shiftleft.org/papers/vector_aes/vector_aes.pdf 447 448[VersionOracle] Klíma, Pokorný, Rosa "Attacking RSA-based Sessions in SSL/TLS" 449(https://eprint.iacr.org/2003/052) 450