1Side Channels
2=========================
3
4Many cryptographic systems can be easily broken by side channels. This document
5notes side channel protections which are currently implemented, as well as areas
6of the code which are known to be vulnerable to side channels. The latter are
7obviously all open for future improvement.
8
9The following text assumes the reader is already familiar with cryptographic
10implementations, side channel attacks, and common countermeasures.
11
12Modular Exponentiation
13------------------------
14
15Modular exponentiation uses a fixed window algorithm with Montgomery
16representation. A side channel silent table lookup is used to access the
17precomputed powers. The caller provides the maximum possible bit length of the
18exponent, and the exponent is zero-padded as required. For example, in a DSA
19signature with 256-bit q, the caller will specify a maximum length of exponent
20of 256 bits, even if the k that was generated was 250 bits. This avoids leaking
21the length of the exponent through the number of loop iterations.
22See monty_exp.cpp and monty.cpp
23
24Karatsuba multiplication algorithm avoids any conditional branches; in
25cases where different operations must be performed it instead uses masked
26operations. See mp_karat.cpp for details.
27
28The Montgomery reduction is written to run in constant time.
29The final reduction is handled with a masked subtraction. See mp_monty.cpp.
30
31Barrett Reduction
32--------------------
33
34The Barrett reduction code is written to avoid input dependent branches. The
35Barrett algorithm only works for inputs up to a certain size, and larger values
36fall back on a different (slower) division algorithm. This secondary algorithm
37is also const time, but the branch allows detecting when a value larger than
382^{2k} was reduced, where k is the word length of the modulus. This leaks only
39the size of the two values, and not anything else about their value.
40
41RSA
42----------------------
43
44Blinding is always used to protect private key operations (there is no way to
45turn it off). Both base blinding and exponent blinding are used.
46
47For base blinding, as an optimization, instead of choosing a new random mask and
48inverse with each decryption, both the mask and its inverse are simply squared
49to choose the next blinding factor. This is much faster than computing a fresh
50value each time, and the additional relation is thought to provide only minimal
51useful information for an attacker. Every BOTAN_BLINDING_REINIT_INTERVAL
52(default 64) operations, a new starting point is chosen.
53
54Exponent blinding uses new values for each signature, with 64 bit masks.
55
56RSA signing uses the CRT optimization, which is much faster but vulnerable to
57trivial fault attacks [RsaFault] which can result in the key being entirely
58compromised. To protect against this (or any other computational error which
59would have the same effect as a fault attack in this case), after every private
60key operation the result is checked for consistency with the public key. This
61introduces only slight additional overhead and blocks most fault attacks; it is
62possible to use a second fault attack to bypass this verification, but such a
63double fault attack requires significantly more control on the part of an
64attacker than a BellCore style attack, which is possible if any error at all
65occurs during either modular exponentiation involved in the RSA signature
66operation.
67
68See blinding.cpp and rsa.cpp.
69
70If the OpenSSL provider is enabled, then no explicit blinding is done; we assume
71OpenSSL handles this. See openssl_rsa.cpp.
72
73Decryption of PKCS #1 v1.5 Ciphertexts
74----------------------------------------
75
76This padding scheme is used with RSA, and is very vulnerable to errors. In a
77scenario where an attacker can repeatedly present RSA ciphertexts, and a
78legitimate key holder will attempt to decrypt each ciphertext and simply
79indicates to the attacker if the PKCS padding was valid or not (without
80revealing any additional information), the attacker can use this behavior as an
81oracle to perform iterative decryption of arbitrary RSA ciphertexts encrypted
82under that key. This is the famous million message attack [MillionMsg].  A side
83channel such as a difference in time taken to handle valid and invalid RSA
84ciphertexts is enough to mount the attack [MillionMsgTiming].
85
86As a first step, the PKCS v1.5 decoding operation runs without any
87conditional jumps or indexes, with the only variance in runtime being
88based on the length of the public modulus, which is public information.
89
90Preventing the attack in full requires some application level changes. In
91protocols which know the expected length of the encrypted key, PK_Decryptor
92provides the function `decrypt_or_random` which first generates a random fake
93key, then decrypts the presented ciphertext, then in constant time either copies
94out the random key or the decrypted plaintext depending on if the ciphertext was
95valid or not (valid padding and expected plaintext length). Then in the case of
96an attack, the protocol will carry on with a randomly chosen key, which will
97presumably cause total failure in a way that does not allow an attacker to
98distinguish (via any timing or other side channel, nor any error messages
99specific to the one situation vs the other) if the RSA padding was valid or
100invalid.
101
102One very important user of PKCS #1 v1.5 encryption is the TLS protocol. In TLS,
103some extra versioning information is embedded in the plaintext message, along
104with the key. It turns out that this version information must be treated in an
105identical (constant-time) way with the PKCS padding, or again the system is
106broken. [VersionOracle]. This is supported by a special version of
107PK_Decryptor::decrypt_or_random that additionally allows verifying one or more
108content bytes, in addition to the PKCS padding.
109
110See eme_pkcs.cpp and pubkey.cpp.
111
112Verification of PKCS #1 v1.5 Signatures
113----------------------------------------
114
115One way of verifying PKCS #1 v1.5 signature padding is to decode it with an
116ASN.1 BER parser. However such a design commonly leads to accepting signatures
117besides the (single) valid RSA PKCS #1 v1.5 signature for any given message,
118because often the BER parser accepts variations of the encoding which are
119actually invalid. It also needlessly exposes the BER parser to untrusted inputs.
120
121It is safer and simpler to instead re-encode the hash value we are expecting
122using the PKCS #1 v1.5 encoding rules, and const time compare our expected
123encoding with the output of the RSA operation. So that is what Botan does.
124
125See emsa_pkcs.cpp.
126
127OAEP
128----------------------
129
130RSA OAEP is (PKCS#1 v2) is the recommended version of RSA encoding standard,
131because it is not directly vulnerable to Bleichenbacher attack. However, if
132implemented incorrectly, a side channel can be presented to an attacker and
133create an oracle for decrypting RSA ciphertexts [OaepTiming].
134
135This attack is avoided in Botan by making the OAEP decoding operation run
136without any conditional jumps or indexes, with the only variance in runtime
137coming from the length of the RSA key (which is public information).
138
139See eme_oaep.cpp.
140
141ECC point decoding
142----------------------
143
144The API function OS2ECP, which is used to convert byte strings to ECC points,
145verifies that all points satisfy the ECC curve equation. Points that do not
146satisfy the equation are invalid, and can sometimes be used to break
147protocols ([InvalidCurve] [InvalidCurveTLS]). See point_gfp.cpp.
148
149ECC scalar multiply
150----------------------
151
152There are several different implementations of ECC scalar multiplications which
153depend on the API invoked. This include ``PointGFp::operator*``,
154``EC_Group::blinded_base_point_multiply`` and
155``EC_Group::blinded_var_point_multiply``.
156
157The ``PointGFp::operator*`` implementation uses the Montgomery ladder, which is
158fairly resistant to side channels. However it leaks the size of the scalar,
159because the loop iterations are bounded by the scalar size. It should not be
160used in cases when the scalar is a secret.
161
162Both ``blinded_base_point_multiply`` and ``blinded_var_point_multiply`` apply
163side channel countermeasures. The scalar is masked by a multiple of the group
164order (this is commonly called Coron's first countermeasure [CoronDpa]),
165currently the mask is an 80 bit random value.
166
167Botan stores all ECC points in Jacobian representation. This form allows faster
168computation by representing points (x,y) as (X,Y,Z) where x=X/Z^2 and
169y=Y/Z^3. As the representation is redundant, for any randomly chosen non-zero r,
170(X*r^2,Y*r^3,Z*r) is an equivalent point. Changing the point values prevents an
171attacker from mounting attacks based on the input point remaining unchanged over
172multiple executions. This is commonly called Coron's third countermeasure, see
173again [CoronDpa].
174
175The base point multiplication algorithm is a comb-like technique which
176precomputes ``P^i,(2*P)^i,(3*P)^i`` for all ``i`` in the range of valid scalars.
177This means the scalar multiplication involves only point additions and no
178doublings, which may help against attacks which rely on distinguishing between
179point doublings and point additions. The elements of the table are accessed by
180masked lookups, so as not to leak information about bits of the scalar via a
181cache side channel. However, whenever 3 sequential bits of the (masked) scalar
182are all 0, no operation is performed in that iteration of the loop. This exposes
183the scalar multiply to a cache-based side channel attack; scalar blinding is
184necessary to prevent this attack from leaking information about the scalar.
185
186The variable point multiplication algorithm uses a fixed-window algorithm. Since
187this is normally invoked using untrusted points (eg during ECDH key exchange) it
188randomizes all inputs to prevent attacks which are based on chosen input
189points. The table of precomputed multiples is accessed using a masked lookup
190which should not leak information about the secret scalar to an attacker who can
191mount a cache-based side channel attack.
192
193See point_gfp.cpp and point_mul.cpp
194
195ECDH
196----------------------
197
198ECDH verifies (through its use of OS2ECP) that all input points received from
199the other party satisfy the curve equation. This prevents twist attacks. The
200same check is performed on the output point, which helps prevent fault attacks.
201
202ECDSA
203----------------------
204
205Inversion of the ECDSA nonce k must be done in constant time, as any leak of
206even a single bit of the nonce can be sufficient to allow recovering the private
207key. In Botan all inverses modulo an odd number are performed using a constant
208time algorithm due to Niels Möller.
209
210x25519
211----------------------
212
213The x25519 code is independent of the main Weierstrass form ECC code, instead
214based on curve25519-donna-c64.c by Adam Langley. The code seems immune to cache
215based side channels. It does make use of integer multiplications; on some old
216CPUs these multiplications take variable time and might allow a side channel
217attack. This is not considered a problem on modern processors.
218
219TLS CBC ciphersuites
220----------------------
221
222The original TLS v1.0 CBC Mac-then-Encrypt mode is vulnerable to an oracle
223attack. If an attacker can distinguish padding errors through different error
224messages [TlsCbcOracle] or via a side channel attack like [Lucky13], they can
225abuse the server as a decryption oracle.
226
227The side channel protection for Lucky13 follows the approach proposed in the
228Lucky13 paper. It is not perfectly constant time, but does hide the padding
229oracle in practice. Tools to test TLS CBC decoding are included in the timing
230tests. See https://github.com/randombit/botan/pull/675 for more information.
231
232The Encrypt-then-MAC extension, which completely avoids the side channel, is
233implemented and used by default for CBC ciphersuites.
234
235CBC mode padding
236----------------------
237
238In theory, any good protocol protects CBC ciphertexts with a MAC. But in
239practice, some protocols are not good and cannot be fixed immediately. To avoid
240making a bad problem worse, the code to handle decoding CBC ciphertext padding
241bytes runs in constant time, depending only on the block size of the cipher.
242
243AES
244----------------------
245
246Some x86, ARMv8 and POWER processors support AES instructions which
247are fast and are thought to be side channel silent. These instructions
248are used when available.
249
250On CPUs which do not have hardware AES instructions but do support SIMD vectors
251with a byte shuffle (including x86's SSSE3, ARM's NEON and PowerPC AltiVec), a
252version of AES is implemented which is side channel silent. This implementation
253is based on code by Mike Hamburg [VectorAes], see aes_vperm.cpp.
254
255On all other processors, a constant time bitsliced implementation is used. This
256is typically slower than the vector permute implementation, and additionally for
257best performance multiple blocks must be processed in parellel.  So modes such
258as CTR, GCM or XTS are relatively fast, but others such as CBC encryption
259suffer.
260
261GCM
262---------------------
263
264On platforms that support a carryless multiply instruction (ARMv8 and recent x86),
265GCM is fast and constant time.
266
267On all other platforms, GCM uses an algorithm based on precomputing all powers
268of H from 1 to 128. Then for every bit of the input a mask is formed which
269allows conditionally adding that power without leaking information via a cache
270side channel. There is also an SSSE3 variant of this algorithm which is somewhat
271faster on processors which have SSSE3 but no AES-NI instructions.
272
273OCB
274-----------------------
275
276It is straightforward to implement OCB mode in a efficient way that does not
277depend on any secret branches or lookups. See ocb.cpp for the implementation.
278
279Poly1305
280----------------------
281
282The Poly1305 implementation does not have any secret lookups or conditionals.
283The code is based on the public domain version by Andrew Moon.
284
285DES/3DES
286----------------------
287
288The DES implementation uses table lookups, and is likely vulnerable to side
289channel attacks. DES or 3DES should be avoided in new systems. The proper fix
290would be a scalar bitsliced implementation, this is not seen as worth the
291engineering investment given these algorithms end of life status.
292
293Twofish
294------------------------
295
296This algorithm uses table lookups with secret sboxes. No cache-based side
297channel attack on Twofish has ever been published, but it is possible nobody
298sufficiently skilled has ever tried.
299
300ChaCha20, Serpent, Threefish, ...
301-----------------------------------
302
303Some algorithms including ChaCha, Salsa, Serpent and Threefish are 'naturally'
304silent to cache and timing side channels on all recent processors.
305
306IDEA
307---------------
308
309IDEA encryption, decryption, and key schedule are implemented to take constant
310time regardless of their inputs.
311
312Hash Functions
313-------------------------
314
315Most hash functions included in Botan such as MD5, SHA-1, SHA-2, SHA-3, Skein,
316and BLAKE2 do not require any input-dependent memory lookups, and so seem to not be
317affected by common CPU side channels. However the implementations of Whirlpool
318and Streebog use table lookups and probably can be attacked by side channels.
319
320Memory comparisons
321----------------------
322
323The function same_mem in header mem_ops.h provides a constant-time comparison
324function. It is used when comparing MACs or other secret values. It is also
325exposed for application use.
326
327Memory zeroizing
328----------------------
329
330There is no way in portable C/C++ to zero out an array before freeing it, in
331such a way that it is guaranteed that the compiler will not elide the
332'additional' (seemingly unnecessary) writes to zero out the memory.
333
334The function secure_scrub_memory (in mem_ops.cpp) uses some system specific
335trick to zero out an array. If possible an OS provided routine (such as
336``RtlSecureZeroMemory`` or ``explicit_bzero``) is used.
337
338On other platforms, by default the trick of referencing memset through a
339volatile function pointer is used. This approach is not guaranteed to work on
340all platforms, and currently there is no systematic check of the resulting
341binary function that it is compiled as expected. But, it is the best approach
342currently known and has been verified to work as expected on common platforms.
343
344If BOTAN_USE_VOLATILE_MEMSET_FOR_ZERO is set to 0 in build.h (not the default) a
345byte at a time loop through a volatile pointer is used to overwrite the array.
346
347Memory allocation
348----------------------
349
350Botan's secure_vector type is a std::vector with a custom allocator. The
351allocator calls secure_scrub_memory before freeing memory.
352
353Some operating systems support an API call to lock a range of pages
354into memory, such that they will never be swapped out (``mlock`` on POSIX,
355``VirtualLock`` on Windows). On many POSIX systems ``mlock`` is only usable by
356root, but on Linux, FreeBSD and possibly other systems a small amount
357of memory can be locked by processes without extra credentials.
358
359If available, Botan uses such a region for storing key material. A page-aligned
360block of memory is allocated and locked, then the memory is scrubbed before
361freeing. This memory pool is used by secure_vector when available. It can be
362disabled at runtime setting the environment variable BOTAN_MLOCK_POOL_SIZE to 0.
363
364Automated Analysis
365---------------------
366
367Currently the main tool used by the Botan developers for testing for side
368channels at runtime is valgrind; valgrind's runtime API is used to taint memory
369values, and any jumps or indexes using data derived from these values will cause
370a valgrind warning. This technique was first used by Adam Langley in ctgrind.
371See header ct_utils.h.
372
373To check, install valgrind, configure the build with --with-valgrind, and run
374the tests.
375
376.. highlight:: shell
377
378There is also a test utility built into the command line util, `timing_test`,
379which runs an operation on several different inputs many times in order to
380detect simple timing differences. The output can be processed using the
381Mona timing report library (https://github.com/seecurity/mona-timing-report).
382To run a timing report (here for example pow_mod)::
383
384  $ ./botan timing_test pow_mod > pow_mod.raw
385
386This must be run from a checkout of the source, or otherwise ``--test-data-dir=``
387must be used to point to the expected input files.
388
389Build and run the Mona report as::
390
391  $ git clone https://github.com/seecurity/mona-timing-report.git
392  $ cd mona-timing-report
393  $ ant
394  $ java -jar ReportingTool.jar --lowerBound=0.4 --upperBound=0.5 --inputFile=pow_mod.raw --name=PowMod
395
396This will produce plots and an HTML file in subdirectory starting with
397``reports_`` followed by a representation of the current date and time.
398
399References
400---------------
401
402[Aes256Sc] Neve, Tiri "On the complexity of side-channel attacks on AES-256"
403(https://eprint.iacr.org/2007/318.pdf)
404
405[AesCacheColl] Bonneau, Mironov "Cache-Collision Timing Attacks Against AES"
406(http://www.jbonneau.com/doc/BM06-CHES-aes_cache_timing.pdf)
407
408[CoronDpa] Coron,
409"Resistance against Differential Power Analysis for Elliptic Curve Cryptosystems"
410(https://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.5695)
411
412[InvalidCurve] Biehl, Meyer, Müller: Differential fault attacks on
413elliptic curve cryptosystems
414(https://www.iacr.org/archive/crypto2000/18800131/18800131.pdf)
415
416[InvalidCurveTLS] Jager, Schwenk, Somorovsky: Practical Invalid Curve
417Attacks on TLS-ECDH
418(https://www.nds.rub.de/research/publications/ESORICS15/)
419
420[SafeCurves] Bernstein, Lange: SafeCurves: choosing safe curves for
421elliptic-curve cryptography. (https://safecurves.cr.yp.to)
422
423[Lucky13] AlFardan, Paterson "Lucky Thirteen: Breaking the TLS and DTLS Record Protocols"
424(http://www.isg.rhul.ac.uk/tls/TLStiming.pdf)
425
426[MillionMsg] Bleichenbacher "Chosen Ciphertext Attacks Against Protocols Based
427on the RSA Encryption Standard PKCS1"
428(https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.19.8543)
429
430[MillionMsgTiming] Meyer, Somorovsky, Weiss, Schwenk, Schinzel, Tews: Revisiting
431SSL/TLS Implementations: New Bleichenbacher Side Channels and Attacks
432(https://www.nds.rub.de/research/publications/mswsst2014-bleichenbacher-usenix14/)
433
434[OaepTiming] Manger, "A Chosen Ciphertext Attack on RSA Optimal Asymmetric
435Encryption Padding (OAEP) as Standardized in PKCS #1 v2.0"
436(http://archiv.infsec.ethz.ch/education/fs08/secsem/Manger01.pdf)
437
438[RsaFault] Boneh, Demillo, Lipton
439"On the importance of checking cryptographic protocols for faults"
440(https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.48.9764)
441
442[RandomMonty] Le, Tan, Tunstall "Randomizing the Montgomery Powering Ladder"
443(https://eprint.iacr.org/2015/657)
444
445[VectorAes] Hamburg, "Accelerating AES with Vector Permute Instructions"
446https://shiftleft.org/papers/vector_aes/vector_aes.pdf
447
448[VersionOracle] Klíma, Pokorný, Rosa "Attacking RSA-based Sessions in SSL/TLS"
449(https://eprint.iacr.org/2003/052)
450