#
6d85a058 |
| 20-May-2024 |
Tony Luck <tony.luck@intel.com> |
crypto: x86/aes-xts - switch to new Intel CPU model defines
New CPU #defines encode vendor and family as well as model.
Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov
crypto: x86/aes-xts - switch to new Intel CPU model defines
New CPU #defines encode vendor and family as well as model.
Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Eric Biggers <ebiggers@google.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Link: https://lore.kernel.org/r/20240520224620.9480-2-tony.luck@intel.com
show more ...
|
#
ed265f7f |
| 20-Apr-2024 |
Eric Biggers <ebiggers@google.com> |
crypto: x86/aes-gcm - simplify GCM hash subkey derivation
Remove a redundant expansion of the AES key, and use rodata for zeroes. Also rename rfc4106_set_hash_subkey() to aes_gcm_derive_hash_subkey(
crypto: x86/aes-gcm - simplify GCM hash subkey derivation
Remove a redundant expansion of the AES key, and use rodata for zeroes. Also rename rfc4106_set_hash_subkey() to aes_gcm_derive_hash_subkey() because it's used for both versions of AES-GCM, not just RFC4106.
Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
#
6a805864 |
| 20-Apr-2024 |
Eric Biggers <ebiggers@google.com> |
crypto: x86/aes-xts - simplify loop in xts_crypt_slowpath()
Since the total length processed by the loop in xts_crypt_slowpath() is a multiple of AES_BLOCK_SIZE, just round the length down to AES_BL
crypto: x86/aes-xts - simplify loop in xts_crypt_slowpath()
Since the total length processed by the loop in xts_crypt_slowpath() is a multiple of AES_BLOCK_SIZE, just round the length down to AES_BLOCK_SIZE even on the last step. This doesn't change behavior, as the last step will process a multiple of AES_BLOCK_SIZE regardless.
Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
#
543ea178 |
| 13-Apr-2024 |
Eric Biggers <ebiggers@google.com> |
crypto: x86/aes-xts - optimize size of instructions operating on lengths
x86_64 has the "interesting" property that the instruction size is generally a bit shorter for instructions that operate on t
crypto: x86/aes-xts - optimize size of instructions operating on lengths
x86_64 has the "interesting" property that the instruction size is generally a bit shorter for instructions that operate on the 32-bit (or less) part of registers, or registers that are in the original set of 8.
This patch adjusts the AES-XTS code to take advantage of that property by changing the LEN parameter from size_t to unsigned int (which is all that's needed and is what the non-AVX implementation uses) and using the %eax register for KEYLEN.
This decreases the size of aes-xts-avx-x86_64.o by 1.2%.
Note that changing the kmovq to kmovd was going to be needed anyway to make the AVX10/256 code really work on CPUs that don't support 512-bit vectors (since the AVX10 spec says that 64-bit opmask instructions will only be supported on processors that support 512-bit vectors).
Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
#
751fb252 |
| 07-Apr-2024 |
Eric Biggers <ebiggers@google.com> |
crypto: x86/aes-xts - make non-AVX implementation use new glue code
Make the non-AVX implementation of AES-XTS (xts-aes-aesni) use the new glue code that was introduced for the AVX implementations o
crypto: x86/aes-xts - make non-AVX implementation use new glue code
Make the non-AVX implementation of AES-XTS (xts-aes-aesni) use the new glue code that was introduced for the AVX implementations of AES-XTS. This reduces code size, and it improves the performance of xts-aes-aesni due to the optimization for messages that don't span page boundaries.
This required moving the new glue functions higher up in the file and allowing the IV encryption function to be specified by the caller.
Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
#
aa2197f5 |
| 29-Mar-2024 |
Eric Biggers <ebiggers@google.com> |
crypto: x86/aes-xts - wire up VAES + AVX10/512 implementation
Add an AES-XTS implementation "xts-aes-vaes-avx10_512" for x86_64 CPUs with the VAES, VPCLMULQDQ, and either AVX10/512 or AVX512BW + AVX
crypto: x86/aes-xts - wire up VAES + AVX10/512 implementation
Add an AES-XTS implementation "xts-aes-vaes-avx10_512" for x86_64 CPUs with the VAES, VPCLMULQDQ, and either AVX10/512 or AVX512BW + AVX512VL extensions. This implementation uses zmm registers to operate on four AES blocks at a time. The assembly code is instantiated using a macro so that most of the source code is shared with other implementations.
To avoid downclocking on older Intel CPU models, an exclusion list is used to prevent this 512-bit implementation from being used by default on some CPU models. They will use xts-aes-vaes-avx10_256 instead. For now, this exclusion list is simply coded into aesni-intel_glue.c. It may make sense to eventually move it into a more central location.
xts-aes-vaes-avx10_512 is slightly faster than xts-aes-vaes-avx10_256 on some current CPUs. E.g., on AMD Zen 4, AES-256-XTS decryption throughput increases by 13% with 4096-byte inputs, or 14% with 512-byte inputs. On Intel Sapphire Rapids, AES-256-XTS decryption throughput increases by 2% with 4096-byte inputs, or 3% with 512-byte inputs.
Future CPUs may provide stronger 512-bit support, in which case a larger benefit should be seen.
Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
#
ee63fea0 |
| 29-Mar-2024 |
Eric Biggers <ebiggers@google.com> |
crypto: x86/aes-xts - wire up VAES + AVX10/256 implementation
Add an AES-XTS implementation "xts-aes-vaes-avx10_256" for x86_64 CPUs with the VAES, VPCLMULQDQ, and either AVX10/256 or AVX512BW + AVX
crypto: x86/aes-xts - wire up VAES + AVX10/256 implementation
Add an AES-XTS implementation "xts-aes-vaes-avx10_256" for x86_64 CPUs with the VAES, VPCLMULQDQ, and either AVX10/256 or AVX512BW + AVX512VL extensions. This implementation avoids using zmm registers, instead using ymm registers to operate on two AES blocks at a time. The assembly code is instantiated using a macro so that most of the source code is shared with other implementations.
This is the optimal implementation on CPUs that support VAES and AVX512 but where the zmm registers should not be used due to downclocking effects, for example Intel's Ice Lake. It should also be the optimal implementation on future CPUs that support AVX10/256 but not AVX10/512.
The performance is slightly better than that of xts-aes-vaes-avx2, which uses the same 256-bit vector length, due to factors such as being able to use ymm16-ymm31 to cache the AES round keys, and being able to use the vpternlogd instruction to do XORs more efficiently. For example, on Ice Lake, the throughput of decrypting 4096-byte messages with AES-256-XTS is 6.6% higher with xts-aes-vaes-avx10_256 than with xts-aes-vaes-avx2. While this is a small improvement, it is straightforward to provide this implementation (xts-aes-vaes-avx10_256) as long as we are providing xts-aes-vaes-avx2 and xts-aes-vaes-avx10_512 anyway, due to the way the _aes_xts_crypt macro is structured.
Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
#
e787060b |
| 29-Mar-2024 |
Eric Biggers <ebiggers@google.com> |
crypto: x86/aes-xts - wire up VAES + AVX2 implementation
Add an AES-XTS implementation "xts-aes-vaes-avx2" for x86_64 CPUs with the VAES, VPCLMULQDQ, and AVX2 extensions, but not AVX512 or AVX10. Th
crypto: x86/aes-xts - wire up VAES + AVX2 implementation
Add an AES-XTS implementation "xts-aes-vaes-avx2" for x86_64 CPUs with the VAES, VPCLMULQDQ, and AVX2 extensions, but not AVX512 or AVX10. This implementation uses ymm registers to operate on two AES blocks at a time. The assembly code is instantiated using a macro so that most of the source code is shared with other implementations.
This is the optimal implementation on AMD Zen 3. It should also be the optimal implementation on Intel Alder Lake, which similarly supports VAES but not AVX512. Comparing to xts-aes-aesni-avx on Zen 3, xts-aes-vaes-avx2 provides 70% higher AES-256-XTS decryption throughput with 4096-byte messages, or 23% higher with 512-byte messages.
A large improvement is also seen with CPUs that do support AVX512 (e.g., 98% higher AES-256-XTS decryption throughput on Ice Lake with 4096-byte messages), though the following patches add AVX512 optimized implementations to get a bit more performance on those CPUs.
Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
#
996f4dcb |
| 29-Mar-2024 |
Eric Biggers <ebiggers@google.com> |
crypto: x86/aes-xts - wire up AESNI + AVX implementation
Add an AES-XTS implementation "xts-aes-aesni-avx" for x86_64 CPUs that have the AES-NI and AVX extensions but not VAES. It's similar to the
crypto: x86/aes-xts - wire up AESNI + AVX implementation
Add an AES-XTS implementation "xts-aes-aesni-avx" for x86_64 CPUs that have the AES-NI and AVX extensions but not VAES. It's similar to the existing xts-aes-aesni in that uses xmm registers to operate on one AES block at a time. It differs from xts-aes-aesni in the following ways:
- It uses the VEX-coded (non-destructive) instructions from AVX. This improves performance slightly. - It incorporates some additional optimizations such as interleaving the tweak computation with AES en/decryption, handling single-page messages more efficiently, and caching the first round key. - It supports only 64-bit (x86_64). - It's generated by an assembly macro that will also be used to generate VAES-based implementations.
The performance improvement over xts-aes-aesni varies from small to large, depending on the CPU and other factors such as the size of the messages en/decrypted. For example, the following increases in AES-256-XTS decryption throughput are seen on the following CPUs:
| 4096-byte messages | 512-byte messages | ----------------------+--------------------+-------------------+ Intel Skylake | 6% | 31% | Intel Cascade Lake | 4% | 26% | AMD Zen 1 | 61% | 73% | AMD Zen 2 | 36% | 59% |
(The above CPUs don't support VAES, so they can't use VAES instead.)
While this isn't as large an improvement as what VAES provides, this still seems worthwhile. This implementation is fairly easy to provide based on the assembly macro that's needed for VAES anyway, and it will be the best implementation on a large number of CPUs (very roughly, the CPUs launched by Intel and AMD from 2011 to 2018).
This makes the existing xts-aes-aesni *mostly* obsolete. For now, leave it in place to support 32-bit kernels and also CPUs like Intel Westmere that support AES-NI but not AVX. (We could potentially remove it anyway and just rely on the indirect acceleration via ecb-aes-aesni in those cases, but that change will need to be considered separately.)
Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
#
e3299a4c |
| 22-Mar-2024 |
Chang S. Bae <chang.seok.bae@intel.com> |
crypto: x86/aesni - Update aesni_set_key() to return void
The aesni_set_key() implementation has no error case, yet its prototype specifies to return an error code.
Modify the function prototype to
crypto: x86/aesni - Update aesni_set_key() to return void
The aesni_set_key() implementation has no error case, yet its prototype specifies to return an error code.
Modify the function prototype to return void and adjust the related code.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: linux-crypto@vger.kernel.org Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
#
d50b35f0 |
| 22-Mar-2024 |
Chang S. Bae <chang.seok.bae@intel.com> |
crypto: x86/aesni - Rearrange AES key size check
aes_expandkey() already includes an AES key size check. If AES-NI is unusable, invoke the function without the size check.
Also, use aes_check_keyle
crypto: x86/aesni - Rearrange AES key size check
aes_expandkey() already includes an AES key size check. If AES-NI is unusable, invoke the function without the size check.
Also, use aes_check_keylen() instead of open code.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: linux-crypto@vger.kernel.org Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
#
e12a68b3 |
| 28-Sep-2023 |
Chang S. Bae <chang.seok.bae@intel.com> |
crypto: x86/aesni - Perform address alignment early for XTS mode
Currently, the alignment of each field in struct aesni_xts_ctx occurs right before every access. However, it's possible to perform th
crypto: x86/aesni - Perform address alignment early for XTS mode
Currently, the alignment of each field in struct aesni_xts_ctx occurs right before every access. However, it's possible to perform this alignment ahead of time.
Introduce a helper function that converts struct crypto_skcipher *tfm to struct aesni_xts_ctx *ctx and returns an aligned address. Utilize this helper function at the beginning of each XTS function and then eliminate redundant alignment code.
Suggested-by: Eric Biggers <ebiggers@kernel.org> Link: https://lore.kernel.org/all/ZFWQ4sZEVu%2FLHq+Q@gmail.com/ Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Cc: linux-crypto@vger.kernel.org Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
#
d148736f |
| 28-Sep-2023 |
Chang S. Bae <chang.seok.bae@intel.com> |
crypto: x86/aesni - Correct the data type in struct aesni_xts_ctx
Currently, every field in struct aesni_xts_ctx is defined as a byte array of the same size as struct crypto_aes_ctx. This data type
crypto: x86/aesni - Correct the data type in struct aesni_xts_ctx
Currently, every field in struct aesni_xts_ctx is defined as a byte array of the same size as struct crypto_aes_ctx. This data type is obscure and the choice lacks justification.
To rectify this, update the field type in struct aesni_xts_ctx to match its actual structure.
Suggested-by: Eric Biggers <ebiggers@kernel.org> Link: https://lore.kernel.org/all/ZFWQ4sZEVu%2FLHq+Q@gmail.com/ Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Cc: linux-crypto@vger.kernel.org Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
#
62496a2d |
| 28-Sep-2023 |
Chang S. Bae <chang.seok.bae@intel.com> |
crypto: x86/aesni - Refactor the common address alignment code
The address alignment code has been duplicated for each mode. Instead of duplicating the same code, refactor the alignment code and sim
crypto: x86/aesni - Refactor the common address alignment code
The address alignment code has been duplicated for each mode. Instead of duplicating the same code, refactor the alignment code and simplify the alignment helpers.
Suggested-by: Eric Biggers <ebiggers@kernel.org> Link: https://lore.kernel.org/all/20230526065414.GB875@sol.localdomain/ Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Cc: linux-crypto@vger.kernel.org Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
#
28b77609 |
| 15-Jul-2023 |
Eric Biggers <ebiggers@google.com> |
crypto: x86/aesni - remove unused parameter to aes_set_key_common()
The 'tfm' parameter to aes_set_key_common() is never used, so remove it.
Signed-off-by: Eric Biggers <ebiggers@google.com> Signed
crypto: x86/aesni - remove unused parameter to aes_set_key_common()
The 'tfm' parameter to aes_set_key_common() is never used, so remove it.
Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
#
74c6df41 |
| 21-Jun-2023 |
Chang S. Bae <chang.seok.bae@intel.com> |
crypto: x86/aesni - Align the address before aes_set_key_common()
aes_set_key_common() performs runtime alignment to the void *raw_ctx pointer. This facilitates consistent access to the 16byte-align
crypto: x86/aesni - Align the address before aes_set_key_common()
aes_set_key_common() performs runtime alignment to the void *raw_ctx pointer. This facilitates consistent access to the 16byte-aligned address during key extension.
However, the alignment is already handlded in the GCM-related setkey functions before invoking the common function. Consequently, the alignment in the common function is unnecessary for those functions.
To establish a consistent approach throughout the glue code, remove the aes_ctx() call from its current location. Instead, place it at each call site where the runtime alignment is currently absent.
Link: https://lore.kernel.org/lkml/20230605024623.GA4653@quark.localdomain/ Suggested-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Cc: linux-crypto@vger.kernel.org Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
#
fd94fcf0 |
| 20-May-2022 |
Nathan Huckleberry <nhuck@google.com> |
crypto: x86/aesni-xctr - Add accelerated implementation of XCTR
Add hardware accelerated version of XCTR for x86-64 CPUs with AESNI support.
More information on XCTR can be found in the HCTR2 paper
crypto: x86/aesni-xctr - Add accelerated implementation of XCTR
Add hardware accelerated version of XCTR for x86-64 CPUs with AESNI support.
More information on XCTR can be found in the HCTR2 paper: "Length-preserving encryption with HCTR2": https://eprint.iacr.org/2021/1441.pdf
Signed-off-by: Nathan Huckleberry <nhuck@google.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
#
d480a26b |
| 21-Dec-2021 |
Jakub Kicinski <kuba@kernel.org> |
crypto: x86/aesni - don't require alignment of data
x86 AES-NI routines can deal with unaligned data. Crypto context (key, iv etc.) have to be aligned but we take care of that separately by copying
crypto: x86/aesni - don't require alignment of data
x86 AES-NI routines can deal with unaligned data. Crypto context (key, iv etc.) have to be aligned but we take care of that separately by copying it onto the stack. We were feeding unaligned data into crypto routines up until commit 83c83e658863 ("crypto: aesni - refactor scatterlist processing") switched to use the full skcipher API which uses cra_alignmask to decide data alignment.
This fixes 21% performance regression in kTLS.
Tested by booting with CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y (and running thru various kTLS packets).
CC: stable@vger.kernel.org # 5.15+ Fixes: 83c83e658863 ("crypto: aesni - refactor scatterlist processing") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
#
a2d3cbc8 |
| 11-Sep-2021 |
Shreyansh Chouhan <chouhan.shreyansh630@gmail.com> |
crypto: aesni - check walk.nbytes instead of err
In the code for xts_crypt(), we check for the err value returned by skcipher_walk_virt() and return from the function if it is non zero. However, skc
crypto: aesni - check walk.nbytes instead of err
In the code for xts_crypt(), we check for the err value returned by skcipher_walk_virt() and return from the function if it is non zero. However, skcipher_walk_virt() can set walk.nbytes to 0, which would cause us to call kernel_fpu_begin(), and then skip the kernel_fpu_end() call.
This patch checks for the walk.nbytes value instead, and returns if walk.nbytes is 0. This prevents us from calling kernel_fpu_begin() in the first place and also covers the case of having a non zero err value returned from skcipher_walk_virt().
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Shreyansh Chouhan <chouhan.shreyansh630@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
#
72ff2bf0 |
| 22-Aug-2021 |
Shreyansh Chouhan <chouhan.shreyansh630@gmail.com> |
crypto: aesni - xts_crypt() return if walk.nbytes is 0
xts_crypt() code doesn't call kernel_fpu_end() after calling kernel_fpu_begin() if walk.nbytes is 0. The correct behavior should be not calling
crypto: aesni - xts_crypt() return if walk.nbytes is 0
xts_crypt() code doesn't call kernel_fpu_end() after calling kernel_fpu_begin() if walk.nbytes is 0. The correct behavior should be not calling kernel_fpu_begin() if walk.nbytes is 0.
Reported-by: syzbot+20191dc583eff8602d2d@syzkaller.appspotmail.com Signed-off-by: Shreyansh Chouhan <chouhan.shreyansh630@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
#
821720b9 |
| 16-Jul-2021 |
Ard Biesheuvel <ardb@kernel.org> |
crypto: x86/aes-ni - add missing error checks in XTS code
The updated XTS code fails to check the return code of skcipher_walk_virt, which may lead to skcipher_walk_abort() or skcipher_walk_done() b
crypto: x86/aes-ni - add missing error checks in XTS code
The updated XTS code fails to check the return code of skcipher_walk_virt, which may lead to skcipher_walk_abort() or skcipher_walk_done() being called while the walk argument is in an inconsistent state.
So check the return value after each such call, and bail on errors.
Fixes: 2481104fe98d ("crypto: x86/aes-ni-xts - rewrite and drop indirections via glue helper") Reported-by: Dave Hansen <dave.hansen@intel.com> Reported-by: syzbot <syzbot+5d1bad8042a8f0e8117a@syzkaller.appspotmail.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
#
65d1e3c4 |
| 16-Jan-2021 |
Ard Biesheuvel <ardb@kernel.org> |
crypto: aesni - release FPU during skcipher walk API calls
Taking ownership of the FPU in kernel mode disables preemption, and this may result in excessive scheduling blackouts if the size of the da
crypto: aesni - release FPU during skcipher walk API calls
Taking ownership of the FPU in kernel mode disables preemption, and this may result in excessive scheduling blackouts if the size of the data being processed on the FPU is unbounded.
Given that taking and releasing the FPU is cheap these days on x86, we can limit the impact of this issue easily for skcipher implementations, by moving the FPU begin/end calls inside the skcipher walk processing loop. Considering that skcipher walks operate on at most one page at a time, doing so fully mitigates this issue.
This also permits the skcipher walk logic to use non-atomic kmalloc() calls etc so we can change the 'atomic' bool argument in the calls to skcipher_walk_virt() to false as well.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
#
64a49b85 |
| 16-Jan-2021 |
Ard Biesheuvel <ardb@kernel.org> |
crypto: aesni - replace CTR function pointer with static call
Indirect calls are very expensive on x86, so use a static call to set the system-wide AES-NI/CTR asm helper.
Signed-off-by: Ard Biesheu
crypto: aesni - replace CTR function pointer with static call
Indirect calls are very expensive on x86, so use a static call to set the system-wide AES-NI/CTR asm helper.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
#
d6cbf4ea |
| 04-Jan-2021 |
Ard Biesheuvel <ardb@kernel.org> |
crypto: aesni - replace function pointers with static branches
Replace the function pointers in the GCM implementation with static branches, which are based on code patching, which occurs only at mo
crypto: aesni - replace function pointers with static branches
Replace the function pointers in the GCM implementation with static branches, which are based on code patching, which occurs only at module load time. This avoids the severe performance penalty caused by the use of retpolines.
In order to retain the ability to switch between different versions of the implementation based on the input size on cores that support AVX and AVX2, use static branches instead of static calls.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
#
83c83e65 |
| 04-Jan-2021 |
Ard Biesheuvel <ardb@kernel.org> |
crypto: aesni - refactor scatterlist processing
Currently, the gcm(aes-ni) driver open codes the scatterlist handling that is encapsulated by the skcipher walk API. So let's switch to that instead.
crypto: aesni - refactor scatterlist processing
Currently, the gcm(aes-ni) driver open codes the scatterlist handling that is encapsulated by the skcipher walk API. So let's switch to that instead.
Also, move the handling at the end of gcmaes_crypt_by_sg() that is dependent on whether we are encrypting or decrypting into the callers, which always do one or the other.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|