#
fdb022f6 |
| 25-Mar-2024 |
Baoquan He <bhe@redhat.com> |
x86: remove unneeded memblock_find_dma_reserve()
Patch series "mm/mm_init.c: refactor free_area_init_core()".
In function free_area_init_core(), the code calculating zone->managed_pages and the sub
x86: remove unneeded memblock_find_dma_reserve()
Patch series "mm/mm_init.c: refactor free_area_init_core()".
In function free_area_init_core(), the code calculating zone->managed_pages and the subtracting dma_reserve from DMA zone looks very confusing.
From git history, the code calculating zone->managed_pages was for zone->present_pages originally. The early rough assignment is for optimize zone's pcp and water mark setting. Later, managed_pages was introduced into zone to represent the number of managed pages by buddy. Now, zone->managed_pages is zeroed out and reset in mem_init() when calling memblock_free_all(). zone's pcp and wmark setting relying on actual zone->managed_pages are done later than mem_init() invocation. So we don't need rush to early calculate and set zone->managed_pages, just set it as zone->present_pages, will adjust it in mem_init().
And also add a new function calc_nr_kernel_pages() to count up free but not reserved pages in memblock, then assign it to nr_all_pages and nr_kernel_pages after memmap pages are allocated.
This patch (of 6):
Variable dma_reserve and its usage was introduced in commit 0e0b864e069c ("[PATCH] Account for memmap and optionally the kernel image as holes"). Its original purpose was to accounting for the reserved pages in DMA zone to make DMA zone's watermarks calculation more accurate on x86.
However, currently there's zone->managed_pages to account for all available pages for buddy, zone->present_pages to account for all present physical pages in zone. What is more important, on x86, calculating and setting the zone->managed_pages is a temporary move, all zone's managed_pages will be zeroed out and reset to the actual value according to how many pages are added to buddy allocator in mem_init(). Before mem_init(), no buddy alloction is requested. And zone's pcp and watermark setting are all done after mem_init(). So, no need to worry about the DMA zone's setting accuracy during free_area_init().
Hence, remove memblock_find_dma_reserve() to stop calculating and setting dma_reserve.
Link: https://lkml.kernel.org/r/20240325145646.1044760-1-bhe@redhat.com Link: https://lkml.kernel.org/r/20240325145646.1044760-2-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
99485c4c |
| 26-Mar-2024 |
Jason A. Donenfeld <Jason@zx2c4.com> |
x86/coco: Require seeding RNG with RDRAND on CoCo systems
There are few uses of CoCo that don't rely on working cryptography and hence a working RNG. Unfortunately, the CoCo threat model means that
x86/coco: Require seeding RNG with RDRAND on CoCo systems
There are few uses of CoCo that don't rely on working cryptography and hence a working RNG. Unfortunately, the CoCo threat model means that the VM host cannot be trusted and may actively work against guests to extract secrets or manipulate computation. Since a malicious host can modify or observe nearly all inputs to guests, the only remaining source of entropy for CoCo guests is RDRAND.
If RDRAND is broken -- due to CPU hardware fault -- the RNG as a whole is meant to gracefully continue on gathering entropy from other sources, but since there aren't other sources on CoCo, this is catastrophic. This is mostly a concern at boot time when initially seeding the RNG, as after that the consequences of a broken RDRAND are much more theoretical.
So, try at boot to seed the RNG using 256 bits of RDRAND output. If this fails, panic(). This will also trigger if the system is booted without RDRAND, as RDRAND is essential for a safe CoCo boot.
Add this deliberately to be "just a CoCo x86 driver feature" and not part of the RNG itself. Many device drivers and platforms have some desire to contribute something to the RNG, and add_device_randomness() is specifically meant for this purpose.
Any driver can call it with seed data of any quality, or even garbage quality, and it can only possibly make the quality of the RNG better or have no effect, but can never make it worse.
Rather than trying to build something into the core of the RNG, consider the particular CoCo issue just a CoCo issue, and therefore separate it all out into driver (well, arch/platform) code.
[ bp: Massage commit message. ]
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Elena Reshetova <elena.reshetova@intel.com> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240326160735.73531-1-Jason@zx2c4.com
show more ...
|
#
4faa0e5d |
| 28-Mar-2024 |
Julian Stecklina <julian.stecklina@cyberus-technology.de> |
x86/boot: Move kernel cmdline setup earlier in the boot process (again)
When split_lock_detect=off (or similar) is specified in CONFIG_CMDLINE, its effect is lost. The flow is currently this:
setu
x86/boot: Move kernel cmdline setup earlier in the boot process (again)
When split_lock_detect=off (or similar) is specified in CONFIG_CMDLINE, its effect is lost. The flow is currently this:
setup_arch(): -> early_cpu_init() -> early_identify_cpu() -> sld_setup() -> sld_state_setup() -> Looks for split_lock_detect in boot_command_line
-> e820__memory_setup()
-> Assemble final command line: boot_command_line = builtin_cmdline + boot_cmdline
-> parse_early_param()
There were earlier attempts at fixing this in:
8d48bf8206f7 ("x86/boot: Pull up cmdline preparation and early param parsing")
later reverted in:
fbe618399854 ("Revert "x86/boot: Pull up cmdline preparation and early param parsing"")
... because parse_early_param() can't be called before e820__memory_setup().
In this patch, we just move the command line concatenation to the beginning of early_cpu_init(). This should fix sld_state_setup(), while not running in the same issues as the earlier attempt.
The order is now:
setup_arch(): -> Assemble final command line: boot_command_line = builtin_cmdline + boot_cmdline
-> early_cpu_init() -> early_identify_cpu() -> sld_setup() -> sld_state_setup() -> Looks for split_lock_detect in boot_command_line
-> e820__memory_setup()
-> parse_early_param()
Signed-off-by: Julian Stecklina <julian.stecklina@cyberus-technology.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: linux-kernel@vger.kernel.org
show more ...
|
#
0f4a1e80 |
| 13-Mar-2024 |
Kevin Loughlin <kevinloughlin@google.com> |
x86/sev: Skip ROM range scans and validation for SEV-SNP guests
SEV-SNP requires encrypted memory to be validated before access. Because the ROM memory range is not part of the e820 table, it is not
x86/sev: Skip ROM range scans and validation for SEV-SNP guests
SEV-SNP requires encrypted memory to be validated before access. Because the ROM memory range is not part of the e820 table, it is not pre-validated by the BIOS. Therefore, if a SEV-SNP guest kernel wishes to access this range, the guest must first validate the range.
The current SEV-SNP code does indeed scan the ROM range during early boot and thus attempts to validate the ROM range in probe_roms(). However, this behavior is neither sufficient nor necessary for the following reasons:
* With regards to sufficiency, if EFI_CONFIG_TABLES are not enabled and CONFIG_DMI_SCAN_MACHINE_NON_EFI_FALLBACK is set, the kernel will attempt to access the memory at SMBIOS_ENTRY_POINT_SCAN_START (which falls in the ROM range) prior to validation.
For example, Project Oak Stage 0 provides a minimal guest firmware that currently meets these configuration conditions, meaning guests booting atop Oak Stage 0 firmware encounter a problematic call chain during dmi_setup() -> dmi_scan_machine() that results in a crash during boot if SEV-SNP is enabled.
* With regards to necessity, SEV-SNP guests generally read garbage (which changes across boots) from the ROM range, meaning these scans are unnecessary. The guest reads garbage because the legacy ROM range is unencrypted data but is accessed via an encrypted PMD during early boot (where the PMD is marked as encrypted due to potentially mapping actually-encrypted data in other PMD-contained ranges).
In one exceptional case, EISA probing treats the ROM range as unencrypted data, which is inconsistent with other probing.
Continuing to allow SEV-SNP guests to use garbage and to inconsistently classify ROM range encryption status can trigger undesirable behavior. For instance, if garbage bytes appear to be a valid signature, memory may be unnecessarily reserved for the ROM range. Future code or other use cases may result in more problematic (arbitrary) behavior that should be avoided.
While one solution would be to overhaul the early PMD mapping to always treat the ROM region of the PMD as unencrypted, SEV-SNP guests do not currently rely on data from the ROM region during early boot (and even if they did, they would be mostly relying on garbage data anyways).
As a simpler solution, skip the ROM range scans (and the otherwise- necessary range validation) during SEV-SNP guest early boot. The potential SEV-SNP guest crash due to lack of ROM range validation is thus avoided by simply not accessing the ROM range.
In most cases, skip the scans by overriding problematic x86_init functions during sme_early_init() to SNP-safe variants, which can be likened to x86_init overrides done for other platforms (ex: Xen); such overrides also avoid the spread of cc_platform_has() checks throughout the tree.
In the exceptional EISA case, still use cc_platform_has() for the simplest change, given (1) checks for guest type (ex: Xen domain status) are already performed here, and (2) these checks occur in a subsys initcall instead of an x86_init function.
[ bp: Massage commit message, remove "we"s. ]
Fixes: 9704c07bf9f7 ("x86/kernel: Validate ROM memory before accessing when SEV-SNP is active") Signed-off-by: Kevin Loughlin <kevinloughlin@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org> Link: https://lore.kernel.org/r/20240313121546.2964854-1-kevinloughlin@google.com
show more ...
|
#
c90399fb |
| 22-Mar-2024 |
Thomas Gleixner <tglx@linutronix.de> |
x86/cpu: Ensure that CPU info updates are propagated on UP
The boot sequence evaluates CPUID information twice:
1) During early boot
2) When finalizing the early setup right before mitiga
x86/cpu: Ensure that CPU info updates are propagated on UP
The boot sequence evaluates CPUID information twice:
1) During early boot
2) When finalizing the early setup right before mitigations are selected and alternatives are patched.
In both cases the evaluation is stored in boot_cpu_data, but on UP the copying of boot_cpu_data to the per CPU info of the boot CPU happens between #1 and #2. So any update which happens in #2 is never propagated to the per CPU info instance.
Consolidate the whole logic and copy boot_cpu_data right before applying alternatives as that's the point where boot_cpu_data is in it's final state and not supposed to change anymore.
This also removes the voodoo mb() from smp_prepare_cpus_common() which had absolutely no purpose.
Fixes: 71eb4893cfaf ("x86/percpu: Cure per CPU madness on UP") Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20240322185305.127642785@linutronix.de
show more ...
|
#
edf66a3c |
| 18-Mar-2024 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
x86/cpu: Move leftover contents of topology.c to setup.c
The only useful piece of arch/x86/kernel/topology.c is the definition of arch_cpu_is_hotpluggable() that can be moved elsewhere (other archit
x86/cpu: Move leftover contents of topology.c to setup.c
The only useful piece of arch/x86/kernel/topology.c is the definition of arch_cpu_is_hotpluggable() that can be moved elsewhere (other architectures tend to put it into setup.c), so do that and delete the rest of the file.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/12422874.O9o76ZdvQC@kreacher
show more ...
|
#
71eb4893 |
| 04-Mar-2024 |
Thomas Gleixner <tglx@linutronix.de> |
x86/percpu: Cure per CPU madness on UP
On UP builds Sparse complains rightfully about accesses to cpu_info with per CPU accessors:
cacheinfo.c:282:30: sparse: warning: incorrect type in initializ
x86/percpu: Cure per CPU madness on UP
On UP builds Sparse complains rightfully about accesses to cpu_info with per CPU accessors:
cacheinfo.c:282:30: sparse: warning: incorrect type in initializer (different address spaces) cacheinfo.c:282:30: sparse: expected void const [noderef] __percpu *__vpp_verify cacheinfo.c:282:30: sparse: got unsigned int *
The reason is that on UP builds cpu_info which is a per CPU variable on SMP is mapped to boot_cpu_info which is a regular variable. There is a hideous accessor cpu_data() which tries to hide this, but it's not sufficient as some places require raw accessors and generates worse code than the regular per CPU accessors.
Waste sizeof(struct x86_cpuinfo) memory on UP and provide the per CPU cpu_info unconditionally. This requires to update the CPU info on the boot CPU as SMP does. (Ab)use the weakly defined smp_prepare_boot_cpu() function and implement exactly that.
This allows to use regular per CPU accessors uncoditionally and paves the way to remove the cpu_data() hackery.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240304005104.622511517@linutronix.de
show more ...
|
#
a4eeb217 |
| 24-Jan-2024 |
Baoquan He <bhe@redhat.com> |
x86, crash: wrap crash dumping code into crash related ifdefs
Now crash codes under kernel/ folder has been split out from kexec code, crash dumping can be separated from kexec reboot in config item
x86, crash: wrap crash dumping code into crash related ifdefs
Now crash codes under kernel/ folder has been split out from kexec code, crash dumping can be separated from kexec reboot in config items on x86 with some adjustments.
Here, also change some ifdefs or IS_ENABLED() check to more appropriate ones, e,g - #ifdef CONFIG_KEXEC_CORE -> #ifdef CONFIG_CRASH_DUMP - (!IS_ENABLED(CONFIG_KEXEC_CORE)) - > (!IS_ENABLED(CONFIG_CRASH_RESERVE))
[bhe@redhat.com: don't nest CONFIG_CRASH_DUMP ifdef inside CONFIG_KEXEC_CODE ifdef scope] Link: https://lore.kernel.org/all/SN6PR02MB4157931105FA68D72E3D3DB8D47B2@SN6PR02MB4157.namprd02.prod.outlook.com/T/#u Link: https://lkml.kernel.org/r/20240124051254.67105-7-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Pingfan Liu <piliu@redhat.com> Cc: Klara Modin <klarasmodin@gmail.com> Cc: Michael Kelley <mhklinux@outlook.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
7c0edad3 |
| 13-Feb-2024 |
Thomas Gleixner <tglx@linutronix.de> |
x86/cpu/topology: Rework possible CPU management
Managing possible CPUs is an unreadable and uncomprehensible maze. Aside of that it's backwards because it applies command line limits after register
x86/cpu/topology: Rework possible CPU management
Managing possible CPUs is an unreadable and uncomprehensible maze. Aside of that it's backwards because it applies command line limits after registering all APICs.
Rewrite it so that it:
- Applies the command line limits upfront so that only the allowed amount of APIC IDs can be registered.
- Applies eventual late restrictions in an understandable way
- Uses simple min_t() calculations which are trivial to follow.
- Provides a separate function for resetting to UP mode late in the bringup process.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Michael Kelley <mhklinux@outlook.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Link: https://lore.kernel.org/r/20240213210252.290098853@linutronix.de
show more ...
|
#
de6aec24 |
| 13-Feb-2024 |
Thomas Gleixner <tglx@linutronix.de> |
x86/mm/numa: Move early mptable evaluation into common code
There is no reason to have the early mptable evaluation conditionally invoked only from the AMD numa topology code.
Make it explicit and
x86/mm/numa: Move early mptable evaluation into common code
There is no reason to have the early mptable evaluation conditionally invoked only from the AMD numa topology code.
Make it explicit and invoke it from setup_arch() right after the corresponding ACPI init call. Remove the pointless wrapper and invoke x86_init::mpparse::early_parse_smp_config() directly.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Michael Kelley <mhklinux@outlook.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Link: https://lore.kernel.org/r/20240212154639.931761608@linutronix.de
show more ...
|
#
dcb76008 |
| 13-Feb-2024 |
Thomas Gleixner <tglx@linutronix.de> |
x86/mpparse: Switch to new init callbacks
Now that all platforms have the new split SMP configuration callbacks set up, flip the switch and remove the old callback pointer and mop up the platform co
x86/mpparse: Switch to new init callbacks
Now that all platforms have the new split SMP configuration callbacks set up, flip the switch and remove the old callback pointer and mop up the platform code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Michael Kelley <mhklinux@outlook.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Link: https://lore.kernel.org/r/20240212154639.870883080@linutronix.de
show more ...
|
#
5faf8ec7 |
| 13-Feb-2024 |
Thomas Gleixner <tglx@linutronix.de> |
x86/dtb: Rename x86_dtb_init()
x86_dtb_init() is a misnomer and it really should be used as a SMP configuration parser which is selected by the platform via x86_init::mpparse:parse_smp_config().
Re
x86/dtb: Rename x86_dtb_init()
x86_dtb_init() is a misnomer and it really should be used as a SMP configuration parser which is selected by the platform via x86_init::mpparse:parse_smp_config().
Rename it to x86_dtb_parse_smp_config() in preparation for that.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Michael Kelley <mhklinux@outlook.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Link: https://lore.kernel.org/r/20240212154639.495992801@linutronix.de
show more ...
|
#
e061c7ae |
| 13-Feb-2024 |
Thomas Gleixner <tglx@linutronix.de> |
x86/mpparse: Rename default_find_smp_config()
MPTABLE is no longer the default SMP configuration mechanism. Rename it to mpparse_find_mptable() because that's what it does.
Signed-off-by: Thomas G
x86/mpparse: Rename default_find_smp_config()
MPTABLE is no longer the default SMP configuration mechanism. Rename it to mpparse_find_mptable() because that's what it does.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Michael Kelley <mhklinux@outlook.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Link: https://lore.kernel.org/r/20240212154639.306287711@linutronix.de
show more ...
|
#
abe8dbab |
| 08-Dec-2023 |
Kai Huang <kai.huang@intel.com> |
x86/virt/tdx: Use all system memory when initializing TDX module as TDX memory
Start to transit out the "multi-steps" to initialize the TDX module.
TDX provides increased levels of memory confident
x86/virt/tdx: Use all system memory when initializing TDX module as TDX memory
Start to transit out the "multi-steps" to initialize the TDX module.
TDX provides increased levels of memory confidentiality and integrity. This requires special hardware support for features like memory encryption and storage of memory integrity checksums. Not all memory satisfies these requirements.
As a result, TDX introduced the concept of a "Convertible Memory Region" (CMR). During boot, the firmware builds a list of all of the memory ranges which can provide the TDX security guarantees. The list of these ranges is available to the kernel by querying the TDX module.
CMRs tell the kernel which memory is TDX compatible. The kernel needs to build a list of memory regions (out of CMRs) as "TDX-usable" memory and pass them to the TDX module. Once this is done, those "TDX-usable" memory regions are fixed during module's lifetime.
To keep things simple, assume that all TDX-protected memory will come from the page allocator. Make sure all pages in the page allocator *are* TDX-usable memory.
As TDX-usable memory is a fixed configuration, take a snapshot of the memory configuration from memblocks at the time of module initialization (memblocks are modified on memory hotplug). This snapshot is used to enable TDX support for *this* memory configuration only. Use a memory hotplug notifier to ensure that no other RAM can be added outside of this configuration.
This approach requires all memblock memory regions at the time of module initialization to be TDX convertible memory to work, otherwise module initialization will fail in a later SEAMCALL when passing those regions to the module. This approach works when all boot-time "system RAM" is TDX convertible memory and no non-TDX-convertible memory is hot-added to the core-mm before module initialization.
For instance, on the first generation of TDX machines, both CXL memory and NVDIMM are not TDX convertible memory. Using kmem driver to hot-add any CXL memory or NVDIMM to the core-mm before module initialization will result in failure to initialize the module. The SEAMCALL error code will be available in the dmesg to help user to understand the failure.
Signed-off-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Link: https://lore.kernel.org/all/20231208170740.53979-7-dave.hansen%40intel.com
show more ...
|
#
f7a25cf1 |
| 13-Nov-2023 |
Yuntao Wang <ytcoode@gmail.com> |
x86/setup: Make relocated_ramdisk a local variable of relocate_initrd()
After
0b62f6cb0773 ("x86/microcode/32: Move early loading after paging enable"),
the global variable relocated_ramdisk is
x86/setup: Make relocated_ramdisk a local variable of relocate_initrd()
After
0b62f6cb0773 ("x86/microcode/32: Move early loading after paging enable"),
the global variable relocated_ramdisk is no longer used anywhere except for the relocate_initrd() function. Make it a local variable of that function.
Signed-off-by: Yuntao Wang <ytcoode@gmail.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Baoquan He <bhe@redhat.com> Link: https://lore.kernel.org/r/20231113034026.130679-1-ytcoode@gmail.com
show more ...
|
#
acfc7882 |
| 09-Oct-2023 |
Arnd Bergmann <arnd@arndb.de> |
vgacon: remove screen_info dependency
The vga console driver is fairly self-contained, and only used by architectures that explicitly initialize the screen_info settings.
Chance every instance that
vgacon: remove screen_info dependency
The vga console driver is fairly self-contained, and only used by architectures that explicitly initialize the screen_info settings.
Chance every instance that picks the vga console by setting conswitchp to call a function instead, and pass a reference to the screen_info there.
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Acked-by: Khalid Azzi <khalid@gonehiking.org> Acked-by: Helge Deller <deller@gmx.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20231009211845.3136536-6-arnd@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
6e74b125 |
| 10-Oct-2023 |
Alexander Shishkin <alexander.shishkin@linux.intel.com> |
x86/sev: Move sev_setup_arch() to mem_encrypt.c
Since commit:
4d96f9109109b ("x86/sev: Replace occurrences of sev_active() with cc_platform_has()")
... the SWIOTLB bounce buffer size adjustment
x86/sev: Move sev_setup_arch() to mem_encrypt.c
Since commit:
4d96f9109109b ("x86/sev: Replace occurrences of sev_active() with cc_platform_has()")
... the SWIOTLB bounce buffer size adjustment and restricted virtio memory setting also inadvertently apply to TDX: the code is using cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT) as a gatekeeping condition, which is also true for TDX, and this is also what we want.
To reflect this, move the corresponding code to generic mem_encrypt.c.
No functional changes intended.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20231010145220.3960055-2-alexander.shishkin@linux.intel.com
show more ...
|
#
9c08a2a1 |
| 14-Sep-2023 |
Baoquan He <bhe@redhat.com> |
x86: kdump: use generic interface to simplify crashkernel reservation code
With the help of newly changed function parse_crashkernel() and generic reserve_crashkernel_generic(), crashkernel reservat
x86: kdump: use generic interface to simplify crashkernel reservation code
With the help of newly changed function parse_crashkernel() and generic reserve_crashkernel_generic(), crashkernel reservation can be simplified by steps:
1) Add a new header file <asm/crash_core.h>, and define CRASH_ALIGN, CRASH_ADDR_LOW_MAX, CRASH_ADDR_HIGH_MAX and DEFAULT_CRASH_KERNEL_LOW_SIZE in <asm/crash_core.h>;
2) Add arch_reserve_crashkernel() to call parse_crashkernel() and reserve_crashkernel_generic(), and do the ARCH specific work if needed.
3) Add ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION Kconfig in arch/x86/Kconfig.
When adding DEFAULT_CRASH_KERNEL_LOW_SIZE, add crash_low_size_default() to calculate crashkernel low memory because x86_64 has special requirement.
The old reserve_crashkernel_low() and reserve_crashkernel() can be removed.
[bhe@redhat.com: move crash_low_size_default() code into <asm/crash_core.h>] Link: https://lkml.kernel.org/r/ZQpeAjOmuMJBFw1/@MiWiFi-R3L-srv Link: https://lkml.kernel.org/r/20230914033142.676708-7-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen Jiahao <chenjiahao16@huawei.com> Cc: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
a9e1a3d8 |
| 14-Sep-2023 |
Baoquan He <bhe@redhat.com> |
crash_core: change the prototype of function parse_crashkernel()
Add two parameters 'low_size' and 'high' to function parse_crashkernel(), later crashkernel=,high|low parsing will be added. Make ad
crash_core: change the prototype of function parse_crashkernel()
Add two parameters 'low_size' and 'high' to function parse_crashkernel(), later crashkernel=,high|low parsing will be added. Make adjustments in all call sites of parse_crashkernel() in arch.
Link: https://lkml.kernel.org/r/20230914033142.676708-3-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen Jiahao <chenjiahao16@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
0d294c8c |
| 25-Aug-2023 |
Saurabh Sengar <ssengar@linux.microsoft.com> |
x86/of: Move the x86_flattree_get_config() call out of x86_dtb_init()
Fetching the device tree configuration before initmem_init() is necessary to allow the parsing of NUMA node information. However
x86/of: Move the x86_flattree_get_config() call out of x86_dtb_init()
Fetching the device tree configuration before initmem_init() is necessary to allow the parsing of NUMA node information. However moving the entire x86_dtb_init() call before initmem_init() is not correct as APIC/IO-APIC enumeration has to be after initmem_init().
Thus, move the x86_flattree_get_config() call out of x86_dtb_init(), into setup_arch(), to call it before initmem_init(), and leave the ACPI/IOAPIC registration sequence as-is.
[ mingo: Updated the changelog for clarity. ]
Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/r/1692949657-16446-1-git-send-email-ssengar@linux.microsoft.com
show more ...
|
#
34cf99c2 |
| 17-Aug-2023 |
Rik van Riel <riel@surriel.com> |
x86/mm, kexec, ima: Use memblock_free_late() from ima_free_kexec_buffer()
The code calling ima_free_kexec_buffer() runs long after the memblock allocator has already been torn down, potentially resu
x86/mm, kexec, ima: Use memblock_free_late() from ima_free_kexec_buffer()
The code calling ima_free_kexec_buffer() runs long after the memblock allocator has already been torn down, potentially resulting in a use after free in memblock_isolate_range().
With KASAN or KFENCE, this use after free will result in a BUG from the idle task, and a subsequent kernel panic.
Switch ima_free_kexec_buffer() over to memblock_free_late() to avoid that bug.
Fixes: fee3ff99bc67 ("powerpc: Move arch independent ima kexec functions to drivers/of/kexec.c") Suggested-by: Mike Rappoport <rppt@kernel.org> Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230817135558.67274c83@imladris.surriel.com
show more ...
|
#
bef4f379 |
| 08-Aug-2023 |
Thomas Gleixner <tglx@linutronix.de> |
x86/apic: Provide apic_update_callback()
There are already two variants of update mechanism for particular callbacks and virtualization just writes into the data structure.
Provide an interface and
x86/apic: Provide apic_update_callback()
There are already two variants of update mechanism for particular callbacks and virtualization just writes into the data structure.
Provide an interface and use a shadow data structure to preserve callbacks so they can be reapplied when the APIC driver is replaced.
The extra data structure is intentional as any new callback needs to be also updated in the core code. This also prepares for static calls.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Tested-by: Juergen Gross <jgross@suse.com> # Xen PV (dom0 and unpriv. guest)
show more ...
|
#
9d87f5b6 |
| 08-Aug-2023 |
Thomas Gleixner <tglx@linutronix.de> |
x86/apic: Mop up *setup_apic_routing()
default_setup_apic_routing() is a complete misnomer. On 64bit it does the actual APIC probing and on 32bit it is used to force select the bigsmp APIC and to em
x86/apic: Mop up *setup_apic_routing()
default_setup_apic_routing() is a complete misnomer. On 64bit it does the actual APIC probing and on 32bit it is used to force select the bigsmp APIC and to emit a redundant message in the apic::setup_apic_routing() callback.
Rename the 64bit and 32bit function so they reflect what they are doing and remove the useless APIC callback.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Tested-by: Juergen Gross <jgross@suse.com> # Xen PV (dom0 and unpriv. guest)
show more ...
|
#
79c9a17c |
| 08-Aug-2023 |
Thomas Gleixner <tglx@linutronix.de> |
x86/apic/32: Decrapify the def_bigsmp mechanism
If the system has more than 8 CPUs then XAPIC and the bigsmp APIC driver is required. This is ensured via:
1) Enumerating all possible CPUs up to N
x86/apic/32: Decrapify the def_bigsmp mechanism
If the system has more than 8 CPUs then XAPIC and the bigsmp APIC driver is required. This is ensured via:
1) Enumerating all possible CPUs up to NR_CPUS
2) Checking at boot CPU APIC setup time whether the system has more than 8 CPUs and has an XAPIC.
If that's the case then it's attempted to install the bigsmp APIC driver and a magic variable 'def_to_bigsmp' is set to one.
3) If that magic variable is set and CONFIG_X86_BIGSMP=n and the system has more than 8 CPUs smp_sanity_check() removes all CPUs >= #8 from the present and possible mask in the most convoluted way.
This logic is completely broken for the case where the bigsmp driver is enabled, but not selected due to a command line option specifying the default APIC. In that case the system boots with default APIC in logical destination mode and fails to reduce the number of CPUs.
That aside the above which is sprinkled over 3 different places is yet another piece of art.
It would have been too obvious to check the requirements upfront and limit nr_cpu_ids _before_ enumerating tons of CPUs and then removing them again.
Implement exactly this. Check the bigsmp requirement when the boot APIC is registered which happens _before_ ACPI/MPTABLE parsing and limit the number of CPUs to 8 if it can't be used. Switch it over when the boot CPU apic is set up if necessary.
[ dhansen: fix nr_cpu_ids off-by-one in default_setup_apic_routing() ]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Tested-by: Juergen Gross <jgross@suse.com> # Xen PV (dom0 and unpriv. guest)
show more ...
|
#
49062454 |
| 08-Aug-2023 |
Thomas Gleixner <tglx@linutronix.de> |
x86/apic: Rename disable_apic
It reflects a state and not a command. Make it bool while at it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Dave Hansen <dave.hansen@linux.inte
x86/apic: Rename disable_apic
It reflects a state and not a command. Make it bool while at it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Tested-by: Juergen Gross <jgross@suse.com> # Xen PV (dom0 and unpriv. guest)
show more ...
|