238f4380 | 24-Mar-2023 |
Richard Henderson <richard.henderson@linaro.org> |
tcg: Widen CPUTLBEntry comparators to 64-bits
This makes CPUTLBEntry agnostic to the address size of the guest. When 32-bit addresses are in effect, we can simply read the low 32 bits of the 64-bit
tcg: Widen CPUTLBEntry comparators to 64-bits
This makes CPUTLBEntry agnostic to the address size of the guest. When 32-bit addresses are in effect, we can simply read the low 32 bits of the 64-bit field. Similarly when we need to update the field for setting TLB_NOTDIRTY.
For TCG backends that could in theory be big-endian, but in practice are not (arm, loongarch, riscv), use QEMU_BUILD_BUG_ON to document and ensure this is not accidentally missed.
For s390x, which is always big-endian, use HOST_BIG_ENDIAN anyway, to document the reason for the adjustment.
For sparc64 and ppc64, always perform a 64-bit load, and rely on the following 32-bit comparison to ignore the high bits.
Rearrange mips and ppc if ladders for clarity.
Reviewed-by: Anton Johansson <anjo@rev.ng> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
show more ...
|
367189ef | 26-May-2023 |
Alex Bennée <alex.bennee@linaro.org> |
accel/tcg: include cs_base in our hash calculations
We weren't using cs_base in the hash calculations before. Since the arm front end moved a chunk of flags in a378206a20 (target/arm: Move mode spec
accel/tcg: include cs_base in our hash calculations
We weren't using cs_base in the hash calculations before. Since the arm front end moved a chunk of flags in a378206a20 (target/arm: Move mode specific TB flags to tb->cs_base) they comprise of an important part of the execution state.
Widen the tb_hash_func to include cs_base and expand to qemu_xxhash8() to accommodate it.
My initial benchmark shows very little difference in the runtime.
Before:
armhf
➜ hyperfine -w 2 -m 20 "./arm-softmmu/qemu-system-arm -cpu cortex-a15 -machine type=virt,highmem=off -display none -m 2048 -serial mon:stdio -netdev user,id=unet,hostfwd=tcp::2222-:22 -device virtio-net-pci,netdev=unet -device virtio-scsi-pci -blockdev driver=raw,node-name=hd,discard=unmap,file.driver=host_device,file.filename=/dev/zen-disk/debian-bullseye-armhf -device scsi-hd,drive=hd -smp 4 -kernel /home/alex/lsrc/linux.git/builds/arm/arch/arm/boot/zImage -append 'console=ttyAMA0 root=/dev/sda2 systemd.unit=benchmark.service' -snapshot" Benchmark 1: ./arm-softmmu/qemu-system-arm -cpu cortex-a15 -machine type=virt,highmem=off -display none -m 2048 -serial mon:stdio -netdev user,id=unet,hostfwd=tcp::2222-:22 -device virtio-net-pci,netdev=unet -device virtio-scsi-pci -blockdev driver=raw,node-name=hd,discard=unmap,file.driver=host_device,file.filename=/dev/zen-disk/debian-bullseye-armhf -device scsi-hd,drive=hd -smp 4 -kernel /home/alex/lsrc/linux.git/builds/arm/arch/arm/boot/zImage -append 'console=ttyAMA0 root=/dev/sda2 systemd.unit=benchmark.service' -snapshot Time (mean ± σ): 24.627 s ± 2.708 s [User: 34.309 s, System: 1.797 s] Range (min … max): 22.345 s … 29.864 s 20 runs
arm64
➜ hyperfine -w 2 -n 20 "./qemu-system-aarch64 -cpu max,pauth-impdef=on -machine type=virt,virtualization=on,gic-version=3 -display none -serial mon:stdio -netdev user,id=unet,hostfwd=tcp::2222-:22,hostfwd=tcp::1234-:1234 -device virtio-net-pci,netdev=unet -device virtio-scsi-pci -blockdev driver=raw,node-name=hd,discard=unmap,file.driver=host_device,file.filename=/dev/zen-disk/debian-bullseye-arm64 -device scsi-hd,drive=hd -smp 4 -kernel ~/lsrc/linux.git/builds/arm64/arch/arm64/boot/Image.gz -append 'console=ttyAMA0 root=/dev/sda2 systemd.unit=benchmark-pigz.service' -snapshot" Benchmark 1: 20 Time (mean ± σ): 62.559 s ± 2.917 s [User: 189.115 s, System: 4.089 s] Range (min … max): 59.997 s … 70.153 s 10 runs
After:
armhf
Benchmark 1: ./arm-softmmu/qemu-system-arm -cpu cortex-a15 -machine type=virt,highmem=off -display none -m 2048 -serial mon:stdio -netdev user,id=unet,hostfwd=tcp::2222-:22 -device virtio-net-pci,netdev=unet -device virtio-scsi-pci -blockdev driver=raw,node-name=hd,discard=unmap,file.driver=host_device,file.filename=/dev/zen-disk/debian-bullseye-armhf -device scsi-hd,drive=hd -smp 4 -kernel /home/alex/lsrc/linux.git/builds/arm/arch/arm/boot/zImage -append 'console=ttyAMA0 root=/dev/sda2 systemd.unit=benchmark.service' -snapshot Time (mean ± σ): 24.223 s ± 2.151 s [User: 34.284 s, System: 1.906 s] Range (min … max): 22.000 s … 28.476 s 20 runs
arm64
hyperfine -w 2 -n 20 "./qemu-system-aarch64 -cpu max,pauth-impdef=on -machine type=virt,virtualization=on,gic-version=3 -display none -serial mon:stdio -netdev user,id=unet,hostfwd=tcp::2222-:22,hostfwd=tcp::1234-:1234 -device virtio-net-pci,netdev=unet -device virtio-scsi-pci -blockdev driver=raw,node-name=hd,discard=unmap,file.driver=host_device,file.filename=/dev/zen-disk/debian-bullseye-arm64 -device scsi-hd,drive=hd -smp 4 -kernel ~/lsrc/linux.git/builds/arm64/arch/arm64/boot/Image.gz -append 'console=ttyAMA0 root=/dev/sda2 systemd.unit=benchmark-pigz.service' -snapshot" Benchmark 1: 20 Time (mean ± σ): 62.769 s ± 1.978 s [User: 188.431 s, System: 5.269 s] Range (min … max): 60.285 s … 66.868 s 10 runs
Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20230526165401.574474-12-alex.bennee@linaro.org Message-Id: <20230524133952.3971948-11-alex.bennee@linaro.org> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
show more ...
|
856e23a0 | 09-May-2023 |
Gavin Shan <gshan@redhat.com> |
kvm: Enable dirty ring for arm64
arm64 has different capability from x86 to enable the dirty ring, which is KVM_CAP_DIRTY_LOG_RING_ACQ_REL. Besides, arm64 also needs the backup bitmap extension (KVM
kvm: Enable dirty ring for arm64
arm64 has different capability from x86 to enable the dirty ring, which is KVM_CAP_DIRTY_LOG_RING_ACQ_REL. Besides, arm64 also needs the backup bitmap extension (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) when 'kvm-arm-gicv3' or 'arm-its-kvm' device is enabled. Here the extension is always enabled and the unnecessary overhead to do the last stage of dirty log synchronization when those two devices aren't used is introduced, but the overhead should be very small and acceptable. The benefit is cover future cases where those two devices are used without modifying the code.
Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Tested-by: Zhenyu Zhang <zhenyzha@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20230509022122.20888-5-gshan@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|