History log of /qemu/net/net.c (Results 1 – 25 of 235)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v8.2.2, v7.2.10, v8.2.1, v8.1.5, v7.2.9, v8.1.4, v7.2.8, v8.2.0, v8.2.0-rc4, v8.2.0-rc3, v8.2.0-rc2, v8.2.0-rc1, v7.2.7, v8.1.3, v8.2.0-rc0
# e8c5c452 23-Oct-2023 David Woodhouse <dwmw@amazon.co.uk>

net: make nb_nics and nd_table[] static in net/net.c

Also remove the stale declaration of host_net_devices; the actual
definition was removed long ago in commit 7cc28cb06104 ("net: Remove
the deprec

net: make nb_nics and nd_table[] static in net/net.c

Also remove the stale declaration of host_net_devices; the actual
definition was removed long ago in commit 7cc28cb06104 ("net: Remove
the deprecated 'host_net_add' and 'host_net_remove' HMP commands")

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Thomas Huth <thuth@redhat.com>

show more ...


# 481434f9 23-Oct-2023 David Woodhouse <dwmw@amazon.co.uk>

net: remove qemu_show_nic_models(), qemu_find_nic_model()

These old functions can be removed now too. Let net_param_nic() print
the full set of network devices directly, and also make it note that a

net: remove qemu_show_nic_models(), qemu_find_nic_model()

These old functions can be removed now too. Let net_param_nic() print
the full set of network devices directly, and also make it note that a
list more specific to this platform/config will be available by using
'-nic model=help' instead.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Thomas Huth <thuth@redhat.com>

show more ...


# 09c292c9 23-Oct-2023 David Woodhouse <dwmw@amazon.co.uk>

net: remove qemu_check_nic_model()

There are no callers of this function any more, as they have all been
converted to qemu_{create,configure}_nic_device().

Signed-off-by: David Woodhouse <dwmw@amaz

net: remove qemu_check_nic_model()

There are no callers of this function any more, as they have all been
converted to qemu_{create,configure}_nic_device().

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Thomas Huth <thuth@redhat.com>

show more ...


# 93125e4b 22-Oct-2023 David Woodhouse <dwmw@amazon.co.uk>

net: add qemu_create_nic_bus_devices()

This will instantiate any NICs which live on a given bus type. Each bus
is allowed *one* substitution (for PCI it's virtio → virtio-net-pci, for
Xen it's xen →

net: add qemu_create_nic_bus_devices()

This will instantiate any NICs which live on a given bus type. Each bus
is allowed *one* substitution (for PCI it's virtio → virtio-net-pci, for
Xen it's xen → xen-net-device; no point in overengineering it unless we
actually want more).

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>

show more ...


# 2cdeca04 21-Oct-2023 David Woodhouse <dwmw@amazon.co.uk>

net: report list of available models according to platform

By noting the models for which a configuration was requested, we can give
the user an accurate list of which NIC models were actually avail

net: report list of available models according to platform

By noting the models for which a configuration was requested, we can give
the user an accurate list of which NIC models were actually available on
the platform/configuration that was otherwise chosen.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

show more ...


# 93e9d730 19-Oct-2023 David Woodhouse <dwmw@amazon.co.uk>

net: add qemu_{configure,create}_nic_device(), qemu_find_nic_info()

Most code which directly accesses nd_table[] and nb_nics uses them for
one of two things. Either "I have created a NIC device and

net: add qemu_{configure,create}_nic_device(), qemu_find_nic_info()

Most code which directly accesses nd_table[] and nb_nics uses them for
one of two things. Either "I have created a NIC device and I'd like a
configuration for it", or "I will create a NIC device *if* there is a
configuration for it". With some variants on the theme around whether
they actually *check* if the model specified in the configuration is
the right one.

Provide functions which perform both of those, allowing platforms to
be a little more consistent and as a step towards making nd_table[]
and nb_nics private to the net code.

One might argue that platforms ought to be consistent about whether
they create the unconfigured devices or not, but making significant
user-visible changes is explicitly *not* the intent right now.

The new functions leave the 'model' field of the NICInfo as NULL after
using it for the default NIC model, unlike the qemu_check_nic_model()
function which does set nd->model to match default_model explicitly.
This is acceptable because there is no code which consumes nd->model
except this NIC-matching code in net/net.c, and no reasonable excuse
for any code wanting to use nd->model in future.

Also export the qemu_find_nic_info() helper, as some platforms have
special cases they need to handle.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>

show more ...


# 84f85eb9 15-Nov-2023 David Woodhouse <dwmw@amazon.co.uk>

net: do not delete nics in net_cleanup()

In net_cleanup() we only need to delete the netdevs, as those may have
state which outlives Qemu when it exits, and thus may actually need to
be cleaned up o

net: do not delete nics in net_cleanup()

In net_cleanup() we only need to delete the netdevs, as those may have
state which outlives Qemu when it exits, and thus may actually need to
be cleaned up on exit.

The nics, on the other hand, are owned by the device which created them.
Most devices don't bother to clean up on exit because they don't have
any state which will outlive Qemu... but XenBus devices do need to clean
up their nodes in XenStore, and do have an exit handler to delete them.

When the XenBus exit handler destroys the xen-net-device, it attempts
to delete its nic after net_cleanup() had already done so. And crashes.

Fix this by only deleting netdevs as we walk the list. As the comment
notes, we can't use QTAILQ_FOREACH_SAFE() as each deletion may remove
*multiple* entries, including the "safely" saved 'next' pointer. But
we can store the *previous* entry, since nics are safe.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>

show more ...


Revision tags: v8.1.2, v8.1.1, v7.2.6, v8.0.5, v8.1.0, v8.1.0-rc4, v8.1.0-rc3, v7.2.5, v8.0.4, v8.1.0-rc2, v8.1.0-rc1, v8.1.0-rc0, v8.0.3, v7.2.4
# 9050f976 01-Jun-2023 Akihiko Odaki <akihiko.odaki@daynix.com>

net: Update MemReentrancyGuard for NIC

Recently MemReentrancyGuard was added to DeviceState to record that the
device is engaging in I/O. The network device backend needs to update it
when deliverin

net: Update MemReentrancyGuard for NIC

Recently MemReentrancyGuard was added to DeviceState to record that the
device is engaging in I/O. The network device backend needs to update it
when delivering a packet to a device.

This implementation follows what bottom half does, but it does not add
a tracepoint for the case that the network device backend started
delivering a packet to a device which is already engaging in I/O. This
is because such reentrancy frequently happens for
qemu_flush_queued_packets() and is insignificant.

Fixes: CVE-2023-3019
Reported-by: Alexander Bulekov <alxndr@bu.edu>
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Acked-by: Alexander Bulekov <alxndr@bu.edu>
Signed-off-by: Jason Wang <jasowang@redhat.com>

show more ...


# 7d0fefdf 01-Jun-2023 Akihiko Odaki <akihiko.odaki@daynix.com>

net: Provide MemReentrancyGuard * to qemu_new_nic()

Recently MemReentrancyGuard was added to DeviceState to record that the
device is engaging in I/O. The network device backend needs to update it
w

net: Provide MemReentrancyGuard * to qemu_new_nic()

Recently MemReentrancyGuard was added to DeviceState to record that the
device is engaging in I/O. The network device backend needs to update it
when delivering a packet to a device.

In preparation for such a change, add MemReentrancyGuard * as a
parameter of qemu_new_nic().

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Alexander Bulekov <alxndr@bu.edu>
Signed-off-by: Jason Wang <jasowang@redhat.com>

show more ...


# 0a4a1512 31-Oct-2023 Markus Armbruster <armbru@redhat.com>

net: Fix a misleading error message

The error message

$ qemu-system-x86_64 -netdev user,id=net0,ipv6-net=fec0::0/
qemu-system-x86_64: -netdev user,id=net0,ipv6-net=fec0::0/: Parameter 'ipv6

net: Fix a misleading error message

The error message

$ qemu-system-x86_64 -netdev user,id=net0,ipv6-net=fec0::0/
qemu-system-x86_64: -netdev user,id=net0,ipv6-net=fec0::0/: Parameter 'ipv6-prefixlen' expects a number

points to ipv6-prefixlen instead of ipv6-net. Fix:

qemu-system-x86_64: -netdev user,id=net0,ipv6-net=fec0::0/: parameter 'ipv6-net' expects a number after '/'

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20231031111059.3407803-6-armbru@redhat.com>

show more ...


# 0a4a1512 31-Oct-2023 Markus Armbruster <armbru@redhat.com>

net: Fix a misleading error message

The error message

$ qemu-system-x86_64 -netdev user,id=net0,ipv6-net=fec0::0/
qemu-system-x86_64: -netdev user,id=net0,ipv6-net=fec0::0/: Parameter 'ipv6

net: Fix a misleading error message

The error message

$ qemu-system-x86_64 -netdev user,id=net0,ipv6-net=fec0::0/
qemu-system-x86_64: -netdev user,id=net0,ipv6-net=fec0::0/: Parameter 'ipv6-prefixlen' expects a number

points to ipv6-prefixlen instead of ipv6-net. Fix:

qemu-system-x86_64: -netdev user,id=net0,ipv6-net=fec0::0/: parameter 'ipv6-net' expects a number after '/'

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20231031111059.3407803-6-armbru@redhat.com>

show more ...


# 73071f19 04-Oct-2023 Philippe Mathieu-Daudé <philmd@linaro.org>

net/net: Clean up global variable shadowing

Fix:

net/net.c:1680:35: error: declaration shadows a variable in the global scope [-Werror,-Wshadow]
bool netdev_is_modern(const char *optarg)

net/net: Clean up global variable shadowing

Fix:

net/net.c:1680:35: error: declaration shadows a variable in the global scope [-Werror,-Wshadow]
bool netdev_is_modern(const char *optarg)
^
net/net.c:1714:38: error: declaration shadows a variable in the global scope [-Werror,-Wshadow]
void netdev_parse_modern(const char *optarg)
^
net/net.c:1728:60: error: declaration shadows a variable in the global scope [-Werror,-Wshadow]
void net_client_parse(QemuOptsList *opts_list, const char *optarg)
^
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/getopt.h:77:14: note: previous declaration is here
extern char *optarg; /* getopt(3) external variables */
^

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20231004120019.93101-4-philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>

show more ...


# 73071f19 04-Oct-2023 Philippe Mathieu-Daudé <philmd@linaro.org>

net/net: Clean up global variable shadowing

Fix:

net/net.c:1680:35: error: declaration shadows a variable in the global scope [-Werror,-Wshadow]
bool netdev_is_modern(const char *optarg)

net/net: Clean up global variable shadowing

Fix:

net/net.c:1680:35: error: declaration shadows a variable in the global scope [-Werror,-Wshadow]
bool netdev_is_modern(const char *optarg)
^
net/net.c:1714:38: error: declaration shadows a variable in the global scope [-Werror,-Wshadow]
void netdev_parse_modern(const char *optarg)
^
net/net.c:1728:60: error: declaration shadows a variable in the global scope [-Werror,-Wshadow]
void net_client_parse(QemuOptsList *opts_list, const char *optarg)
^
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/getopt.h:77:14: note: previous declaration is here
extern char *optarg; /* getopt(3) external variables */
^

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20231004120019.93101-4-philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>

show more ...


# 73071f19 04-Oct-2023 Philippe Mathieu-Daudé <philmd@linaro.org>

net/net: Clean up global variable shadowing

Fix:

net/net.c:1680:35: error: declaration shadows a variable in the global scope [-Werror,-Wshadow]
bool netdev_is_modern(const char *optarg)

net/net: Clean up global variable shadowing

Fix:

net/net.c:1680:35: error: declaration shadows a variable in the global scope [-Werror,-Wshadow]
bool netdev_is_modern(const char *optarg)
^
net/net.c:1714:38: error: declaration shadows a variable in the global scope [-Werror,-Wshadow]
void netdev_parse_modern(const char *optarg)
^
net/net.c:1728:60: error: declaration shadows a variable in the global scope [-Werror,-Wshadow]
void net_client_parse(QemuOptsList *opts_list, const char *optarg)
^
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/getopt.h:77:14: note: previous declaration is here
extern char *optarg; /* getopt(3) external variables */
^

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20231004120019.93101-4-philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>

show more ...


# cb039ef3 13-Sep-2023 Ilya Maximets <i.maximets@ovn.org>

net: add initial support for AF_XDP network backend

AF_XDP is a network socket family that allows communication directly
with the network device driver in the kernel, bypassing most or all
of the ke

net: add initial support for AF_XDP network backend

AF_XDP is a network socket family that allows communication directly
with the network device driver in the kernel, bypassing most or all
of the kernel networking stack. In the essence, the technology is
pretty similar to netmap. But, unlike netmap, AF_XDP is Linux-native
and works with any network interfaces without driver modifications.
Unlike vhost-based backends (kernel, user, vdpa), AF_XDP doesn't
require access to character devices or unix sockets. Only access to
the network interface itself is necessary.

This patch implements a network backend that communicates with the
kernel by creating an AF_XDP socket. A chunk of userspace memory
is shared between QEMU and the host kernel. 4 ring buffers (Tx, Rx,
Fill and Completion) are placed in that memory along with a pool of
memory buffers for the packet data. Data transmission is done by
allocating one of the buffers, copying packet data into it and
placing the pointer into Tx ring. After transmission, device will
return the buffer via Completion ring. On Rx, device will take
a buffer form a pre-populated Fill ring, write the packet data into
it and place the buffer into Rx ring.

AF_XDP network backend takes on the communication with the host
kernel and the network interface and forwards packets to/from the
peer device in QEMU.

Usage example:

-device virtio-net-pci,netdev=guest1,mac=00:16:35:AF:AA:5C
-netdev af-xdp,ifname=ens6f1np1,id=guest1,mode=native,queues=1

XDP program bridges the socket with a network interface. It can be
attached to the interface in 2 different modes:

1. skb - this mode should work for any interface and doesn't require
driver support. With a caveat of lower performance.

2. native - this does require support from the driver and allows to
bypass skb allocation in the kernel and potentially use
zero-copy while getting packets in/out userspace.

By default, QEMU will try to use native mode and fall back to skb.
Mode can be forced via 'mode' option. To force 'copy' even in native
mode, use 'force-copy=on' option. This might be useful if there is
some issue with the driver.

Option 'queues=N' allows to specify how many device queues should
be open. Note that all the queues that are not open are still
functional and can receive traffic, but it will not be delivered to
QEMU. So, the number of device queues should generally match the
QEMU configuration, unless the device is shared with something
else and the traffic re-direction to appropriate queues is correctly
configured on a device level (e.g. with ethtool -N).
'start-queue=M' option can be used to specify from which queue id
QEMU should start configuring 'N' queues. It might also be necessary
to use this option with certain NICs, e.g. MLX5 NICs. See the docs
for examples.

In a general case QEMU will need CAP_NET_ADMIN and CAP_SYS_ADMIN
or CAP_BPF capabilities in order to load default XSK/XDP programs to
the network interface and configure BPF maps. It is possible, however,
to run with no capabilities. For that to work, an external process
with enough capabilities will need to pre-load default XSK program,
create AF_XDP sockets and pass their file descriptors to QEMU process
on startup via 'sock-fds' option. Network backend will need to be
configured with 'inhibit=on' to avoid loading of the program.
QEMU will need 32 MB of locked memory (RLIMIT_MEMLOCK) per queue
or CAP_IPC_LOCK.

There are few performance challenges with the current network backends.

First is that they do not support IO threads. This means that data
path is handled by the main thread in QEMU and may slow down other
work or may be slowed down by some other work. This also means that
taking advantage of multi-queue is generally not possible today.

Another thing is that data path is going through the device emulation
code, which is not really optimized for performance. The fastest
"frontend" device is virtio-net. But it's not optimized for heavy
traffic either, because it expects such use-cases to be handled via
some implementation of vhost (user, kernel, vdpa). In practice, we
have virtio notifications and rcu lock/unlock on a per-packet basis
and not very efficient accesses to the guest memory. Communication
channels between backend and frontend devices do not allow passing
more than one packet at a time as well.

Some of these challenges can be avoided in the future by adding better
batching into device emulation or by implementing vhost-af-xdp variant.

There are also a few kernel limitations. AF_XDP sockets do not
support any kinds of checksum or segmentation offloading. Buffers
are limited to a page size (4K), i.e. MTU is limited. Multi-buffer
support implementation for AF_XDP is in progress, but not ready yet.
Also, transmission in all non-zero-copy modes is synchronous, i.e.
done in a syscall. That doesn't allow high packet rates on virtual
interfaces.

However, keeping in mind all of these challenges, current implementation
of the AF_XDP backend shows a decent performance while running on top
of a physical NIC with zero-copy support.

Test setup:

2 VMs running on 2 physical hosts connected via ConnectX6-Dx card.
Network backend is configured to open the NIC directly in native mode.
The driver supports zero-copy. NIC is configured to use 1 queue.

Inside a VM - iperf3 for basic TCP performance testing and dpdk-testpmd
for PPS testing.

iperf3 result:
TCP stream : 19.1 Gbps

dpdk-testpmd (single queue, single CPU core, 64 B packets) results:
Tx only : 3.4 Mpps
Rx only : 2.0 Mpps
L2 FWD Loopback : 1.5 Mpps

In skb mode the same setup shows much lower performance, similar to
the setup where pair of physical NICs is replaced with veth pair:

iperf3 result:
TCP stream : 9 Gbps

dpdk-testpmd (single queue, single CPU core, 64 B packets) results:
Tx only : 1.2 Mpps
Rx only : 1.0 Mpps
L2 FWD Loopback : 0.7 Mpps

Results in skb mode or over the veth are close to results of a tap
backend with vhost=on and disabled segmentation offloading bridged
with a NIC.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> (docker/lcitool)
Signed-off-by: Jason Wang <jasowang@redhat.com>

show more ...


# f03e0cf6 31-Jul-2023 Yuri Benditovich <yuri.benditovich@daynix.com>

tap: Add check for USO features

Tap indicates support for USO features according to
capabilities of current kernel module.

Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
Signed-off-b

tap: Add check for USO features

Tap indicates support for USO features according to
capabilities of current kernel module.

Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
Signed-off-by: Andrew Melnychecnko <andrew@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>

show more ...


# 2ab0ec31 31-Jul-2023 Andrew Melnychenko <andrew@daynix.com>

tap: Add USO support to tap device.

Passing additional parameters (USOv4 and USOv6 offloads) when
setting TAP offloads

Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
Signed-off-by: A

tap: Add USO support to tap device.

Passing additional parameters (USOv4 and USOv6 offloads) when
setting TAP offloads

Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>

show more ...


# cb039ef3 13-Sep-2023 Ilya Maximets <i.maximets@ovn.org>

net: add initial support for AF_XDP network backend

AF_XDP is a network socket family that allows communication directly
with the network device driver in the kernel, bypassing most or all
of the ke

net: add initial support for AF_XDP network backend

AF_XDP is a network socket family that allows communication directly
with the network device driver in the kernel, bypassing most or all
of the kernel networking stack. In the essence, the technology is
pretty similar to netmap. But, unlike netmap, AF_XDP is Linux-native
and works with any network interfaces without driver modifications.
Unlike vhost-based backends (kernel, user, vdpa), AF_XDP doesn't
require access to character devices or unix sockets. Only access to
the network interface itself is necessary.

This patch implements a network backend that communicates with the
kernel by creating an AF_XDP socket. A chunk of userspace memory
is shared between QEMU and the host kernel. 4 ring buffers (Tx, Rx,
Fill and Completion) are placed in that memory along with a pool of
memory buffers for the packet data. Data transmission is done by
allocating one of the buffers, copying packet data into it and
placing the pointer into Tx ring. After transmission, device will
return the buffer via Completion ring. On Rx, device will take
a buffer form a pre-populated Fill ring, write the packet data into
it and place the buffer into Rx ring.

AF_XDP network backend takes on the communication with the host
kernel and the network interface and forwards packets to/from the
peer device in QEMU.

Usage example:

-device virtio-net-pci,netdev=guest1,mac=00:16:35:AF:AA:5C
-netdev af-xdp,ifname=ens6f1np1,id=guest1,mode=native,queues=1

XDP program bridges the socket with a network interface. It can be
attached to the interface in 2 different modes:

1. skb - this mode should work for any interface and doesn't require
driver support. With a caveat of lower performance.

2. native - this does require support from the driver and allows to
bypass skb allocation in the kernel and potentially use
zero-copy while getting packets in/out userspace.

By default, QEMU will try to use native mode and fall back to skb.
Mode can be forced via 'mode' option. To force 'copy' even in native
mode, use 'force-copy=on' option. This might be useful if there is
some issue with the driver.

Option 'queues=N' allows to specify how many device queues should
be open. Note that all the queues that are not open are still
functional and can receive traffic, but it will not be delivered to
QEMU. So, the number of device queues should generally match the
QEMU configuration, unless the device is shared with something
else and the traffic re-direction to appropriate queues is correctly
configured on a device level (e.g. with ethtool -N).
'start-queue=M' option can be used to specify from which queue id
QEMU should start configuring 'N' queues. It might also be necessary
to use this option with certain NICs, e.g. MLX5 NICs. See the docs
for examples.

In a general case QEMU will need CAP_NET_ADMIN and CAP_SYS_ADMIN
or CAP_BPF capabilities in order to load default XSK/XDP programs to
the network interface and configure BPF maps. It is possible, however,
to run with no capabilities. For that to work, an external process
with enough capabilities will need to pre-load default XSK program,
create AF_XDP sockets and pass their file descriptors to QEMU process
on startup via 'sock-fds' option. Network backend will need to be
configured with 'inhibit=on' to avoid loading of the program.
QEMU will need 32 MB of locked memory (RLIMIT_MEMLOCK) per queue
or CAP_IPC_LOCK.

There are few performance challenges with the current network backends.

First is that they do not support IO threads. This means that data
path is handled by the main thread in QEMU and may slow down other
work or may be slowed down by some other work. This also means that
taking advantage of multi-queue is generally not possible today.

Another thing is that data path is going through the device emulation
code, which is not really optimized for performance. The fastest
"frontend" device is virtio-net. But it's not optimized for heavy
traffic either, because it expects such use-cases to be handled via
some implementation of vhost (user, kernel, vdpa). In practice, we
have virtio notifications and rcu lock/unlock on a per-packet basis
and not very efficient accesses to the guest memory. Communication
channels between backend and frontend devices do not allow passing
more than one packet at a time as well.

Some of these challenges can be avoided in the future by adding better
batching into device emulation or by implementing vhost-af-xdp variant.

There are also a few kernel limitations. AF_XDP sockets do not
support any kinds of checksum or segmentation offloading. Buffers
are limited to a page size (4K), i.e. MTU is limited. Multi-buffer
support implementation for AF_XDP is in progress, but not ready yet.
Also, transmission in all non-zero-copy modes is synchronous, i.e.
done in a syscall. That doesn't allow high packet rates on virtual
interfaces.

However, keeping in mind all of these challenges, current implementation
of the AF_XDP backend shows a decent performance while running on top
of a physical NIC with zero-copy support.

Test setup:

2 VMs running on 2 physical hosts connected via ConnectX6-Dx card.
Network backend is configured to open the NIC directly in native mode.
The driver supports zero-copy. NIC is configured to use 1 queue.

Inside a VM - iperf3 for basic TCP performance testing and dpdk-testpmd
for PPS testing.

iperf3 result:
TCP stream : 19.1 Gbps

dpdk-testpmd (single queue, single CPU core, 64 B packets) results:
Tx only : 3.4 Mpps
Rx only : 2.0 Mpps
L2 FWD Loopback : 1.5 Mpps

In skb mode the same setup shows much lower performance, similar to
the setup where pair of physical NICs is replaced with veth pair:

iperf3 result:
TCP stream : 9 Gbps

dpdk-testpmd (single queue, single CPU core, 64 B packets) results:
Tx only : 1.2 Mpps
Rx only : 1.0 Mpps
L2 FWD Loopback : 0.7 Mpps

Results in skb mode or over the veth are close to results of a tap
backend with vhost=on and disabled segmentation offloading bridged
with a NIC.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> (docker/lcitool)
Signed-off-by: Jason Wang <jasowang@redhat.com>

show more ...


# f03e0cf6 31-Jul-2023 Yuri Benditovich <yuri.benditovich@daynix.com>

tap: Add check for USO features

Tap indicates support for USO features according to
capabilities of current kernel module.

Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
Signed-off-b

tap: Add check for USO features

Tap indicates support for USO features according to
capabilities of current kernel module.

Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
Signed-off-by: Andrew Melnychecnko <andrew@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>

show more ...


# 2ab0ec31 31-Jul-2023 Andrew Melnychenko <andrew@daynix.com>

tap: Add USO support to tap device.

Passing additional parameters (USOv4 and USOv6 offloads) when
setting TAP offloads

Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
Signed-off-by: A

tap: Add USO support to tap device.

Passing additional parameters (USOv4 and USOv6 offloads) when
setting TAP offloads

Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>

show more ...


Revision tags: v8.0.2, v8.0.1, v7.2.3, v7.2.2, v8.0.0, v8.0.0-rc4, v8.0.0-rc3, v7.2.1, v8.0.0-rc2, v8.0.0-rc1, v8.0.0-rc0
# 481c5232 23-Feb-2023 Akihiko Odaki <akihiko.odaki@daynix.com>

net: Strip virtio-net header when dumping

filter-dump specifiees Ethernet as PCAP LinkType, which does not expect
virtio-net header. Having virtio-net header in such PCAP file breaks
PCAP unconsumab

net: Strip virtio-net header when dumping

filter-dump specifiees Ethernet as PCAP LinkType, which does not expect
virtio-net header. Having virtio-net header in such PCAP file breaks
PCAP unconsumable. Unfortunately currently there is no LinkType for
virtio-net so for now strip virtio-net header to convert the output to
Ethernet.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>

show more ...


Revision tags: v8.0.2, v8.0.1, v7.2.3, v7.2.2, v8.0.0, v8.0.0-rc4, v8.0.0-rc3, v7.2.1, v8.0.0-rc2, v8.0.0-rc1, v8.0.0-rc0
# 481c5232 23-Feb-2023 Akihiko Odaki <akihiko.odaki@daynix.com>

net: Strip virtio-net header when dumping

filter-dump specifiees Ethernet as PCAP LinkType, which does not expect
virtio-net header. Having virtio-net header in such PCAP file breaks
PCAP unconsumab

net: Strip virtio-net header when dumping

filter-dump specifiees Ethernet as PCAP LinkType, which does not expect
virtio-net header. Having virtio-net header in such PCAP file breaks
PCAP unconsumable. Unfortunately currently there is no LinkType for
virtio-net so for now strip virtio-net header to convert the output to
Ethernet.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>

show more ...


Revision tags: v8.0.2, v8.0.1, v7.2.3, v7.2.2, v8.0.0, v8.0.0-rc4, v8.0.0-rc3, v7.2.1, v8.0.0-rc2, v8.0.0-rc1, v8.0.0-rc0
# 481c5232 23-Feb-2023 Akihiko Odaki <akihiko.odaki@daynix.com>

net: Strip virtio-net header when dumping

filter-dump specifiees Ethernet as PCAP LinkType, which does not expect
virtio-net header. Having virtio-net header in such PCAP file breaks
PCAP unconsumab

net: Strip virtio-net header when dumping

filter-dump specifiees Ethernet as PCAP LinkType, which does not expect
virtio-net header. Having virtio-net header in such PCAP file breaks
PCAP unconsumable. Unfortunately currently there is no LinkType for
virtio-net so for now strip virtio-net header to convert the output to
Ethernet.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>

show more ...


Revision tags: v7.2.0, v7.2.0-rc4, v7.2.0-rc3, v7.2.0-rc2, v7.2.0-rc1
# 3b0cca8e 10-Nov-2022 Thomas Huth <thuth@redhat.com>

net: Replace "Supported NIC models" with "Available NIC models"

Just because a NIC model is compiled into the QEMU binary does not
necessary mean that it can be used with each and every machine.
So

net: Replace "Supported NIC models" with "Available NIC models"

Just because a NIC model is compiled into the QEMU binary does not
necessary mean that it can be used with each and every machine.
So let's rather talk about "available" models instead of "supported"
models, just to avoid confusion.

Reviewed-by: Claudio Fontana <cfontana@suse.de>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>

show more ...


# 27c81924 10-Nov-2022 Thomas Huth <thuth@redhat.com>

net: Restore printing of the help text with "-nic help"

Running QEMU with "-nic help" used to work in QEMU 5.2 and earlier versions
(it showed the available netdev backends), but this feature got br

net: Restore printing of the help text with "-nic help"

Running QEMU with "-nic help" used to work in QEMU 5.2 and earlier versions
(it showed the available netdev backends), but this feature got broken during
some refactoring in version 6.0. Let's restore the old behavior, and while
we're at it, let's also print the available NIC models here now since this
option can be used to configure both, netdev backend and model in one go.

Fixes: ad6f932fe8 ("net: do not exit on "netdev_add help" monitor command")
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>

show more ...


12345678910