xref: /qemu/docs/devel/multiple-iothreads.txt (revision 0b2675c4)
1d02d8ddeSStefan HajnocziCopyright (c) 2014-2017 Red Hat Inc.
2ac06724aSPaolo Bonzini
3ac06724aSPaolo BonziniThis work is licensed under the terms of the GNU GPL, version 2 or later.  See
4ac06724aSPaolo Bonzinithe COPYING file in the top-level directory.
5ac06724aSPaolo Bonzini
6ac06724aSPaolo Bonzini
7ac06724aSPaolo BonziniThis document explains the IOThread feature and how to write code that runs
8*0b2675c4SStefan Hajnoczioutside the BQL.
9ac06724aSPaolo Bonzini
10ac06724aSPaolo BonziniThe main loop and IOThreads
11ac06724aSPaolo Bonzini---------------------------
12ac06724aSPaolo BonziniQEMU is an event-driven program that can do several things at once using an
13ac06724aSPaolo Bonzinievent loop.  The VNC server and the QMP monitor are both processed from the
14ac06724aSPaolo Bonzinisame event loop, which monitors their file descriptors until they become
15ac06724aSPaolo Bonzinireadable and then invokes a callback.
16ac06724aSPaolo Bonzini
17ac06724aSPaolo BonziniThe default event loop is called the main loop (see main-loop.c).  It is
18ac06724aSPaolo Bonzinipossible to create additional event loop threads using -object
19ac06724aSPaolo Bonziniiothread,id=my-iothread.
20ac06724aSPaolo Bonzini
21ac06724aSPaolo BonziniSide note: The main loop and IOThread are both event loops but their code is
22ac06724aSPaolo Bonzininot shared completely.  Sometimes it is useful to remember that although they
23ac06724aSPaolo Bonziniare conceptually similar they are currently not interchangeable.
24ac06724aSPaolo Bonzini
25ac06724aSPaolo BonziniWhy IOThreads are useful
26ac06724aSPaolo Bonzini------------------------
27ac06724aSPaolo BonziniIOThreads allow the user to control the placement of work.  The main loop is a
28ac06724aSPaolo Bonziniscalability bottleneck on hosts with many CPUs.  Work can be spread across
29ac06724aSPaolo Bonziniseveral IOThreads instead of just one main loop.  When set up correctly this
30ac06724aSPaolo Bonzinican improve I/O latency and reduce jitter seen by the guest.
31ac06724aSPaolo Bonzini
32*0b2675c4SStefan HajnocziThe main loop is also deeply associated with the BQL, which is a
33*0b2675c4SStefan Hajnocziscalability bottleneck in itself.  vCPU threads and the main loop use the BQL
34*0b2675c4SStefan Hajnoczito serialize execution of QEMU code.  This mutex is necessary because a lot of
35*0b2675c4SStefan HajnocziQEMU's code historically was not thread-safe.
36ac06724aSPaolo Bonzini
37ac06724aSPaolo BonziniThe fact that all I/O processing is done in a single main loop and that the
38*0b2675c4SStefan HajnocziBQL is contended by all vCPU threads and the main loop explain
39ac06724aSPaolo Bonziniwhy it is desirable to place work into IOThreads.
40ac06724aSPaolo Bonzini
41ac06724aSPaolo BonziniThe experimental virtio-blk data-plane implementation has been benchmarked and
42ac06724aSPaolo Bonzinishows these effects:
43ac06724aSPaolo Bonziniftp://public.dhe.ibm.com/linux/pdfs/KVM_Virtualized_IO_Performance_Paper.pdf
44ac06724aSPaolo Bonzini
45ac06724aSPaolo BonziniHow to program for IOThreads
46ac06724aSPaolo Bonzini----------------------------
47ac06724aSPaolo BonziniThe main difference between legacy code and new code that can run in an
48ac06724aSPaolo BonziniIOThread is dealing explicitly with the event loop object, AioContext
49ac06724aSPaolo Bonzini(see include/block/aio.h).  Code that only works in the main loop
50ac06724aSPaolo Bonziniimplicitly uses the main loop's AioContext.  Code that supports running
51ac06724aSPaolo Bonziniin IOThreads must be aware of its AioContext.
52ac06724aSPaolo Bonzini
53ac06724aSPaolo BonziniAioContext supports the following services:
54ac06724aSPaolo Bonzini * File descriptor monitoring (read/write/error on POSIX hosts)
55ac06724aSPaolo Bonzini * Event notifiers (inter-thread signalling)
56ac06724aSPaolo Bonzini * Timers
57ac06724aSPaolo Bonzini * Bottom Halves (BH) deferred callbacks
58ac06724aSPaolo Bonzini
59ac06724aSPaolo BonziniThere are several old APIs that use the main loop AioContext:
60ac06724aSPaolo Bonzini * LEGACY qemu_aio_set_fd_handler() - monitor a file descriptor
61ac06724aSPaolo Bonzini * LEGACY qemu_aio_set_event_notifier() - monitor an event notifier
62ac06724aSPaolo Bonzini * LEGACY timer_new_ms() - create a timer
63ac06724aSPaolo Bonzini * LEGACY qemu_bh_new() - create a BH
649c86c97fSAlexander Bulekov * LEGACY qemu_bh_new_guarded() - create a BH with a device re-entrancy guard
65ac06724aSPaolo Bonzini * LEGACY qemu_aio_wait() - run an event loop iteration
66ac06724aSPaolo Bonzini
67ac06724aSPaolo BonziniSince they implicitly work on the main loop they cannot be used in code that
68ac06724aSPaolo Bonziniruns in an IOThread.  They might cause a crash or deadlock if called from an
69*0b2675c4SStefan HajnocziIOThread since the BQL is not held.
70ac06724aSPaolo Bonzini
71ac06724aSPaolo BonziniInstead, use the AioContext functions directly (see include/block/aio.h):
72ac06724aSPaolo Bonzini * aio_set_fd_handler() - monitor a file descriptor
73ac06724aSPaolo Bonzini * aio_set_event_notifier() - monitor an event notifier
74ac06724aSPaolo Bonzini * aio_timer_new() - create a timer
75ac06724aSPaolo Bonzini * aio_bh_new() - create a BH
769c86c97fSAlexander Bulekov * aio_bh_new_guarded() - create a BH with a device re-entrancy guard
77ac06724aSPaolo Bonzini * aio_poll() - run an event loop iteration
78ac06724aSPaolo Bonzini
799c86c97fSAlexander BulekovThe qemu_bh_new_guarded/aio_bh_new_guarded APIs accept a "MemReentrancyGuard"
809c86c97fSAlexander Bulekovargument, which is used to check for and prevent re-entrancy problems. For
819c86c97fSAlexander BulekovBHs associated with devices, the reentrancy-guard is contained in the
829c86c97fSAlexander Bulekovcorresponding DeviceState and named "mem_reentrancy_guard".
839c86c97fSAlexander Bulekov
84ac06724aSPaolo BonziniThe AioContext can be obtained from the IOThread using
85ac06724aSPaolo Bonziniiothread_get_aio_context() or for the main loop using qemu_get_aio_context().
86ac06724aSPaolo BonziniCode that takes an AioContext argument works both in IOThreads or the main
87ac06724aSPaolo Bonziniloop, depending on which AioContext instance the caller passes in.
88ac06724aSPaolo Bonzini
89ac06724aSPaolo BonziniHow to synchronize with an IOThread
90ac06724aSPaolo Bonzini-----------------------------------
91e0444c27SStefan HajnocziVariables that can be accessed by multiple threads require some form of
92e0444c27SStefan Hajnoczisynchronization such as qemu_mutex_lock(), rcu_read_lock(), etc.
93ac06724aSPaolo Bonzini
94e0444c27SStefan HajnocziAioContext functions like aio_set_fd_handler(), aio_set_event_notifier(),
95e0444c27SStefan Hajnocziaio_bh_new(), and aio_timer_new() are thread-safe. They can be used to trigger
96e0444c27SStefan Hajnocziactivity in an IOThread.
97ac06724aSPaolo Bonzini
98ac06724aSPaolo BonziniSide note: the best way to schedule a function call across threads is to call
99e0444c27SStefan Hajnocziaio_bh_schedule_oneshot().
100e0444c27SStefan Hajnoczi
101e0444c27SStefan HajnocziThe main loop thread can wait synchronously for a condition using
102e0444c27SStefan HajnocziAIO_WAIT_WHILE().
103ac06724aSPaolo Bonzini
104ac06724aSPaolo BonziniAioContext and the block layer
105ac06724aSPaolo Bonzini------------------------------
106ac06724aSPaolo BonziniThe AioContext originates from the QEMU block layer, even though nowadays
107ac06724aSPaolo BonziniAioContext is a generic event loop that can be used by any QEMU subsystem.
108ac06724aSPaolo Bonzini
109ac06724aSPaolo BonziniThe block layer has support for AioContext integrated.  Each BlockDriverState
110142e6907SEmanuele Giuseppe Espositois associated with an AioContext using bdrv_try_change_aio_context() and
111ac06724aSPaolo Bonzinibdrv_get_aio_context().  This allows block layer code to process I/O inside the
112ac06724aSPaolo Bonziniright AioContext.  Other subsystems may wish to follow a similar approach.
113ac06724aSPaolo Bonzini
114ac06724aSPaolo BonziniBlock layer code must therefore expect to run in an IOThread and avoid using
115ac06724aSPaolo Bonziniold APIs that implicitly use the main loop.  See the "How to program for
116ac06724aSPaolo BonziniIOThreads" above for information on how to do that.
117ac06724aSPaolo Bonzini
118ac06724aSPaolo BonziniCode running in the monitor typically needs to ensure that past
119ac06724aSPaolo Bonzinirequests from the guest are completed.  When a block device is running
120ac06724aSPaolo Bonziniin an IOThread, the IOThread can also process requests from the guest
121ac06724aSPaolo Bonzini(via ioeventfd).  To achieve both objects, wrap the code between
122ac06724aSPaolo Bonzinibdrv_drained_begin() and bdrv_drained_end(), thus creating a "drained
123e0444c27SStefan Hajnoczisection".
124ac06724aSPaolo Bonzini
125e0444c27SStefan HajnocziLong-running jobs (usually in the form of coroutines) are often scheduled in
126e0444c27SStefan Hajnoczithe BlockDriverState's AioContext.  The functions
127e0444c27SStefan Hajnoczibdrv_add/remove_aio_context_notifier, or alternatively
128e0444c27SStefan Hajnocziblk_add/remove_aio_context_notifier if you use BlockBackends, can be used to
129e0444c27SStefan Hajnocziget a notification whenever bdrv_try_change_aio_context() moves a
130ac06724aSPaolo BonziniBlockDriverState to a different AioContext.
131