11ffad77cSAlberto GarciaThe QEMU throttling infrastructure 21ffad77cSAlberto Garcia================================== 31ffad77cSAlberto GarciaCopyright (C) 2016 Igalia, S.L. 41ffad77cSAlberto GarciaAuthor: Alberto Garcia <berto@igalia.com> 51ffad77cSAlberto Garcia 61ffad77cSAlberto GarciaThis work is licensed under the terms of the GNU GPL, version 2 or 71ffad77cSAlberto Garcialater. See the COPYING file in the top-level directory. 81ffad77cSAlberto Garcia 91ffad77cSAlberto GarciaIntroduction 101ffad77cSAlberto Garcia------------ 111ffad77cSAlberto GarciaQEMU includes a throttling module that can be used to set limits to 121ffad77cSAlberto GarciaI/O operations. The code itself is generic and independent of the I/O 13*cb8d4c8fSStefan Weilunits, but it is currently used to limit the number of bytes per second 141ffad77cSAlberto Garciaand operations per second (IOPS) when performing disk I/O. 151ffad77cSAlberto Garcia 161ffad77cSAlberto GarciaThis document explains how to use the throttling code in QEMU, and how 171ffad77cSAlberto Garciait works internally. The implementation is in throttle.c. 181ffad77cSAlberto Garcia 191ffad77cSAlberto Garcia 201ffad77cSAlberto GarciaUsing throttling to limit disk I/O 211ffad77cSAlberto Garcia---------------------------------- 221ffad77cSAlberto GarciaTwo aspects of the disk I/O can be limited: the number of bytes per 231ffad77cSAlberto Garciasecond and the number of operations per second (IOPS). For each one of 241ffad77cSAlberto Garciathem the user can set a global limit or separate limits for read and 251ffad77cSAlberto Garciawrite operations. This gives us a total of six different parameters. 261ffad77cSAlberto Garcia 271ffad77cSAlberto GarciaI/O limits can be set using the throttling.* parameters of -drive, or 281ffad77cSAlberto Garciausing the QMP 'block_set_io_throttle' command. These are the names of 291ffad77cSAlberto Garciathe parameters for both cases: 301ffad77cSAlberto Garcia 311ffad77cSAlberto Garcia|-----------------------+-----------------------| 321ffad77cSAlberto Garcia| -drive | block_set_io_throttle | 331ffad77cSAlberto Garcia|-----------------------+-----------------------| 341ffad77cSAlberto Garcia| throttling.iops-total | iops | 351ffad77cSAlberto Garcia| throttling.iops-read | iops_rd | 361ffad77cSAlberto Garcia| throttling.iops-write | iops_wr | 371ffad77cSAlberto Garcia| throttling.bps-total | bps | 381ffad77cSAlberto Garcia| throttling.bps-read | bps_rd | 391ffad77cSAlberto Garcia| throttling.bps-write | bps_wr | 401ffad77cSAlberto Garcia|-----------------------+-----------------------| 411ffad77cSAlberto Garcia 421ffad77cSAlberto GarciaIt is possible to set limits for both IOPS and bps and the same time, 431ffad77cSAlberto Garciaand for each case we can decide whether to have separate read and 441ffad77cSAlberto Garciawrite limits or not, but note that if iops-total is set then neither 451ffad77cSAlberto Garciaiops-read nor iops-write can be set. The same applies to bps-total and 461ffad77cSAlberto Garciabps-read/write. 471ffad77cSAlberto Garcia 481ffad77cSAlberto GarciaThe default value of these parameters is 0, and it means 'unlimited'. 491ffad77cSAlberto Garcia 501ffad77cSAlberto GarciaIn its most basic usage, the user can add a drive to QEMU with a limit 511ffad77cSAlberto Garciaof 100 IOPS with the following -drive line: 521ffad77cSAlberto Garcia 531ffad77cSAlberto Garcia -drive file=hd0.qcow2,throttling.iops-total=100 541ffad77cSAlberto Garcia 551ffad77cSAlberto GarciaWe can do the same using QMP. In this case all these parameters are 561ffad77cSAlberto Garciamandatory, so we must set to 0 the ones that we don't want to limit: 571ffad77cSAlberto Garcia 581ffad77cSAlberto Garcia { "execute": "block_set_io_throttle", 591ffad77cSAlberto Garcia "arguments": { 601ffad77cSAlberto Garcia "device": "virtio0", 611ffad77cSAlberto Garcia "iops": 100, 621ffad77cSAlberto Garcia "iops_rd": 0, 631ffad77cSAlberto Garcia "iops_wr": 0, 641ffad77cSAlberto Garcia "bps": 0, 651ffad77cSAlberto Garcia "bps_rd": 0, 661ffad77cSAlberto Garcia "bps_wr": 0 671ffad77cSAlberto Garcia } 681ffad77cSAlberto Garcia } 691ffad77cSAlberto Garcia 701ffad77cSAlberto Garcia 711ffad77cSAlberto GarciaI/O bursts 721ffad77cSAlberto Garcia---------- 731ffad77cSAlberto GarciaIn addition to the basic limits we have just seen, QEMU allows the 741ffad77cSAlberto Garciauser to do bursts of I/O for a configurable amount of time. A burst is 751ffad77cSAlberto Garciaan amount of I/O that can exceed the basic limit. Bursts are useful to 761ffad77cSAlberto Garciaallow better performance when there are peaks of activity (the OS 771ffad77cSAlberto Garciaboots, a service needs to be restarted) while keeping the average 781ffad77cSAlberto Garcialimits lower the rest of the time. 791ffad77cSAlberto Garcia 801ffad77cSAlberto GarciaTwo parameters control bursts: their length and the maximum amount of 811ffad77cSAlberto GarciaI/O they allow. These two can be configured separately for each one of 821ffad77cSAlberto Garciathe six basic parameters described in the previous section, but in 831ffad77cSAlberto Garciathis section we'll use 'iops-total' as an example. 841ffad77cSAlberto Garcia 851ffad77cSAlberto GarciaThe I/O limit during bursts is set using 'iops-total-max', and the 861ffad77cSAlberto Garciamaximum length (in seconds) is set with 'iops-total-max-length'. So if 871ffad77cSAlberto Garciawe want to configure a drive with a basic limit of 100 IOPS and allow 881ffad77cSAlberto Garciabursts of 2000 IOPS for 60 seconds, we would do it like this (the line 891ffad77cSAlberto Garciais split for clarity): 901ffad77cSAlberto Garcia 911ffad77cSAlberto Garcia -drive file=hd0.qcow2, 921ffad77cSAlberto Garcia throttling.iops-total=100, 931ffad77cSAlberto Garcia throttling.iops-total-max=2000, 941ffad77cSAlberto Garcia throttling.iops-total-max-length=60 951ffad77cSAlberto Garcia 961ffad77cSAlberto GarciaOr, with QMP: 971ffad77cSAlberto Garcia 981ffad77cSAlberto Garcia { "execute": "block_set_io_throttle", 991ffad77cSAlberto Garcia "arguments": { 1001ffad77cSAlberto Garcia "device": "virtio0", 1011ffad77cSAlberto Garcia "iops": 100, 1021ffad77cSAlberto Garcia "iops_rd": 0, 1031ffad77cSAlberto Garcia "iops_wr": 0, 1041ffad77cSAlberto Garcia "bps": 0, 1051ffad77cSAlberto Garcia "bps_rd": 0, 1061ffad77cSAlberto Garcia "bps_wr": 0, 1071ffad77cSAlberto Garcia "iops_max": 2000, 1081ffad77cSAlberto Garcia "iops_max_length": 60, 1091ffad77cSAlberto Garcia } 1101ffad77cSAlberto Garcia } 1111ffad77cSAlberto Garcia 1121ffad77cSAlberto GarciaWith this, the user can perform I/O on hd0.qcow2 at a rate of 2000 1131ffad77cSAlberto GarciaIOPS for 1 minute before it's throttled down to 100 IOPS. 1141ffad77cSAlberto Garcia 1151ffad77cSAlberto GarciaThe user will be able to do bursts again if there's a sufficiently 1161ffad77cSAlberto Garcialong period of time with unused I/O (see below for details). 1171ffad77cSAlberto Garcia 1181ffad77cSAlberto GarciaThe default value for 'iops-total-max' is 0 and it means that bursts 1191ffad77cSAlberto Garciaare not allowed. 'iops-total-max-length' can only be set if 1201ffad77cSAlberto Garcia'iops-total-max' is set as well, and its default value is 1 second. 1211ffad77cSAlberto Garcia 1221ffad77cSAlberto GarciaHere's the complete list of parameters for configuring bursts: 1231ffad77cSAlberto Garcia 1241ffad77cSAlberto Garcia|----------------------------------+-----------------------| 1251ffad77cSAlberto Garcia| -drive | block_set_io_throttle | 1261ffad77cSAlberto Garcia|----------------------------------+-----------------------| 1271ffad77cSAlberto Garcia| throttling.iops-total-max | iops_max | 1281ffad77cSAlberto Garcia| throttling.iops-total-max-length | iops_max_length | 1291ffad77cSAlberto Garcia| throttling.iops-read-max | iops_rd_max | 1301ffad77cSAlberto Garcia| throttling.iops-read-max-length | iops_rd_max_length | 1311ffad77cSAlberto Garcia| throttling.iops-write-max | iops_wr_max | 1321ffad77cSAlberto Garcia| throttling.iops-write-max-length | iops_wr_max_length | 1331ffad77cSAlberto Garcia| throttling.bps-total-max | bps_max | 1341ffad77cSAlberto Garcia| throttling.bps-total-max-length | bps_max_length | 1351ffad77cSAlberto Garcia| throttling.bps-read-max | bps_rd_max | 1361ffad77cSAlberto Garcia| throttling.bps-read-max-length | bps_rd_max_length | 1371ffad77cSAlberto Garcia| throttling.bps-write-max | bps_wr_max | 1381ffad77cSAlberto Garcia| throttling.bps-write-max-length | bps_wr_max_length | 1391ffad77cSAlberto Garcia|----------------------------------+-----------------------| 1401ffad77cSAlberto Garcia 1411ffad77cSAlberto Garcia 1421ffad77cSAlberto GarciaControlling the size of I/O operations 1431ffad77cSAlberto Garcia-------------------------------------- 1441ffad77cSAlberto GarciaWhen applying IOPS limits all I/O operations are treated equally 1451ffad77cSAlberto Garciaregardless of their size. This means that the user can take advantage 1461ffad77cSAlberto Garciaof this in order to circumvent the limits and submit one huge I/O 1471ffad77cSAlberto Garciarequest instead of several smaller ones. 1481ffad77cSAlberto Garcia 1491ffad77cSAlberto GarciaQEMU provides a setting called throttling.iops-size to prevent this 1501ffad77cSAlberto Garciafrom happening. This setting specifies the size (in bytes) of an I/O 1511ffad77cSAlberto Garciarequest for accounting purposes. Larger requests will be counted 1521ffad77cSAlberto Garciaproportionally to this size. 1531ffad77cSAlberto Garcia 1541ffad77cSAlberto GarciaFor example, if iops-size is set to 4096 then an 8KB request will be 1551ffad77cSAlberto Garciacounted as two, and a 6KB request will be counted as one and a 1561ffad77cSAlberto Garciahalf. This only applies to requests larger than iops-size: smaller 1571ffad77cSAlberto Garciarequests will be always counted as one, no matter their size. 1581ffad77cSAlberto Garcia 1591ffad77cSAlberto GarciaThe default value of iops-size is 0 and it means that the size of the 1601ffad77cSAlberto Garciarequests is never taken into account when applying IOPS limits. 1611ffad77cSAlberto Garcia 1621ffad77cSAlberto Garcia 1631ffad77cSAlberto GarciaApplying I/O limits to groups of disks 1641ffad77cSAlberto Garcia-------------------------------------- 1651ffad77cSAlberto GarciaIn all the examples so far we have seen how to apply limits to the I/O 1661ffad77cSAlberto Garciaperformed on individual drives, but QEMU allows grouping drives so 1671ffad77cSAlberto Garciathey all share the same limits. 1681ffad77cSAlberto Garcia 1691ffad77cSAlberto GarciaThe way it works is that each drive with I/O limits is assigned to a 1701ffad77cSAlberto Garciagroup named using the throttling.group parameter. If this parameter is 1711ffad77cSAlberto Garcianot specified, then the device name (i.e. 'virtio0', 'ide0-hd0') will 1721ffad77cSAlberto Garciabe used as the group name. 1731ffad77cSAlberto Garcia 1741ffad77cSAlberto GarciaLimits set using the throttling.* parameters discussed earlier in this 1751ffad77cSAlberto Garciadocument apply to the combined I/O of all members of a group. 1761ffad77cSAlberto Garcia 1771ffad77cSAlberto GarciaConsider this example: 1781ffad77cSAlberto Garcia 1791ffad77cSAlberto Garcia -drive file=hd1.qcow2,throttling.iops-total=6000,throttling.group=foo 1801ffad77cSAlberto Garcia -drive file=hd2.qcow2,throttling.iops-total=6000,throttling.group=foo 1811ffad77cSAlberto Garcia -drive file=hd3.qcow2,throttling.iops-total=3000,throttling.group=bar 1821ffad77cSAlberto Garcia -drive file=hd4.qcow2,throttling.iops-total=6000,throttling.group=foo 1831ffad77cSAlberto Garcia -drive file=hd5.qcow2,throttling.iops-total=3000,throttling.group=bar 1841ffad77cSAlberto Garcia -drive file=hd6.qcow2,throttling.iops-total=5000 1851ffad77cSAlberto Garcia 1861ffad77cSAlberto GarciaHere hd1, hd2 and hd4 are all members of a group named 'foo' with a 1871ffad77cSAlberto Garciacombined IOPS limit of 6000, and hd3 and hd5 are members of 'bar'. hd6 1881ffad77cSAlberto Garciais left alone (technically it is part of a 1-member group). 1891ffad77cSAlberto Garcia 1901ffad77cSAlberto GarciaLimits are applied in a round-robin fashion so if there are concurrent 1911ffad77cSAlberto GarciaI/O requests on several drives of the same group they will be 1921ffad77cSAlberto Garciadistributed evenly. 1931ffad77cSAlberto Garcia 1941ffad77cSAlberto GarciaWhen I/O limits are applied to an existing drive using the QMP command 1951ffad77cSAlberto Garcia'block_set_io_throttle', the following things need to be taken into 1961ffad77cSAlberto Garciaaccount: 1971ffad77cSAlberto Garcia 1981ffad77cSAlberto Garcia - I/O limits are shared within the same group, so new values will 1991ffad77cSAlberto Garcia affect all members and overwrite the previous settings. In other 2001ffad77cSAlberto Garcia words: if different limits are applied to members of the same 2011ffad77cSAlberto Garcia group, the last one wins. 2021ffad77cSAlberto Garcia 2031ffad77cSAlberto Garcia - If 'group' is unset it is assumed to be the current group of that 2041ffad77cSAlberto Garcia drive. If the drive is not in a group yet, it will be added to a 2051ffad77cSAlberto Garcia group named after the device name. 2061ffad77cSAlberto Garcia 2071ffad77cSAlberto Garcia - If 'group' is set then the drive will be moved to that group if 2081ffad77cSAlberto Garcia it was member of a different one. In this case the limits 2091ffad77cSAlberto Garcia specified in the parameters will be applied to the new group 2101ffad77cSAlberto Garcia only. 2111ffad77cSAlberto Garcia 2121ffad77cSAlberto Garcia - I/O limits can be disabled by setting all of them to 0. In this 2131ffad77cSAlberto Garcia case the device will be removed from its group and the rest of 2141ffad77cSAlberto Garcia its members will not be affected. The 'group' parameter is 2151ffad77cSAlberto Garcia ignored. 2161ffad77cSAlberto Garcia 2171ffad77cSAlberto Garcia 2181ffad77cSAlberto GarciaThe Leaky Bucket algorithm 2191ffad77cSAlberto Garcia-------------------------- 2201ffad77cSAlberto GarciaI/O limits in QEMU are implemented using the leaky bucket algorithm 2211ffad77cSAlberto Garcia(specifically the "Leaky bucket as a meter" variant). 2221ffad77cSAlberto Garcia 2231ffad77cSAlberto GarciaThis algorithm uses the analogy of a bucket that leaks water 2241ffad77cSAlberto Garciaconstantly. The water that gets into the bucket represents the I/O 2251ffad77cSAlberto Garciathat has been performed, and no more I/O is allowed once the bucket is 2261ffad77cSAlberto Garciafull. 2271ffad77cSAlberto Garcia 2281ffad77cSAlberto GarciaTo see the way this corresponds to the throttling parameters in QEMU, 2291ffad77cSAlberto Garciaconsider the following values: 2301ffad77cSAlberto Garcia 2311ffad77cSAlberto Garcia iops-total=100 2321ffad77cSAlberto Garcia iops-total-max=2000 2331ffad77cSAlberto Garcia iops-total-max-length=60 2341ffad77cSAlberto Garcia 2351ffad77cSAlberto Garcia - Water leaks from the bucket at a rate of 100 IOPS. 2361ffad77cSAlberto Garcia - Water can be added to the bucket at a rate of 2000 IOPS. 2371ffad77cSAlberto Garcia - The size of the bucket is 2000 x 60 = 120000 2381ffad77cSAlberto Garcia - If 'iops-total-max-length' is unset then the bucket size is 100. 2391ffad77cSAlberto Garcia 2401ffad77cSAlberto GarciaThe bucket is initially empty, therefore water can be added until it's 2411ffad77cSAlberto Garciafull at a rate of 2000 IOPS (the burst rate). Once the bucket is full 2421ffad77cSAlberto Garciawe can only add as much water as it leaks, therefore the I/O rate is 2431ffad77cSAlberto Garciareduced to 100 IOPS. If we add less water than it leaks then the 2441ffad77cSAlberto Garciabucket will start to empty, allowing for bursts again. 2451ffad77cSAlberto Garcia 2461ffad77cSAlberto GarciaNote that since water is leaking from the bucket even during bursts, 2471ffad77cSAlberto Garciait will take a bit more than 60 seconds at 2000 IOPS to fill it 2481ffad77cSAlberto Garciaup. After those 60 seconds the bucket will have leaked 60 x 100 = 2491ffad77cSAlberto Garcia6000, allowing for 3 more seconds of I/O at 2000 IOPS. 2501ffad77cSAlberto Garcia 2511ffad77cSAlberto GarciaAlso, due to the way the algorithm works, longer burst can be done at 2521ffad77cSAlberto Garciaa lower I/O rate, e.g. 1000 IOPS during 120 seconds. 253