xref: /qemu/docs/qcow2-cache.txt (revision 30afc120)
17f65ce83SAlberto Garciaqcow2 L2/refcount cache configuration
27f65ce83SAlberto Garcia=====================================
3*30afc120SAlberto GarciaCopyright (C) 2015, 2018-2020 Igalia, S.L.
47f65ce83SAlberto GarciaAuthor: Alberto Garcia <berto@igalia.com>
57f65ce83SAlberto Garcia
67f65ce83SAlberto GarciaThis work is licensed under the terms of the GNU GPL, version 2 or
77f65ce83SAlberto Garcialater. See the COPYING file in the top-level directory.
87f65ce83SAlberto Garcia
97f65ce83SAlberto GarciaIntroduction
107f65ce83SAlberto Garcia------------
117f65ce83SAlberto GarciaThe QEMU qcow2 driver has two caches that can improve the I/O
127f65ce83SAlberto Garciaperformance significantly. However, setting the right cache sizes is
137f65ce83SAlberto Garcianot a straightforward operation.
147f65ce83SAlberto Garcia
157f65ce83SAlberto GarciaThis document attempts to give an overview of the L2 and refcount
167f65ce83SAlberto Garciacaches, and how to configure them.
177f65ce83SAlberto Garcia
18f3fdeb9cSPhilippe Mathieu-DaudéPlease refer to the docs/interop/qcow2.txt file for an in-depth
197f65ce83SAlberto Garciatechnical description of the qcow2 file format.
207f65ce83SAlberto Garcia
217f65ce83SAlberto Garcia
227f65ce83SAlberto GarciaClusters
237f65ce83SAlberto Garcia--------
247f65ce83SAlberto GarciaA qcow2 file is organized in units of constant size called clusters.
257f65ce83SAlberto Garcia
267f65ce83SAlberto GarciaThe cluster size is configurable, but it must be a power of two and
277f65ce83SAlberto Garciaits value 512 bytes or higher. QEMU currently defaults to 64 KB
287f65ce83SAlberto Garciaclusters, and it does not support sizes larger than 2MB.
297f65ce83SAlberto Garcia
307f65ce83SAlberto GarciaThe 'qemu-img create' command supports specifying the size using the
317f65ce83SAlberto Garciacluster_size option:
327f65ce83SAlberto Garcia
337f65ce83SAlberto Garcia   qemu-img create -f qcow2 -o cluster_size=128K hd.qcow2 4G
347f65ce83SAlberto Garcia
357f65ce83SAlberto Garcia
367f65ce83SAlberto GarciaThe L2 tables
377f65ce83SAlberto Garcia-------------
387f65ce83SAlberto GarciaThe qcow2 format uses a two-level structure to map the virtual disk as
397f65ce83SAlberto Garciaseen by the guest to the disk image in the host. These structures are
407f65ce83SAlberto Garciacalled the L1 and L2 tables.
417f65ce83SAlberto Garcia
427f65ce83SAlberto GarciaThere is one single L1 table per disk image. The table is small and is
437f65ce83SAlberto Garciaalways kept in memory.
447f65ce83SAlberto Garcia
457f65ce83SAlberto GarciaThere can be many L2 tables, depending on how much space has been
467f65ce83SAlberto Garciaallocated in the image. Each table is one cluster in size. In order to
477f65ce83SAlberto Garciaread or write data from the virtual disk, QEMU needs to read its
487f65ce83SAlberto Garciacorresponding L2 table to find out where that data is located. Since
497f65ce83SAlberto Garciareading the table for each I/O operation can be expensive, QEMU keeps
507f65ce83SAlberto Garciaan L2 cache in memory to speed up disk access.
517f65ce83SAlberto Garcia
527f65ce83SAlberto GarciaThe size of the L2 cache can be configured, and setting the right
537f65ce83SAlberto Garciavalue can improve the I/O performance significantly.
547f65ce83SAlberto Garcia
557f65ce83SAlberto Garcia
567f65ce83SAlberto GarciaThe refcount blocks
577f65ce83SAlberto Garcia-------------------
58806be373SLike XuThe qcow2 format also maintains a reference count for each cluster.
597f65ce83SAlberto GarciaReference counts are used for cluster allocation and internal
607f65ce83SAlberto Garciasnapshots. The data is stored in a two-level structure similar to the
617f65ce83SAlberto GarciaL1/L2 tables described above.
627f65ce83SAlberto Garcia
637f65ce83SAlberto GarciaThe second level structures are called refcount blocks, are also one
647f65ce83SAlberto Garciacluster in size and the number is also variable and dependent on the
657f65ce83SAlberto Garciaamount of allocated space.
667f65ce83SAlberto Garcia
677f65ce83SAlberto GarciaEach block contains a number of refcount entries. Their size (in bits)
687f65ce83SAlberto Garciais a power of two and must not be higher than 64. It defaults to 16
697f65ce83SAlberto Garciabits, but a different value can be set using the refcount_bits option:
707f65ce83SAlberto Garcia
717f65ce83SAlberto Garcia   qemu-img create -f qcow2 -o refcount_bits=8 hd.qcow2 4G
727f65ce83SAlberto Garcia
737f65ce83SAlberto GarciaQEMU keeps a refcount cache to speed up I/O much like the
747f65ce83SAlberto Garciaaforementioned L2 cache, and its size can also be configured.
757f65ce83SAlberto Garcia
767f65ce83SAlberto Garcia
777f65ce83SAlberto GarciaChoosing the right cache sizes
787f65ce83SAlberto Garcia------------------------------
797f65ce83SAlberto GarciaIn order to choose the cache sizes we need to know how they relate to
807f65ce83SAlberto Garciathe amount of allocated space.
817f65ce83SAlberto Garcia
8240fb215dSLeonid BlochThe part of the virtual disk that can be mapped by the L2 and refcount
837f65ce83SAlberto Garciacaches (in bytes) is:
847f65ce83SAlberto Garcia
857f65ce83SAlberto Garcia   disk_size = l2_cache_size * cluster_size / 8
867f65ce83SAlberto Garcia   disk_size = refcount_cache_size * cluster_size * 8 / refcount_bits
877f65ce83SAlberto Garcia
887f65ce83SAlberto GarciaWith the default values for cluster_size (64KB) and refcount_bits
8940fb215dSLeonid Bloch(16), this becomes:
907f65ce83SAlberto Garcia
917f65ce83SAlberto Garcia   disk_size = l2_cache_size * 8192
927f65ce83SAlberto Garcia   disk_size = refcount_cache_size * 32768
937f65ce83SAlberto Garcia
947f65ce83SAlberto GarciaSo in order to cover n GB of disk space with the default values we
957f65ce83SAlberto Garcianeed:
967f65ce83SAlberto Garcia
977f65ce83SAlberto Garcia   l2_cache_size = disk_size_GB * 131072
987f65ce83SAlberto Garcia   refcount_cache_size = disk_size_GB * 32768
997f65ce83SAlberto Garcia
10040fb215dSLeonid BlochFor example, 1MB of L2 cache is needed to cover every 8 GB of the virtual
10140fb215dSLeonid Blochimage size (given that the default cluster size is used):
1027f65ce83SAlberto Garcia
10340fb215dSLeonid Bloch   8 GB / 8192 = 1 MB
10440fb215dSLeonid Bloch
10540fb215dSLeonid BlochThe refcount cache is 4 times the cluster size by default. With the default
10640fb215dSLeonid Blochcluster size of 64 KB, it is 256 KB (262144 bytes). This is sufficient for
10740fb215dSLeonid Bloch8 GB of image size:
10840fb215dSLeonid Bloch
10940fb215dSLeonid Bloch   262144 * 32768 = 8 GB
1107f65ce83SAlberto Garcia
1117f65ce83SAlberto Garcia
1127f65ce83SAlberto GarciaHow to configure the cache sizes
1137f65ce83SAlberto Garcia--------------------------------
1147f65ce83SAlberto GarciaCache sizes can be configured using the -drive option in the
1157f65ce83SAlberto Garciacommand-line, or the 'blockdev-add' QMP command.
1167f65ce83SAlberto Garcia
1177f65ce83SAlberto GarciaThere are three options available, and all of them take bytes:
1187f65ce83SAlberto Garcia
1197f65ce83SAlberto Garcia"l2-cache-size":         maximum size of the L2 table cache
1207f65ce83SAlberto Garcia"refcount-cache-size":   maximum size of the refcount block cache
1217f65ce83SAlberto Garcia"cache-size":            maximum size of both caches combined
1227f65ce83SAlberto Garcia
123603790efSAlberto GarciaThere are a few things that need to be taken into account:
1247f65ce83SAlberto Garcia
125be820971SAlberto Garcia - Both caches must have a size that is a multiple of the cluster size
126be820971SAlberto Garcia   (or the cache entry size: see "Using smaller cache sizes" below).
1277f65ce83SAlberto Garcia
12880668d0fSLeonid Bloch - The maximum L2 cache size is 32 MB by default on Linux platforms (enough
12980668d0fSLeonid Bloch   for full coverage of 256 GB images, with the default cluster size). This
13080668d0fSLeonid Bloch   value can be modified using the "l2-cache-size" option. QEMU will not use
13180668d0fSLeonid Bloch   more memory than needed to hold all of the image's L2 tables, regardless
13280668d0fSLeonid Bloch   of this max. value.
13380668d0fSLeonid Bloch   On non-Linux platforms the maximal value is smaller by default (8 MB) and
13480668d0fSLeonid Bloch   this difference stems from the fact that on Linux the cache can be cleared
13580668d0fSLeonid Bloch   periodically if needed, using the "cache-clean-interval" option (see below).
13680668d0fSLeonid Bloch   The minimal L2 cache size is 2 clusters (or 2 cache entries, see below).
1377f65ce83SAlberto Garcia
138603790efSAlberto Garcia - The default (and minimum) refcount cache size is 4 clusters.
1397f65ce83SAlberto Garcia
140603790efSAlberto Garcia - If only "cache-size" is specified then QEMU will assign as much
141603790efSAlberto Garcia   memory as possible to the L2 cache before increasing the refcount
142603790efSAlberto Garcia   cache size.
1437f65ce83SAlberto Garcia
14440fb215dSLeonid Bloch - At most two of "l2-cache-size", "refcount-cache-size", and "cache-size"
14540fb215dSLeonid Bloch   can be set simultaneously.
14640fb215dSLeonid Bloch
147603790efSAlberto GarciaUnlike L2 tables, refcount blocks are not used during normal I/O but
148603790efSAlberto Garciaonly during allocations and internal snapshots. In most cases they are
149603790efSAlberto Garciaaccessed sequentially (even during random guest I/O) so increasing the
150603790efSAlberto Garciarefcount cache size won't have any measurable effect in performance
151603790efSAlberto Garcia(this can change if you are using internal snapshots, so you may want
152603790efSAlberto Garciato think about increasing the cache size if you use them heavily).
1537f65ce83SAlberto Garcia
154603790efSAlberto GarciaBefore QEMU 2.12 the refcount cache had a default size of 1/4 of the
155603790efSAlberto GarciaL2 cache size. This resulted in unnecessarily large caches, so now the
156603790efSAlberto Garciarefcount cache is as small as possible unless overridden by the user.
1577f65ce83SAlberto Garcia
1587f65ce83SAlberto Garcia
159be820971SAlberto GarciaUsing smaller cache entries
160be820971SAlberto Garcia---------------------------
161af39bd0dSAlberto GarciaThe qcow2 L2 cache can store complete tables. This means that if QEMU
162af39bd0dSAlberto Garcianeeds an entry from an L2 table then the whole table is read from disk
163af39bd0dSAlberto Garciaand is kept in the cache. If the cache is full then a complete table
164af39bd0dSAlberto Garcianeeds to be evicted first.
165be820971SAlberto Garcia
166be820971SAlberto GarciaThis can be inefficient with large cluster sizes since it results in
167be820971SAlberto Garciamore disk I/O and wastes more cache memory.
168be820971SAlberto Garcia
169be820971SAlberto GarciaSince QEMU 2.12 you can change the size of the L2 cache entry and make
170be820971SAlberto Garciait smaller than the cluster size. This can be configured using the
171be820971SAlberto Garcia"l2-cache-entry-size" parameter:
172be820971SAlberto Garcia
173be820971SAlberto Garcia   -drive file=hd.qcow2,l2-cache-size=2097152,l2-cache-entry-size=4096
174be820971SAlberto Garcia
175af39bd0dSAlberto GarciaSince QEMU 4.0 the value of l2-cache-entry-size defaults to 4KB (or
176af39bd0dSAlberto Garciathe cluster size if it's smaller).
177af39bd0dSAlberto Garcia
178be820971SAlberto GarciaSome things to take into account:
179be820971SAlberto Garcia
180be820971SAlberto Garcia - The L2 cache entry size has the same restrictions as the cluster
181be820971SAlberto Garcia   size (power of two, at least 512 bytes).
182be820971SAlberto Garcia
183be820971SAlberto Garcia - Smaller entry sizes generally improve the cache efficiency and make
184be820971SAlberto Garcia   disk I/O faster. This is particularly true with solid state drives
185be820971SAlberto Garcia   so it's a good idea to reduce the entry size in those cases. With
186be820971SAlberto Garcia   rotating hard drives the situation is a bit more complicated so you
187be820971SAlberto Garcia   should test it first and stay with the default size if unsure.
188be820971SAlberto Garcia
189be820971SAlberto Garcia - Try different entry sizes to see which one gives faster performance
190be820971SAlberto Garcia   in your case. The block size of the host filesystem is generally a
191af39bd0dSAlberto Garcia   good default (usually 4096 bytes in the case of ext4, hence the
192af39bd0dSAlberto Garcia   default).
193be820971SAlberto Garcia
194be820971SAlberto Garcia - Only the L2 cache can be configured this way. The refcount cache
195be820971SAlberto Garcia   always uses the cluster size as the entry size.
196be820971SAlberto Garcia
197be820971SAlberto Garcia - If the L2 cache is big enough to hold all of the image's L2 tables
198b749562dSLeonid Bloch   (as explained in the "Choosing the right cache sizes" and "How to
199b749562dSLeonid Bloch   configure the cache sizes" sections in this document) then none of
200b749562dSLeonid Bloch   this is necessary and you can omit the "l2-cache-entry-size"
201af39bd0dSAlberto Garcia   parameter altogether. In this case QEMU makes the entry size
202af39bd0dSAlberto Garcia   equal to the cluster size by default.
203be820971SAlberto Garcia
204be820971SAlberto Garcia
2057f65ce83SAlberto GarciaReducing the memory usage
2067f65ce83SAlberto Garcia-------------------------
2077f65ce83SAlberto GarciaIt is possible to clean unused cache entries in order to reduce the
2087f65ce83SAlberto Garciamemory usage during periods of low I/O activity.
2097f65ce83SAlberto Garcia
210e3a7b455SLeonid BlochThe parameter "cache-clean-interval" defines an interval (in seconds),
211e3a7b455SLeonid Blochafter which all the cache entries that haven't been accessed during the
212e3a7b455SLeonid Blochinterval are removed from memory. Setting this parameter to 0 disables this
213e3a7b455SLeonid Blochfeature.
2147f65ce83SAlberto Garcia
215e3a7b455SLeonid BlochThe following example removes all unused cache entries every 15 minutes:
2167f65ce83SAlberto Garcia
2177f65ce83SAlberto Garcia   -drive file=hd.qcow2,cache-clean-interval=900
2187f65ce83SAlberto Garcia
219e3a7b455SLeonid BlochIf unset, the default value for this parameter is 600 on platforms which
220e3a7b455SLeonid Blochsupport this functionality, and is 0 (disabled) on other platforms.
2217f65ce83SAlberto Garcia
222e3a7b455SLeonid BlochThis functionality currently relies on the MADV_DONTNEED argument for
223e3a7b455SLeonid Blochmadvise() to actually free the memory. This is a Linux-specific feature,
224e3a7b455SLeonid Blochso cache-clean-interval is not supported on other systems.
225*30afc120SAlberto Garcia
226*30afc120SAlberto Garcia
227*30afc120SAlberto GarciaExtended L2 Entries
228*30afc120SAlberto Garcia-------------------
229*30afc120SAlberto GarciaAll numbers shown in this document are valid for qcow2 images with normal
230*30afc120SAlberto Garcia64-bit L2 entries.
231*30afc120SAlberto Garcia
232*30afc120SAlberto GarciaImages with extended L2 entries need twice as much L2 metadata, so the L2
233*30afc120SAlberto Garciacache size must be twice as large for the same disk space.
234*30afc120SAlberto Garcia
235*30afc120SAlberto Garcia   disk_size = l2_cache_size * cluster_size / 16
236*30afc120SAlberto Garcia
237*30afc120SAlberto Garciai.e.
238*30afc120SAlberto Garcia
239*30afc120SAlberto Garcia   l2_cache_size = disk_size * 16 / cluster_size
240*30afc120SAlberto Garcia
241*30afc120SAlberto GarciaRefcount blocks are not affected by this.
242