xref: /qemu/docs/qcow2-cache.txt (revision 8f577583)
17f65ce83SAlberto Garciaqcow2 L2/refcount cache configuration
27f65ce83SAlberto Garcia=====================================
37f65ce83SAlberto GarciaCopyright (C) 2015 Igalia, S.L.
47f65ce83SAlberto GarciaAuthor: Alberto Garcia <berto@igalia.com>
57f65ce83SAlberto Garcia
67f65ce83SAlberto GarciaThis work is licensed under the terms of the GNU GPL, version 2 or
77f65ce83SAlberto Garcialater. See the COPYING file in the top-level directory.
87f65ce83SAlberto Garcia
97f65ce83SAlberto GarciaIntroduction
107f65ce83SAlberto Garcia------------
117f65ce83SAlberto GarciaThe QEMU qcow2 driver has two caches that can improve the I/O
127f65ce83SAlberto Garciaperformance significantly. However, setting the right cache sizes is
137f65ce83SAlberto Garcianot a straightforward operation.
147f65ce83SAlberto Garcia
157f65ce83SAlberto GarciaThis document attempts to give an overview of the L2 and refcount
167f65ce83SAlberto Garciacaches, and how to configure them.
177f65ce83SAlberto Garcia
187f65ce83SAlberto GarciaPlease refer to the docs/specs/qcow2.txt file for an in-depth
197f65ce83SAlberto Garciatechnical description of the qcow2 file format.
207f65ce83SAlberto Garcia
217f65ce83SAlberto Garcia
227f65ce83SAlberto GarciaClusters
237f65ce83SAlberto Garcia--------
247f65ce83SAlberto GarciaA qcow2 file is organized in units of constant size called clusters.
257f65ce83SAlberto Garcia
267f65ce83SAlberto GarciaThe cluster size is configurable, but it must be a power of two and
277f65ce83SAlberto Garciaits value 512 bytes or higher. QEMU currently defaults to 64 KB
287f65ce83SAlberto Garciaclusters, and it does not support sizes larger than 2MB.
297f65ce83SAlberto Garcia
307f65ce83SAlberto GarciaThe 'qemu-img create' command supports specifying the size using the
317f65ce83SAlberto Garciacluster_size option:
327f65ce83SAlberto Garcia
337f65ce83SAlberto Garcia   qemu-img create -f qcow2 -o cluster_size=128K hd.qcow2 4G
347f65ce83SAlberto Garcia
357f65ce83SAlberto Garcia
367f65ce83SAlberto GarciaThe L2 tables
377f65ce83SAlberto Garcia-------------
387f65ce83SAlberto GarciaThe qcow2 format uses a two-level structure to map the virtual disk as
397f65ce83SAlberto Garciaseen by the guest to the disk image in the host. These structures are
407f65ce83SAlberto Garciacalled the L1 and L2 tables.
417f65ce83SAlberto Garcia
427f65ce83SAlberto GarciaThere is one single L1 table per disk image. The table is small and is
437f65ce83SAlberto Garciaalways kept in memory.
447f65ce83SAlberto Garcia
457f65ce83SAlberto GarciaThere can be many L2 tables, depending on how much space has been
467f65ce83SAlberto Garciaallocated in the image. Each table is one cluster in size. In order to
477f65ce83SAlberto Garciaread or write data from the virtual disk, QEMU needs to read its
487f65ce83SAlberto Garciacorresponding L2 table to find out where that data is located. Since
497f65ce83SAlberto Garciareading the table for each I/O operation can be expensive, QEMU keeps
507f65ce83SAlberto Garciaan L2 cache in memory to speed up disk access.
517f65ce83SAlberto Garcia
527f65ce83SAlberto GarciaThe size of the L2 cache can be configured, and setting the right
537f65ce83SAlberto Garciavalue can improve the I/O performance significantly.
547f65ce83SAlberto Garcia
557f65ce83SAlberto Garcia
567f65ce83SAlberto GarciaThe refcount blocks
577f65ce83SAlberto Garcia-------------------
587f65ce83SAlberto GarciaThe qcow2 format also mantains a reference count for each cluster.
597f65ce83SAlberto GarciaReference counts are used for cluster allocation and internal
607f65ce83SAlberto Garciasnapshots. The data is stored in a two-level structure similar to the
617f65ce83SAlberto GarciaL1/L2 tables described above.
627f65ce83SAlberto Garcia
637f65ce83SAlberto GarciaThe second level structures are called refcount blocks, are also one
647f65ce83SAlberto Garciacluster in size and the number is also variable and dependent on the
657f65ce83SAlberto Garciaamount of allocated space.
667f65ce83SAlberto Garcia
677f65ce83SAlberto GarciaEach block contains a number of refcount entries. Their size (in bits)
687f65ce83SAlberto Garciais a power of two and must not be higher than 64. It defaults to 16
697f65ce83SAlberto Garciabits, but a different value can be set using the refcount_bits option:
707f65ce83SAlberto Garcia
717f65ce83SAlberto Garcia   qemu-img create -f qcow2 -o refcount_bits=8 hd.qcow2 4G
727f65ce83SAlberto Garcia
737f65ce83SAlberto GarciaQEMU keeps a refcount cache to speed up I/O much like the
747f65ce83SAlberto Garciaaforementioned L2 cache, and its size can also be configured.
757f65ce83SAlberto Garcia
767f65ce83SAlberto Garcia
777f65ce83SAlberto GarciaChoosing the right cache sizes
787f65ce83SAlberto Garcia------------------------------
797f65ce83SAlberto GarciaIn order to choose the cache sizes we need to know how they relate to
807f65ce83SAlberto Garciathe amount of allocated space.
817f65ce83SAlberto Garcia
827f65ce83SAlberto GarciaThe amount of virtual disk that can be mapped by the L2 and refcount
837f65ce83SAlberto Garciacaches (in bytes) is:
847f65ce83SAlberto Garcia
857f65ce83SAlberto Garcia   disk_size = l2_cache_size * cluster_size / 8
867f65ce83SAlberto Garcia   disk_size = refcount_cache_size * cluster_size * 8 / refcount_bits
877f65ce83SAlberto Garcia
887f65ce83SAlberto GarciaWith the default values for cluster_size (64KB) and refcount_bits
897f65ce83SAlberto Garcia(16), that is
907f65ce83SAlberto Garcia
917f65ce83SAlberto Garcia   disk_size = l2_cache_size * 8192
927f65ce83SAlberto Garcia   disk_size = refcount_cache_size * 32768
937f65ce83SAlberto Garcia
947f65ce83SAlberto GarciaSo in order to cover n GB of disk space with the default values we
957f65ce83SAlberto Garcianeed:
967f65ce83SAlberto Garcia
977f65ce83SAlberto Garcia   l2_cache_size = disk_size_GB * 131072
987f65ce83SAlberto Garcia   refcount_cache_size = disk_size_GB * 32768
997f65ce83SAlberto Garcia
1007f65ce83SAlberto GarciaQEMU has a default L2 cache of 1MB (1048576 bytes) and a refcount
1017f65ce83SAlberto Garciacache of 256KB (262144 bytes), so using the formulas we've just seen
1027f65ce83SAlberto Garciawe have
1037f65ce83SAlberto Garcia
1047f65ce83SAlberto Garcia   1048576 / 131072 = 8 GB of virtual disk covered by that cache
1057f65ce83SAlberto Garcia    262144 /  32768 = 8 GB
1067f65ce83SAlberto Garcia
1077f65ce83SAlberto Garcia
1087f65ce83SAlberto GarciaHow to configure the cache sizes
1097f65ce83SAlberto Garcia--------------------------------
1107f65ce83SAlberto GarciaCache sizes can be configured using the -drive option in the
1117f65ce83SAlberto Garciacommand-line, or the 'blockdev-add' QMP command.
1127f65ce83SAlberto Garcia
1137f65ce83SAlberto GarciaThere are three options available, and all of them take bytes:
1147f65ce83SAlberto Garcia
1157f65ce83SAlberto Garcia"l2-cache-size":         maximum size of the L2 table cache
1167f65ce83SAlberto Garcia"refcount-cache-size":   maximum size of the refcount block cache
1177f65ce83SAlberto Garcia"cache-size":            maximum size of both caches combined
1187f65ce83SAlberto Garcia
1197f65ce83SAlberto GarciaThere are two things that need to be taken into account:
1207f65ce83SAlberto Garcia
1217f65ce83SAlberto Garcia - Both caches must have a size that is a multiple of the cluster
1227f65ce83SAlberto Garcia   size.
1237f65ce83SAlberto Garcia
1247f65ce83SAlberto Garcia - If you only set one of the options above, QEMU will automatically
1257f65ce83SAlberto Garcia   adjust the others so that the L2 cache is 4 times bigger than the
1267f65ce83SAlberto Garcia   refcount cache.
1277f65ce83SAlberto Garcia
1287f65ce83SAlberto GarciaThis means that these options are equivalent:
1297f65ce83SAlberto Garcia
1307f65ce83SAlberto Garcia   -drive file=hd.qcow2,l2-cache-size=2097152
1317f65ce83SAlberto Garcia   -drive file=hd.qcow2,refcount-cache-size=524288
1327f65ce83SAlberto Garcia   -drive file=hd.qcow2,cache-size=2621440
1337f65ce83SAlberto Garcia
1347f65ce83SAlberto GarciaThe reason for this 1/4 ratio is to ensure that both caches cover the
1357f65ce83SAlberto Garciasame amount of disk space. Note however that this is only valid with
1367f65ce83SAlberto Garciathe default value of refcount_bits (16). If you are using a different
1377f65ce83SAlberto Garciavalue you might want to calculate both cache sizes yourself since QEMU
1387f65ce83SAlberto Garciawill always use the same 1/4 ratio.
1397f65ce83SAlberto Garcia
1407f65ce83SAlberto GarciaIt's also worth mentioning that there's no strict need for both caches
1417f65ce83SAlberto Garciato cover the same amount of disk space. The refcount cache is used
1427f65ce83SAlberto Garciamuch less often than the L2 cache, so it's perfectly reasonable to
1437f65ce83SAlberto Garciakeep it small.
1447f65ce83SAlberto Garcia
1457f65ce83SAlberto Garcia
1467f65ce83SAlberto GarciaReducing the memory usage
1477f65ce83SAlberto Garcia-------------------------
1487f65ce83SAlberto GarciaIt is possible to clean unused cache entries in order to reduce the
1497f65ce83SAlberto Garciamemory usage during periods of low I/O activity.
1507f65ce83SAlberto Garcia
1517f65ce83SAlberto GarciaThe parameter "cache-clean-interval" defines an interval (in seconds).
1527f65ce83SAlberto GarciaAll cache entries that haven't been accessed during that interval are
1537f65ce83SAlberto Garciaremoved from memory.
1547f65ce83SAlberto Garcia
1557f65ce83SAlberto GarciaThis example removes all unused cache entries every 15 minutes:
1567f65ce83SAlberto Garcia
1577f65ce83SAlberto Garcia   -drive file=hd.qcow2,cache-clean-interval=900
1587f65ce83SAlberto Garcia
1597f65ce83SAlberto GarciaIf unset, the default value for this parameter is 0 and it disables
1607f65ce83SAlberto Garciathis feature.
1617f65ce83SAlberto Garcia
1627f65ce83SAlberto GarciaNote that this functionality currently relies on the MADV_DONTNEED
163*8f577583SAlberto Garciaargument for madvise() to actually free the memory. This is a
164*8f577583SAlberto GarciaLinux-specific feature, so cache-clean-interval is not supported in
165*8f577583SAlberto Garciaother systems.
166