qemu/docs/multi-thread-compression.txt

263170e6SLiang LiUse multiple thread (de)compression in live migration
263170e6SLiang Li=====================================================
263170e6SLiang LiCopyright (C) 2015 Intel Corporation
263170e6SLiang LiAuthor: Liang Li <liang.z.li@intel.com>
263170e6SLiang Li
263170e6SLiang LiThis work is licensed under the terms of the GNU GPLv2 or later. See
263170e6SLiang Lithe COPYING file in the top-level directory.
263170e6SLiang Li
263170e6SLiang LiContents:
263170e6SLiang Li=========
263170e6SLiang Li* Introduction
263170e6SLiang Li* When to use
263170e6SLiang Li* Performance
263170e6SLiang Li* Usage
263170e6SLiang Li* TODO
263170e6SLiang Li
263170e6SLiang LiIntroduction
263170e6SLiang Li============
263170e6SLiang LiInstead of sending the guest memory directly, this solution will
263170e6SLiang Licompress the RAM page before sending; after receiving, the data will
263170e6SLiang Libe decompressed. Using compression in live migration can help
263170e6SLiang Lito reduce the data transferred about 60%, this is very useful when the
263170e6SLiang Libandwidth is limited, and the total migration time can also be reduced
263170e6SLiang Liabout 70% in a typical case. In addition to this, the VM downtime can be
263170e6SLiang Lireduced about 50%. The benefit depends on data's compressibility in VM.
263170e6SLiang Li
263170e6SLiang LiThe process of compression will consume additional CPU cycles, and the
263170e6SLiang Liextra CPU cycles will increase the migration time. On the other hand,
263170e6SLiang Lithe amount of data transferred will decrease; this factor can reduce
263170e6SLiang Lithe total migration time. If the process of the compression is quick
263170e6SLiang Lienough, then the total migration time can be reduced, and multiple
263170e6SLiang Lithread compression can be used to accelerate the compression process.
263170e6SLiang Li
263170e6SLiang LiThe decompression speed of Zlib is at least 4 times as quick as
263170e6SLiang Licompression, if the source and destination CPU have equal speed,
263170e6SLiang Likeeping the compression thread count 4 times the decompression
263170e6SLiang Lithread count can avoid resource waste.
263170e6SLiang Li
263170e6SLiang LiCompression level can be used to control the compression speed and the
263170e6SLiang Licompression ratio. High compression ratio will take more time, level 0
263170e6SLiang Listands for no compression, level 1 stands for the best compression
263170e6SLiang Lispeed, and level 9 stands for the best compression ratio. Users can
263170e6SLiang Liselect a level number between 0 and 9.
263170e6SLiang Li
263170e6SLiang Li
263170e6SLiang LiWhen to use the multiple thread compression in live migration
263170e6SLiang Li=============================================================
263170e6SLiang LiCompression of data will consume extra CPU cycles; so in a system with
263170e6SLiang Lihigh overhead of CPU, avoid using this feature. When the network
263170e6SLiang Libandwidth is very limited and the CPU resource is adequate, use of
263170e6SLiang Limultiple thread compression will be very helpful. If both the CPU and
263170e6SLiang Lithe network bandwidth are adequate, use of multiple thread compression
263170e6SLiang Lican still help to reduce the migration time.
263170e6SLiang Li
263170e6SLiang LiPerformance
263170e6SLiang Li===========
263170e6SLiang LiTest environment:
263170e6SLiang Li
263170e6SLiang LiCPU: Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
263170e6SLiang LiSocket Count: 2
263170e6SLiang LiRAM: 128G
263170e6SLiang LiNIC: Intel I350 (10/100/1000Mbps)
263170e6SLiang LiHost OS: CentOS 7 64-bit
263170e6SLiang LiGuest OS: RHEL 6.5 64-bit
a1d30f28SThomas HuthParameter: qemu-system-x86_64 -accel kvm -smp 4 -m 4096
263170e6SLiang Li /share/ia32e_rhel6u5.qcow -monitor stdio
263170e6SLiang Li
263170e6SLiang LiThere is no additional application is running on the guest when doing
263170e6SLiang Lithe test.
263170e6SLiang Li
263170e6SLiang Li
263170e6SLiang LiSpeed limit: 1000Gb/s
263170e6SLiang Li---------------------------------------------------------------
263170e6SLiang Li                    | original  | compress thread: 8
263170e6SLiang Li                    |   way     | decompress thread: 2
263170e6SLiang Li                    |           | compression level: 1
263170e6SLiang Li---------------------------------------------------------------
263170e6SLiang Litotal time(msec):   |   3333    |  1833
263170e6SLiang Li---------------------------------------------------------------
263170e6SLiang Lidowntime(msec):     |    100    |   27
263170e6SLiang Li---------------------------------------------------------------
263170e6SLiang Litransferred ram(kB):|  363536   | 107819
263170e6SLiang Li---------------------------------------------------------------
263170e6SLiang Lithroughput(mbps):   |  893.73   | 482.22
263170e6SLiang Li---------------------------------------------------------------
263170e6SLiang Litotal ram(kB):      |  4211524  | 4211524
263170e6SLiang Li---------------------------------------------------------------
263170e6SLiang Li
263170e6SLiang LiThere is an application running on the guest which write random numbers
263170e6SLiang Lito RAM block areas periodically.
263170e6SLiang Li
263170e6SLiang LiSpeed limit: 1000Gb/s
263170e6SLiang Li---------------------------------------------------------------
263170e6SLiang Li                    | original  | compress thread: 8
263170e6SLiang Li                    |   way     | decompress thread: 2
263170e6SLiang Li                    |           | compression level: 1
263170e6SLiang Li---------------------------------------------------------------
263170e6SLiang Litotal time(msec):   |   37369   | 15989
263170e6SLiang Li---------------------------------------------------------------
263170e6SLiang Lidowntime(msec):     |    337    |  173
263170e6SLiang Li---------------------------------------------------------------
263170e6SLiang Litransferred ram(kB):|  4274143  | 1699824
263170e6SLiang Li---------------------------------------------------------------
263170e6SLiang Lithroughput(mbps):   |  936.99   | 870.95
263170e6SLiang Li---------------------------------------------------------------
263170e6SLiang Litotal ram(kB):      |  4211524  | 4211524
263170e6SLiang Li---------------------------------------------------------------
263170e6SLiang Li
263170e6SLiang LiUsage
263170e6SLiang Li=====
263170e6SLiang Li1. Verify both the source and destination QEMU are able
263170e6SLiang Lito support the multiple thread compression migration:
aa5982e0SWei Jiangang    {qemu} info migrate_capabilities
263170e6SLiang Li    {qemu} ... compress: off ...
263170e6SLiang Li
263170e6SLiang Li2. Activate compression on the source:
263170e6SLiang Li    {qemu} migrate_set_capability compress on
263170e6SLiang Li
263170e6SLiang Li3. Set the compression thread count on source:
b21a6e31SMarkus Armbruster    {qemu} migrate_set_parameter compress-threads 12
263170e6SLiang Li
263170e6SLiang Li4. Set the compression level on the source:
b21a6e31SMarkus Armbruster    {qemu} migrate_set_parameter compress-level 1
263170e6SLiang Li
263170e6SLiang Li5. Set the decompression thread count on destination:
b21a6e31SMarkus Armbruster    {qemu} migrate_set_parameter decompress-threads 3
263170e6SLiang Li
263170e6SLiang Li6. Start outgoing migration:
263170e6SLiang Li    {qemu} migrate -d tcp:destination.host:4444
263170e6SLiang Li    {qemu} info migrate
263170e6SLiang Li    Capabilities: ... compress: on
263170e6SLiang Li    ...
263170e6SLiang Li
263170e6SLiang LiThe following are the default settings:
263170e6SLiang Li    compress: off
b21a6e31SMarkus Armbruster    compress-threads: 8
b21a6e31SMarkus Armbruster    decompress-threads: 2
b21a6e31SMarkus Armbruster    compress-level: 1 (which means best speed)
263170e6SLiang Li
263170e6SLiang LiSo, only the first two steps are required to use the multiple
263170e6SLiang Lithread compression in migration. You can do more if the default
263170e6SLiang Lisettings are not appropriate.
263170e6SLiang Li
263170e6SLiang LiTODO
263170e6SLiang Li====
263170e6SLiang LiSome faster (de)compression method such as LZ4 and Quicklz can help
263170e6SLiang Lito reduce the CPU consumption when doing (de)compression. If using
263170e6SLiang Lithese faster (de)compression method, less (de)compression threads
263170e6SLiang Liare needed when doing the migration.