xref: /qemu/docs/multi-thread-compression.txt (revision b21a6e31)
1263170e6SLiang LiUse multiple thread (de)compression in live migration
2263170e6SLiang Li=====================================================
3263170e6SLiang LiCopyright (C) 2015 Intel Corporation
4263170e6SLiang LiAuthor: Liang Li <liang.z.li@intel.com>
5263170e6SLiang Li
6263170e6SLiang LiThis work is licensed under the terms of the GNU GPLv2 or later. See
7263170e6SLiang Lithe COPYING file in the top-level directory.
8263170e6SLiang Li
9263170e6SLiang LiContents:
10263170e6SLiang Li=========
11263170e6SLiang Li* Introduction
12263170e6SLiang Li* When to use
13263170e6SLiang Li* Performance
14263170e6SLiang Li* Usage
15263170e6SLiang Li* TODO
16263170e6SLiang Li
17263170e6SLiang LiIntroduction
18263170e6SLiang Li============
19263170e6SLiang LiInstead of sending the guest memory directly, this solution will
20263170e6SLiang Licompress the RAM page before sending; after receiving, the data will
21263170e6SLiang Libe decompressed. Using compression in live migration can help
22263170e6SLiang Lito reduce the data transferred about 60%, this is very useful when the
23263170e6SLiang Libandwidth is limited, and the total migration time can also be reduced
24263170e6SLiang Liabout 70% in a typical case. In addition to this, the VM downtime can be
25263170e6SLiang Lireduced about 50%. The benefit depends on data's compressibility in VM.
26263170e6SLiang Li
27263170e6SLiang LiThe process of compression will consume additional CPU cycles, and the
28263170e6SLiang Liextra CPU cycles will increase the migration time. On the other hand,
29263170e6SLiang Lithe amount of data transferred will decrease; this factor can reduce
30263170e6SLiang Lithe total migration time. If the process of the compression is quick
31263170e6SLiang Lienough, then the total migration time can be reduced, and multiple
32263170e6SLiang Lithread compression can be used to accelerate the compression process.
33263170e6SLiang Li
34263170e6SLiang LiThe decompression speed of Zlib is at least 4 times as quick as
35263170e6SLiang Licompression, if the source and destination CPU have equal speed,
36263170e6SLiang Likeeping the compression thread count 4 times the decompression
37263170e6SLiang Lithread count can avoid resource waste.
38263170e6SLiang Li
39263170e6SLiang LiCompression level can be used to control the compression speed and the
40263170e6SLiang Licompression ratio. High compression ratio will take more time, level 0
41263170e6SLiang Listands for no compression, level 1 stands for the best compression
42263170e6SLiang Lispeed, and level 9 stands for the best compression ratio. Users can
43263170e6SLiang Liselect a level number between 0 and 9.
44263170e6SLiang Li
45263170e6SLiang Li
46263170e6SLiang LiWhen to use the multiple thread compression in live migration
47263170e6SLiang Li=============================================================
48263170e6SLiang LiCompression of data will consume extra CPU cycles; so in a system with
49263170e6SLiang Lihigh overhead of CPU, avoid using this feature. When the network
50263170e6SLiang Libandwidth is very limited and the CPU resource is adequate, use of
51263170e6SLiang Limultiple thread compression will be very helpful. If both the CPU and
52263170e6SLiang Lithe network bandwidth are adequate, use of multiple thread compression
53263170e6SLiang Lican still help to reduce the migration time.
54263170e6SLiang Li
55263170e6SLiang LiPerformance
56263170e6SLiang Li===========
57263170e6SLiang LiTest environment:
58263170e6SLiang Li
59263170e6SLiang LiCPU: Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
60263170e6SLiang LiSocket Count: 2
61263170e6SLiang LiRAM: 128G
62263170e6SLiang LiNIC: Intel I350 (10/100/1000Mbps)
63263170e6SLiang LiHost OS: CentOS 7 64-bit
64263170e6SLiang LiGuest OS: RHEL 6.5 64-bit
65a1d30f28SThomas HuthParameter: qemu-system-x86_64 -accel kvm -smp 4 -m 4096
66263170e6SLiang Li /share/ia32e_rhel6u5.qcow -monitor stdio
67263170e6SLiang Li
68263170e6SLiang LiThere is no additional application is running on the guest when doing
69263170e6SLiang Lithe test.
70263170e6SLiang Li
71263170e6SLiang Li
72263170e6SLiang LiSpeed limit: 1000Gb/s
73263170e6SLiang Li---------------------------------------------------------------
74263170e6SLiang Li                    | original  | compress thread: 8
75263170e6SLiang Li                    |   way     | decompress thread: 2
76263170e6SLiang Li                    |           | compression level: 1
77263170e6SLiang Li---------------------------------------------------------------
78263170e6SLiang Litotal time(msec):   |   3333    |  1833
79263170e6SLiang Li---------------------------------------------------------------
80263170e6SLiang Lidowntime(msec):     |    100    |   27
81263170e6SLiang Li---------------------------------------------------------------
82263170e6SLiang Litransferred ram(kB):|  363536   | 107819
83263170e6SLiang Li---------------------------------------------------------------
84263170e6SLiang Lithroughput(mbps):   |  893.73   | 482.22
85263170e6SLiang Li---------------------------------------------------------------
86263170e6SLiang Litotal ram(kB):      |  4211524  | 4211524
87263170e6SLiang Li---------------------------------------------------------------
88263170e6SLiang Li
89263170e6SLiang LiThere is an application running on the guest which write random numbers
90263170e6SLiang Lito RAM block areas periodically.
91263170e6SLiang Li
92263170e6SLiang LiSpeed limit: 1000Gb/s
93263170e6SLiang Li---------------------------------------------------------------
94263170e6SLiang Li                    | original  | compress thread: 8
95263170e6SLiang Li                    |   way     | decompress thread: 2
96263170e6SLiang Li                    |           | compression level: 1
97263170e6SLiang Li---------------------------------------------------------------
98263170e6SLiang Litotal time(msec):   |   37369   | 15989
99263170e6SLiang Li---------------------------------------------------------------
100263170e6SLiang Lidowntime(msec):     |    337    |  173
101263170e6SLiang Li---------------------------------------------------------------
102263170e6SLiang Litransferred ram(kB):|  4274143  | 1699824
103263170e6SLiang Li---------------------------------------------------------------
104263170e6SLiang Lithroughput(mbps):   |  936.99   | 870.95
105263170e6SLiang Li---------------------------------------------------------------
106263170e6SLiang Litotal ram(kB):      |  4211524  | 4211524
107263170e6SLiang Li---------------------------------------------------------------
108263170e6SLiang Li
109263170e6SLiang LiUsage
110263170e6SLiang Li=====
111263170e6SLiang Li1. Verify both the source and destination QEMU are able
112263170e6SLiang Lito support the multiple thread compression migration:
113aa5982e0SWei Jiangang    {qemu} info migrate_capabilities
114263170e6SLiang Li    {qemu} ... compress: off ...
115263170e6SLiang Li
116263170e6SLiang Li2. Activate compression on the source:
117263170e6SLiang Li    {qemu} migrate_set_capability compress on
118263170e6SLiang Li
119263170e6SLiang Li3. Set the compression thread count on source:
120b21a6e31SMarkus Armbruster    {qemu} migrate_set_parameter compress-threads 12
121263170e6SLiang Li
122263170e6SLiang Li4. Set the compression level on the source:
123b21a6e31SMarkus Armbruster    {qemu} migrate_set_parameter compress-level 1
124263170e6SLiang Li
125263170e6SLiang Li5. Set the decompression thread count on destination:
126b21a6e31SMarkus Armbruster    {qemu} migrate_set_parameter decompress-threads 3
127263170e6SLiang Li
128263170e6SLiang Li6. Start outgoing migration:
129263170e6SLiang Li    {qemu} migrate -d tcp:destination.host:4444
130263170e6SLiang Li    {qemu} info migrate
131263170e6SLiang Li    Capabilities: ... compress: on
132263170e6SLiang Li    ...
133263170e6SLiang Li
134263170e6SLiang LiThe following are the default settings:
135263170e6SLiang Li    compress: off
136b21a6e31SMarkus Armbruster    compress-threads: 8
137b21a6e31SMarkus Armbruster    decompress-threads: 2
138b21a6e31SMarkus Armbruster    compress-level: 1 (which means best speed)
139263170e6SLiang Li
140263170e6SLiang LiSo, only the first two steps are required to use the multiple
141263170e6SLiang Lithread compression in migration. You can do more if the default
142263170e6SLiang Lisettings are not appropriate.
143263170e6SLiang Li
144263170e6SLiang LiTODO
145263170e6SLiang Li====
146263170e6SLiang LiSome faster (de)compression method such as LZ4 and Quicklz can help
147263170e6SLiang Lito reduce the CPU consumption when doing (de)compression. If using
148263170e6SLiang Lithese faster (de)compression method, less (de)compression threads
149263170e6SLiang Liare needed when doing the migration.
150