134c26412SOrit WassermanXBZRLE (Xor Based Zero Run Length Encoding) 234c26412SOrit Wasserman=========================================== 334c26412SOrit Wasserman 434c26412SOrit WassermanUsing XBZRLE (Xor Based Zero Run Length Encoding) allows for the reduction 534c26412SOrit Wassermanof VM downtime and the total live-migration time of Virtual machines. 634c26412SOrit WassermanIt is particularly useful for virtual machines running memory write intensive 734c26412SOrit Wassermanworkloads that are typical of large enterprise applications such as SAP ERP 834c26412SOrit WassermanSystems, and generally speaking for any application that uses a sparse memory 934c26412SOrit Wassermanupdate pattern. 1034c26412SOrit Wasserman 1134c26412SOrit WassermanInstead of sending the changed guest memory page this solution will send a 1234c26412SOrit Wassermancompressed version of the updates, thus reducing the amount of data sent during 1334c26412SOrit Wassermanlive migration. 1434c26412SOrit WassermanIn order to be able to calculate the update, the previous memory pages need to 1534c26412SOrit Wassermanbe stored on the source. Those pages are stored in a dedicated cache 1634c26412SOrit Wasserman(hash table) and are accessed by their address. 1734c26412SOrit WassermanThe larger the cache size the better the chances are that the page has already 1834c26412SOrit Wassermanbeen stored in the cache. 1934c26412SOrit WassermanA small cache size will result in high cache miss rate. 2034c26412SOrit WassermanCache size can be changed before and during migration. 2134c26412SOrit Wasserman 2234c26412SOrit WassermanFormat 2334c26412SOrit Wasserman======= 2434c26412SOrit Wasserman 2534c26412SOrit WassermanThe compression format performs a XOR between the previous and current content 2634c26412SOrit Wassermanof the page, where zero represents an unchanged value. 2734c26412SOrit WassermanThe page data delta is represented by zero and non zero runs. 2834c26412SOrit WassermanA zero run is represented by its length (in bytes). 2934c26412SOrit WassermanA non zero run is represented by its length (in bytes) and the new data. 3034c26412SOrit WassermanThe run length is encoded using ULEB128 (http://en.wikipedia.org/wiki/LEB128) 3134c26412SOrit Wasserman 3234c26412SOrit WassermanThere can be more than one valid encoding, the sender may send a longer encoding 3334c26412SOrit Wassermanfor the benefit of reducing computation cost. 3434c26412SOrit Wasserman 3534c26412SOrit Wassermanpage = zrun nzrun 3634c26412SOrit Wasserman | zrun nzrun page 3734c26412SOrit Wasserman 3834c26412SOrit Wassermanzrun = length 3934c26412SOrit Wasserman 4034c26412SOrit Wassermannzrun = length byte... 4134c26412SOrit Wasserman 4234c26412SOrit Wassermanlength = uleb128 encoded integer 4334c26412SOrit Wasserman 4434c26412SOrit WassermanOn the sender side XBZRLE is used as a compact delta encoding of page updates, 457c2b0f65SCao jinretrieving the old page content from the cache (default size of 64MB). The 4634c26412SOrit Wassermanreceiving side uses the existing page's content and XBZRLE to decode the new 4734c26412SOrit Wassermanpage's content. 4834c26412SOrit Wasserman 4934c26412SOrit WassermanThis work was originally based on research results published 5034c26412SOrit WassermanVEE 2011: Evaluation of Delta Compression Techniques for Efficient Live 5134c26412SOrit WassermanMigration of Large Virtual Machines by Benoit, Svard, Tordsson and Elmroth. 5234c26412SOrit WassermanAdditionally the delta encoder XBRLE was improved further using the XBZRLE 5334c26412SOrit Wassermaninstead. 5434c26412SOrit Wasserman 5534c26412SOrit WassermanXBZRLE has a sustained bandwidth of 2-2.5 GB/s for typical workloads making it 5634c26412SOrit Wassermanideal for in-line, real-time encoding such as is needed for live-migration. 5734c26412SOrit Wasserman 5834c26412SOrit WassermanExample 5934c26412SOrit Wassermanold buffer: 6034c26412SOrit Wasserman1001 zeros 6134c26412SOrit Wasserman05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 68 00 00 6b 00 6d 6234c26412SOrit Wasserman3074 zeros 6334c26412SOrit Wasserman 6434c26412SOrit Wassermannew buffer: 6534c26412SOrit Wasserman1001 zeros 6634c26412SOrit Wasserman01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 68 00 00 67 00 69 6734c26412SOrit Wasserman3074 zeros 6834c26412SOrit Wasserman 6934c26412SOrit Wassermanencoded buffer: 7034c26412SOrit Wasserman 7134c26412SOrit Wassermanencoded length 24 7234c26412SOrit Wassermane9 07 0f 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 03 01 67 01 01 69 7334c26412SOrit Wasserman 7427af7d6eSChenLiangCache update strategy 7527af7d6eSChenLiang===================== 767c2b0f65SCao jinKeeping the hot pages in the cache is effective for decreasing cache 7727af7d6eSChenLiangmisses. XBZRLE uses a counter as the age of each page. The counter will 7827af7d6eSChenLiangincrease after each ram dirty bitmap sync. When a cache conflict is 7927af7d6eSChenLiangdetected, XBZRLE will only evict pages in the cache that are older than 8027af7d6eSChenLianga threshold. 8127af7d6eSChenLiang 8234c26412SOrit WassermanUsage 8334c26412SOrit Wasserman====================== 8434c26412SOrit Wasserman1. Verify the destination QEMU version is able to decode the new format. 8534c26412SOrit Wasserman {qemu} info migrate_capabilities 8634c26412SOrit Wasserman {qemu} xbzrle: off , ... 8734c26412SOrit Wasserman 8834c26412SOrit Wasserman2. Activate xbzrle on both source and destination: 8934c26412SOrit Wasserman {qemu} migrate_set_capability xbzrle on 9034c26412SOrit Wasserman 9134c26412SOrit Wasserman3. Set the XBZRLE cache size - the cache size is in MBytes and should be a 9234c26412SOrit Wassermanpower of 2. The cache default value is 64MBytes. (on source only) 9306b1c6f8SMao Zhongyi {qemu} migrate_set_parameter xbzrle-cache-size 256m 9406b1c6f8SMao Zhongyi 9534c26412SOrit Wasserman4. Start outgoing migration 9634c26412SOrit Wasserman {qemu} migrate -d tcp:destination.host:4444 9734c26412SOrit Wasserman {qemu} info migrate 9834c26412SOrit Wasserman capabilities: xbzrle: on 9934c26412SOrit Wasserman Migration status: active 10034c26412SOrit Wasserman transferred ram: A kbytes 10134c26412SOrit Wasserman remaining ram: B kbytes 10234c26412SOrit Wasserman total ram: C kbytes 10334c26412SOrit Wasserman total time: D milliseconds 10434c26412SOrit Wasserman duplicate: E pages 10534c26412SOrit Wasserman normal: F pages 10634c26412SOrit Wasserman normal bytes: G kbytes 10734c26412SOrit Wasserman cache size: H bytes 10834c26412SOrit Wasserman xbzrle transferred: I kbytes 10934c26412SOrit Wasserman xbzrle pages: J pages 110afb5d01cSMao Zhongyi xbzrle cache miss: K pages 111*6bcd361aSMao Zhongyi xbzrle cache miss rate: L 112*6bcd361aSMao Zhongyi xbzrle encoding rate: M 113*6bcd361aSMao Zhongyi xbzrle overflow: N 11434c26412SOrit Wasserman 115*6bcd361aSMao Zhongyixbzrle cache miss: the number of cache misses to date - high cache-miss rate 11634c26412SOrit Wassermanindicates that the cache size is set too low. 11734c26412SOrit Wassermanxbzrle overflow: the number of overflows in the decoding which where the delta 11834c26412SOrit Wassermancould not be compressed. This can happen if the changes in the pages are too 11934c26412SOrit Wassermanlarge or there are many short changes; for example, changing every second byte 12034c26412SOrit Wasserman(half a page). 12134c26412SOrit Wasserman 12234c26412SOrit WassermanTesting: Testing indicated that live migration with XBZRLE was completed in 110 12334c26412SOrit Wassermanseconds, whereas without it would not be able to complete. 12434c26412SOrit Wasserman 12534c26412SOrit WassermanA simple synthetic memory r/w load generator: 12634c26412SOrit Wasserman.. include <stdlib.h> 12734c26412SOrit Wasserman.. include <stdio.h> 12834c26412SOrit Wasserman.. int main() 12934c26412SOrit Wasserman.. { 13034c26412SOrit Wasserman.. char *buf = (char *) calloc(4096, 4096); 13134c26412SOrit Wasserman.. while (1) { 13234c26412SOrit Wasserman.. int i; 13334c26412SOrit Wasserman.. for (i = 0; i < 4096 * 4; i++) { 13434c26412SOrit Wasserman.. buf[i * 4096 / 4]++; 13534c26412SOrit Wasserman.. } 13634c26412SOrit Wasserman.. printf("."); 13734c26412SOrit Wasserman.. } 13834c26412SOrit Wasserman.. } 139