1Super Carrier 2============= 3 4A super carrier is large memory area, allocated at VM start, which can 5be used during runtime to allocate normal carriers from. 6 7The super carrier feature was introduced in OTP R16B03. It is 8enabled with command line option +MMscs <size in Mb> 9and can be configured with other options. 10 11Problem 12------- 13 14The initial motivation for this feature was customers asking for a way 15to pre-allocate physcial memory at VM start for it to use. 16 17Other problems were different experienced limitations of the OS 18implementation of mmap: 19 20* Increasingly bad performance of mmap/munmap as the number of mmap'ed areas grow. 21* Fragmentation problem between mmap'ed areas. 22 23A third problem was management of low memory in the halfword 24emulator. The implementation used a naive linear search structure to 25hold free segments which would lead to poor performance when 26fragmentation increased. 27 28 29Solution 30-------- 31 32Allocate one large continious area of address space at VM start and 33then use that area to satisfy our dynamic memory need during 34runtime. In other words: implement our own mmap. 35 36### Use cases ### 37 38If command line option +MMscrpm (Reserve Physical Memory) is set to 39false, only virtual space is allocated for the super carrier from 40start. The super carrier then acts as an "alternative mmap" implementation 41without changing the consumption of physical memory pages. Physical 42pages will be reserved on demand when an allocation is done from the super 43carrier and be unreserved when the memory is released back to the 44super carrier. 45 46If +MMscrpm is set to true, which is default, the initial allocation 47will reserve physical memory for the entire super carrier. This can be 48used by users that want to ensure a certain *minimum* amount of 49physical memory for the VM. 50 51However, what reservation of physical memory actually means highly 52depends on the operating system, and how it is configured. For 53example, different memory overcommit settings on Linux drastically 54change the behaviour. 55 56A third feature is to have the super carrier limit the *maximum* 57amount of memory used by the VM. If +MMsco (Super Carrier Only) is set 58to true, which is default, allocations will only be done from the 59super carrier. When the super carrier gets full, the VM will fail due 60to out of memory. 61If +MMsco is false, allocations will use mmap directly if the super 62carrier is full. 63 64 65 66### Implementation ### 67 68The entire super carrier implementation is kept in erl\_mmap.c. The 69name suggest that it can be viewed as our own mmap implementation. 70 71A super carrier needs to satisfy two slightly different kinds of 72allocation requests; multi block carriers (MBC) and single block 73carriers (SBC). They are both rather large blocks of continious 74memory, but MBCs and SBCs have different demands on alignment and 75size. 76 77SBCs can have arbitrary size and do only need minimum 8-byte 78alignment. 79 80MBCs are more restricted. They can only have a number of fixed 81sizes that are powers of 2. The start address need to have a very 82large aligment (currently 256 kb, called "super alignment"). This is a 83design choice that allows very low overhead per allocated block in the 84MBC. 85 86To reduce fragmentation within the super carrier, it is good to keep SBCs 87and MBCs apart. MBCs with their uniform alignment and sizes can be 88packed very efficiently together. SBCs without demand for aligment can 89also be allocated quite efficiently together. But mixing them can lead 90to a lot of memory wasted when we need to create large holes of 91padding to the next alignment limit. 92 93The super carrier thus contains two areas. One area for MBCs growing from 94the bottom and up. And one area for SBCs growing from the top and 95down. Like a process with a heap and a stack growing towards each 96other. 97 98 99### Data structures ### 100 101The MBC area is called *sa* as in super aligned and the SBC area is 102called *sua* as in super un-aligned. 103 104Note that the "super" in super alignment and the "super" in super 105carrier has nothing to do with each other. We could have choosen 106another naming to avoid confusion, such as "meta" carrier or "giant" 107aligment. 108 109 +-------+ <---- sua.top 110 | sua | 111 | | 112 |-------| <---- sua.bot 113 | | 114 | | 115 | | 116 |-------| <---- sa.top 117 | | 118 | sa | 119 | | 120 +-------+ <---- sa.bot 121 122 123When a carrier is deallocated a free memory segment will be created 124inside the corresponding area, unless the carrier was at the very top 125(in `sa`) or bottom (in `sua`) in which case the area will just shrink 126down or up. 127 128We need to keep track of all the free segments in order to reuse them 129for new carrier allocations. One initial idea was to use the same 130mechanism that is used to keep track of free blocks within MBCs 131(alloc\_util and the different strategies). However, that would not be 132as straight forward as one can think and can also waste quite a lot of 133memory as it uses prepended block headers. The granularity of the 134super carrier is one memory page (usually 4kb). We want to allocate 135and free entire pages and we don't want to waste an entire page just 136to hold the block header of the following pages. 137 138Instead we store the meta information about all the free segments in a 139dedicated area apart from the `sa` and `sua` areas. Every free segment is 140represented by a descriptor struct (`ErtsFreeSegDesc`). 141 142 typedef struct { 143 RBTNode snode; /* node in 'stree' */ 144 RBTNode anode; /* node in 'atree' */ 145 char* start; 146 char* end; 147 }ErtsFreeSegDesc; 148 149To find the smallest free segment that will satisfy a carrier allocation 150(best fit), the free segments are organized in a tree sorted by 151size (`stree`). We search in this tree at allocation. If no free segment of 152sufficient size was found, the area (`sa` or `sua`) is instead expanded. 153If two or more free segments with equal size exist, the one at lowest 154address is chosen for `sa` and highest address for `sua`. 155 156At carrier deallocation, we want to coalesce with any adjacent free 157segments, to form one large free segment. To do that, all free 158segments are also organized in a tree sorted in address order (`atree`). 159 160So, in total we keep four trees of free descriptors for the super 161carrier; two for `sa` and two for `sua`. They all use the same 162red-black-tree implementation that support the different sorting 163orders used. 164 165When allocating a new MBC we first search after a free segment in `sa`, 166then try to raise `sa.top`, and then as a fallback try to search after a 167free segment in `sua`. When an MBC is allocated in `sua`, a larger segment 168is allocated which is then trimmed to obtain the right 169alignment. Allocation search for an SBC is done in reverse order. When 170an SBC is allocated in `sa`, the size is aligned up to super aligned 171size. 172 173### The free descriptor area ### 174 175As mentioned above, the descriptors for the free segments are 176allocated in a separate area. This area has a constant configurable 177size (+MMscrfsd) that defaults to 65536 descriptors. This should be 178more than enough in most cases. If the descriptors area should fill up, 179new descriptor areas will be allocated first directly from the OS, and 180then from `sua` and `sa` in the super carrier, and lastly from the memory 181segment itself which is being deallocated. Allocating free descriptor 182areas from the super carrier is only a last resort, and should be 183avoided, as it creates fragmentation. 184 185### Halfword emulator ### 186 187The halfword emulator uses the super carrier implementation to manage 188its low memory mappings thar are needed for all term storage. The 189super carrier can here not be configured by command line options. One 190could imagine a second configurable instance of the super carrier used 191by high memory allocation, but that has not been implemented. 192