1Super Carrier
2=============
3
4A super carrier is large memory area, allocated at VM start, which can
5be used during runtime to allocate normal carriers from.
6
7The super carrier feature was introduced in OTP R16B03. It is
8enabled with command line option +MMscs <size in Mb>
9and can be configured with other options.
10
11Problem
12-------
13
14The initial motivation for this feature was customers asking for a way
15to pre-allocate physcial memory at VM start for it to use.
16
17Other problems were different experienced limitations of the OS
18implementation of mmap:
19
20* Increasingly bad performance of mmap/munmap as the number of mmap'ed areas grow.
21* Fragmentation problem between mmap'ed areas.
22
23A third problem was management of low memory in the halfword
24emulator. The implementation used a naive linear search structure to
25hold free segments which would lead to poor performance when
26fragmentation increased.
27
28
29Solution
30--------
31
32Allocate one large continious area of address space at VM start and
33then use that area to satisfy our dynamic memory need during
34runtime. In other words: implement our own mmap.
35
36### Use cases ###
37
38If command line option +MMscrpm (Reserve Physical Memory) is set to
39false, only virtual space is allocated for the super carrier from
40start. The super carrier then acts as an "alternative mmap" implementation
41without changing the consumption of physical memory pages. Physical
42pages will be reserved on demand when an allocation is done from the super
43carrier and be unreserved when the memory is released back to the
44super carrier.
45
46If +MMscrpm is set to true, which is default, the initial allocation
47will reserve physical memory for the entire super carrier. This can be
48used by users that want to ensure a certain *minimum* amount of
49physical memory for the VM.
50
51However, what reservation of physical memory actually means highly
52depends on the operating system, and how it is configured. For
53example, different memory overcommit settings on Linux drastically
54change the behaviour.
55
56A third feature is to have the super carrier limit the *maximum*
57amount of memory used by the VM. If +MMsco (Super Carrier Only) is set
58to true, which is default, allocations will only be done from the
59super carrier. When the super carrier gets full, the VM will fail due
60to out of memory.
61If +MMsco is false, allocations will use mmap directly if the super
62carrier is full.
63
64
65
66### Implementation ###
67
68The entire super carrier implementation is kept in erl\_mmap.c. The
69name suggest that it can be viewed as our own mmap implementation.
70
71A super carrier needs to satisfy two slightly different kinds of
72allocation requests; multi block carriers (MBC) and single block
73carriers (SBC). They are both rather large blocks of continious
74memory, but MBCs and SBCs have different demands on alignment and
75size.
76
77SBCs can have arbitrary size and do only need minimum 8-byte
78alignment.
79
80MBCs are more restricted. They can only have a number of fixed
81sizes that are powers of 2. The start address need to have a very
82large aligment (currently 256 kb, called "super alignment"). This is a
83design choice that allows very low overhead per allocated block in the
84MBC.
85
86To reduce fragmentation within the super carrier, it is good to keep SBCs
87and MBCs apart. MBCs with their uniform alignment and sizes can be
88packed very efficiently together. SBCs without demand for aligment can
89also be allocated quite efficiently together. But mixing them can lead
90to a lot of memory wasted when we need to create large holes of
91padding to the next alignment limit.
92
93The super carrier thus contains two areas. One area for MBCs growing from
94the bottom and up. And one area for SBCs growing from the top and
95down. Like a process with a heap and a stack growing towards each
96other.
97
98
99### Data structures ###
100
101The MBC area is called *sa* as in super aligned and the SBC area is
102called *sua* as in super un-aligned.
103
104Note that the "super" in super alignment and the "super" in super
105carrier has nothing to do with each other. We could have choosen
106another naming to avoid confusion, such as "meta" carrier or "giant"
107aligment.
108
109	+-------+ <---- sua.top
110	|  sua  |
111	|       |
112	|-------| <---- sua.bot
113	|       |
114	|       |
115	|       |
116	|-------| <---- sa.top
117	|       |
118	|  sa   |
119	|       |
120	+-------+ <---- sa.bot
121
122
123When a carrier is deallocated a free memory segment will be created
124inside the corresponding area, unless the carrier was at the very top
125(in `sa`) or bottom (in `sua`) in which case the area will just shrink
126down or up.
127
128We need to keep track of all the free segments in order to reuse them
129for new carrier allocations. One initial idea was to use the same
130mechanism that is used to keep track of free blocks within MBCs
131(alloc\_util and the different strategies). However, that would not be
132as straight forward as one can think and can also waste quite a lot of
133memory as it uses prepended block headers. The granularity of the
134super carrier is one memory page (usually 4kb). We want to allocate
135and free entire pages and we don't want to waste an entire page just
136to hold the block header of the following pages.
137
138Instead we store the meta information about all the free segments in a
139dedicated area apart from the `sa` and `sua` areas. Every free segment is
140represented by a descriptor struct (`ErtsFreeSegDesc`).
141
142    typedef struct {
143        RBTNode snode;      /* node in 'stree' */
144        RBTNode anode;      /* node in 'atree' */
145        char* start;
146        char* end;
147    }ErtsFreeSegDesc;
148
149To find the smallest free segment that will satisfy a carrier allocation
150(best fit), the free segments are organized in a tree sorted by
151size (`stree`). We search in this tree at allocation. If no free segment of
152sufficient size was found, the area (`sa` or `sua`) is instead expanded.
153If two or more free segments with equal size exist, the one at lowest
154address is chosen for `sa` and highest address for `sua`.
155
156At carrier deallocation, we want to coalesce with any adjacent free
157segments, to form one large free segment. To do that, all free
158segments are also organized in a tree sorted in address order (`atree`).
159
160So, in total we keep four trees of free descriptors for the super
161carrier; two for `sa` and two for `sua`. They all use the same
162red-black-tree implementation that support the different sorting
163orders used.
164
165When allocating a new MBC we first search after a free segment in `sa`,
166then try to raise `sa.top`, and then as a fallback try to search after a
167free segment in `sua`. When an MBC is allocated in `sua`, a larger segment
168is allocated which is then trimmed to obtain the right
169alignment. Allocation search for an SBC is done in reverse order. When
170an SBC is allocated in `sa`, the size is aligned up to super aligned
171size.
172
173### The free descriptor area ###
174
175As mentioned above, the descriptors for the free segments are
176allocated in a separate area. This area has a constant configurable
177size (+MMscrfsd) that defaults to 65536 descriptors. This should be
178more than enough in most cases. If the descriptors area should fill up,
179new descriptor areas will be allocated first directly from the OS, and
180then from `sua` and `sa` in the super carrier, and lastly from the memory
181segment itself which is being deallocated. Allocating free descriptor
182areas from the super carrier is only a last resort, and should be
183avoided, as it creates fragmentation.
184
185### Halfword emulator ###
186
187The halfword emulator uses the super carrier implementation to manage
188its low memory mappings thar are needed for all term storage. The
189super carrier can here not be configured by command line options. One
190could imagine a second configurable instance of the super carrier used
191by high memory allocation, but that has not been implemented.
192