1## Custom memory allocator
2
3LibSass comes with a custom memory allocator to improve performance.
4First included in LibSass 3.6 and currently disabled by default.
5Needs to be enabled by defining `SASS_CUSTOM_ALLOCATOR`.
6
7### Overview
8
9The allocator is a pool/arena allocator with a free-list on top. The
10memory usage pattern of LibSass fits this implementation very well.
11Every compilation tends to accumulate memory and only releasing some
12items from time to time, but the overall memory consumption will mostly
13grow until the compilation is finished. This helps us to keep the
14implementation as simple as possible, since we don't need to release
15much memory back to the system and can re-use it instead.
16
17### Arenas
18
19Each arena is allocated in a (compile time) fixed size. Every allocation
20request is served from the current arena. We basically slice up the
21arena into different sized chunks. Arenas are never returned to the
22system until the whole compilation is finished.
23
24### Slices
25
26A memory slice is a part of an arena. Once the system requests a sized
27memory chunk we check the current arena if there is enough space to
28hold it. If not a new arena is allocated. Then we return a pointer
29into that arena and mark the space as being used. Each slice also
30has a header which is invisible to the requester as it lies before
31the pointer address that we returned. This is used for book-keeping.
32
33### Free-lists (or buckets)
34
35Once a memory slice is returned to the allocator it will not be released.
36It will instead be put on the free list. We keep a fixed number of free lists,
37one for every possible chunk size. Since chunk sizes are memory aligned, we
38can get the free-list index (aka `bucket`) very quickly (`size/alignment`).
39For further readings see https://en.wikipedia.org/wiki/Free_list.
40
41### Chunk-sizes
42
43Since arenas are of fixed size we need to make sure that only small
44enough chunks get served from it. This also helps to keep implementation
45simple, as we can statically declare some structures for book-keeping.
46Allocations that are too big to be tracked on a free-list will be patched
47directly to malloc and free. This is the case when the bucket index would
48be bigger than `SassAllocatorBuckets`.
49
50### Thread-safety
51
52This implementation is not thread-safe by design. Making it thread-safe
53would certainly be possible, but it would come at a (performance) price.
54Also it is not needed given the memory usage pattern of LibSass. Instead
55we should make sure that memory pools are local to each thread.
56
57### Implementation obstacles
58
59Since memory allocation is a core part of C++ itself, we get into various
60difficult territories. This has specially proven true in regard of static
61variable initialization and destruction order. E.g when we have a static
62string with custom allocator. It might be that it is initialized before
63the thread local memory pool. On the other hand it's also possible that
64the memory pool is destroyed before another static string wants to give
65back its memory to the pool. I tried hard to work around those issues.
66Mainly by only using thead local POD (plain old data) objects.
67
68See https://isocpp.org/wiki/faq/ctors#static-init-order
69
70### Performance gains
71
72My tests indicate that the custom allocator brings around 15% performance
73enhancement for complex cases (I used the bolt-bench to measure it). Once
74more get optimized, the custom allocator can bring up to 30% improvement.
75This comes at a small cost of a few percent of overall memory usage. This
76can be tweaked, but the sweet spot for now seems to be:
77
78```c
79// How many buckets should we have for the free-list
80// Determines when allocations go directly to malloc/free
81// For maximum size of managed items multiply by alignment
82#define SassAllocatorBuckets 512
83
84// The size of the memory pool arenas in bytes.
85#define SassAllocatorArenaSize (1024 * 256)
86```
87
88These can be found in `settings.hpp`.
89
90### Memory overhead
91
92Both settings `SassAllocatorBuckets` and `SassAllocatorArenaSize` need
93to be set in relation to each other. Assuming the memory alignment on
94the platform is 8 bytes, the maximum chunk size that can be handled
95is 4KB (512*8B). If the arena size is too close to this value, you
96may leave a lot of RAM unused. Once an arena can't fullfil the current
97request, it is put away and a new one is allocated. We don't keep track
98of unused space in previous arenas, as it bloats the code and costs
99precious cpu time. By setting the values carefully we can avoid the cost
100and still provide reasonable memory overhead. In the worst scenario we
101loose around 1.5% for the default settings (4K of 256K).
102
103### Further improvements
104
105It is worth to check if we can re-use the space of old arenas somehow without
106scarifying to much performance. Additionally we could check free-lists of
107bigger chunks sizes to satisfy an allocation request. But both would need
108to be checked for performance impact and their actual gain.