1Sparse Arrays
2=============
3
4The `sparse_array.c` file contains an implementation of a sparse array that
5attempts to be both space and time efficient.
6
7The sparse array is represented using a tree structure.  Each node in the
8tree contains a block of pointers to either the user supplied leaf values or
9to another node.
10
11There are a number of parameters used to define the block size:
12
13    OPENSSL_SA_BLOCK_BITS   Specifies the number of bits covered by each block
14    SA_BLOCK_MAX            Specifies the number of pointers in each block
15    SA_BLOCK_MASK           Specifies a bit mask to perform modulo block size
16    SA_BLOCK_MAX_LEVELS     Indicates the maximum possible height of the tree
17
18These constants are inter-related:
19
20    SA_BLOCK_MAX        = 2 ^ OPENSSL_SA_BLOCK_BITS
21    SA_BLOCK_MASK       = SA_BLOCK_MAX - 1
22    SA_BLOCK_MAX_LEVELS = number of bits in size_t divided by
23                          OPENSSL_SA_BLOCK_BITS rounded up to the next multiple
24                          of OPENSSL_SA_BLOCK_BITS
25
26`OPENSSL_SA_BLOCK_BITS` can be defined at compile time and this overrides the
27built in setting.
28
29As a space and performance optimisation, the height of the tree is usually
30less than the maximum possible height.  Only sufficient height is allocated to
31accommodate the largest index added to the data structure.
32
33The largest index used to add a value to the array determines the tree height:
34
35        +----------------------+---------------------+
36        | Largest Added Index  |   Height of Tree    |
37        +----------------------+---------------------+
38        | SA_BLOCK_MAX     - 1 |          1          |
39        | SA_BLOCK_MAX ^ 2 - 1 |          2          |
40        | SA_BLOCK_MAX ^ 3 - 1 |          3          |
41        | ...                  |          ...        |
42        | size_t max           | SA_BLOCK_MAX_LEVELS |
43        +----------------------+---------------------+
44
45The tree height is dynamically increased as needed based on additions.
46
47An empty tree is represented by a NULL root pointer.  Inserting a value at
48index 0 results in the allocation of a top level node full of null pointers
49except for the single pointer to the user's data (N = SA_BLOCK_MAX for
50brevity):
51
52        +----+
53        |Root|
54        |Node|
55        +-+--+
56          |
57          |
58          |
59          v
60        +-+-+---+---+---+---+
61        | 0 | 1 | 2 |...|N-1|
62        |   |nil|nil|...|nil|
63        +-+-+---+---+---+---+
64          |
65          |
66          |
67          v
68        +-+--+
69        |User|
70        |Data|
71        +----+
72    Index 0
73
74Inserting at element 2N+1 creates a new root node and pushes down the old root
75node.  It then creates a second second level node to hold the pointer to the
76user's new data:
77
78        +----+
79        |Root|
80        |Node|
81        +-+--+
82          |
83          |
84          |
85          v
86        +-+-+---+---+---+---+
87        | 0 | 1 | 2 |...|N-1|
88        |   |nil|   |...|nil|
89        +-+-+---+-+-+---+---+
90          |       |
91          |       +------------------+
92          |                          |
93          v                          v
94        +-+-+---+---+---+---+      +-+-+---+---+---+---+
95        | 0 | 1 | 2 |...|N-1|      | 0 | 1 | 2 |...|N-1|
96        |nil|   |nil|...|nil|      |nil|   |nil|...|nil|
97        +-+-+---+---+---+---+      +---+-+-+---+---+---+
98          |                              |
99          |                              |
100          |                              |
101          v                              v
102        +-+--+                         +-+--+
103        |User|                         |User|
104        |Data|                         |Data|
105        +----+                         +----+
106    Index 0                       Index 2N+1
107
108The nodes themselves are allocated in a sparse manner.  Only nodes which exist
109along a path from the root of the tree to an added leaf will be allocated.
110The complexity is hidden and nodes are allocated on an as needed basis.
111Because the data is expected to be sparse this doesn't result in a large waste
112of space.
113
114Values can be removed from the sparse array by setting their index position to
115NULL.  The data structure does not attempt to reclaim nodes or reduce the
116height of the tree on removal.  For example, now setting index 0 to NULL would
117result in:
118
119        +----+
120        |Root|
121        |Node|
122        +-+--+
123          |
124          |
125          |
126          v
127        +-+-+---+---+---+---+
128        | 0 | 1 | 2 |...|N-1|
129        |   |nil|   |...|nil|
130        +-+-+---+-+-+---+---+
131          |       |
132          |       +------------------+
133          |                          |
134          v                          v
135        +-+-+---+---+---+---+      +-+-+---+---+---+---+
136        | 0 | 1 | 2 |...|N-1|      | 0 | 1 | 2 |...|N-1|
137        |nil|nil|nil|...|nil|      |nil|   |nil|...|nil|
138        +---+---+---+---+---+      +---+-+-+---+---+---+
139                                         |
140                                         |
141                                         |
142                                         v
143                                       +-+--+
144                                       |User|
145                                       |Data|
146                                       +----+
147                                  Index 2N+1
148
149Accesses to elements in the sparse array take O(log n) time where n is the
150largest element.  The base of the logarithm is `SA_BLOCK_MAX`, so for moderately
151small indices (e.g. NIDs), single level (constant time) access is achievable.
152Space usage is O(minimum(m, n log(n)) where m is the number of elements in the
153array.
154
155Note: sparse arrays only include pointers to types.
156Thus, `SPARSE_ARRAY_OF(char)` can be used to store a string.
157