1.. _rfc-26:
2
3================================================================================
4RFC 26: GDAL Block Cache Improvements
5================================================================================
6
7Authors: Tamas Szekeres, Even Rouault
8
9Contact: szekerest@gmail.com, even.rouault at spatialys.com
10
11Status: Adopted, implemented
12
13Implementation version: GDAL 2.1
14
15Summary and rationale
16---------------------
17
18GDAL maintains an in-memory cache for the raster blocks fetched from the
19drivers and ensures that the second attempt to access the same block
20will be served from the cache instead of the driver. This cache is
21maintained in a per-band fashion and an array is allocated for the
22pointers for each blocks (or sub-blocks). This approach is not
23sufficient with large raster dimensions (or large virtual rasters ie.
24with the WMS/TMS driver), which may cause out of memory errors in
25GDALRasterBand::InitBlockInfo, as raised in #3224
26
27For example, a band of a dataset at level 21 with a GoogleMaps tiling
28requires 2097152x2097152 tiles of 256x256 pixels. This means that GDAL
29will try to allocate an array of 32768x32768 = 1 billion elements (32768
30= 2097152 / 64). The size of this array is 4 GB on a 32-bit build, so it
31cannot be allocated at all. And it is 8 GB on a 64-bit build (even if
32this is generally only virtual memory reservation but not actually
33allocation of physical pages of memory, due to over-commit mechanism of
34the operating system). At dataset closing, this means that those 1
35billion cells will have to be explored to discover remaining cached
36blocks. In reality, all above figures must be multiplied by 3 for a RGB
37(or 4 for a RGBA) dataset.
38
39In the hash set implementation, memory allocation depends directly on
40the number of cached blocks. Typically with the default GDAL_CACHEMAX
41size of 40 MB, only 640 blocks of 256x256 pixels can be simultaneously
42cached (for all datasets).
43
44Main concepts
45-------------
46
47Awareness of thread-safety issues is crucial in the design of block
48caching. In gdalrasterblock.cpp, a static linked list is maintained so
49as to track the access order of the blocks and keep the size of the
50cache within a desired limit by dropping the oldest blocks out of the
51list. This linked list is shared among all the datasets and bands in
52GDAL (protected by a hRBMutex) and a thread on each band, when reading a
53new block, may also trigger a GDALRasterBand::UnreferenceBlock call on
54another band within the scope of this mutex. GDALRasterBand::FlushBlock
55will also access the data structure of the band level cache by removing
56the corresponding tile from the array or the hashtable.
57
58In GDAL 2.0, some issues related to thread-safety (#3225, #3226) have
59been fixed and this RFC still preserves those scenarios as safe.
60
61The changes of this RFC consist in moving away from the GDALRasterBand
62class the logic to access to a cached block, to add or remove it. This
63is done with the new GDALAbstractBandBlockCache class. The current array
64based logic is moved into the new GDALArrayBandBlockCache class, and the
65new hashset based logic in GDALHashsetBandBlockCache.
66
67For the array based implementation, due to the "static" nature of the
68hosting structure (an array), no special care is needed when reading or
69writing a cell from concurrent threads. The only special care that must
70be taken is to prevent a given cell (block) to be accessed concurrently.
71For example we want to avoid TryGetLockedBlockRef() to return a block
72that is being freed by another thread from
73GDALRasterBlock::FlushCacheBlock() or Internalize(). For that, the
74nRefCount member of GDALRasterBlock is now accessed and modified only
75through atomic functions to increase, decrease or compare-and-swap its
76value.
77
78For the hash set based implementation, the base implementation of hash
79set data structure done in in cpl_hash_set.h / cpl_hash_set.cpp is not
80thread safe by default. So GDALHashsetBandBlockCache has a dedicated
81mutex to protect all reads, additions and removals from the hash set. No
82dead-lock with the hRBMutex can occurs since no operations done under
83the hashset mutex involves calling any method from GDALRasterBlock.
84
85We could potentially have reused the hRBMutex to protect the hash set,
86but this would have increased the contention of the hRBMutex
87unnecessarily.
88
89By default, the selection between the array based and the hashtable
90based approaches is based on the following rule: if the dataset has more
91than 1 million blocks, the hashset based implementation is used,
92otherwise the array based implementation is used. The new
93GDAL_OF_ARRAY_BLOCK_ACCESS and GDAL_OF_HASHSET_BLOCK_ACCESS open flags
94can also be passed to GDALOpenEx() to override this choice. The
95GDAL_BAND_BLOCK_CACHE configuration option can also be set to ARRAY or
96HASHSET.
97
98The hashset based implementation could potentially be the default
99implementation in all cases (performance comparisons done with the
100autotest/cpp/testblockcache utility with 4 or 8 cores show non
101measurable differences), but in theory the array based implementation
102offers less contention of the hRBMutex, so should be more scalable when
103using lots of cores. And as work has been done during GDAL 2.0 to
104improve the scalability, it might be prudent for now to remain on the
105array based implementation on rasters of modest size.
106
107Not completely linked with this RFC, a few changes have been done to
108limit the number of allocation/deallocation of objects (GDALRasterBlock
109instances, as well as an internal element of CPLHashSet), which has an
110effect on scalability since memory allocation routines involve
111synchronization between threads.
112
113Implementation
114--------------
115
116To implement the addition the following changes is made in the GDAL
117codebase:
118
119-  port/cpl_hash_set.cpp / port/cpl_hash_set.h: CPLHashSetClear()
120   function added to remove all the elements in one operation.
121
122-  port/cpl_hash_set.cpp / port/cpl_hash_set.h:
123   CPLHashSetRemoveDeferRehash() function added to remove one element
124   quickly. That is to say the potential resizing of the array used
125   internally is deferred to a later operation
126
127-  port/cpl_hash_set.cpp / port/cpl_hash_set.h: improvements to
128   "recycle" links from the linked lists and avoid useless
129   malloc()/free().
130
131-  port/cpl_atomic_ops.cpp: addition of CPLAtomicCompareAndExchange()
132
133-  gcore/gdal.h: additions of GDAL_OF_DEFAULT_BLOCK_ACCESS,
134   GDAL_OF_ARRAY_BLOCK_ACCESS and GDAL_OF_HASHSET_BLOCK_ACCESS values.
135
136-  gcore/gdal_priv.h: definition of GDALAbstractBandBlockCache class,
137   and GDALArrayBandBlockCacheCreate() and
138   GDALHashSetBandBlockCacheCreate() functions. Modifications of
139   GDALRasterBand, GDALDataset and GDALRasterBlock definitions.
140
141-  gcore/gdalrasterband.cpp: InitBlockInfo() instantiates the
142   appropriate band block cache implementation.
143
144-  gcore/gdalrasterband.cpp: the AdoptBlock(), UnreferenceBlock(),
145   FlushBlock() and TryGetLockedBlockRef() methods delegate to the
146   actual band block cache implementation.
147
148-  gcore/gdalrasterband.cpp: AddBlockToFreeList() is added and delegate
149   to GDALAbstractBandBlockCache
150
151-  gcore/gdalrasterblock.cpp: SafeLockBlock() is replaced by TakeLock()
152
153-  gcore/gdalrasterblock.cpp: RecycleFor() method added to recycle an
154   existing block object to save a few new/delete calls (used by
155   GDALAbstractBandBlockCache::CreateBlock())
156
157-  gcore/gdalrasterblock.cpp: Internalize() or FlushCacheBlock() no
158   longer directly free a block (they still free or recycle its pData
159   member), but provide it to GDALRasterBand::AddBlockToFreeList() for
160   layer reuse.
161
162-  gcore/gdalrasterblock.cpp: DropLockForRemovalFromStorage() is added
163   to avoid racing destruction of blocks between
164   GDALRasterBand::FlushCache() or FlushBlock() with
165   GDALRasterBlock::Internalize() or FlushCacheBlock().
166
167-  gcore/gdalabstractbandblockcache.cpp: added. Contains logic to keep
168   instantiated GDALRasterBlock that were discarded by the global block
169   manager for their later reuse. Saves a few new/delete calls.
170
171-  gcore/gdalarraybandblockcache.cpp: the GDALArrayBandBlockCache class
172   implementation with mostly the existing code
173
174-  gcore/gdalhashsetbandblockcache.cpp: the new
175   GDALHashsetBandBlockCache class implementation
176
177Backward Compatibility
178----------------------
179
180This implementation retains the backward compatibility with the existing
181API. The C++ ABI of GDALRasterBand, GDALDataset and GDALRasterBlock is
182modified.
183
184Performance impacts
185-------------------
186
187The array based implementation after this RFC should still show the same
188performance than the current implementation (potentially very slightly
189improved with the recycling of blocks). Confirmed by tests with
190autotest/cpp/testblockcache.
191
192Documentation
193-------------
194
195This change doesn't affect the existing user documentation.
196
197Testing
198-------
199
200The autotest/cpp/testblockcache utility is now run by the "quick_test"
201target of autotest/cpp/Makefile with GDAL_BAND_BLOCK_CACHE=HASHSET in
202additions to the array based implementation.
203
204A new autotest/cpp/testblockcachelimits utility has been developed to
205test a few racing situations. As races are hard to trigger, the code of
206GDALRasterBlock has been instrumented to allow sleeping in particular
207places, enabling races to be reliably simulated.
208
209Implementation
210--------------
211
212Tamas Szekeres had provided an initial version of this RFC. It has been
213restructured and ported on GDAL 2.0 by Even Rouault (sponsored by `LINZ
214(Land Information New Zealand) <http://www.linz.govt.nz/>`__)
215
216References
217----------
218
219The proposed implementation lies in the "rfc26_bandblockcache" branch of
220the
221`https://github.com/rouault/gdal2/tree/rfc26_bandblockcache <https://github.com/rouault/gdal2/tree/rfc26_bandblockcache>`__
222repository.
223
224The list of changes:
225`https://github.com/rouault/gdal2/compare/rfc26_bandblockcache <https://github.com/rouault/gdal2/compare/rfc26_bandblockcache>`__
226
227Related bugs: #3264, #3224.
228
229Voting History
230--------------
231
232+1 from EvenR, DanielM, TamasS. +0 from JukkaR
233