1.. _rfc-45: 2 3======================================================================================= 4RFC 45: GDAL datasets and raster bands as virtual memory mappings 5======================================================================================= 6 7Authors: Even Rouault 8 9Contact: even dot rouault at spatialys.com 10 11Status: Adopted, implemented 12 13Summary 14------- 15 16This document proposes additions to GDAL so that image data of GDAL 17datasets and raster bands can be seen as virtual memory mappings, for 18hopefully simpler usage. 19 20Rationale 21--------- 22 23When one wants to read or write image data from/into a GDAL dataset or 24raster band, one must use the RasterIO() interface for the regions of 25interest that are read or written. For small images, the most convenient 26solution is usually to read/write the whole image in a single request 27where the region of interest is the full raster extent. For larger 28images, particularly when they do not fit entirely in RAM, this is not 29possible, and if one wants to operate on the whole image, one must use a 30windowing strategy to avoid memory issues : typically by proceeding 31scanline (or group of scanlines) by scanline, or by blocks for tiled 32images. This can make the writing of algorithms more complicated when 33they need to access a neighbourhood of pixels around each pixel of 34interest, since the size of this extra window must be taken into 35account, leading to overlapping regions of interests. Nothing that 36cannot be solved, but that requires some additional thinking that 37distracts from the followed main purpose. 38 39The proposed addition of this RFC is to make the image data appear as a 40single array accessed with a pointer, without being limited by the size 41of RAM with respect to the size of the dataset (excepted limitations 42imposed by the CPU architecture and the operating system) 43 44Technical solution 45~~~~~~~~~~~~~~~~~~ 46 47Low-level machinery : cpl_virtualmem.h 48^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 49 50The low-level machinery to support this new capability is a 51CPLVirtualMem object that represents an area of virtual memory ( on 52Linux, an area of virtual memory allocated by the mmap() function ). 53This virtual memory area is initially just reserved in terms of virtual 54memory space, but has no actual allocation in physical memory. This 55reserved virtual memory space is protected with an access permission 56that cause any attempt to access it to result in an exception - a page 57fault, that on POSIX systems triggers a SIGSEGV signal (segmentation 58fault). Fortunately, segmentation faults can be caught by the software 59with a signal handler. When such a segmentation fault occurs, our 60specialized signal handler will check if it occurs in a virtual memory 61region under its responsibility and, if so, it will proceed to fill the 62part (a "page") of the virtual memory area that has been accessed with 63sensible values (thanks to a user provided callback). It will then set 64appropriate permissions to the page (read-only or read-write), before 65attempting again the instruction that triggered the segmentation fault. 66From the point of view of the user code that accesses the memory 67mapping, this is completely transparent, and this is equivalent as if 68the whole virtual memory area had been filled from the start. 69 70For very large mappings that are larger than RAM, this would still cause 71disk swapping to occur at a certain point. To avoid that, the 72segmentation fault handler will evict the least recently used pages, 73once a threshold defined at the creation of the CPLVirtualMem object has 74been reached. 75 76For write support, another callback can be passed. It will be called 77before a page is evicted so that user code has a chance to flush its 78content to a more persistent storage. 79 80We also offer an alternative way of creating a CPLVirtualMem object, by 81using memory file mapping mechanisms. This may be used by "raw" datasets 82(EHdr driver for example) where the organization of data on disk 83directly matches the organization of a in-memory array. 84 85High-level usage 86^^^^^^^^^^^^^^^^ 87 88Four new API are introduced (detailed in further section): 89 90- GDALDatasetGetVirtualMem() : takes almost the same arguments as 91 GDALDatasetRasterIO(), with the notable exception of a pData buffer. 92 It returns a CPLVirtualMem\* object, from which the base address of 93 the virtual memory mapping can be obtained with 94 CPLVirtualMemGetAddr(). 95 96.. image:: ../../../images/rfc45/rfc_2d_array.png 97 98- GDALRasterBandGetVirtualMem(): equivalent of 99 GDALDatasetGetVirtualMem() that operates on a raster band object 100 rather than a dataset object. 101 102- GDALDatasetGetTiledVirtualMem(): this is a rather original API. 103 Instead of presenting a 2D view of the image data (i.e. organized 104 rows by rows), the mapping exposes it as an array of tiles, which is 105 more suitable, performance wise, when the dataset is itself tiled. 106 107.. image:: ../../../images/rfc45/rfc_tiled.png 108 109When they are several bands, 3 different organizations of band 110components are possible. To the best of our knowledge, there is no 111standard way of calling those organizations, which consequently will be 112best illustrated by the following schemas : 113 114- TIP / Tile Interleaved by Pixel 115 116.. image:: ../../../images/rfc45/rfc_TIP.png 117 :alt: TIP / Tile Interleaved by Pixel 118 119- BIT / Band Interleaved by Tile 120 121.. image:: ../../../images/rfc45/rfc_BIT.png 122 :alt: BIT / Band Interleaved by Tile 123 124- BSQ / Band SeQuential organization 125 126.. image:: ../../../images/rfc45/rfc_BSQ.png 127 :alt: BSQ / Band SeQuential organization 128 129- GDALRasterBandGetTiledVirtualMem(): equivalent of 130 GDALDatasetGetTiledVirtualMem() that operates on a raster band object 131 rather than a dataset object. 132 133- GDALGetVirtualMemAuto(): simplified version of 134 GDALRasterBandGetVirtualMem() where the user only specifies the 135 access mode. The pixel spacing and line spacing are returned by the 136 function. This is implemented as a virtual method at the 137 GDALRasterBand level, so that drivers have a chance of overriding the 138 base implementation. The base implementation just uses 139 GDALRasterBandGetVirtualMem(). Overridden implementation may use the 140 memory file mapping mechanism instead. Such implementations will be 141 done in the RawRasterBand object and in the GeoTIFF driver. 142 143Details of new API 144------------------ 145 146.. _implemented-by-cpl_virtualmemcpp: 147 148Implemented by cpl_virtualmem.cpp 149~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 150 151:: 152 153 /** 154 * \file cpl_virtualmem.h 155 * 156 * Virtual memory management. 157 * 158 * This file provides mechanism to define virtual memory mappings, whose content 159 * is allocated transparently and filled on-the-fly. Those virtual memory mappings 160 * can be much larger than the available RAM, but only parts of the virtual 161 * memory mapping, in the limit of the allowed the cache size, will actually be 162 * physically allocated. 163 * 164 * This exploits low-level mechanisms of the operating system (virtual memory 165 * allocation, page protection and handler of virtual memory exceptions). 166 * 167 * It is also possible to create a virtual memory mapping from a file or part 168 * of a file. 169 * 170 * The current implementation is Linux only. 171 */ 172 173 /** Opaque type that represents a virtual memory mapping. */ 174 typedef struct CPLVirtualMem CPLVirtualMem; 175 176 /** Callback triggered when a still unmapped page of virtual memory is accessed. 177 * The callback has the responsibility of filling the page with relevant values 178 * 179 * @param ctxt virtual memory handle. 180 * @param nOffset offset of the page in the memory mapping. 181 * @param pPageToFill address of the page to fill. Note that the address might 182 * be a temporary location, and not at CPLVirtualMemGetAddr() + nOffset. 183 * @param nToFill number of bytes of the page. 184 * @param pUserData user data that was passed to CPLVirtualMemNew(). 185 */ 186 typedef void (*CPLVirtualMemCachePageCbk)(CPLVirtualMem* ctxt, 187 size_t nOffset, 188 void* pPageToFill, 189 size_t nToFill, 190 void* pUserData); 191 192 /** Callback triggered when a dirty mapped page is going to be freed. 193 * (saturation of cache, or termination of the virtual memory mapping). 194 * 195 * @param ctxt virtual memory handle. 196 * @param nOffset offset of the page in the memory mapping. 197 * @param pPageToBeEvicted address of the page that will be flushed. Note that the address might 198 * be a temporary location, and not at CPLVirtualMemGetAddr() + nOffset. 199 * @param nToBeEvicted number of bytes of the page. 200 * @param pUserData user data that was passed to CPLVirtualMemNew(). 201 */ 202 typedef void (*CPLVirtualMemUnCachePageCbk)(CPLVirtualMem* ctxt, 203 size_t nOffset, 204 const void* pPageToBeEvicted, 205 size_t nToBeEvicted, 206 void* pUserData); 207 208 /** Callback triggered when a virtual memory mapping is destroyed. 209 * @param pUserData user data that was passed to CPLVirtualMemNew(). 210 */ 211 typedef void (*CPLVirtualMemFreeUserData)(void* pUserData); 212 213 /** Access mode of a virtual memory mapping. */ 214 typedef enum 215 { 216 /*! The mapping is meant at being read-only, but writes will not be prevented. 217 Note that any content written will be lost. */ 218 VIRTUALMEM_READONLY, 219 /*! The mapping is meant at being read-only, and this will be enforced 220 through the operating system page protection mechanism. */ 221 VIRTUALMEM_READONLY_ENFORCED, 222 /*! The mapping is meant at being read-write, and modified pages can be saved 223 thanks to the pfnUnCachePage callback */ 224 VIRTUALMEM_READWRITE 225 } CPLVirtualMemAccessMode; 226 227 228 /** Return the size of a page of virtual memory. 229 * 230 * @return the page size. 231 * 232 * @since GDAL 1.11 233 */ 234 size_t CPL_DLL CPLGetPageSize(void); 235 236 /** Create a new virtual memory mapping. 237 * 238 * This will reserve an area of virtual memory of size nSize, whose size 239 * might be potentially much larger than the physical memory available. Initially, 240 * no physical memory will be allocated. As soon as memory pages will be accessed, 241 * they will be allocated transparently and filled with the pfnCachePage callback. 242 * When the allowed cache size is reached, the least recently used pages will 243 * be unallocated. 244 * 245 * On Linux AMD64 platforms, the maximum value for nSize is 128 TB. 246 * On Linux x86 platforms, the maximum value for nSize is 2 GB. 247 * 248 * Only supported on Linux for now. 249 * 250 * Note that on Linux, this function will install a SIGSEGV handler. The 251 * original handler will be restored by CPLVirtualMemManagerTerminate(). 252 * 253 * @param nSize size in bytes of the virtual memory mapping. 254 * @param nCacheSize size in bytes of the maximum memory that will be really 255 * allocated (must ideally fit into RAM). 256 * @param nPageSizeHint hint for the page size. Must be a multiple of the 257 * system page size, returned by CPLGetPageSize(). 258 * Minimum value is generally 4096. Might be set to 0 to 259 * let the function determine a default page size. 260 * @param bSingleThreadUsage set to TRUE if there will be no concurrent threads 261 * that will access the virtual memory mapping. This can 262 * optimize performance a bit. 263 * @param eAccessMode permission to use for the virtual memory mapping. 264 * @param pfnCachePage callback triggered when a still unmapped page of virtual 265 * memory is accessed. The callback has the responsibility 266 * of filling the page with relevant values. 267 * @param pfnUnCachePage callback triggered when a dirty mapped page is going to 268 * be freed (saturation of cache, or termination of the 269 * virtual memory mapping). Might be NULL. 270 * @param pfnFreeUserData callback that can be used to free pCbkUserData. Might be 271 * NULL 272 * @param pCbkUserData user data passed to pfnCachePage and pfnUnCachePage. 273 * 274 * @return a virtual memory object that must be freed by CPLVirtualMemFree(), 275 * or NULL in case of failure. 276 * 277 * @since GDAL 1.11 278 */ 279 280 CPLVirtualMem CPL_DLL *CPLVirtualMemNew(size_t nSize, 281 size_t nCacheSize, 282 size_t nPageSizeHint, 283 int bSingleThreadUsage, 284 CPLVirtualMemAccessMode eAccessMode, 285 CPLVirtualMemCachePageCbk pfnCachePage, 286 CPLVirtualMemUnCachePageCbk pfnUnCachePage, 287 CPLVirtualMemFreeUserData pfnFreeUserData, 288 void *pCbkUserData); 289 290 291 /** Return if virtual memory mapping of a file is available. 292 * 293 * @return TRUE if virtual memory mapping of a file is available. 294 * @since GDAL 1.11 295 */ 296 int CPL_DLL CPLIsVirtualMemFileMapAvailable(void); 297 298 /** Create a new virtual memory mapping from a file. 299 * 300 * The file must be a "real" file recognized by the operating system, and not 301 * a VSI extended virtual file. 302 * 303 * In VIRTUALMEM_READWRITE mode, updates to the memory mapping will be written 304 * in the file. 305 * 306 * On Linux AMD64 platforms, the maximum value for nLength is 128 TB. 307 * On Linux x86 platforms, the maximum value for nLength is 2 GB. 308 * 309 * Only supported on Linux for now. 310 * 311 * @param fp Virtual file handle. 312 * @param nOffset Offset in the file to start the mapping from. 313 * @param nLength Length of the portion of the file to map into memory. 314 * @param eAccessMode Permission to use for the virtual memory mapping. This must 315 * be consistent with how the file has been opened. 316 * @param pfnFreeUserData callback that is called when the object is destroyed. 317 * @param pCbkUserData user data passed to pfnFreeUserData. 318 * @return a virtual memory object that must be freed by CPLVirtualMemFree(), 319 * or NULL in case of failure. 320 * 321 * @since GDAL 1.11 322 */ 323 CPLVirtualMem CPL_DLL *CPLVirtualMemFileMapNew( VSILFILE* fp, 324 vsi_l_offset nOffset, 325 vsi_l_offset nLength, 326 CPLVirtualMemAccessMode eAccessMode, 327 CPLVirtualMemFreeUserData pfnFreeUserData, 328 void *pCbkUserData ); 329 330 /** Create a new virtual memory mapping derived from an other virtual memory 331 * mapping. 332 * 333 * This may be useful in case of creating mapping for pixel interleaved data. 334 * 335 * The new mapping takes a reference on the base mapping. 336 * 337 * @param pVMemBase Base virtual memory mapping 338 * @param nOffset Offset in the base virtual memory mapping from which to start 339 * the new mapping. 340 * @param nSize Size of the base virtual memory mapping to expose in the 341 * the new mapping. 342 * @param pfnFreeUserData callback that is called when the object is destroyed. 343 * @param pCbkUserData user data passed to pfnFreeUserData. 344 * @return a virtual memory object that must be freed by CPLVirtualMemFree(), 345 * or NULL in case of failure. 346 * 347 * @since GDAL 1.11 348 */ 349 CPLVirtualMem CPL_DLL *CPLVirtualMemDerivedNew(CPLVirtualMem* pVMemBase, 350 vsi_l_offset nOffset, 351 vsi_l_offset nSize, 352 CPLVirtualMemFreeUserData pfnFreeUserData, 353 void *pCbkUserData); 354 355 /** Free a virtual memory mapping. 356 * 357 * The pointer returned by CPLVirtualMemGetAddr() will no longer be valid. 358 * If the virtual memory mapping was created with read/write permissions and that 359 * they are dirty (i.e. modified) pages, they will be flushed through the 360 * pfnUnCachePage callback before being freed. 361 * 362 * @param ctxt context returned by CPLVirtualMemNew(). 363 * 364 * @since GDAL 1.11 365 */ 366 void CPL_DLL CPLVirtualMemFree(CPLVirtualMem* ctxt); 367 368 /** Return the pointer to the start of a virtual memory mapping. 369 * 370 * The bytes in the range [p:p+CPLVirtualMemGetSize()-1] where p is the pointer 371 * returned by this function will be valid, until CPLVirtualMemFree() is called. 372 * 373 * Note that if a range of bytes used as an argument of a system call 374 * (such as read() or write()) contains pages that have not been "realized", the 375 * system call will fail with EFAULT. CPLVirtualMemPin() can be used to work 376 * around this issue. 377 * 378 * @param ctxt context returned by CPLVirtualMemNew(). 379 * @return the pointer to the start of a virtual memory mapping. 380 * 381 * @since GDAL 1.11 382 */ 383 void CPL_DLL *CPLVirtualMemGetAddr(CPLVirtualMem* ctxt); 384 385 /** Return the size of the virtual memory mapping. 386 * 387 * @param ctxt context returned by CPLVirtualMemNew(). 388 * @return the size of the virtual memory mapping. 389 * 390 * @since GDAL 1.11 391 */ 392 size_t CPL_DLL CPLVirtualMemGetSize(CPLVirtualMem* ctxt); 393 394 /** Return if the virtual memory mapping is a direct file mapping. 395 * 396 * @param ctxt context returned by CPLVirtualMemNew(). 397 * @return TRUE if the virtual memory mapping is a direct file mapping. 398 * 399 * @since GDAL 1.11 400 */ 401 int CPL_DLL CPLVirtualMemIsFileMapping(CPLVirtualMem* ctxt); 402 403 /** Return the access mode of the virtual memory mapping. 404 * 405 * @param ctxt context returned by CPLVirtualMemNew(). 406 * @return the access mode of the virtual memory mapping. 407 * 408 * @since GDAL 1.11 409 */ 410 CPLVirtualMemAccessMode CPL_DLL CPLVirtualMemGetAccessMode(CPLVirtualMem* ctxt); 411 412 /** Return the page size associated to a virtual memory mapping. 413 * 414 * The value returned will be at least CPLGetPageSize(), but potentially 415 * larger. 416 * 417 * @param ctxt context returned by CPLVirtualMemNew(). 418 * @return the page size 419 * 420 * @since GDAL 1.11 421 */ 422 size_t CPL_DLL CPLVirtualMemGetPageSize(CPLVirtualMem* ctxt); 423 424 /** Return TRUE if this memory mapping can be accessed safely from concurrent 425 * threads. 426 * 427 * The situation that can cause problems is when several threads try to access 428 * a page of the mapping that is not yet mapped. 429 * 430 * The return value of this function depends on whether bSingleThreadUsage has 431 * been set of not in CPLVirtualMemNew() and/or the implementation. 432 * 433 * On Linux, this will always return TRUE if bSingleThreadUsage = FALSE. 434 * 435 * @param ctxt context returned by CPLVirtualMemNew(). 436 * @return TRUE if this memory mapping can be accessed safely from concurrent 437 * threads. 438 * 439 * @since GDAL 1.11 440 */ 441 int CPL_DLL CPLVirtualMemIsAccessThreadSafe(CPLVirtualMem* ctxt); 442 443 /** Declare that a thread will access a virtual memory mapping. 444 * 445 * This function must be called by a thread that wants to access the 446 * content of a virtual memory mapping, except if the virtual memory mapping has 447 * been created with bSingleThreadUsage = TRUE. 448 * 449 * This function must be paired with CPLVirtualMemUnDeclareThread(). 450 * 451 * @param ctxt context returned by CPLVirtualMemNew(). 452 * 453 * @since GDAL 1.11 454 */ 455 void CPL_DLL CPLVirtualMemDeclareThread(CPLVirtualMem* ctxt); 456 457 /** Declare that a thread will stop accessing a virtual memory mapping. 458 * 459 * This function must be called by a thread that will no longer access the 460 * content of a virtual memory mapping, except if the virtual memory mapping has 461 * been created with bSingleThreadUsage = TRUE. 462 * 463 * This function must be paired with CPLVirtualMemDeclareThread(). 464 * 465 * @param ctxt context returned by CPLVirtualMemNew(). 466 * 467 * @since GDAL 1.11 468 */ 469 void CPL_DLL CPLVirtualMemUnDeclareThread(CPLVirtualMem* ctxt); 470 471 /** Make sure that a region of virtual memory will be realized. 472 * 473 * Calling this function is not required, but might be useful when debugging 474 * a process with tools like gdb or valgrind that do not naturally like 475 * segmentation fault signals. 476 * 477 * It is also needed when wanting to provide part of virtual memory mapping 478 * to a system call such as read() or write(). If read() or write() is called 479 * on a memory region not yet realized, the call will fail with EFAULT. 480 * 481 * @param ctxt context returned by CPLVirtualMemNew(). 482 * @param pAddr the memory region to pin. 483 * @param nSize the size of the memory region. 484 * @param bWriteOp set to TRUE if the memory are will be accessed in write mode. 485 * 486 * @since GDAL 1.11 487 */ 488 void CPL_DLL CPLVirtualMemPin(CPLVirtualMem* ctxt, 489 void* pAddr, size_t nSize, int bWriteOp); 490 491 /** Cleanup any resource and handlers related to virtual memory. 492 * 493 * This function must be called after the last CPLVirtualMem object has 494 * been freed. 495 * 496 * @since GDAL 1.11 497 */ 498 void CPL_DLL CPLVirtualMemManagerTerminate(void); 499 500.. _implemented-by-gdalvirtualmemcpp: 501 502Implemented by gdalvirtualmem.cpp 503~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 504 505:: 506 507 508 /** Create a CPLVirtualMem object from a GDAL dataset object. 509 * 510 * Only supported on Linux for now. 511 * 512 * This method allows creating a virtual memory object for a region of one 513 * or more GDALRasterBands from this dataset. The content of the virtual 514 * memory object is automatically filled from dataset content when a virtual 515 * memory page is first accessed, and it is released (or flushed in case of a 516 * "dirty" page) when the cache size limit has been reached. 517 * 518 * The pointer to access the virtual memory object is obtained with 519 * CPLVirtualMemGetAddr(). It remains valid until CPLVirtualMemFree() is called. 520 * CPLVirtualMemFree() must be called before the dataset object is destroyed. 521 * 522 * If p is such a pointer and base_type the C type matching eBufType, for default 523 * values of spacing parameters, the element of image coordinates (x, y) 524 * (relative to xOff, yOff) for band b can be accessed with 525 * ((base_type*)p)[x + y * nBufXSize + (b-1)*nBufXSize*nBufYSize]. 526 * 527 * Note that the mechanism used to transparently fill memory pages when they are 528 * accessed is the same (but in a controlled way) than what occurs when a memory 529 * error occurs in a program. Debugging software will generally interrupt program 530 * execution when that happens. If needed, CPLVirtualMemPin() can be used to avoid 531 * that by ensuring memory pages are allocated before being accessed. 532 * 533 * The size of the region that can be mapped as a virtual memory object depends 534 * on hardware and operating system limitations. 535 * On Linux AMD64 platforms, the maximum value is 128 TB. 536 * On Linux x86 platforms, the maximum value is 2 GB. 537 * 538 * Data type translation is automatically done if the data type 539 * (eBufType) of the buffer is different than 540 * that of the GDALRasterBand. 541 * 542 * Image decimation / replication is currently not supported, i.e. if the 543 * size of the region being accessed (nXSize x nYSize) is different from the 544 * buffer size (nBufXSize x nBufYSize). 545 * 546 * The nPixelSpace, nLineSpace and nBandSpace parameters allow reading into or 547 * writing from various organization of buffers. Arbitrary values for the spacing 548 * parameters are not supported. Those values must be multiple of the size of the 549 * buffer data type, and must be either band sequential organization (typically 550 * nPixelSpace = GDALGetDataTypeSize(eBufType) / 8, nLineSpace = nPixelSpace * nBufXSize, 551 * nBandSpace = nLineSpace * nBufYSize), or pixel-interleaved organization 552 * (typically nPixelSpace = nBandSpace * nBandCount, nLineSpace = nPixelSpace * nBufXSize, 553 * nBandSpace = GDALGetDataTypeSize(eBufType) / 8) 554 * 555 * @param hDS Dataset object 556 * 557 * @param eRWFlag Either GF_Read to read a region of data, or GF_Write to 558 * write a region of data. 559 * 560 * @param nXOff The pixel offset to the top left corner of the region 561 * of the band to be accessed. This would be zero to start from the left side. 562 * 563 * @param nYOff The line offset to the top left corner of the region 564 * of the band to be accessed. This would be zero to start from the top. 565 * 566 * @param nXSize The width of the region of the band to be accessed in pixels. 567 * 568 * @param nYSize The height of the region of the band to be accessed in lines. 569 * 570 * @param nBufXSize the width of the buffer image into which the desired region 571 * is to be read, or from which it is to be written. 572 * 573 * @param nBufYSize the height of the buffer image into which the desired 574 * region is to be read, or from which it is to be written. 575 * 576 * @param eBufType the type of the pixel values in the data buffer. The 577 * pixel values will automatically be translated to/from the GDALRasterBand 578 * data type as needed. 579 * 580 * @param nBandCount the number of bands being read or written. 581 * 582 * @param panBandMap the list of nBandCount band numbers being read/written. 583 * Note band numbers are 1 based. This may be NULL to select the first 584 * nBandCount bands. 585 * 586 * @param nPixelSpace The byte offset from the start of one pixel value in 587 * the buffer to the start of the next pixel value within a scanline. If defaulted 588 * (0) the size of the datatype eBufType is used. 589 * 590 * @param nLineSpace The byte offset from the start of one scanline in 591 * the buffer to the start of the next. If defaulted (0) the size of the datatype 592 * eBufType * nBufXSize is used. 593 * 594 * @param nBandSpace the byte offset from the start of one bands data to the 595 * start of the next. If defaulted (0) the value will be 596 * nLineSpace * nBufYSize implying band sequential organization 597 * of the data buffer. 598 * 599 * @param nCacheSize size in bytes of the maximum memory that will be really 600 * allocated (must ideally fit into RAM) 601 * 602 * @param nPageSizeHint hint for the page size. Must be a multiple of the 603 * system page size, returned by CPLGetPageSize(). 604 * Minimum value is generally 4096. Might be set to 0 to 605 * let the function determine a default page size. 606 * 607 * @param bSingleThreadUsage set to TRUE if there will be no concurrent threads 608 * that will access the virtual memory mapping. This can 609 * optimize performance a bit. If set to FALSE, 610 * CPLVirtualMemDeclareThread() must be called. 611 * 612 * @param papszOptions NULL terminated list of options. Unused for now. 613 * 614 * @return a virtual memory object that must be freed by CPLVirtualMemFree(), 615 * or NULL in case of failure. 616 * 617 * @since GDAL 1.11 618 */ 619 620 CPLVirtualMem CPL_DLL* GDALDatasetGetVirtualMem( GDALDatasetH hDS, 621 GDALRWFlag eRWFlag, 622 int nXOff, int nYOff, 623 int nXSize, int nYSize, 624 int nBufXSize, int nBufYSize, 625 GDALDataType eBufType, 626 int nBandCount, int* panBandMap, 627 int nPixelSpace, 628 GIntBig nLineSpace, 629 GIntBig nBandSpace, 630 size_t nCacheSize, 631 size_t nPageSizeHint, 632 int bSingleThreadUsage, 633 char **papszOptions ); 634 635 ** Create a CPLVirtualMem object from a GDAL raster band object. 636 * 637 * Only supported on Linux for now. 638 * 639 * This method allows creating a virtual memory object for a region of a 640 * GDALRasterBand. The content of the virtual 641 * memory object is automatically filled from dataset content when a virtual 642 * memory page is first accessed, and it is released (or flushed in case of a 643 * "dirty" page) when the cache size limit has been reached. 644 * 645 * The pointer to access the virtual memory object is obtained with 646 * CPLVirtualMemGetAddr(). It remains valid until CPLVirtualMemFree() is called. 647 * CPLVirtualMemFree() must be called before the raster band object is destroyed. 648 * 649 * If p is such a pointer and base_type the C type matching eBufType, for default 650 * values of spacing parameters, the element of image coordinates (x, y) 651 * (relative to xOff, yOff) can be accessed with 652 * ((base_type*)p)[x + y * nBufXSize]. 653 * 654 * Note that the mechanism used to transparently fill memory pages when they are 655 * accessed is the same (but in a controlled way) than what occurs when a memory 656 * error occurs in a program. Debugging software will generally interrupt program 657 * execution when that happens. If needed, CPLVirtualMemPin() can be used to avoid 658 * that by ensuring memory pages are allocated before being accessed. 659 * 660 * The size of the region that can be mapped as a virtual memory object depends 661 * on hardware and operating system limitations. 662 * On Linux AMD64 platforms, the maximum value is 128 TB. 663 * On Linux x86 platforms, the maximum value is 2 GB. 664 * 665 * Data type translation is automatically done if the data type 666 * (eBufType) of the buffer is different than 667 * that of the GDALRasterBand. 668 * 669 * Image decimation / replication is currently not supported, i.e. if the 670 * size of the region being accessed (nXSize x nYSize) is different from the 671 * buffer size (nBufXSize x nBufYSize). 672 * 673 * The nPixelSpace and nLineSpace parameters allow reading into or 674 * writing from various organization of buffers. Arbitrary values for the spacing 675 * parameters are not supported. Those values must be multiple of the size of the 676 * buffer data type and must be such that nLineSpace >= nPixelSpace * nBufXSize. 677 * 678 * @param hBand Rasterband object 679 * 680 * @param eRWFlag Either GF_Read to read a region of data, or GF_Write to 681 * write a region of data. 682 * 683 * @param nXOff The pixel offset to the top left corner of the region 684 * of the band to be accessed. This would be zero to start from the left side. 685 * 686 * @param nYOff The line offset to the top left corner of the region 687 * of the band to be accessed. This would be zero to start from the top. 688 * 689 * @param nXSize The width of the region of the band to be accessed in pixels. 690 * 691 * @param nYSize The height of the region of the band to be accessed in lines. 692 * 693 * @param nBufXSize the width of the buffer image into which the desired region 694 * is to be read, or from which it is to be written. 695 * 696 * @param nBufYSize the height of the buffer image into which the desired 697 * region is to be read, or from which it is to be written. 698 * 699 * @param eBufType the type of the pixel values in the data buffer. The 700 * pixel values will automatically be translated to/from the GDALRasterBand 701 * data type as needed. 702 * 703 * @param nPixelSpace The byte offset from the start of one pixel value in 704 * the buffer to the start of the next pixel value within a scanline. If defaulted 705 * (0) the size of the datatype eBufType is used. 706 * 707 * @param nLineSpace The byte offset from the start of one scanline in 708 * the buffer to the start of the next. If defaulted (0) the size of the datatype 709 * eBufType * nBufXSize is used. 710 * 711 * @param nCacheSize size in bytes of the maximum memory that will be really 712 * allocated (must ideally fit into RAM) 713 * 714 * @param nPageSizeHint hint for the page size. Must be a multiple of the 715 * system page size, returned by CPLGetPageSize(). 716 * Minimum value is generally 4096. Might be set to 0 to 717 * let the function determine a default page size. 718 * 719 * @param bSingleThreadUsage set to TRUE if there will be no concurrent threads 720 * that will access the virtual memory mapping. This can 721 * optimize performance a bit. If set to FALSE, 722 * CPLVirtualMemDeclareThread() must be called. 723 * 724 * @param papszOptions NULL terminated list of options. Unused for now. 725 * 726 * @return a virtual memory object that must be freed by CPLVirtualMemFree(), 727 * or NULL in case of failure. 728 * 729 * @since GDAL 1.11 730 */ 731 732 CPLVirtualMem CPL_DLL* GDALRasterBandGetVirtualMem( GDALRasterBandH hBand, 733 GDALRWFlag eRWFlag, 734 int nXOff, int nYOff, 735 int nXSize, int nYSize, 736 int nBufXSize, int nBufYSize, 737 GDALDataType eBufType, 738 int nPixelSpace, 739 GIntBig nLineSpace, 740 size_t nCacheSize, 741 size_t nPageSizeHint, 742 int bSingleThreadUsage, 743 char **papszOptions ); 744 745 typedef enum 746 { 747 /*! Tile Interleaved by Pixel: tile (0,0) with internal band interleaved 748 by pixel organization, tile (1, 0), ... */ 749 GTO_TIP, 750 /*! Band Interleaved by Tile : tile (0,0) of first band, tile (0,0) of second 751 band, ... tile (1,0) of first band, tile (1,0) of second band, ... */ 752 GTO_BIT, 753 /*! Band SeQuential : all the tiles of first band, all the tiles of following band... */ 754 GTO_BSQ 755 } GDALTileOrganization; 756 757 /** Create a CPLVirtualMem object from a GDAL dataset object, with tiling 758 * organization 759 * 760 * Only supported on Linux for now. 761 * 762 * This method allows creating a virtual memory object for a region of one 763 * or more GDALRasterBands from this dataset. The content of the virtual 764 * memory object is automatically filled from dataset content when a virtual 765 * memory page is first accessed, and it is released (or flushed in case of a 766 * "dirty" page) when the cache size limit has been reached. 767 * 768 * Contrary to GDALDatasetGetVirtualMem(), pixels will be organized by tiles 769 * instead of scanlines. Different ways of organizing pixel within/across tiles 770 * can be selected with the eTileOrganization parameter. 771 * 772 * If nXSize is not a multiple of nTileXSize or nYSize is not a multiple of 773 * nTileYSize, partial tiles will exists at the right and/or bottom of the region 774 * of interest. Those partial tiles will also have nTileXSize * nTileYSize dimension, 775 * with padding pixels. 776 * 777 * The pointer to access the virtual memory object is obtained with 778 * CPLVirtualMemGetAddr(). It remains valid until CPLVirtualMemFree() is called. 779 * CPLVirtualMemFree() must be called before the dataset object is destroyed. 780 * 781 * If p is such a pointer and base_type the C type matching eBufType, for default 782 * values of spacing parameters, the element of image coordinates (x, y) 783 * (relative to xOff, yOff) for band b can be accessed with : 784 * - for eTileOrganization = GTO_TIP, ((base_type*)p)[tile_number(x,y)*nBandCount*tile_size + offset_in_tile(x,y)*nBandCount + (b-1)]. 785 * - for eTileOrganization = GTO_BIT, ((base_type*)p)[(tile_number(x,y)*nBandCount + (b-1)) * tile_size + offset_in_tile(x,y)]. 786 * - for eTileOrganization = GTO_BSQ, ((base_type*)p)[(tile_number(x,y) + (b-1)*nTilesCount) * tile_size + offset_in_tile(x,y)]. 787 * 788 * where nTilesPerRow = ceil(nXSize / nTileXSize) 789 * nTilesPerCol = ceil(nYSize / nTileYSize) 790 * nTilesCount = nTilesPerRow * nTilesPerCol 791 * tile_number(x,y) = (y / nTileYSize) * nTilesPerRow + (x / nTileXSize) 792 * offset_in_tile(x,y) = (y % nTileYSize) * nTileXSize + (x % nTileXSize) 793 * tile_size = nTileXSize * nTileYSize 794 * 795 * Note that for a single band request, all tile organizations are equivalent. 796 * 797 * Note that the mechanism used to transparently fill memory pages when they are 798 * accessed is the same (but in a controlled way) than what occurs when a memory 799 * error occurs in a program. Debugging software will generally interrupt program 800 * execution when that happens. If needed, CPLVirtualMemPin() can be used to avoid 801 * that by ensuring memory pages are allocated before being accessed. 802 * 803 * The size of the region that can be mapped as a virtual memory object depends 804 * on hardware and operating system limitations. 805 * On Linux AMD64 platforms, the maximum value is 128 TB. 806 * On Linux x86 platforms, the maximum value is 2 GB. 807 * 808 * Data type translation is automatically done if the data type 809 * (eBufType) of the buffer is different than 810 * that of the GDALRasterBand. 811 * 812 * @param hDS Dataset object 813 * 814 * @param eRWFlag Either GF_Read to read a region of data, or GF_Write to 815 * write a region of data. 816 * 817 * @param nXOff The pixel offset to the top left corner of the region 818 * of the band to be accessed. This would be zero to start from the left side. 819 * 820 * @param nYOff The line offset to the top left corner of the region 821 * of the band to be accessed. This would be zero to start from the top. 822 * 823 * @param nXSize The width of the region of the band to be accessed in pixels. 824 * 825 * @param nYSize The height of the region of the band to be accessed in lines. 826 * 827 * @param nTileXSize the width of the tiles. 828 * 829 * @param nTileYSize the height of the tiles. 830 * 831 * @param eBufType the type of the pixel values in the data buffer. The 832 * pixel values will automatically be translated to/from the GDALRasterBand 833 * data type as needed. 834 * 835 * @param nBandCount the number of bands being read or written. 836 * 837 * @param panBandMap the list of nBandCount band numbers being read/written. 838 * Note band numbers are 1 based. This may be NULL to select the first 839 * nBandCount bands. 840 * 841 * @param eTileOrganization tile organization. 842 * 843 * @param nCacheSize size in bytes of the maximum memory that will be really 844 * allocated (must ideally fit into RAM) 845 * 846 * @param bSingleThreadUsage set to TRUE if there will be no concurrent threads 847 * that will access the virtual memory mapping. This can 848 * optimize performance a bit. If set to FALSE, 849 * CPLVirtualMemDeclareThread() must be called. 850 * 851 * @param papszOptions NULL terminated list of options. Unused for now. 852 * 853 * @return a virtual memory object that must be freed by CPLVirtualMemFree(), 854 * or NULL in case of failure. 855 * 856 * @since GDAL 1.11 857 */ 858 859 CPLVirtualMem CPL_DLL* GDALDatasetGetTiledVirtualMem( GDALDatasetH hDS, 860 GDALRWFlag eRWFlag, 861 int nXOff, int nYOff, 862 int nXSize, int nYSize, 863 int nTileXSize, int nTileYSize, 864 GDALDataType eBufType, 865 int nBandCount, int* panBandMap, 866 GDALTileOrganization eTileOrganization, 867 size_t nCacheSize, 868 int bSingleThreadUsage, 869 char **papszOptions ); 870 871 /** Create a CPLVirtualMem object from a GDAL rasterband object, with tiling 872 * organization 873 * 874 * Only supported on Linux for now. 875 * 876 * This method allows creating a virtual memory object for a region of one 877 * GDALRasterBand. The content of the virtual 878 * memory object is automatically filled from dataset content when a virtual 879 * memory page is first accessed, and it is released (or flushed in case of a 880 * "dirty" page) when the cache size limit has been reached. 881 * 882 * Contrary to GDALDatasetGetVirtualMem(), pixels will be organized by tiles 883 * instead of scanlines. 884 * 885 * If nXSize is not a multiple of nTileXSize or nYSize is not a multiple of 886 * nTileYSize, partial tiles will exists at the right and/or bottom of the region 887 * of interest. Those partial tiles will also have nTileXSize * nTileYSize dimension, 888 * with padding pixels. 889 * 890 * The pointer to access the virtual memory object is obtained with 891 * CPLVirtualMemGetAddr(). It remains valid until CPLVirtualMemFree() is called. 892 * CPLVirtualMemFree() must be called before the raster band object is destroyed. 893 * 894 * If p is such a pointer and base_type the C type matching eBufType, for default 895 * values of spacing parameters, the element of image coordinates (x, y) 896 * (relative to xOff, yOff) can be accessed with : 897 * ((base_type*)p)[tile_number(x,y)*tile_size + offset_in_tile(x,y)]. 898 * 899 * where nTilesPerRow = ceil(nXSize / nTileXSize) 900 * nTilesCount = nTilesPerRow * nTilesPerCol 901 * tile_number(x,y) = (y / nTileYSize) * nTilesPerRow + (x / nTileXSize) 902 * offset_in_tile(x,y) = (y % nTileYSize) * nTileXSize + (x % nTileXSize) 903 * tile_size = nTileXSize * nTileYSize 904 * 905 * Note that the mechanism used to transparently fill memory pages when they are 906 * accessed is the same (but in a controlled way) than what occurs when a memory 907 * error occurs in a program. Debugging software will generally interrupt program 908 * execution when that happens. If needed, CPLVirtualMemPin() can be used to avoid 909 * that by ensuring memory pages are allocated before being accessed. 910 * 911 * The size of the region that can be mapped as a virtual memory object depends 912 * on hardware and operating system limitations. 913 * On Linux AMD64 platforms, the maximum value is 128 TB. 914 * On Linux x86 platforms, the maximum value is 2 GB. 915 * 916 * Data type translation is automatically done if the data type 917 * (eBufType) of the buffer is different than 918 * that of the GDALRasterBand. 919 * 920 * @param hBand Rasterband object 921 * 922 * @param eRWFlag Either GF_Read to read a region of data, or GF_Write to 923 * write a region of data. 924 * 925 * @param nXOff The pixel offset to the top left corner of the region 926 * of the band to be accessed. This would be zero to start from the left side. 927 * 928 * @param nYOff The line offset to the top left corner of the region 929 * of the band to be accessed. This would be zero to start from the top. 930 * 931 * @param nXSize The width of the region of the band to be accessed in pixels. 932 * 933 * @param nYSize The height of the region of the band to be accessed in lines. 934 * 935 * @param nTileXSize the width of the tiles. 936 * 937 * @param nTileYSize the height of the tiles. 938 * 939 * @param eBufType the type of the pixel values in the data buffer. The 940 * pixel values will automatically be translated to/from the GDALRasterBand 941 * data type as needed. 942 * 943 * @param nCacheSize size in bytes of the maximum memory that will be really 944 * allocated (must ideally fit into RAM) 945 * 946 * @param bSingleThreadUsage set to TRUE if there will be no concurrent threads 947 * that will access the virtual memory mapping. This can 948 * optimize performance a bit. If set to FALSE, 949 * CPLVirtualMemDeclareThread() must be called. 950 * 951 * @param papszOptions NULL terminated list of options. Unused for now. 952 * 953 * @return a virtual memory object that must be freed by CPLVirtualMemFree(), 954 * or NULL in case of failure. 955 * 956 * @since GDAL 1.11 957 */ 958 959 CPLVirtualMem CPL_DLL* GDALRasterBandGetTiledVirtualMem( GDALRasterBandH hBand, 960 GDALRWFlag eRWFlag, 961 int nXOff, int nYOff, 962 int nXSize, int nYSize, 963 int nTileXSize, int nTileYSize, 964 GDALDataType eBufType, 965 size_t nCacheSize, 966 int bSingleThreadUsage, 967 char **papszOptions ); 968 969.. _implemented-by-gdalrasterbandcpp: 970 971Implemented by gdalrasterband.cpp 972~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 973 974:: 975 976 977 /** \brief Create a CPLVirtualMem object from a GDAL raster band object. 978 * 979 * Only supported on Linux for now. 980 * 981 * This method allows creating a virtual memory object for a GDALRasterBand, 982 * that exposes the whole image data as a virtual array. 983 * 984 * The default implementation relies on GDALRasterBandGetVirtualMem(), but specialized 985 * implementation, such as for raw files, may also directly use mechanisms of the 986 * operating system to create a view of the underlying file into virtual memory 987 * ( CPLVirtualMemFileMapNew() ) 988 * 989 * At the time of writing, the GeoTIFF driver and "raw" drivers (EHdr, ...) offer 990 * a specialized implementation with direct file mapping, provided that some 991 * requirements are met : 992 * - for all drivers, the dataset must be backed by a "real" file in the file 993 * system, and the byte ordering of multi-byte datatypes (Int16, etc.) 994 * must match the native ordering of the CPU. 995 * - in addition, for the GeoTIFF driver, the GeoTIFF file must be uncompressed, scanline 996 * oriented (i.e. not tiled). Strips must be organized in the file in sequential 997 * order, and be equally spaced (which is generally the case). Only power-of-two 998 * bit depths are supported (8 for GDT_Bye, 16 for GDT_Int16/GDT_UInt16, 999 * 32 for GDT_Float32 and 64 for GDT_Float64) 1000 * 1001 * The pointer returned remains valid until CPLVirtualMemFree() is called. 1002 * CPLVirtualMemFree() must be called before the raster band object is destroyed. 1003 * 1004 * If p is such a pointer and base_type the type matching GDALGetRasterDataType(), 1005 * the element of image coordinates (x, y) can be accessed with 1006 * *(base_type*) ((GByte*)p + x * *pnPixelSpace + y * *pnLineSpace) 1007 * 1008 * This method is the same as the C GDALGetVirtualMemAuto() function. 1009 * 1010 * @param eRWFlag Either GF_Read to read the band, or GF_Write to 1011 * read/write the band. 1012 * 1013 * @param pnPixelSpace Output parameter giving the byte offset from the start of one pixel value in 1014 * the buffer to the start of the next pixel value within a scanline. 1015 * 1016 * @param pnLineSpace Output parameter giving the byte offset from the start of one scanline in 1017 * the buffer to the start of the next. 1018 * 1019 * @param papszOptions NULL terminated list of options. 1020 * If a specialized implementation exists, defining USE_DEFAULT_IMPLEMENTATION=YES 1021 * will cause the default implementation to be used. 1022 * When requiring or falling back to the default implementation, the following 1023 * options are available : CACHE_SIZE (in bytes, defaults to 40 MB), 1024 * PAGE_SIZE_HINT (in bytes), 1025 * SINGLE_THREAD ("FALSE" / "TRUE", defaults to FALSE) 1026 * 1027 * @return a virtual memory object that must be unreferenced by CPLVirtualMemFree(), 1028 * or NULL in case of failure. 1029 * 1030 * @since GDAL 1.11 1031 */ 1032 1033 CPLVirtualMem *GDALRasterBand::GetVirtualMemAuto( GDALRWFlag eRWFlag, 1034 int *pnPixelSpace, 1035 GIntBig *pnLineSpace, 1036 char **papszOptions ): 1037 1038 CPLVirtualMem CPL_DLL* GDALGetVirtualMemAuto( GDALRasterBandH hBand, 1039 GDALRWFlag eRWFlag, 1040 int *pnPixelSpace, 1041 GIntBig *pnLineSpace, 1042 char **papszOptions ); 1043 1044Portability 1045----------- 1046 1047The CPLVirtualMem low-level machinery is only implemented for Linux now. 1048It assumes that returning from a SIGSEGV handler is possible, which is a 1049blatant violation of POSIX, but in practice it seems that most POSIX 1050(and non POSIX such as Windows) systems should be able to resume 1051execution after a segmentation fault. 1052 1053Porting to other POSIX operating systems such as MacOSX should be doable 1054with moderate effort. Windows has API that offer similar capabilities as 1055POSIX API with VirtualAlloc(), VirtualProtect() and 1056SetUnhandledExceptionFilter(), although the porting would undoubtly 1057require more effort. 1058 1059The existence of `libsigsegv <http://www.gnu.org/software/libsigsegv>`__ 1060that run on various OS is an evidence on its capacity of being ported to 1061other platforms. 1062 1063The trickiest part is ensuring that things will work reliably when two 1064concurrent threads that try to access the same initially unmapped page. 1065Without special care, one thread could manage to access the page that is 1066being filled by the other thread, before it is completely filled. On 1067Linux this can be easily avoided with the mremap() call. When a page is 1068filled, we don't actually pass the target page to the user callback, but 1069a temporary page. When the callback has finished its job, this temporary 1070page is mremap()'ed to its target location, which is an atomic 1071operation. An alternative implementation for POSIX systems that don't 1072have this mremap() call has been tested : any declared threads that can 1073access the memory mapping are paused before the temporary page is 1074memcpy'ed to its target location, and are resumed afterwards. This 1075requires threads to priorly declare their "interest" for a memory 1076mapping with CPLVirtualMemDeclareThread(). Pausing a thread is 1077interestingly non-obvious : the solution found to do so is to send it a 1078SIGUSR1 signal and make it wait in a signal handler for this SIGUSR1 1079signal... It has not been investigated if/how this could be done on 1080Windows. CPLVirtualMemIsAccessThreadSafe() has been introduced for that 1081purpose. 1082 1083As far as CPLVirtualMemFileMapNew() is concerned, memory file mapping on 1084POSIX systems with mmap() should be portable. Windows has 1085CreateFileMapping() and MapViewOfFile() API that have similar 1086capabilities as mmap(). 1087 1088Performance 1089----------- 1090 1091No miraculous performance gain should be expected from this new 1092capability, when compared to code that carefully uses GDALRasterIO(). 1093Handling segmentation faults has a cost ( the operating system catches a 1094hardware exception, then calls the user program segmentation fault 1095handler, which does the normal GDAL I/O operations, and plays with page 1096mappings and permissions which invalidate some CPU caches, etc... ). 1097However, when a page has been realized, access to it should be really 1098fast, so with appropriate access patterns and cache size, good 1099performance should be expected. 1100 1101It should also be noted that in the current implementation, the 1102realization of pages is done in a serialized way, that is to say if 2 1103threads which use 2 different memory mappings cause a segmentation fault 1104at the same time, they will not be dealt by 2 different threads, but one 1105after the other one. 1106 1107The overhead of virtual memory objects returned by GetVirtualMemAuto(), 1108when using the memory file mapping, should be lesser than the manual 1109management of page faults. However, GDAL has no control of the strategy 1110used by the operating system to cache pages. 1111 1112Limitations 1113----------- 1114 1115The maximum size of the virtual memory space (and thus a virtual memory 1116mapping) depends on the CPU architecture and OS limitations : 1117 1118- on Linux AMD64, 128 TB. 1119- on Linux x86, 2 GB. 1120- On Windows AMD64 (unsupported by the current implementation), 8 TB. 1121- On Windows x86 (unsupported by the current implementation), 2 GB. 1122 1123Clearly, the main interest of this new functionality is for AMD64 1124platforms. 1125 1126On a Linux AMD64 machine with 4 GB RAM, the Python binding of 1127GDALDatasetGetTiledVirtualMem() has been successfully used to access 1128random points on the new `Europe 3'' DEM 1129dataset <http://www.eea.europa.eu/data-and-maps/data/eu-dem/#tab-original-data>`__, 1130which is a 20 GB compressed GeoTIFF ( and 288000 \* 180000 \* 4 = 193 GB 1131uncompressed ) 1132 1133Related thoughts 1134---------------- 1135 1136Some issues with system calls such as read() or write(), or easier 1137multi-threading could potentially be solved by making a FUSE (File 1138system in USEr space) driver that would expose a GDAL dataset as a file, 1139and the mmap()'ing the file itself. However FUSE drivers are only 1140available on POSIX OS, and need root privilege to be mounted (a FUSE 1141filesystem does not need root privilege to run, but the mounting 1142operation does). 1143 1144Open questions 1145-------------- 1146 1147Due to the fact that it currently only works on Linux, should we mark 1148the API as experimental for now ? 1149 1150Backward compatibility issues 1151----------------------------- 1152 1153C/C++ API --> compatible (new API). C ABI --> compatible (new API). C++ 1154ABI --> incompatibility because GDALRasterBand has a new virtual method. 1155 1156Updated drivers 1157--------------- 1158 1159The RawRasterBand object and GeoTIFF drivers will be updated to 1160implement GetVirtualMemAuto() and offer memory file mapping when 1161possible (see above documented restrictions on when this is possible). 1162 1163In future steps, other drivers such as the VRT driver (for 1164VRTRawRasterBand) could also offer a specialized implementation of 1165GetVirtualMemAuto(). 1166 1167SWIG bindings 1168------------- 1169 1170The high level API (dataset and raster band) API is available in Python 1171bindings. 1172 1173GDALDatasetGetVirtualMem() is mapped as Dataset.GetVirtualArray(), which 1174returns a NumPy array. 1175 1176:: 1177 1178 def GetVirtualMemArray(self, eAccess = gdalconst.GF_Read, xoff=0, yoff=0, 1179 xsize=None, ysize=None, bufxsize=None, bufysize=None, 1180 datatype = None, band_list = None, band_sequential = True, 1181 cache_size = 10 * 1024 * 1024, page_size_hint = 0, options = None): 1182 """Return a NumPy array for the dataset, seen as a virtual memory mapping. 1183 If there are several bands and band_sequential = True, an element is 1184 accessed with array[band][y][x]. 1185 If there are several bands and band_sequential = False, an element is 1186 accessed with array[y][x][band]. 1187 If there is only one band, an element is accessed with array[y][x]. 1188 Any reference to the array must be dropped before the last reference to the 1189 related dataset is also dropped. 1190 """ 1191 1192Similarly for GDALDatasetGetTiledVirtualMem() : 1193 1194:: 1195 1196 def GetTiledVirtualMemArray(self, eAccess = gdalconst.GF_Read, xoff=0, yoff=0, 1197 xsize=None, ysize=None, tilexsize=256, tileysize=256, 1198 datatype = None, band_list = None, tile_organization = gdalconst.GTO_BSQ, 1199 cache_size = 10 * 1024 * 1024, options = None): 1200 """Return a NumPy array for the dataset, seen as a virtual memory mapping with 1201 a tile organization. 1202 If there are several bands and tile_organization = gdal.GTO_BIP, an element is 1203 accessed with array[tiley][tilex][y][x][band]. 1204 If there are several bands and tile_organization = gdal.GTO_BTI, an element is 1205 accessed with array[tiley][tilex][band][y][x]. 1206 If there are several bands and tile_organization = gdal.GTO_BSQ, an element is 1207 accessed with array[band][tiley][tilex][y][x]. 1208 If there is only one band, an element is accessed with array[tiley][tilex][y][x]. 1209 Any reference to the array must be dropped before the last reference to the 1210 related dataset is also dropped. 1211 """ 1212 1213And the Band object has the following 3 methods : 1214 1215:: 1216 1217 def GetVirtualMemArray(self, eAccess = gdalconst.GF_Read, xoff=0, yoff=0, 1218 xsize=None, ysize=None, bufxsize=None, bufysize=None, 1219 datatype = None, 1220 cache_size = 10 * 1024 * 1024, page_size_hint = 0, options = None): 1221 """Return a NumPy array for the band, seen as a virtual memory mapping. 1222 An element is accessed with array[y][x]. 1223 Any reference to the array must be dropped before the last reference to the 1224 related dataset is also dropped. 1225 """ 1226 1227 def GetVirtualMemAutoArray(self, eAccess = gdalconst.GF_Read, options = None): 1228 """Return a NumPy array for the band, seen as a virtual memory mapping. 1229 An element is accessed with array[y][x]. 1230 1231 def GetTiledVirtualMemArray(self, eAccess = gdalconst.GF_Read, xoff=0, yoff=0, 1232 xsize=None, ysize=None, tilexsize=256, tileysize=256, 1233 datatype = None, 1234 cache_size = 10 * 1024 * 1024, options = None): 1235 """Return a NumPy array for the band, seen as a virtual memory mapping with 1236 a tile organization. 1237 An element is accessed with array[tiley][tilex][y][x]. 1238 Any reference to the array must be dropped before the last reference to the 1239 related dataset is also dropped. 1240 """ 1241 1242Note: dataset/Band.GetVirtualMem()/GetTiledVirtualMem() methods are also 1243available. They return a VirtualMem python object that has a GetAddr() 1244method that returns a Python memoryview object (Python 2.7 or later 1245required). However, using such object does not seem practical for 1246non-Byte data types. 1247 1248Test Suite 1249---------- 1250 1251The autotest suite will be extended to test the Python API of this RFC. 1252It will also test the specialized implementations of GetVirtualMemAuto() 1253in RawRasterBand and the GeoTIFF drivers. In autotest/cpp, a 1254test_virtualmem.cpp file tests concurrent access to the same pages by 2 1255threads. 1256 1257Implementation 1258-------------- 1259 1260Implementation will be done by Even Rouault in GDAL/OGR trunk. The 1261proposed implementation is attached as a 1262`patch <http://trac.osgeo.org/gdal/attachment/wiki/rfc45_virtualmem/virtualmem.patch>`__. 1263 1264Voting history 1265-------------- 1266 1267+1 from EvenR, FrankW, DanielM and JukkaR 1268