1.. _rfc-66: 2 3======================================================================================= 4RFC 66 : OGR random layer read/write capabilities 5======================================================================================= 6 7Author: Even Rouault 8 9Contact: even.rouault at spatialys.com 10 11Status: Implemented 12 13Implementing version: 2.2 14 15Summary 16------- 17 18This RFC introduces a new API to be able to iterate over vector features 19at dataset level, in addition to the existing capability of doing it at 20the layer level. The existing capability of writing features in layers 21in random order, that is supported by most drivers with output 22capabilities, is formalized with a new dataset capability flag. 23 24Rationale 25--------- 26 27Some vector formats mix features that belong to different layers in an 28interleaved way, which make the current feature iteration per layer 29rather inefficient (this requires for each layer to read the whole 30file). One example of such drivers is the OSM driver. For this driver, a 31hack had been developed in the past to be able to use the 32OGRLayer::GetNextFeature() method, but with a really particular 33semantics. See "Interleaved reading" paragraph of :ref:`vector.osm` for more 34details. A similar need arises with the development of a new driver, 35GMLAS (for GML Application Schemas), that reads GML files with arbitrary 36element nesting, and thus can return them in a apparent random order, 37because it works in a streaming way. For example, let's consider the 38following simplified XML content : 39 40:: 41 42 <A> 43 ... 44 <B> 45 ... 46 </B> 47 ... 48 </A> 49 50The driver will be first able to complete the building of feature B 51before emitting feature A. So when reading sequences of this pattern, 52the driver will emit features in the order B,A,B,A,... 53 54Changes 55------- 56 57C++ API 58~~~~~~~ 59 60Two new methods are added at the GDALDataset level : 61 62GetNextFeature(): 63 64:: 65 66 /** 67 \brief Fetch the next available feature from this dataset. 68 69 The returned feature becomes the responsibility of the caller to 70 delete with OGRFeature::DestroyFeature(). 71 72 Depending on the driver, this method may return features from layers in a 73 non sequential way. This is what may happen when the 74 ODsCRandomLayerRead capability is declared (for example for the 75 OSM and GMLAS drivers). When datasets declare this capability, it is strongly 76 advised to use GDALDataset::GetNextFeature() instead of 77 OGRLayer::GetNextFeature(), as the later might have a slow, incomplete or stub 78 implementation. 79 80 The default implementation, used by most drivers, will 81 however iterate over each layer, and then over each feature within this 82 layer. 83 84 This method takes into account spatial and attribute filters set on layers that 85 will be iterated upon. 86 87 The ResetReading() method can be used to start at the beginning again. 88 89 Depending on drivers, this may also have the side effect of calling 90 OGRLayer::GetNextFeature() on the layers of this dataset. 91 92 This method is the same as the C function GDALDatasetGetNextFeature(). 93 94 @param ppoBelongingLayer a pointer to a OGRLayer* variable to receive the 95 layer to which the object belongs to, or NULL. 96 It is possible that the output of *ppoBelongingLayer 97 to be NULL despite the feature not being NULL. 98 @param pdfProgressPct a pointer to a double variable to receive the 99 percentage progress (in [0,1] range), or NULL. 100 On return, the pointed value might be negative if 101 determining the progress is not possible. 102 @param pfnProgress a progress callback to report progress (for 103 GetNextFeature() calls that might have a long duration) 104 and offer cancellation possibility, or NULL 105 @param pProgressData user data provided to pfnProgress, or NULL 106 @return a feature, or NULL if no more features are available. 107 @since GDAL 2.2 108 */ 109 110 OGRFeature* GDALDataset::GetNextFeature( OGRLayer** ppoBelongingLayer, 111 double* pdfProgressPct, 112 GDALProgressFunc pfnProgress, 113 void* pProgressData ) 114 115and ResetReading(): 116 117:: 118 119 /** 120 \brief Reset feature reading to start on the first feature. 121 122 This affects GetNextFeature(). 123 124 Depending on drivers, this may also have the side effect of calling 125 OGRLayer::ResetReading() on the layers of this dataset. 126 127 This method is the same as the C function GDALDatasetResetReading(). 128 129 @since GDAL 2.2 130 */ 131 void GDALDataset::ResetReading(); 132 133New capabilities 134~~~~~~~~~~~~~~~~ 135 136The following 2 new dataset capabilities are added : 137 138:: 139 140 #define ODsCRandomLayerRead "RandomLayerRead" /**< Dataset capability for GetNextFeature() returning features from random layers */ 141 #define ODsCRandomLayerWrite "RandomLayerWrite " /**< Dataset capability for supporting CreateFeature on layer in random order */ 142 143C API 144~~~~~ 145 146The above 2 new methods are available in the C API with : 147 148:: 149 150 OGRFeatureH CPL_DLL GDALDatasetGetNextFeature( GDALDatasetH hDS, 151 OGRLayerH* phBelongingLayer, 152 double* pdfProgressPct, 153 GDALProgressFunc pfnProgress, 154 void* pProgressData ) 155 156 void CPL_DLL GDALDatasetResetReading( GDALDatasetH hDS ); 157 158Discussion about a few design choices of the new API 159---------------------------------------------------- 160 161Compared to OGRLayer::GetNextFeature(), GDALDataset::GetNextFeature() 162has a few differences : 163 164- it returns the layer which the feature belongs to. Indeed, there's no 165 easy way from a feature to know which layer it belongs too (since in 166 the data model, features can exist outside of any layer). One 167 possibility would be to correlate the OGRFeatureDefn\* object of the 168 feature with the one of the layer, but that is a bit inconvenient to 169 do (and theoretically, one could imagine several layers sharing the 170 same feature definition object, although this probably never happen 171 in any in-tree driver). 172- even if the feature returned is not NULL, the returned layer might be 173 NULL. This is just a provision for now, since that cannot currently 174 happen. This could be interesting to address schema-less datasources 175 where basically each feature could have a different schema (GeoJSON 176 for example) without really belonging to a clearly identified layer. 177- it returns a progress percentage. When using OGRLayer API, one has to 178 count the number of features returned with the total number returned 179 by GetFeatureCount(). For the use cases we want to address knowing 180 quickly the total number of features of the dataset is not doable. 181 But knowing the position of the file pointer regarding the total size 182 of the size is easy. Hence the decision to make GetNextFeature() 183 return the progress percentage. Regarding the choice of the range 184 [0,1], this is to be consistent with the range accepted by GDAL 185 progress functions. 186- it accepts a progress and cancellation callback. One could wonder why 187 this is needed given that GetNextFeature() is an "elementary" method 188 and that it can already returns the progress percentage. However, in 189 some circumstances, it might take a rather long time to complete a 190 GetNextFeature() call. For example in the case of the OSM driver, as 191 an optimization you can ask the driver to return features of a subset 192 of layers. For example all layers except nodes. But generally the 193 nodes are at the beginning of the file, so before you get the first 194 feature, you have typically to process 70% of the whole file. In the 195 GMLAS driver, the first GetNextFeature() call is also the opportunity 196 to do a preliminary quick scan of the file to determine the SRS of 197 geometry columns, hence having progress feedback is welcome. 198 199The progress percentage output is redundant with the progress callback 200mechanism, and the latter could be used to get the former, however it 201may be a bit convoluted. It would require doing things like: 202 203:: 204 205 int MyProgress(double pct, const char* msg, void* user_data) 206 { 207 *(double*)user_data = pct; 208 return TRUE; 209 } 210 211 myDS->GetNextFeature(&poLayer, MyProgress, &pct) 212 213SWIG bindings (Python / Java / C# / Perl) changes 214------------------------------------------------- 215 216GDALDatasetGetNextFeature is mapped as gdal::Dataset::GetNextFeature() 217and GDALDatasetResetReading as gdal::Dataset::ResetReading(). 218 219Regarding gdal::Dataset::GetNextFeature(), currently only Python has 220been modified to return both the feature and its belonging layer. Other 221bindings just return the feature for now (would need specialized 222typemaps) 223 224Drivers 225------- 226 227The OSM and GMLAS driver are updated to implement the new API. 228 229Existing drivers that support ODsCRandomLayerWrite are updated to 230advertise it (that is most drivers that have layer creation 231capabilities, with the exceptions of KML, JML and GeoJSON). 232 233Utilities 234--------- 235 236ogr2ogr / GDALVectorTranslate() is changed internally to remove the hack 237that was used for the OSM driver to use the new API, when 238ODsCRandomLayerRead is advertized. It checks if the output driver 239advertises ODsCRandomLayerWrite, and if it does not, emit a warning, but 240still goes on proceeding with the conversion using random layer 241reading/writing. 242 243ogrinfo is extended to accept a -rl (for random layer) flag that 244instructs it to use the GDALDataset::GetNextFeature() API. It was 245considered to use it automatically when ODsCRandomLayerRead was 246advertized, but the output can be quite... random and thus not very 247practical for the user. 248 249Documentation 250------------- 251 252All new methods/functions are documented. 253 254Test Suite 255---------- 256 257The specialized GetNextFeature() implementation of the OSM and GMLAS 258driver is tested in their respective tests. The default implementation 259of GDALDataset::GetNextFeature() is tested in the MEM driver tests. 260 261Compatibility Issues 262-------------------- 263 264None for existing users of the C/C++ API. 265 266Since there is a default implementation, the new functions/methods can 267be safely used on drivers that don't have a specialized implementation. 268 269The addition of the new virtual methods GDALDataset::ResetReading() and 270GDALDataset::GetNextFeature() may cause issues for out-of-tree drivers 271that would already use internally such method names, but with different 272semantics, or signatures. We have encountered such issues with a few 273in-tree drivers, and fixed them. 274 275Implementation 276-------------- 277 278The implementation will be done by Even Rouault, and is mostly triggered 279by the needs of the new GMLAS driver (initial development funded by the 280European Earth observation programme Copernicus). 281 282The proposed implementation is in 283`https://github.com/rouault/gdal2/tree/gmlas_randomreadwrite <https://github.com/rouault/gdal2/tree/gmlas_randomreadwrite>`__ 284(commit: 285`https://github.com/rouault/gdal2/commit/8447606d68b9fac571aa4d381181ecfffed6d72c <https://github.com/rouault/gdal2/commit/8447606d68b9fac571aa4d381181ecfffed6d72c>`__) 286 287Voting history 288-------------- 289 290+1 from TamasS, HowardB, JukkaR, DanielM and EvenR. 291