1.. _rfc-66:
2
3=======================================================================================
4RFC 66 : OGR random layer read/write capabilities
5=======================================================================================
6
7Author: Even Rouault
8
9Contact: even.rouault at spatialys.com
10
11Status: Implemented
12
13Implementing version: 2.2
14
15Summary
16-------
17
18This RFC introduces a new API to be able to iterate over vector features
19at dataset level, in addition to the existing capability of doing it at
20the layer level. The existing capability of writing features in layers
21in random order, that is supported by most drivers with output
22capabilities, is formalized with a new dataset capability flag.
23
24Rationale
25---------
26
27Some vector formats mix features that belong to different layers in an
28interleaved way, which make the current feature iteration per layer
29rather inefficient (this requires for each layer to read the whole
30file). One example of such drivers is the OSM driver. For this driver, a
31hack had been developed in the past to be able to use the
32OGRLayer::GetNextFeature() method, but with a really particular
33semantics. See "Interleaved reading" paragraph of :ref:`vector.osm` for more
34details. A similar need arises with the development of a new driver,
35GMLAS (for GML Application Schemas), that reads GML files with arbitrary
36element nesting, and thus can return them in a apparent random order,
37because it works in a streaming way. For example, let's consider the
38following simplified XML content :
39
40::
41
42   <A>
43       ...
44       <B>
45           ...
46       </B>
47       ...
48   </A>
49
50The driver will be first able to complete the building of feature B
51before emitting feature A. So when reading sequences of this pattern,
52the driver will emit features in the order B,A,B,A,...
53
54Changes
55-------
56
57C++ API
58~~~~~~~
59
60Two new methods are added at the GDALDataset level :
61
62GetNextFeature():
63
64::
65
66   /**
67    \brief Fetch the next available feature from this dataset.
68
69    The returned feature becomes the responsibility of the caller to
70    delete with OGRFeature::DestroyFeature().
71
72    Depending on the driver, this method may return features from layers in a
73    non sequential way. This is what may happen when the
74    ODsCRandomLayerRead capability is declared (for example for the
75    OSM and GMLAS drivers). When datasets declare this capability, it is strongly
76    advised to use GDALDataset::GetNextFeature() instead of
77    OGRLayer::GetNextFeature(), as the later might have a slow, incomplete or stub
78    implementation.
79
80    The default implementation, used by most drivers, will
81    however iterate over each layer, and then over each feature within this
82    layer.
83
84    This method takes into account spatial and attribute filters set on layers that
85    will be iterated upon.
86
87    The ResetReading() method can be used to start at the beginning again.
88
89    Depending on drivers, this may also have the side effect of calling
90    OGRLayer::GetNextFeature() on the layers of this dataset.
91
92    This method is the same as the C function GDALDatasetGetNextFeature().
93
94    @param ppoBelongingLayer a pointer to a OGRLayer* variable to receive the
95                             layer to which the object belongs to, or NULL.
96                             It is possible that the output of *ppoBelongingLayer
97                             to be NULL despite the feature not being NULL.
98    @param pdfProgressPct    a pointer to a double variable to receive the
99                             percentage progress (in [0,1] range), or NULL.
100                             On return, the pointed value might be negative if
101                             determining the progress is not possible.
102    @param pfnProgress       a progress callback to report progress (for
103                             GetNextFeature() calls that might have a long duration)
104                             and offer cancellation possibility, or NULL
105    @param pProgressData     user data provided to pfnProgress, or NULL
106    @return a feature, or NULL if no more features are available.
107    @since GDAL 2.2
108   */
109
110   OGRFeature* GDALDataset::GetNextFeature( OGRLayer** ppoBelongingLayer,
111                                            double* pdfProgressPct,
112                                            GDALProgressFunc pfnProgress,
113                                            void* pProgressData )
114
115and ResetReading():
116
117::
118
119   /**
120    \brief Reset feature reading to start on the first feature.
121
122    This affects GetNextFeature().
123
124    Depending on drivers, this may also have the side effect of calling
125    OGRLayer::ResetReading() on the layers of this dataset.
126
127    This method is the same as the C function GDALDatasetResetReading().
128
129    @since GDAL 2.2
130   */
131   void        GDALDataset::ResetReading();
132
133New capabilities
134~~~~~~~~~~~~~~~~
135
136The following 2 new dataset capabilities are added :
137
138::
139
140   #define ODsCRandomLayerRead     "RandomLayerRead"   /**< Dataset capability for GetNextFeature() returning features from random layers */
141   #define ODsCRandomLayerWrite    "RandomLayerWrite " /**< Dataset capability for supporting CreateFeature on layer in random order */
142
143C API
144~~~~~
145
146The above 2 new methods are available in the C API with :
147
148::
149
150   OGRFeatureH CPL_DLL GDALDatasetGetNextFeature( GDALDatasetH hDS,
151                                                  OGRLayerH* phBelongingLayer,
152                                                  double* pdfProgressPct,
153                                                  GDALProgressFunc pfnProgress,
154                                                  void* pProgressData )
155
156   void CPL_DLL GDALDatasetResetReading( GDALDatasetH hDS );
157
158Discussion about a few design choices of the new API
159----------------------------------------------------
160
161Compared to OGRLayer::GetNextFeature(), GDALDataset::GetNextFeature()
162has a few differences :
163
164-  it returns the layer which the feature belongs to. Indeed, there's no
165   easy way from a feature to know which layer it belongs too (since in
166   the data model, features can exist outside of any layer). One
167   possibility would be to correlate the OGRFeatureDefn\* object of the
168   feature with the one of the layer, but that is a bit inconvenient to
169   do (and theoretically, one could imagine several layers sharing the
170   same feature definition object, although this probably never happen
171   in any in-tree driver).
172-  even if the feature returned is not NULL, the returned layer might be
173   NULL. This is just a provision for now, since that cannot currently
174   happen. This could be interesting to address schema-less datasources
175   where basically each feature could have a different schema (GeoJSON
176   for example) without really belonging to a clearly identified layer.
177-  it returns a progress percentage. When using OGRLayer API, one has to
178   count the number of features returned with the total number returned
179   by GetFeatureCount(). For the use cases we want to address knowing
180   quickly the total number of features of the dataset is not doable.
181   But knowing the position of the file pointer regarding the total size
182   of the size is easy. Hence the decision to make GetNextFeature()
183   return the progress percentage. Regarding the choice of the range
184   [0,1], this is to be consistent with the range accepted by GDAL
185   progress functions.
186-  it accepts a progress and cancellation callback. One could wonder why
187   this is needed given that GetNextFeature() is an "elementary" method
188   and that it can already returns the progress percentage. However, in
189   some circumstances, it might take a rather long time to complete a
190   GetNextFeature() call. For example in the case of the OSM driver, as
191   an optimization you can ask the driver to return features of a subset
192   of layers. For example all layers except nodes. But generally the
193   nodes are at the beginning of the file, so before you get the first
194   feature, you have typically to process 70% of the whole file. In the
195   GMLAS driver, the first GetNextFeature() call is also the opportunity
196   to do a preliminary quick scan of the file to determine the SRS of
197   geometry columns, hence having progress feedback is welcome.
198
199The progress percentage output is redundant with the progress callback
200mechanism, and the latter could be used to get the former, however it
201may be a bit convoluted. It would require doing things like:
202
203::
204
205   int MyProgress(double pct, const char* msg, void* user_data)
206   {
207       *(double*)user_data = pct;
208       return TRUE;
209   }
210
211   myDS->GetNextFeature(&poLayer, MyProgress, &pct)
212
213SWIG bindings (Python / Java / C# / Perl) changes
214-------------------------------------------------
215
216GDALDatasetGetNextFeature is mapped as gdal::Dataset::GetNextFeature()
217and GDALDatasetResetReading as gdal::Dataset::ResetReading().
218
219Regarding gdal::Dataset::GetNextFeature(), currently only Python has
220been modified to return both the feature and its belonging layer. Other
221bindings just return the feature for now (would need specialized
222typemaps)
223
224Drivers
225-------
226
227The OSM and GMLAS driver are updated to implement the new API.
228
229Existing drivers that support ODsCRandomLayerWrite are updated to
230advertise it (that is most drivers that have layer creation
231capabilities, with the exceptions of KML, JML and GeoJSON).
232
233Utilities
234---------
235
236ogr2ogr / GDALVectorTranslate() is changed internally to remove the hack
237that was used for the OSM driver to use the new API, when
238ODsCRandomLayerRead is advertized. It checks if the output driver
239advertises ODsCRandomLayerWrite, and if it does not, emit a warning, but
240still goes on proceeding with the conversion using random layer
241reading/writing.
242
243ogrinfo is extended to accept a -rl (for random layer) flag that
244instructs it to use the GDALDataset::GetNextFeature() API. It was
245considered to use it automatically when ODsCRandomLayerRead was
246advertized, but the output can be quite... random and thus not very
247practical for the user.
248
249Documentation
250-------------
251
252All new methods/functions are documented.
253
254Test Suite
255----------
256
257The specialized GetNextFeature() implementation of the OSM and GMLAS
258driver is tested in their respective tests. The default implementation
259of GDALDataset::GetNextFeature() is tested in the MEM driver tests.
260
261Compatibility Issues
262--------------------
263
264None for existing users of the C/C++ API.
265
266Since there is a default implementation, the new functions/methods can
267be safely used on drivers that don't have a specialized implementation.
268
269The addition of the new virtual methods GDALDataset::ResetReading() and
270GDALDataset::GetNextFeature() may cause issues for out-of-tree drivers
271that would already use internally such method names, but with different
272semantics, or signatures. We have encountered such issues with a few
273in-tree drivers, and fixed them.
274
275Implementation
276--------------
277
278The implementation will be done by Even Rouault, and is mostly triggered
279by the needs of the new GMLAS driver (initial development funded by the
280European Earth observation programme Copernicus).
281
282The proposed implementation is in
283`https://github.com/rouault/gdal2/tree/gmlas_randomreadwrite <https://github.com/rouault/gdal2/tree/gmlas_randomreadwrite>`__
284(commit:
285`https://github.com/rouault/gdal2/commit/8447606d68b9fac571aa4d381181ecfffed6d72c <https://github.com/rouault/gdal2/commit/8447606d68b9fac571aa4d381181ecfffed6d72c>`__)
286
287Voting history
288--------------
289
290+1 from TamasS, HowardB, JukkaR, DanielM and EvenR.
291