1.. _vector.gmlas:
2
3GMLAS - Geography Markup Language (GML) driven by application schemas
4=====================================================================
5
6.. versionadded:: 2.2
7
8.. shortname:: GMLAS
9
10.. build_dependencies:: Xerces
11
12This driver can read and write XML files of arbitrary structure,
13included those containing so called Complex Features, provided that they
14are accompanied by one or several XML schemas that describe the
15structure of their content. While this driver is generic to any XML
16schema, the main target is to be able to read and write documents
17referencing directly or indirectly to the GML namespace.
18
19The driver requires Xerces-C >= 3.1.
20
21The driver can deal with files of arbitrary size with a very modest RAM
22usage, due to its working in streaming mode.
23
24Driver capabilities
25-------------------
26
27.. supports_georeferencing::
28
29.. supports_virtualio::
30
31Opening syntax
32--------------
33
34The connection string is GMLAS:/path/to/the.gml. Note the GMLAS: prefix.
35If this prefix it is omitted, then the GML driver is likely to be used.
36
37It is also possible to only used "GMLAS:" as the connection string, but
38in that case the schemas must be explicitly provided with the XSD open
39option.
40
41Mapping of XML structure to OGR layers and fields
42-------------------------------------------------
43
44The driver scans the XML schemas referenced by the XML/GML to build the
45OGR layers and fields. It is strictly required that the schemas,
46directly or indirectly used, are fully valid. The content of the XML/GML
47file itself is marginally used, mostly to determine the SRS of geometry
48columns.
49
50XML elements declared at the top level of a schema will generally be
51exposed as OGR layers. Their attributes and sub-elements of simple XML
52types (string, integer, real, ...) will be exposed as OGR fields. For
53sub-elements of complex type, different cases can happen. If the
54cardinality of the sub-element is at most one and it is not referenced
55by other elements, then it is "flattened" into its enclosing element.
56Otherwise it will be exposed as a OGR layer, with either a link to its
57"parent" layer if the sub-element is specific to its parent element, or
58through a junction table if the sub-element is shared by several
59parents.
60
61By default the driver is robust to documents non strictly conforming to
62the schemas. Unexpected content in the document will be silently
63ignored, as well as content required by the schema and absent from the
64document.
65
66Consult the :ref:`GMLAS mapping examples <gmlas_mapping_examples>`
67page for more details.
68
69By default in the configuration, swe:DataRecord and swe:DataArray
70elements from the Sensor Web Enablement (SWE) Common Data Model
71namespace will receive a special processing, so they are mapped more
72naturally to OGR concepts. The swe:field elements will be mapped as OGR
73fields, and the swe:values element of a swe:DataArray will be parsed
74into OGR features in a dedicated layer for each swe:DataArray. Note that
75those conveniency exposure is for read-only purpose. When using the
76write side of the driver, only the content of the general mapping
77mechanisms will be used.
78
79Metadata layers
80---------------
81
82Three special layers "_ogr_fields_metadata", "_ogr_layers_metadata",
83"_ogr_layer_relationships" and "_ogr_other_metadata" add extra
84information to the basic ones you can get from the OGR data model on OGR
85layers and fields.
86
87Those layers are exposed if the EXPOSE_METADATA_LAYERS open option is
88set to YES (or if enabled in the configuration). They can also be
89individually retrieved by specifying their name in calls to
90GetLayerByName(), or on as layer names with the ogrinfo and ogr2ogr
91utility.
92
93Consult the :ref:`GMLAS metadata layers <gmlas_metadata_layers>`
94page for more details.
95
96Configuration file
97------------------
98
99A default configuration file
100`gmlasconf.xml <http://github.com/OSGeo/gdal/blob/master/gdal/data/gmlasconf.xml>`__
101file is provided in the data directory of the GDAL installation. Its
102structure and content is documented in
103`gmlasconf.xsd <http://github.com/OSGeo/gdal/blob/master/gdal/data/gmlasconf.xsd>`__
104schema.
105
106This configuration file enables the user to modify the following
107settings:
108
109-  whether remote schemas should be downloaded. Enabled by default.
110-  whether the local cache of schemas is enabled. Enabled by default.
111-  the path of the local cache. By default, $HOME/.gdal/gmlas_xsd_cache
112-  whether validation of the document against the schemas should be
113   enabled. Disabled by default.
114-  whether validation error should cause dataset opening to fail.
115   Disabled by default.
116-  whether the metadata layers should be exposed by default. Disabled by
117   default.
118-  whether a 'ogr_pkid' field should always be generated. Disabled by
119   default. Turning that on can be useful on layers that have a ID
120   attribute whose uniqueness is not guaranteed among various documents.
121   Which could cause issues when appending several documents into a
122   target database table.
123-  whether layers and fields that are not used in the XML document
124   should be removed. Disable by default.
125-  whether OGR array data types can be used. Enabled by default.
126-  whether the XML definition of the GML geometry should be reported as
127   a OGR string field. Disabled by default.
128-  whether only XML elements that derive from gml:_Feature or
129   gml:AbstractFeature should be considered in the initial pass of the
130   schema building, when at least one element in the schemas derive from
131   them. Enabled by default.
132-  several rules to configure if and how xlink:href should be resolved.
133-  a definition of XPaths of elements and attributes that must be
134   ignored, so as to lighten the number of OGR layers and fields.
135
136This file can be adapted and modified versions can be provided to the
137driver with the CONFIG_FILE open option. None of the elements of the
138configuration file are required. When they are absent, the default value
139indicated in the schema documentation is used.
140
141Configuration can also be provided through other open options. Note that
142some open options have identical names to settings present in the
143configuration file. When such open option is provided, then its value
144will override the one of the configuration file (either the default one,
145or the one provided through the CONFIG_FILE open option).
146
147Geometry support
148----------------
149
150XML schemas only indicate the geometry type but do not constraint the
151spatial reference systems (SRS), so it is theoretically possible to have
152object instances of the same class having different SRS for the same
153geometry field. This is not practical to deal with, so when geometry
154fields are detected, an initial scan of the document is done to find the
155first geometry of each geometry field that has an explicit srsName set.
156This one will be used for the whole geometry field. In case other
157geometries of the same field would have different SRS, they will be
158reprojected.
159
160By default, only the OGR geometry built from the GML geometry is exposed
161in the OGR feature. It is possible to change the IncludeGeometryXML
162setting of the configuration file to true so as to expose a OGR string
163field with the XML definition of the GML geometry.
164
165Performance issues with large multi-layer GML files.
166----------------------------------------------------
167
168Traditionnaly to read a OGR datasource, one iterate over layers with
169GDALDataset::GetLayer(), and for each layer one iterate over features
170with OGRLayer::GetNextFeature(). While this approach still works for the
171GMLAS driver, it may result in very poor performance on big documents or
172documents using complex schemas that are translated in many OGR layers.
173
174It is thus recommended to use GDALDataset::GetNextFeature() to iterate
175over features as soon as they appear in the .gml/.xml file. This may
176return features from non-sequential layers, when the features include
177nested elements.
178
179Open options
180------------
181
182-  **XSD**\ =filename(s): to specify an explicit XSD application schema
183   to use (or a list of filenames, provided they are comma separated).
184   "http://" or "https://" URLs can be used. This option is not required
185   when the XML/GML document has a schemaLocation attribute with valid
186   links in its root element.
187-  **CONFIG_FILE**\ =filename or inline XML definition: filename of a
188   XML configuration file conforming to the
189   `gmlasconf.xsd <https://github.com/OSGeo/gdal/blob/master/gdal/data/gmlasconf.xsd>`__
190   schema. It is also possible to provide the XML content directly
191   inlined provided that the very first characters are <Configuration.
192-  **EXPOSE_METADATA_LAYERS**\ =YES/NO: whether the metadata layers
193   "_ogr_fields_metadata", "_ogr_layers_metadata",
194   "_ogr_layer_relationships" and "ogr_other_metadata" should be
195   reported by default. Default is NO.
196-  **VALIDATE**\ =YES/NO: whether the document should be validated
197   against the schemas. Validation is done at dataset opening. Default
198   is NO.
199-  **FAIL_IF_VALIDATION_ERROR**\ =YES/NO: Whether a validation error
200   should cause dataset opening to fail. (only used if VALIDATE=YES)
201   Default is NO.
202-  **REFRESH_CACHE**\ =YES/NO: Whether remote schemas and documents
203   pointed by xlink:href links should be downloaded from the server even
204   if already present in the local cache. If the cache is enabled, it
205   will be refreshed with the newly downloaded resources. Default is NO.
206-  **SWAP_COORDINATES**\ =AUTO/YES/NO: Whether the order of the x/y or
207   long/lat coordinates should be swapped. In AUTO mode, the driver will
208   determine if swapping must be done from the srsName. If the srsName
209   is urn:ogc:def:crs:EPSG::XXXX and that the order of coordinates in
210   the EPSG database for this SRS is lat,long or northing,easting, then
211   the driver will swap them to the GIS friendly order (long,lat or
212   easting,northing). For other forms of SRS (such as EPSG:XXXX), GIS
213   friendly order is assumed and thus no swapping is done. When
214   SWAP_COORDINATES is set to YES, coordinates will be always swapped
215   regarding the order they appear in the GML, and when it set to NO,
216   they will be kept in the same order. The default is AUTO.
217-  **REMOVE_UNUSED_LAYERS**\ =YES/NO: Whether unused layers should be
218   removed from the reported layers. Defaults to NO
219-  **REMOVE_UNUSED_FIELDS**\ =YES/NO: Whether unused fields should be
220   removed from the reported layers. Defaults to NO
221-  **HANDLE_MULTIPLE_IMPORTS**\ =YES/NO: Whether multiple imports with
222   the same namespace but different schema are allowed. Defaults to NO
223-  **SCHEMA_FULL_CHECKING**\ =YES/NO: Whether to be pedantic with XSD
224   checking or to be forgiving e.g. if the invalid part of the schema is
225   not referenced in the main document. Defaults to NO
226
227Creation support
228----------------
229
230The GMLAS driver can write XML documents in a schema-driven way by
231converting a source dataset (contrary to most other drivers that have
232read support that implement the CreateLayer() and CreateFeature()
233interfaces). The typical workflow is to use the read side of the GMLAS
234driver to produce a SQLite/Spatialite/ PostGIS database, potentially
235modify the features imported and re-export this database as a new XML
236document.
237
238The driver will identify in the source dataset "top-level" layers, and
239in those layers will find which features are not referenced by other
240top-level layers. As the creation of the output XML is schema-driver,
241the schemas need to be available. There are two possible ways:
242
243-  either the result of the processing of the schemas was stored as the
244   4 \_ogr_\* metadata tables in the source dataset by using the
245   EXPOSE_METADATA_LAYERS=YES open option when converting the source
246   .xml),
247-  or the schemas can be specified at creation time with the INPUT_XSD
248   creation option.
249
250By default, the driver will "wrap" the features inside a WFS 2.0
251wfs:FeatureCollection / wfs:member element. It is also possible to ask
252the driver to create instead a custom wrapping .xsd file that declares
253the ogr_gmlas:FeatureCollection / ogr_gmlas:featureMember XML elements.
254
255Note that while the file resulting from the export should be XML valid,
256there is no strong guarantee that it will validate against the
257additional constraints expressed in XML schema(s). This will depend on
258the content of the features (for example if converting from a GML file
259that is not conformant to the schemas, the output of the driver will
260generally be not validating)
261
262If the input layers have geometries stored as GML content in a \_xml
263suffixed field, then the driver will compare the OGR geometry built from
264that XML content with the OGR geometry stored in the dedicated geometry
265field of the feature. If both match, then the GML content stored in the
266\_xml suffixed field will be used, such as to preserve particularities
267of the initial GML content. Otherwise GML will be exported from the OGR
268geometry.
269
270To increase export performance on very large databases, creating
271attribute indexes on the fields pointed by the 'layer_pkid_name'
272attribute in '_ogr_layers_metadata' might help.
273
274ogr2ogr behavior
275~~~~~~~~~~~~~~~~~
276
277When using ogr2ogr / GDALVectorTranslate() to convert to XML/GML from a
278source database, there are restrictions to the options that can be used.
279Only the following options of ogr2ogr are supported:
280
281-  dataset creation options (see below)
282-  layer names
283-  spatial filter through -spat option.
284-  attribute filter through -where option
285
286The effect of spatial and attribute filtering will only apply on
287top-levels layers. Sub-features selected through joins will not be
288affected by those filters.
289
290Dataset creation options
291~~~~~~~~~~~~~~~~~~~~~~~~
292
293The supported dataset creation options are:
294
295-  **INPUT_XSD**\ =filename(s): to specify an explicit XSD application
296   schema to use (or a list of filenames, provided they are comma
297   separated). "http://" or "https://" URLs can be used. This option is
298   not required when the source dataset has a \_ogr_other_metadata with
299   schemas and locations filled.
300-  **CONFIG_FILE**\ =filename or inline XML definition: filename of a
301   XML configuration file conforming to the
302   `gmlasconf.xsd <https://github.com/OSGeo/gdal/blob/master/gdal/data/gmlasconf.xsd>`__
303   schema. It is also possible to provide the XML content directly
304   inlined provided that the very first characters are <Configuration>.
305-  **LAYERS**\ =layers. Comma separated list of layers to export as
306   top-level features. The special value "{SPATIAL_LAYERS}" can also be
307   used to specify all layers that have geometries. When LAYERS is not
308   specified, the driver will identify in the source dataset "top-level"
309   layers, and in those layers will find which features are not
310   referenced by other top-level layers.
311-  **SRSNAME_FORMAT**\ =SHORT/OGC_URN/OGC_URL (Only valid for GML 3
312   output) Defaults to OGC_URL. If SHORT, then srsName will be in the
313   form AUTHORITY_NAME:AUTHORITY_CODE If OGC_URN, then srsName will be
314   in the form urn:ogc:def:crs:AUTHORITY_NAME::AUTHORITY_CODE If
315   OGC_URL, then srsName will be in the form
316   http://www.opengis.net/def/crs/AUTHORITY_NAME/0/AUTHORITY_CODE For
317   OGC_URN and OGC_URL, in the case the SRS is a SRS without explicit
318   AXIS order, but that the same SRS authority code imported with
319   ImportFromEPSGA() should be treated as lat/long or northing/easting,
320   then the function will take care of coordinate order swapping.
321-  **INDENT_SIZE**\ =[0-8]. Number of spaces for each indentation level.
322   Default is 2.
323-  **COMMENT**\ =string. Comment to add at top of generated XML file as
324   a XML comment.
325-  **LINEFORMAT**\ =CRLF/LF. End-of-line sequence to use. Defaults to
326   CRLF on Windows and LF on other platforms.
327-  **WRAPPING**\ =WFS2_FEATURECOLLECTION/GMLAS_FEATURECOLLECTION.
328   Whether to wrap features in a wfs:FeatureCollection or in a
329   ogr_gmlas:FeatureCollection. Defaults to WFS2_FEATURECOLLECTION.
330-  **TIMESTAMP**\ =XML date time. User-specified XML dateTime value for
331   timestamp to use in wfs:FeatureCollection attribute. If not
332   specified, current date time is used. Only valid for
333   WRAPPING=WFS2_FEATURECOLLECTION.
334-  **WFS20_SCHEMALOCATION**\ =Path or URL to wfs.xsd. Only valid for
335   WRAPPING=WFS2_FEATURECOLLECTION. Default is
336   "http://schemas.opengis.net/wfs/2.0/wfs.xsd"
337-  **GENERATE_XSD**\ =YES/NO. Whether to generate a .xsd file that has
338   the structure of the wrapping ogr_gmlas:FeatureCollection /
339   ogr_gmlas:featureMember elements. Only valid for
340   WRAPPING=GMLAS_FEATURECOLLECTION. Default to YES.
341-  **OUTPUT_XSD_FILENAME**\ =string. Wrapping .xsd filename. If not
342   specified, same basename as output file with .xsd extension. Note
343   that it is possible to use this option even if GENERATE_XSD=NO, so
344   that the wrapping .xsd appear in the schemaLocation attribute of the
345   .xml file. Only valid for WRAPPING=GMLAS_FEATURECOLLECTION
346
347Examples
348--------
349
350Listing content of a data file:
351
352::
353
354   ogrinfo -ro GMLAS:my.gml
355
356Converting to PostGIS:
357
358::
359
360   ogr2ogr -f PostgreSQL PG:'host=myserver dbname=warmerda' GMLAS:my.gml -nlt CONVERT_TO_LINEAR
361
362Converting to Spatialite and back to GML
363
364::
365
366   ogr2ogr -f SQLite tmp.sqlite GMLAS:in.gml -dsco SPATILIATE=YES -nlt CONVERT_TO_LINEAR -oo EXPOSE_METADATA_LAYERS=YES
367   ogr2ogr -f GMLAS out.gml tmp.sqlite
368
369See Also
370--------
371
372-  :ref:`GML <vector.gml>`: general purpose driver not requiring the
373   presence of schemas, but with limited support for complex features
374-  :ref:`NAS/ALKIS <vector.nas>`: specialized GML driver for cadastral
375   data in Germany
376
377Credits
378-------
379
380Initial implementation has been funded by the European Union's Earth
381observation programme Copernicus, as part of the tasks delegated to the
382European Environment Agency.
383
384Development of special processing of some Sensor Web Enablement (SWE)
385Common Data Model swe:DataRecord and swe:DataArray constructs has been
386funded by Bureau des Recherches Géologiques et Minières (BRGM).
387
388.. toctree::
389   :maxdepth: 1
390   :hidden:
391
392   gmlas_mapping_examples
393   gmlas_metadata_layers
394
395