1.. _rfc-67:
2
3=======================================================================================
4RFC 67 : Null values in OGR
5=======================================================================================
6
7Author: Even Rouault
8
9Contact: even.rouault at spatialys.com
10
11Status: Adopted, implemented
12
13Implementation version: 2.2
14
15Summary
16-------
17
18This RFC implement the concept of null value for the field of a feature,
19in addition to the existing unset status.
20
21Rationale
22---------
23
24Currently, OGR supports one single concept to indicate that a field
25value is missing : the concept of unset field.
26
27So assuming a JSon feature collection with 2 features would properties
28would be { "foo": "bar" } and { "foo": "bar", "other_field": null }, OGR
29currently returns that the other_field is unset in both cases.
30
31What is proposed here is that in the first case where the "other_field"
32keyword is totally absent, we use the current unset field concept. And
33for the other case, we add a new concept of null field.
34
35This distinction between both concepts apply to all GeoJSON based
36formats and protocols, so GeoJSON, ElasticSearch, MongoDB, CouchDB,
37Cloudant.
38
39This also applies for GML where the semantics of a missing element would
40be mapped to unset field and an element with a xsi:nil="true" attribute
41would be mapped to a null field.
42
43Changes
44-------
45
46OGRField
47~~~~~~~~
48
49The Set structure in the "raw field" union is modified to add a third
50marker
51
52::
53
54       struct {
55           int     nMarker1;
56           int     nMarker2;
57           int     nMarker3;
58       } Set;
59
60This is not strictly related to this work but the 3rd marker decreases
61the likelihood of a genuine value to be misinterpreted as unset / null.
62This does not increase the size of the structure that is already at
63least 12 bytes large.
64
65The current special value of OGRUnsetMarker = -21121 will be set in the
663 markers for unset field (currently set to the first 2 markers).
67
68Similarly for the new Null state, the new value OGRNullMarker = -21122
69will be set to the 3 markers.
70
71OGRFeature
72~~~~~~~~~~
73
74The methods int IsFieldNull( int nFieldIdx ) and void SetNullField ( int
75nFieldIdx ) are added.
76
77The accessors GetFieldXXXX() are modified to take into account the null
78case, in the same way as if they are called on a unset field, so
79returning 0 for numeric field types, empty string for string fields,
80FALSE for date time fields and NULL for list-based types.
81
82A convenience method OGRFeature::IsFieldSetAndNotNull() is added to ease
83the porting of existing code that used previously IsFieldSet() and
84doesn't need to distinguish between the unset and null states.
85
86C API
87-----
88
89The following functions will be added:
90
91::
92
93
94   int    CPL_DLL OGR_F_IsFieldNull( OGRFeatureH, int );
95   void   CPL_DLL OGR_F_SetFieldNull( OGRFeatureH, int );
96
97   int    CPL_DLL OGR_F_IsFieldSetAndNotNull( OGRFeatureH, int );
98
99Lower-level functions will be added to manipulate directly the raw field
100union (for use mostly in core and a few drivers), instead of directly
101testing/ setting the markers :
102
103::
104
105   int    CPL_DLL OGR_RawField_IsUnset( OGRField* );
106   int    CPL_DLL OGR_RawField_IsNull( OGRField* );
107   void   CPL_DLL OGR_RawField_SetUnset( OGRField* );
108   void   CPL_DLL OGR_RawField_SetNull( OGRField* );
109
110SWIG bindings (Python / Java / C# / Perl) changes
111-------------------------------------------------
112
113The new methods will mapped to SWIG.
114
115Drivers
116-------
117
118The following drivers will be modified to take into account the unset
119and NULL state as distinct states: GeoJSON, ElasticSearch, MongoDB,
120CouchDB, Cloudant, GML, GMLAS, WFS.
121
122Note: regarding the GMLAS driver, the previous behavior to have both
123xxxx and xxxx_nil fields when xxxx is an optional nillable XML elements
124is preserved by default (can be changed through a configuration setting
125in the gmlasconf.xml file). The rationale is that the GMLAS driver is
126mostly used to convert to SQL capable formats that cannot distinguish
127between the unset and null states, hence the need for the 2 dedicated
128fields.
129
130The CSV driver will be modified so that when EMPTY_STRING_AS_NULL open
131option is specified, the new Null state is used.
132
133All drivers that in their writing part test if the source feature has a
134field unset will also test if the field is null.
135
136For SQL based drivers (PG, PGDump, Carto, MySQL, OCI, SQLite, GPKG), on
137reading a SQL NULL value will be mapped to the new Null state. On
138writing, a unset field will not be mentioned in the corresponding
139INSERT or UPDATE statement. Whereas a Null field will be mentioned and
140set to NULL. On insertion, there will generally be no difference of
141behavior, unless a default value is defined on the field, in which case
142it will be used by the database engine to set the value in the unset
143case. On update, a unset field will not see its content updated by the
144database, where as a field set to NULL will be updated to NULL.
145
146Utilities
147---------
148
149No direct changes, but as the OGRFeature::DumpReadable() method is
150modified so that unset fields of features are no longer displayed, the
151output of ogrinfo will be affected.
152
153Documentation
154-------------
155
156All new methods/functions are documented.
157
158Test Suite
159----------
160
161Core changes and updated drivers will be tested.
162
163Compatibility Issues
164--------------------
165
166All code, in GDAL source code, and in calling external code, that
167currently uses OGRFeature::IsFieldSet() / OGR_F_IsFieldSet() should also
168be updated to used IsFieldNull() / OGR_F_IsFieldNull(), either to act
169exactly as in the unset case, or add a new appropriate behavior. A
170convenience method and function OGRFeature::IsFieldSetAndNotNull() /
171OGR_F_IsFieldSetAndNotNull() is added to ease the porting of existing
172code.
173
174Failure to do so, the existing code will see 0 for numeric field types,
175empty string for string fields, FALSE for date time fields and NULL for
176list-based types.
177
178On the write side, for the GeoJSON driver, in GDAL 2.1 or before, a
179unset field was written as field_name: null. Starting with GDAL 2.2,
180only fields explicitly set as null with OGR_F_SetFieldNull() will be
181written with a null value. Unset fields of a feature will not be present
182in the corresponding JSon feature element.
183
184MIGRATION_GUIDE.TXT is updated to discuss those compatibility issues.
185
186Related ticket
187--------------
188
189None
190
191Implementation
192--------------
193
194The implementation will be done by Even Rouault (Spatialys) and be
195sponsored by Safe Software.
196
197The proposed implementation is available in
198`https://github.com/rouault/gdal2/tree/rfc67 <https://github.com/rouault/gdal2/tree/rfc67>`__
199
200Voting history
201--------------
202
203+1 from JukkaR, DanielM, HowardB and EvenR
204