1.. _rfc-67: 2 3======================================================================================= 4RFC 67 : Null values in OGR 5======================================================================================= 6 7Author: Even Rouault 8 9Contact: even.rouault at spatialys.com 10 11Status: Adopted, implemented 12 13Implementation version: 2.2 14 15Summary 16------- 17 18This RFC implement the concept of null value for the field of a feature, 19in addition to the existing unset status. 20 21Rationale 22--------- 23 24Currently, OGR supports one single concept to indicate that a field 25value is missing : the concept of unset field. 26 27So assuming a JSon feature collection with 2 features would properties 28would be { "foo": "bar" } and { "foo": "bar", "other_field": null }, OGR 29currently returns that the other_field is unset in both cases. 30 31What is proposed here is that in the first case where the "other_field" 32keyword is totally absent, we use the current unset field concept. And 33for the other case, we add a new concept of null field. 34 35This distinction between both concepts apply to all GeoJSON based 36formats and protocols, so GeoJSON, ElasticSearch, MongoDB, CouchDB, 37Cloudant. 38 39This also applies for GML where the semantics of a missing element would 40be mapped to unset field and an element with a xsi:nil="true" attribute 41would be mapped to a null field. 42 43Changes 44------- 45 46OGRField 47~~~~~~~~ 48 49The Set structure in the "raw field" union is modified to add a third 50marker 51 52:: 53 54 struct { 55 int nMarker1; 56 int nMarker2; 57 int nMarker3; 58 } Set; 59 60This is not strictly related to this work but the 3rd marker decreases 61the likelihood of a genuine value to be misinterpreted as unset / null. 62This does not increase the size of the structure that is already at 63least 12 bytes large. 64 65The current special value of OGRUnsetMarker = -21121 will be set in the 663 markers for unset field (currently set to the first 2 markers). 67 68Similarly for the new Null state, the new value OGRNullMarker = -21122 69will be set to the 3 markers. 70 71OGRFeature 72~~~~~~~~~~ 73 74The methods int IsFieldNull( int nFieldIdx ) and void SetNullField ( int 75nFieldIdx ) are added. 76 77The accessors GetFieldXXXX() are modified to take into account the null 78case, in the same way as if they are called on a unset field, so 79returning 0 for numeric field types, empty string for string fields, 80FALSE for date time fields and NULL for list-based types. 81 82A convenience method OGRFeature::IsFieldSetAndNotNull() is added to ease 83the porting of existing code that used previously IsFieldSet() and 84doesn't need to distinguish between the unset and null states. 85 86C API 87----- 88 89The following functions will be added: 90 91:: 92 93 94 int CPL_DLL OGR_F_IsFieldNull( OGRFeatureH, int ); 95 void CPL_DLL OGR_F_SetFieldNull( OGRFeatureH, int ); 96 97 int CPL_DLL OGR_F_IsFieldSetAndNotNull( OGRFeatureH, int ); 98 99Lower-level functions will be added to manipulate directly the raw field 100union (for use mostly in core and a few drivers), instead of directly 101testing/ setting the markers : 102 103:: 104 105 int CPL_DLL OGR_RawField_IsUnset( OGRField* ); 106 int CPL_DLL OGR_RawField_IsNull( OGRField* ); 107 void CPL_DLL OGR_RawField_SetUnset( OGRField* ); 108 void CPL_DLL OGR_RawField_SetNull( OGRField* ); 109 110SWIG bindings (Python / Java / C# / Perl) changes 111------------------------------------------------- 112 113The new methods will mapped to SWIG. 114 115Drivers 116------- 117 118The following drivers will be modified to take into account the unset 119and NULL state as distinct states: GeoJSON, ElasticSearch, MongoDB, 120CouchDB, Cloudant, GML, GMLAS, WFS. 121 122Note: regarding the GMLAS driver, the previous behavior to have both 123xxxx and xxxx_nil fields when xxxx is an optional nillable XML elements 124is preserved by default (can be changed through a configuration setting 125in the gmlasconf.xml file). The rationale is that the GMLAS driver is 126mostly used to convert to SQL capable formats that cannot distinguish 127between the unset and null states, hence the need for the 2 dedicated 128fields. 129 130The CSV driver will be modified so that when EMPTY_STRING_AS_NULL open 131option is specified, the new Null state is used. 132 133All drivers that in their writing part test if the source feature has a 134field unset will also test if the field is null. 135 136For SQL based drivers (PG, PGDump, Carto, MySQL, OCI, SQLite, GPKG), on 137reading a SQL NULL value will be mapped to the new Null state. On 138writing, a unset field will not be mentioned in the corresponding 139INSERT or UPDATE statement. Whereas a Null field will be mentioned and 140set to NULL. On insertion, there will generally be no difference of 141behavior, unless a default value is defined on the field, in which case 142it will be used by the database engine to set the value in the unset 143case. On update, a unset field will not see its content updated by the 144database, where as a field set to NULL will be updated to NULL. 145 146Utilities 147--------- 148 149No direct changes, but as the OGRFeature::DumpReadable() method is 150modified so that unset fields of features are no longer displayed, the 151output of ogrinfo will be affected. 152 153Documentation 154------------- 155 156All new methods/functions are documented. 157 158Test Suite 159---------- 160 161Core changes and updated drivers will be tested. 162 163Compatibility Issues 164-------------------- 165 166All code, in GDAL source code, and in calling external code, that 167currently uses OGRFeature::IsFieldSet() / OGR_F_IsFieldSet() should also 168be updated to used IsFieldNull() / OGR_F_IsFieldNull(), either to act 169exactly as in the unset case, or add a new appropriate behavior. A 170convenience method and function OGRFeature::IsFieldSetAndNotNull() / 171OGR_F_IsFieldSetAndNotNull() is added to ease the porting of existing 172code. 173 174Failure to do so, the existing code will see 0 for numeric field types, 175empty string for string fields, FALSE for date time fields and NULL for 176list-based types. 177 178On the write side, for the GeoJSON driver, in GDAL 2.1 or before, a 179unset field was written as field_name: null. Starting with GDAL 2.2, 180only fields explicitly set as null with OGR_F_SetFieldNull() will be 181written with a null value. Unset fields of a feature will not be present 182in the corresponding JSon feature element. 183 184MIGRATION_GUIDE.TXT is updated to discuss those compatibility issues. 185 186Related ticket 187-------------- 188 189None 190 191Implementation 192-------------- 193 194The implementation will be done by Even Rouault (Spatialys) and be 195sponsored by Safe Software. 196 197The proposed implementation is available in 198`https://github.com/rouault/gdal2/tree/rfc67 <https://github.com/rouault/gdal2/tree/rfc67>`__ 199 200Voting history 201-------------- 202 203+1 from JukkaR, DanielM, HowardB and EvenR 204