1.. _rfc-31: 2 3================================================================================ 4RFC 31: OGR 64bit Integer Fields and FIDs 5================================================================================ 6 7Authors: Frank Warmerdam, Even Rouault 8 9Contact: warmerdam@pobox.com, even dot rouault at spatialys.com 10 11Status: Adopted, implemented in GDAL 2.0 12 13Summary 14------- 15 16This RFC addresses steps to upgrade OGR to support 64bit integer fields 17and feature ids. Many feature data formats support wide integers, and 18the inability to transform these through OGR causes increasing numbers 19of problems. 20 21.. _64bit-fid-feature-index-and-feature-count: 22 2364bit FID, feature index and feature count 24------------------------------------------ 25 26Feature id's will be handled as type "GIntBig" instead of "long" 27internally. This will include the nFID field of the OGRFeature. The 28existing GetFID() and SetFID() methods on the OGRFeature use type long 29and are changed to return (respectively accept) GIntBig instead. The 30change of return type for GetFID() will require application code to 31carefully adapt to avoid potential issues (for example if GetFID() is 32used in printf-like expression). SetFID() change should be mostly 33transparent. So the changes in the OGRFeature class are: 34 35:: 36 37 GIntBig GetFID(); 38 OGRErr SetFID(GIntBig nFID ); 39 40At the C API level: 41 42:: 43 44 GIntBig CPL_DLL OGR_F_GetFID( OGRFeatureH ); 45 OGRErr CPL_DLL OGR_F_SetFID( OGRFeatureH, GIntBig ); 46 47Note that the old interfaces using "long" are already 64bit on 64bit 48operating systems (excluding Windows target compilers where long is 4932bit even on 64bit builds), so there is little harm to applications 50continuing to use these interfaces on 64bit operating systems. 51 52A layer that can discover in a relatively cheap way that it holds 53features with 64bit FID should advertise the OLMD_FID64 metadata item to 54"YES", so ogr2ogr can pass the FID64 creation option to drivers that 55support it. 56 57The OGRLayer class allows several operations based on the FID. The 58signature of these will be *altered* to accept GIntBig instead of long. 59In theory this should not require any changes to application code since 60long can be converted to GIntBig losslessly. However, all existing OGR 61drivers require changes, including private drivers. This will also 62result in a backwards incompatible change in the C ABI. While we are at 63it, we want GetFeatureCount() to be able to return more than 2 billion 64record (currently returning 32 bit integer), and thus it will return 65GIntBig. Similarly to GetFID(), this change of return type will require 66caution in application code. 67 68So at the OGRLayer C++ class level: 69 70:: 71 72 virtual OGRFeature *GetFeature( GIntBig nFID ); 73 virtual OGRErr DeleteFeature( GIntBig nFID ); 74 virtual OGRErr SetNextByIndex( GIntBig nIndex ); 75 virtual GIntBig GetFeatureCount( int bForce = TRUE ); 76 77At the C API level : 78 79:: 80 81 OGRFeatureH CPL_DLL OGR_L_GetFeature( OGRLayerH, GIntBig ); 82 OGRErr CPL_DLL OGR_L_DeleteFeature( OGRLayerH, GIntBig ); 83 OGRErr CPL_DLL OGR_L_SetNextByIndex( OGRLayerH, GIntBig ); 84 GIntBig CPL_DLL OGR_L_GetFeatureCount( OGRLayerH, int ); 85 86.. _64bit-fields: 87 8864bit Fields 89------------ 90 91New field types will be introduced for 64bit integers: 92 93:: 94 95 OFTInteger64 = 12 96 OFTInteger64List = 13 97 98The OGRField union will be extended to include: 99 100:: 101 102 GIntBig Integer64; 103 struct { 104 int nCount; 105 GIntBig *paList; 106 } Integer64List; 107 108The OGRFeature class will be extended with these new methods: 109 110:: 111 112 GIntBig GetFieldAsInteger64( int i ); 113 GIntBig GetFieldAsInteger64( const char *pszFName ); 114 const int *GetFieldAsInteger64List( const char *pszFName, 115 int *pnCount ); 116 const int *GetFieldAsInteger64List( int i, int *pnCount ); 117 118 void SetField( int i, GIntBig nValue ); 119 void SetField( int i, int nCount, const GIntBig * panValues ); 120 void SetField( const char *pszFName, GIntBig nValue ) 121 void SetField( const char *pszFName, int nCount, 122 const GIntBig * panValues ) 123 124At the C level, the following functions are added : 125 126:: 127 128 GIntBig CPL_DLL OGR_F_GetFieldAsInteger64( OGRFeatureH, int ); 129 const GIntBig CPL_DLL *OGR_F_GetFieldAsInteger64List( OGRFeatureH, int, int * ); 130 void CPL_DLL OGR_F_SetFieldInteger64( OGRFeatureH, int, GIntBig ); 131 void CPL_DLL OGR_F_SetFieldInteger64List( OGRFeatureH, int, int, const GIntBig * ); 132 133Furthermore, the new interfaces will internally support setting/getting 134integer fields, and the integer field methods will support 135getting/setting 64bit integer fields so that one case can be used for 136both field types where convenient (except GetFieldAsInteger64List() that 137can only operate on Integer64List fields) 138 139A GDAL_DMD_CREATIONFIELDDATATYPES = "DMD_CREATIONFIELDDATATYPES" driver 140metadata item is added so as drivers to be able to declare the field 141types they support on creation. For example "Integer Integer64 Real 142String Date DateTime Time IntegerList Integer64List RealList StringList 143Binary". Commonly used drivers will be updated to declare it. 144 145OGR SQL 146------- 147 148A SWQ_INTEGER64 internal type is added so as to be able to map/from 149OFTInteger64 fields. The int_value member of the swq_expr_node class is 150extended from int to GIntBig (so both SWQ_INTEGER and SWQ_INTEGER64 151refer to that member). 152 153.. _python--java--c--perl-changes: 154 155Python / Java / C# / perl Changes 156--------------------------------- 157 158The following changes have been done : 159 160- GetFID(), GetFeatureCount() have been changed to return a 64 bit 161 integer 162- SetFID(), GetFeature(), DeleteFeature(), SetNextByIndex() have been 163 changed to accept a 64 bit integer as argument 164- GetFieldAsInteger64() and SetFieldInteger64() have been added 165- In Python, GetField(), SetField() can accept/return 64 bit values 166- GetFieldAsInteger64List() and SetFieldInteger64List() have been added 167 (Python only, due to lack of relevant typemaps for other languages, 168 but could potentially be done) 169 170The change in return type of GetFID() and GetFeatureCount() might cause 171warnings at compilation time in some languages (Java YES, Python not 172relevant, Perl/C# ?). All changes to existing methods will are an ABI 173change for Java bytecode. 174 175Utilities 176--------- 177 178ogr2ogr and ogrinfo are updated to support the new 64bit interfaces. 179 180A new option is added to ogr2ogr : -mapFieldType. Can be used like this 181-mapFieldType Integer64=Integer,Date=String to mean that Integer64 field 182in the source layer should be created as Integer, and Date as String. 183ogr2ogr will also warn if attempting to create a field in an output 184driver that advertises a GDAL_DMD_CREATIONFIELDDATATYPES metadata item 185that does not mention the required field type. For Integer64 fields, if 186it is not advertized in GDAL_DMD_CREATIONFIELDDATATYPES metadata item or 187GDAL_DMD_CREATIONFIELDDATATYPES is missing, conversion to Real is done 188by default with a warning. ogr2ogr will also query the source layer to 189check if the OLMD_FID64 metadata item is declared and if the output 190driver has the FID64 layer creation option. In which case it will set 191it. 192 193Documentation 194------------- 195 196New/modified API are documented. Updates in drivers with new 197options/behaviours are documented. MIGRATION_GUIDE.TXT extended with a 198section related to this RFC. OGR API updated. 199 200File Formats 201------------ 202 203As appropriate, existing OGR drivers have been updated to support the 204new/updated interfaces. In particular an effort has been made to update 205a few database drivers to support 64bit integer columns for use as 206feature id, though they don't always create FID columns as 64bit by 207default when creating new layers as this may cause problems for other 208applications. 209 210Apart from the mechanical changes due to interface changes, the detailed 211list of changes is : 212 213- Shapefile: OFTInteger fields are created by default with a width of 9 214 characters, so to be unambiguously read as OFTInteger (and if 215 specifying integer that require 10 or 11 characters. the field is 216 dynamically extended like managed since a few versions). OFTInteger64 217 fields are created by default with a width of 18 digits, so to be 218 unambiguously read as OFTInteger64, and extended to 19 or 20 if 219 needed. Integer fields of width between 10 and 18 will be read as 220 OFTInteger64. Above they will be treated as OFTReal. In previous GDAL 221 versions, Integer fields were created with a default with of 10, and 222 thus will be now read as OFTInteger64. An open option, 223 DETECT_TYPE=YES, can be specified so as OGR does a full scan of the 224 DBF file to see if integer fields of size 10 or 11 hold 32 bit or 64 225 bit values and adjust the type accordingly (and same for integer 226 fields of size 19 or 20, in case of overflow of 64 bit integer, 227 OFTReal is chosen) 228- PG: updated to read and create OFTInteger64 as INT8 and 229 OFTInteger64List as bigint[]. 64 bit FIDs are supported. By default, 230 on layer creation, the FID field is created as a SERIAL (32 bit 231 integer) to avoid compatibility issues. The FID64=YES creation option 232 can be passed to create it as a BIGSERIAL instead. If needed, the 233 drivers will dynamically alter the schema to extend a 32 bit integer 234 FID field to 64 bit. GetFeatureCount() modified to return 64 bit 235 values. OLMD_FID64 = "YES" advertized as soon as the FID column is 64 236 bit. 237- PGDump: Integer64, Integer64List and 64 bit FID supported in 238 read/write. FID64=YES creation option available. 239- GeoJSON: Integer64, Integer64List and 64 bit FID supported in 240 read/write. The 64 bit variants are reported only if needed, 241 otherwise OFTInteger/OFTIntegerList is used. OLMD_FID64 = "YES" 242 advertized if needed 243- CSV: Integer64 supported in read/write, including the autodetection 244 feature of field types. 245- GPKG: Integer64 and 64 bit FID supported in read/write. Conforming 246 with the GeoPackage spec, "INT" or "INTEGER" columns are considered 247 64 bits, whereas "MEDIUMINT" is considered 32 bit. OLMD_FID64 = "YES" 248 advertized as soon as MAX(fid_column) is 64 bit. GetFeatureCount() 249 modified to return 64 bit values. 250- SQLite: Integer64 and 64 bit FID supported in read/write. On write, 251 Integer64 are createad as "BIGINT" and on read BIGINT or INT8 are 252 considered as Integer64. However it might be possible that databases 253 produced by other tools are created with "INTEGER" and hold 64 bit 254 values, in which case OGR will not be able to detect it. The 255 OGR_PROMOTE_TO_INTEGER64=YES configuration option can then be passed 256 to workaround that issue. OLMD_FID64 = "YES" advertized as soon as 257 MAX(fid_column) is 64 bit. GetFeatureCount() modified to return 64 258 bit values. 259- MySQL: Integer64 and 64 bit FID supported in read/write. Similarly to 260 PG, FID column is created as 32 bit by default, unless FID64=YES 261 creation option is specified. OLMD_FID64 = "YES" advertized as soon 262 as the FID column is 64 bit. GetFeatureCount() modified to return 64 263 bit values. 264- OCI: Integer64 and 64 bit FID supported in read/write. Detecting 265 Integer/Integer64 on read is tricky since there's only a NUMBER SQL 266 type with a field width. It is assumed that if the width is <= 9 or 267 if it is the unspecified value (38), then it is a Integer. On 268 creation, OGR will set a width of 20 for OFTInteger64, so a NUMBER 269 without decimal part and with a width of 20 will be considered as a 270 Integer64. 271- MEM: Integer64 and 64 bit FID supported in read/write. 272 GetFeatureCount() modified to return 64 bit values. 273- VRT: Integer64, Integer64List and 64 bit FID supported in read/write. 274 GetFeatureCount() modified to return 64 bit values. 275- JML: Integer64 supported on creation (created as "OBJECT"). On read, 276 returned as String 277- GML: Integer64, Integer64List and 64 bit FID supported in read/write. 278 GetFeatureCount() modified to return 64 bit values. 279- WFS: Integer64, Integer64List and 64 bit FID supported in read/write. 280 GetFeatureCount() modified to return 64 bit values. 281- CartoDB: Integer64 supported on creation. On read returned as Real 282 (CartoDB only advertises a 'Number' type). GetFeatureCount() modified 283 to return 64 bit values. 284- XLSX: Integer64 supported in read/write. 285- ODS: Integer64 supported in read/write. 286- MSSQLSpatial: GetFeatureCount() modified to return 64 bit values. No 287 Integer64 support implemented although could likely be done. 288- OSM: FID is now always set even when sizeof(long) != 8 289- LIBKML: KML 'uint' advertized as Integer64. 290- MITAB: Change the way FID of Seamless tables are generated to make it 291 more robust and accept arbitrary number of index tables made of an 292 arbitrary number of features, by using full 64bit width of IDs 293 294Test Suite 295---------- 296 297The test suite is extended to test the new capabilities: 298 299- core SetField/GetField methods 300- updated drivers: Shapefile, PG, GeoJSON, CSV, GPKG, SQLite, MySQL, 301 VRT, GML, XLSX, ODS, MITAB 302- OGR SQL 303- option -mapFieldType of ogr2ogr 304 305Compatibility Issues 306-------------------- 307 308Driver Code Changes 309~~~~~~~~~~~~~~~~~~~ 310 311- All drivers implementing SetNextByIndex(), DeleteFeature(), 312 GetFeature(), GetFeatureCount() will need to change their prototype 313 and do modest changes. 314 315- Drivers supporting CreateField() likely ought to be extended to 316 support OFTInteger64 as an integer/real/string field if nothing else 317 is available (and if bApproxOK is TRUE). ogr2ogr will convert 318 Integer64 to Real if Integer64 support is not advertized 319 320- Drivers reporting FIDs via Debug statements, printf's or using 321 sprintfs like statements to format them for output have been updated 322 to use CPL_FRMT_GIB to format the FID. Failure to make these changes 323 may result in code crashing. Due to the use of GCC annotation to 324 advertise printf()-like formatting syntax in CPL functions, we are 325 reasonably confident to have done the required changes in in-tree 326 drivers (except in some proprietary drivers, like SDE, IDB, INGRES, 327 ArcObjects, where this couldn't be compiled-checked). The same holds 328 true for GetFeatureCount() 329 330Application Code 331~~~~~~~~~~~~~~~~ 332 333- Application code may need to be updated to use GIntBig for FIDs and 334 feature count in order to avoid warnings about downcasting. 335 336- Application code formatting FIDs or feature count using printf like 337 facilities may also need to be changed to downcast explicitly or to 338 use CPL_FRMT_GIB. 339 340- Application code may need to add Integer64 handling in order to 341 utilize wide fields. 342 343Behavioral Changes 344~~~~~~~~~~~~~~~~~~ 345 346- Wide integer fields that were previously treated as "real" or Integer 347 by the shapefile driver will now be treated as Integer64 which will 348 likely not work with some applications, and translation to other 349 formats may fail. 350 351Related tickets 352--------------- 353 354- `#3747 OGR FID needs to be 64 355 bit <http://trac.osgeo.org/gdal/ticket/3747>`__ 356- `#3615 Shapefile : A 10-digit value doesn't necessarily fit into a 32 357 bit integer. <http://trac.osgeo.org/gdal/ticket/3615>`__ 358- `#3150 Precision Problem for Numeric on OGR/OCI 359 driver <http://trac.osgeo.org/gdal/ticket/3150>`__ 360 361Related topics out of scope of this RFC 362--------------------------------------- 363 364The possibility of having a Numeric type that corresponds to the 365matching SQL type, i.e. a decimal number with an arbitrary number of 366significant figures has been considered. In OGR, this could be 367implemented as a full type like Integer, Integer64 etc., or possibly as 368a subtype of String (see `RFC 50: OGR field 369subtypes <./rfc50_ogr_field_subtype>`__). The latter approach would be 370easier to implement and mostly useful for lossless conversion between 371database drivers (and shapefile). The former approach would require more 372work, and would ideally involve OGR SQL support, which would require 373supporting arithmetic of arbitrary length. The use cases for such a 374numeric type have been considered marginal enough to let that aside for 375now. 376 377Implementation 378-------------- 379 380Implementation will be done by Even Rouault 381(`Spatialys <http://spatialys.com>`__), and sponsored by `LINZ (Land 382Information New Zealand) <http://www.linz.govt.nz/>`__. 383 384The proposed implementation lies in the "rfc31_64bit" branch of the 385`https://github.com/rouault/gdal2/tree/rfc31_64bit <https://github.com/rouault/gdal2/tree/rfc31_64bit>`__ 386repository. 387 388The list of changes : 389`https://github.com/rouault/gdal2/compare/rfc31_64bit <https://github.com/rouault/gdal2/compare/rfc31_64bit>`__ 390 391Voting history 392-------------- 393 394+1 from JukkaR, DanielM, TamasS, HowardB and EvenR 395