1.. _rfc-31:
2
3================================================================================
4RFC 31: OGR 64bit Integer Fields and FIDs
5================================================================================
6
7Authors: Frank Warmerdam, Even Rouault
8
9Contact: warmerdam@pobox.com, even dot rouault at spatialys.com
10
11Status: Adopted, implemented in GDAL 2.0
12
13Summary
14-------
15
16This RFC addresses steps to upgrade OGR to support 64bit integer fields
17and feature ids. Many feature data formats support wide integers, and
18the inability to transform these through OGR causes increasing numbers
19of problems.
20
21.. _64bit-fid-feature-index-and-feature-count:
22
2364bit FID, feature index and feature count
24------------------------------------------
25
26Feature id's will be handled as type "GIntBig" instead of "long"
27internally. This will include the nFID field of the OGRFeature. The
28existing GetFID() and SetFID() methods on the OGRFeature use type long
29and are changed to return (respectively accept) GIntBig instead. The
30change of return type for GetFID() will require application code to
31carefully adapt to avoid potential issues (for example if GetFID() is
32used in printf-like expression). SetFID() change should be mostly
33transparent. So the changes in the OGRFeature class are:
34
35::
36
37     GIntBig  GetFID();
38     OGRErr   SetFID(GIntBig nFID );
39
40At the C API level:
41
42::
43
44     GIntBig CPL_DLL OGR_F_GetFID( OGRFeatureH );
45     OGRErr CPL_DLL OGR_F_SetFID( OGRFeatureH, GIntBig );
46
47Note that the old interfaces using "long" are already 64bit on 64bit
48operating systems (excluding Windows target compilers where long is
4932bit even on 64bit builds), so there is little harm to applications
50continuing to use these interfaces on 64bit operating systems.
51
52A layer that can discover in a relatively cheap way that it holds
53features with 64bit FID should advertise the OLMD_FID64 metadata item to
54"YES", so ogr2ogr can pass the FID64 creation option to drivers that
55support it.
56
57The OGRLayer class allows several operations based on the FID. The
58signature of these will be *altered* to accept GIntBig instead of long.
59In theory this should not require any changes to application code since
60long can be converted to GIntBig losslessly. However, all existing OGR
61drivers require changes, including private drivers. This will also
62result in a backwards incompatible change in the C ABI. While we are at
63it, we want GetFeatureCount() to be able to return more than 2 billion
64record (currently returning 32 bit integer), and thus it will return
65GIntBig. Similarly to GetFID(), this change of return type will require
66caution in application code.
67
68So at the OGRLayer C++ class level:
69
70::
71
72       virtual OGRFeature *GetFeature( GIntBig nFID );
73       virtual OGRErr      DeleteFeature( GIntBig nFID );
74       virtual OGRErr      SetNextByIndex( GIntBig nIndex );
75       virtual GIntBig     GetFeatureCount( int bForce = TRUE );
76
77At the C API level :
78
79::
80
81     OGRFeatureH CPL_DLL OGR_L_GetFeature( OGRLayerH, GIntBig );
82     OGRErr CPL_DLL OGR_L_DeleteFeature( OGRLayerH, GIntBig );
83     OGRErr CPL_DLL OGR_L_SetNextByIndex( OGRLayerH, GIntBig );
84     GIntBig CPL_DLL OGR_L_GetFeatureCount( OGRLayerH, int );
85
86.. _64bit-fields:
87
8864bit Fields
89------------
90
91New field types will be introduced for 64bit integers:
92
93::
94
95      OFTInteger64 = 12
96      OFTInteger64List = 13
97
98The OGRField union will be extended to include:
99
100::
101
102       GIntBig     Integer64;
103       struct {
104           int nCount;
105           GIntBig *paList;
106       } Integer64List;
107
108The OGRFeature class will be extended with these new methods:
109
110::
111
112       GIntBig             GetFieldAsInteger64( int i );
113       GIntBig             GetFieldAsInteger64( const char *pszFName );
114       const int          *GetFieldAsInteger64List( const char *pszFName,
115                                                  int *pnCount );
116       const int          *GetFieldAsInteger64List( int i, int *pnCount );
117
118       void                SetField( int i, GIntBig nValue );
119       void                SetField( int i, int nCount, const GIntBig * panValues );
120       void                SetField( const char *pszFName, GIntBig nValue )
121       void                SetField( const char *pszFName, int nCount,
122                                     const GIntBig * panValues )
123
124At the C level, the following functions are added :
125
126::
127
128       GIntBig CPL_DLL OGR_F_GetFieldAsInteger64( OGRFeatureH, int );
129       const GIntBig CPL_DLL *OGR_F_GetFieldAsInteger64List( OGRFeatureH, int, int * );
130       void   CPL_DLL OGR_F_SetFieldInteger64( OGRFeatureH, int, GIntBig );
131       void   CPL_DLL OGR_F_SetFieldInteger64List( OGRFeatureH, int, int, const GIntBig * );
132
133Furthermore, the new interfaces will internally support setting/getting
134integer fields, and the integer field methods will support
135getting/setting 64bit integer fields so that one case can be used for
136both field types where convenient (except GetFieldAsInteger64List() that
137can only operate on Integer64List fields)
138
139A GDAL_DMD_CREATIONFIELDDATATYPES = "DMD_CREATIONFIELDDATATYPES" driver
140metadata item is added so as drivers to be able to declare the field
141types they support on creation. For example "Integer Integer64 Real
142String Date DateTime Time IntegerList Integer64List RealList StringList
143Binary". Commonly used drivers will be updated to declare it.
144
145OGR SQL
146-------
147
148A SWQ_INTEGER64 internal type is added so as to be able to map/from
149OFTInteger64 fields. The int_value member of the swq_expr_node class is
150extended from int to GIntBig (so both SWQ_INTEGER and SWQ_INTEGER64
151refer to that member).
152
153.. _python--java--c--perl-changes:
154
155Python / Java / C# / perl Changes
156---------------------------------
157
158The following changes have been done :
159
160-  GetFID(), GetFeatureCount() have been changed to return a 64 bit
161   integer
162-  SetFID(), GetFeature(), DeleteFeature(), SetNextByIndex() have been
163   changed to accept a 64 bit integer as argument
164-  GetFieldAsInteger64() and SetFieldInteger64() have been added
165-  In Python, GetField(), SetField() can accept/return 64 bit values
166-  GetFieldAsInteger64List() and SetFieldInteger64List() have been added
167   (Python only, due to lack of relevant typemaps for other languages,
168   but could potentially be done)
169
170The change in return type of GetFID() and GetFeatureCount() might cause
171warnings at compilation time in some languages (Java YES, Python not
172relevant, Perl/C# ?). All changes to existing methods will are an ABI
173change for Java bytecode.
174
175Utilities
176---------
177
178ogr2ogr and ogrinfo are updated to support the new 64bit interfaces.
179
180A new option is added to ogr2ogr : -mapFieldType. Can be used like this
181-mapFieldType Integer64=Integer,Date=String to mean that Integer64 field
182in the source layer should be created as Integer, and Date as String.
183ogr2ogr will also warn if attempting to create a field in an output
184driver that advertises a GDAL_DMD_CREATIONFIELDDATATYPES metadata item
185that does not mention the required field type. For Integer64 fields, if
186it is not advertized in GDAL_DMD_CREATIONFIELDDATATYPES metadata item or
187GDAL_DMD_CREATIONFIELDDATATYPES is missing, conversion to Real is done
188by default with a warning. ogr2ogr will also query the source layer to
189check if the OLMD_FID64 metadata item is declared and if the output
190driver has the FID64 layer creation option. In which case it will set
191it.
192
193Documentation
194-------------
195
196New/modified API are documented. Updates in drivers with new
197options/behaviours are documented. MIGRATION_GUIDE.TXT extended with a
198section related to this RFC. OGR API updated.
199
200File Formats
201------------
202
203As appropriate, existing OGR drivers have been updated to support the
204new/updated interfaces. In particular an effort has been made to update
205a few database drivers to support 64bit integer columns for use as
206feature id, though they don't always create FID columns as 64bit by
207default when creating new layers as this may cause problems for other
208applications.
209
210Apart from the mechanical changes due to interface changes, the detailed
211list of changes is :
212
213-  Shapefile: OFTInteger fields are created by default with a width of 9
214   characters, so to be unambiguously read as OFTInteger (and if
215   specifying integer that require 10 or 11 characters. the field is
216   dynamically extended like managed since a few versions). OFTInteger64
217   fields are created by default with a width of 18 digits, so to be
218   unambiguously read as OFTInteger64, and extended to 19 or 20 if
219   needed. Integer fields of width between 10 and 18 will be read as
220   OFTInteger64. Above they will be treated as OFTReal. In previous GDAL
221   versions, Integer fields were created with a default with of 10, and
222   thus will be now read as OFTInteger64. An open option,
223   DETECT_TYPE=YES, can be specified so as OGR does a full scan of the
224   DBF file to see if integer fields of size 10 or 11 hold 32 bit or 64
225   bit values and adjust the type accordingly (and same for integer
226   fields of size 19 or 20, in case of overflow of 64 bit integer,
227   OFTReal is chosen)
228-  PG: updated to read and create OFTInteger64 as INT8 and
229   OFTInteger64List as bigint[]. 64 bit FIDs are supported. By default,
230   on layer creation, the FID field is created as a SERIAL (32 bit
231   integer) to avoid compatibility issues. The FID64=YES creation option
232   can be passed to create it as a BIGSERIAL instead. If needed, the
233   drivers will dynamically alter the schema to extend a 32 bit integer
234   FID field to 64 bit. GetFeatureCount() modified to return 64 bit
235   values. OLMD_FID64 = "YES" advertized as soon as the FID column is 64
236   bit.
237-  PGDump: Integer64, Integer64List and 64 bit FID supported in
238   read/write. FID64=YES creation option available.
239-  GeoJSON: Integer64, Integer64List and 64 bit FID supported in
240   read/write. The 64 bit variants are reported only if needed,
241   otherwise OFTInteger/OFTIntegerList is used. OLMD_FID64 = "YES"
242   advertized if needed
243-  CSV: Integer64 supported in read/write, including the autodetection
244   feature of field types.
245-  GPKG: Integer64 and 64 bit FID supported in read/write. Conforming
246   with the GeoPackage spec, "INT" or "INTEGER" columns are considered
247   64 bits, whereas "MEDIUMINT" is considered 32 bit. OLMD_FID64 = "YES"
248   advertized as soon as MAX(fid_column) is 64 bit. GetFeatureCount()
249   modified to return 64 bit values.
250-  SQLite: Integer64 and 64 bit FID supported in read/write. On write,
251   Integer64 are createad as "BIGINT" and on read BIGINT or INT8 are
252   considered as Integer64. However it might be possible that databases
253   produced by other tools are created with "INTEGER" and hold 64 bit
254   values, in which case OGR will not be able to detect it. The
255   OGR_PROMOTE_TO_INTEGER64=YES configuration option can then be passed
256   to workaround that issue. OLMD_FID64 = "YES" advertized as soon as
257   MAX(fid_column) is 64 bit. GetFeatureCount() modified to return 64
258   bit values.
259-  MySQL: Integer64 and 64 bit FID supported in read/write. Similarly to
260   PG, FID column is created as 32 bit by default, unless FID64=YES
261   creation option is specified. OLMD_FID64 = "YES" advertized as soon
262   as the FID column is 64 bit. GetFeatureCount() modified to return 64
263   bit values.
264-  OCI: Integer64 and 64 bit FID supported in read/write. Detecting
265   Integer/Integer64 on read is tricky since there's only a NUMBER SQL
266   type with a field width. It is assumed that if the width is <= 9 or
267   if it is the unspecified value (38), then it is a Integer. On
268   creation, OGR will set a width of 20 for OFTInteger64, so a NUMBER
269   without decimal part and with a width of 20 will be considered as a
270   Integer64.
271-  MEM: Integer64 and 64 bit FID supported in read/write.
272   GetFeatureCount() modified to return 64 bit values.
273-  VRT: Integer64, Integer64List and 64 bit FID supported in read/write.
274   GetFeatureCount() modified to return 64 bit values.
275-  JML: Integer64 supported on creation (created as "OBJECT"). On read,
276   returned as String
277-  GML: Integer64, Integer64List and 64 bit FID supported in read/write.
278   GetFeatureCount() modified to return 64 bit values.
279-  WFS: Integer64, Integer64List and 64 bit FID supported in read/write.
280   GetFeatureCount() modified to return 64 bit values.
281-  CartoDB: Integer64 supported on creation. On read returned as Real
282   (CartoDB only advertises a 'Number' type). GetFeatureCount() modified
283   to return 64 bit values.
284-  XLSX: Integer64 supported in read/write.
285-  ODS: Integer64 supported in read/write.
286-  MSSQLSpatial: GetFeatureCount() modified to return 64 bit values. No
287   Integer64 support implemented although could likely be done.
288-  OSM: FID is now always set even when sizeof(long) != 8
289-  LIBKML: KML 'uint' advertized as Integer64.
290-  MITAB: Change the way FID of Seamless tables are generated to make it
291   more robust and accept arbitrary number of index tables made of an
292   arbitrary number of features, by using full 64bit width of IDs
293
294Test Suite
295----------
296
297The test suite is extended to test the new capabilities:
298
299-  core SetField/GetField methods
300-  updated drivers: Shapefile, PG, GeoJSON, CSV, GPKG, SQLite, MySQL,
301   VRT, GML, XLSX, ODS, MITAB
302-  OGR SQL
303-  option -mapFieldType of ogr2ogr
304
305Compatibility Issues
306--------------------
307
308Driver Code Changes
309~~~~~~~~~~~~~~~~~~~
310
311-  All drivers implementing SetNextByIndex(), DeleteFeature(),
312   GetFeature(), GetFeatureCount() will need to change their prototype
313   and do modest changes.
314
315-  Drivers supporting CreateField() likely ought to be extended to
316   support OFTInteger64 as an integer/real/string field if nothing else
317   is available (and if bApproxOK is TRUE). ogr2ogr will convert
318   Integer64 to Real if Integer64 support is not advertized
319
320-  Drivers reporting FIDs via Debug statements, printf's or using
321   sprintfs like statements to format them for output have been updated
322   to use CPL_FRMT_GIB to format the FID. Failure to make these changes
323   may result in code crashing. Due to the use of GCC annotation to
324   advertise printf()-like formatting syntax in CPL functions, we are
325   reasonably confident to have done the required changes in in-tree
326   drivers (except in some proprietary drivers, like SDE, IDB, INGRES,
327   ArcObjects, where this couldn't be compiled-checked). The same holds
328   true for GetFeatureCount()
329
330Application Code
331~~~~~~~~~~~~~~~~
332
333-  Application code may need to be updated to use GIntBig for FIDs and
334   feature count in order to avoid warnings about downcasting.
335
336-  Application code formatting FIDs or feature count using printf like
337   facilities may also need to be changed to downcast explicitly or to
338   use CPL_FRMT_GIB.
339
340-  Application code may need to add Integer64 handling in order to
341   utilize wide fields.
342
343Behavioral Changes
344~~~~~~~~~~~~~~~~~~
345
346-  Wide integer fields that were previously treated as "real" or Integer
347   by the shapefile driver will now be treated as Integer64 which will
348   likely not work with some applications, and translation to other
349   formats may fail.
350
351Related tickets
352---------------
353
354-  `#3747 OGR FID needs to be 64
355   bit <http://trac.osgeo.org/gdal/ticket/3747>`__
356-  `#3615 Shapefile : A 10-digit value doesn't necessarily fit into a 32
357   bit integer. <http://trac.osgeo.org/gdal/ticket/3615>`__
358-  `#3150 Precision Problem for Numeric on OGR/OCI
359   driver <http://trac.osgeo.org/gdal/ticket/3150>`__
360
361Related topics out of scope of this RFC
362---------------------------------------
363
364The possibility of having a Numeric type that corresponds to the
365matching SQL type, i.e. a decimal number with an arbitrary number of
366significant figures has been considered. In OGR, this could be
367implemented as a full type like Integer, Integer64 etc., or possibly as
368a subtype of String (see `RFC 50: OGR field
369subtypes <./rfc50_ogr_field_subtype>`__). The latter approach would be
370easier to implement and mostly useful for lossless conversion between
371database drivers (and shapefile). The former approach would require more
372work, and would ideally involve OGR SQL support, which would require
373supporting arithmetic of arbitrary length. The use cases for such a
374numeric type have been considered marginal enough to let that aside for
375now.
376
377Implementation
378--------------
379
380Implementation will be done by Even Rouault
381(`Spatialys <http://spatialys.com>`__), and sponsored by `LINZ (Land
382Information New Zealand) <http://www.linz.govt.nz/>`__.
383
384The proposed implementation lies in the "rfc31_64bit" branch of the
385`https://github.com/rouault/gdal2/tree/rfc31_64bit <https://github.com/rouault/gdal2/tree/rfc31_64bit>`__
386repository.
387
388The list of changes :
389`https://github.com/rouault/gdal2/compare/rfc31_64bit <https://github.com/rouault/gdal2/compare/rfc31_64bit>`__
390
391Voting history
392--------------
393
394+1 from JukkaR, DanielM, TamasS, HowardB and EvenR
395