1.. _vector.csv: 2 3Comma Separated Value (.csv) 4============================ 5 6.. shortname:: CSV 7 8.. built_in_by_default:: 9 10OGR supports reading and writing primarily non-spatial tabular data 11stored in text CSV files. CSV files are a common interchange format 12between software packages supporting tabular data and are also easily 13produced manually with a text editor or with end-user written scripts or 14programs. 15 16While in theory .csv files could have any extension, in order to 17auto-recognise the format OGR only supports CSV files ending with the 18extension ".csv". The datasource name may be either a single CSV file or 19point to a directory. For a directory to be recognised as a .csv 20datasource at least half the files in the directory need to have the 21extension .csv. One layer (table) is produced from each .csv file 22accessed. 23 24For files structured as CSV, but not ending 25with .CSV extension, the 'CSV:' prefix can be added before the filename 26to force loading by the CSV driver. 27 28The OGR CSV driver supports reading and writing. Because the CSV format 29has variable length text lines, reading is done sequentially. Reading 30features in random order will generally be very slow. OGR CSV layer 31might have a coordinate system stored in a .prj file (see GeoCSV 32specification). When reading a field named "WKT" is assumed to contain 33WKT geometry, but also is treated as a regular field. The OGR CSV driver 34returns all attribute columns as string data types if no field type 35information file (with .csvt extension) is available. 36 37Limited type recognition can be done for Integer, Real, String, Date 38(YYYY-MM-DD), Time (HH:MM:SS+nn), DateTime (YYYY-MM-DD HH:MM:SS+nn) 39columns through a descriptive file with the same name as the CSV file, 40but a .csvt extension. In a single line the types for each column have 41to be listed with double quotes and be comma separated (e.g., 42"Integer","String"). It is also possible to specify explicitly the width 43and precision of each column, e.g. 44"Integer(5)","Real(10.7)","String(15)". The driver will then use these 45types as specified for the csv columns. Subtypes 46can be passed between parenthesis, such as "Integer(Boolean)", 47"Integer(Int16)" and "Real(Float32)". Starting with GDAL 2.1, 48accordingly with the `GeoCSV 49specification <http://giswiki.hsr.ch/GeoCSV>`__, the "CoordX" or 50"Point(X)" type can be used to specify a column with longitude/easting 51values, "CoordY" or "Point(Y)" for latitude/northing values and "WKT" 52for geometries encoded in WKT 53 54Starting with GDAL 2.2, the "JSonStringList", "JSonIntegerList", 55"JSonInteger64List" and "JSonRealList" types can be used in .csvt to map 56to the corresponding OGR StringList, IntegerList, Integer64List and 57RealList types. The field values are then encoded as JSon arrays, with 58proper CSV escaping. 59 60Automatic field type guessing can also be done 61if specifying the open options described in the below "Open options" 62section. 63 64Driver capabilities 65------------------- 66 67.. supports_create:: 68 69.. supports_georeferencing:: 70 71.. supports_virtualio:: 72 73Format 74------ 75 76CSV files have one line for each feature (record) in the layer (table). 77The attribute field values are separated by commas. At least two fields 78per line must be present. Lines may be terminated by a DOS (CR/LF) or 79Unix (LF) style line terminators. Each record should have the same 80number of fields. The driver will also accept a semicolon, a tabulation 81or a space character as field separator . 82This autodetection will work only if there's no other potential 83separator on the first line of the CSV file. Otherwise it will default 84to comma as separator. 85 86Complex attribute values (such as those containing commas, quotes or 87newlines) may be placed in double quotes. Any occurrences of double 88quotes within the quoted string should be doubled up to "escape" them. 89 90By default, the driver attempts to treat the first line of the file as a 91list of field names for all the fields. However, if one or more of the 92names is all numeric it is assumed that the first line is actually data 93values and dummy field names are generated internally (field_1 through 94field_n) and the first record is treated as a feature. 95Numeric values are treated as field names if they are 96enclosed in double quotes. Starting with GDAL 2.1, this behavior can be 97modified via the HEADERS open option. 98 99All CSV files are treated as UTF-8 encoded. A 100Byte Order Mark (BOM) at the beginning of the file will be parsed 101correctly. The option WRITE_BOM can be used to create a file 102with a Byte Order Mark, which can improve compatibility with some 103software (particularly Excel). 104 105Example (employee.csv): 106 107:: 108 109 ID,Salary,Name,Comments 110 132,55000.0,John Walker,"The ""big"" cheese." 111 133,11000.0,Jane Lake,Cleaning Staff 112 113Note that the Comments value for the first data record is placed in 114double quotes because the value contains quotes, and those quotes have 115to be doubled up so we know we haven't reached the end of the quoted 116string yet. 117 118Many variations of textual input are sometimes called Comma Separated 119Value files, including files without commas, but fixed column widths, 120those using tabs as separators or those with other auxiliary data 121defining field types or structure. This driver does not attempt to 122support all such files, but instead to support simple .csv files that 123can be auto-recognised. Scripts or other mechanisms can generally be 124used to convert other variations into a form that is compatible with the 125OGR CSV driver. 126 127Reading CSV containing spatial information 128------------------------------------------ 129 130Building point geometries 131~~~~~~~~~~~~~~~~~~~~~~~~~ 132 133Consider the following CSV file (test.csv): 134 135:: 136 137 Latitude,Longitude,Name 138 48.1,0.25,"First point" 139 49.2,1.1,"Second point" 140 47.5,0.75,"Third point" 141 142Starting with GDAL 2.1, it is possible to directly specify the potential 143names of the columns that can contain X/longitude and Y/latitude with 144the X_POSSIBLE_NAMES and Y_POSSIBLE_NAMES open option. 145 146*ogrinfo -ro -al test.csv -oo X_POSSIBLE_NAMES=Lon\* -oo 147Y_POSSIBLE_NAMES=Lat\* -oo KEEP_GEOM_COLUMNS=NO* will return : 148 149:: 150 151 OGRFeature(test):1 152 Name (String) = First point 153 POINT (0.25 48.1) 154 155 OGRFeature(test):2 156 Name (String) = Second point 157 POINT (1.1 49.2) 158 159 OGRFeature(test):3 160 Name (String) = Third point 161 POINT (0.75 47.5) 162 163If CSV file does not have a header line, the dummy "field_n" names can be 164used as possible names for coordinate fields. For example plain XYZ point 165data can be opened as 166 167*ogrinfo -ro -al elevation.xyz -oo X_POSSIBLE_NAMES=field_1 -oo 168Y_POSSIBLE_NAMES=field_2 -oo Z_POSSIBLE_NAMES=field_3* 169 170Otherwise, if one or several columns contain a geometry definition 171encoded as WKT, WKB (encoded in hexadecimal) or GeoJSON (in which case 172the GeoJSON content must be formatted to follow CSV rules, that is to 173say it must be surrounded by double-quotes, and double-quotes inside the 174string must be repeated for proper escaping), the name of such column(s) 175the GEOM_POSSIBLE_NAMES open option. 176 177For older versions, it is possible to extract spatial information 178(points) from a CSV file which has columns for the X and Y coordinates, 179through the use of the :ref:`VRT <vector.vrt>` driver. 180 181You can write the associated VRT file (test.vrt): 182 183:: 184 185 <OGRVRTDataSource> 186 <OGRVRTLayer name="test"> 187 <SrcDataSource>test.csv</SrcDataSource> 188 <GeometryType>wkbPoint</GeometryType> 189 <LayerSRS>WGS84</LayerSRS> 190 <GeometryField encoding="PointFromColumns" x="Longitude" y="Latitude"/> 191 </OGRVRTLayer> 192 </OGRVRTDataSource> 193 194and *ogrinfo -ro -al test.vrt* will return : 195 196:: 197 198 OGRFeature(test):1 199 Latitude (String) = 48.1 200 Longitude (String) = 0.25 201 Name (String) = First point 202 POINT (0.25 48.1 0) 203 204 OGRFeature(test):2 205 Latitude (String) = 49.2 206 Longitude (String) = 1.1 207 Name (String) = Second point 208 POINT (1.1 49.200000000000003 0) 209 210 OGRFeature(test):3 211 Latitude (String) = 47.5 212 Longitude (String) = 0.75 213 Name (String) = Third point 214 POINT (0.75 47.5 0) 215 216Building line geometries 217~~~~~~~~~~~~~~~~~~~~~~~~ 218 219Consider the following CSV file (test.csv): 220 221:: 222 223 way_id,pt_id,x,y 224 1,1,2,49 225 1,2,3,50 226 2,1,-2,49 227 2,2,-3,50 228 229With a GDAL build with Spatialite enabled, *ogrinfo test.csv -dialect 230SQLite -sql "SELECT way_id, MakeLine(MakePoint(CAST(x AS float),CAST(y 231AS float))) FROM test GROUP BY way_id"* will return : 232 233:: 234 235 OGRFeature(SELECT):0 236 way_id (String) = 1 237 LINESTRING (2 49,3 50) 238 239 OGRFeature(SELECT):1 240 way_id (String) = 2 241 LINESTRING (-2 49,-3 50) 242 243Open options 244------------ 245 246The following open options can be specified 247(typically with the -oo name=value parameters of ogrinfo or ogr2ogr): 248 249- **MERGE_SEPARATOR**\ =YES/NO (defaults to NO). Setting it to YES will 250 enable merging consecutive separators. Mostly useful when it is the 251 space character. 252- **AUTODETECT_TYPE**\ =YES/NO (defaults to NO). Setting it to YES will 253 enable auto-detection of field data types. If while reading the 254 records (beyond the records used for autodetection), a value is found 255 to not correspond to the autodetected data type, a warning will be 256 emitted and the field will be emptied. 257- **KEEP_SOURCE_COLUMNS**\ =YES/NO (default NO) keep a copy of the 258 original columns where the guessing is active, and the guessed type 259 is different from string. The name of the original columns will be 260 suffixed with "_original". This flag should be used only when 261 AUTODETECT_TYPE=YES. 262- **AUTODETECT_WIDTH**\ =YES/NO/STRING_ONLY (defaults to NO). Setting 263 it to YES to detect the width of string and integer fields, and the 264 width and precision of real fields. Setting it to STRING_ONLY 265 restricts to string fields. Setting it to NO select default size and 266 width. If while reading the records (beyond the records used for 267 autodetection), a value is found to not correspond to the 268 autodetected width/precision, a warning will be emitted and the field 269 will be emptied. 270- **AUTODETECT_SIZE_LIMIT**\ =size to specify the number of bytes to 271 inspect to determine the data type and width/precision. The default 272 will be 100000. Setting 0 means inspecting the whole file. Note : 273 specifying a value over 1 MB (or 0 if the file is larger than 1MB) 274 will prevent reading from standard input. 275- **QUOTED_FIELDS_AS_STRING**\ =YES/NO (default NO). Only used if 276 AUTODETECT_TYPE=YES. Whether to enforce quoted fields as string 277 fields when set to YES. Otherwise, by default, the content of quoted 278 fields will be tested for real, integer, etc... data types. 279- **X_POSSIBLE_NAMES**\ =list_of_names. (GDAL >= 2.1) Comma separated 280 list of possible names for X/longitude coordinate of a point. Each 281 name might be a pattern using the star character in starting and/or 282 ending position. E.g.: prefix*, \*suffix or \*middle*. The values in 283 the column must be floating point values. X_POSSIBLE_NAMES and 284 Y_POSSIBLE_NAMES must be both specified and a matching for each must 285 be found in the columns of the CSV file. Only one geometry column per 286 layer might be built when using X_POSSIBLE_NAMES/Y_POSSIBLE_NAMES. 287- **Y_POSSIBLE_NAMES**\ =list_of_names. (GDAL >= 2.1) Comma separated 288 list of possible names for Y/latitude coordinate of a point. Each 289 name might be a pattern using the star character in starting and/or 290 ending position. E.g.: prefix*, \*suffix or \*middle*. The values in 291 the column must be floating point values. X_POSSIBLE_NAMES and 292 Y_POSSIBLE_NAMES must be both specified and a matching for each must 293 be found in the columns of the CSV file. 294- **Z_POSSIBLE_NAMES**\ =list_of_names. (GDAL >= 2.1) Comma separated 295 list of possible names for Z/elevation coordinate of a point. Each 296 name might be a pattern using the star character in starting and/or 297 ending position. E.g.: prefix*, \*suffix or \*middle*. The values in 298 the column must be floating point values. Only taken into account in 299 combination with X_POSSIBLE_NAMES and Y_POSSIBLE_NAMES. 300- **GEOM_POSSIBLE_NAMES**\ =list_of_names. (GDAL >= 2.1) Comma 301 separated list of possible names for geometry columns that contain 302 geometry definitions encoded as WKT, WKB (in hexadecimal form, 303 potentially in PostGIS 2.0 extended WKB) or GeoJSON. Each name might 304 be a pattern using the star character in starting and/or ending 305 position. E.g.: prefix*, \*suffix or \*middle\* 306- **KEEP_GEOM_COLUMNS**\ =YES/NO (default YES) Expose the detected 307 X,Y,Z or geometry columns as regular attribute fields. 308- **HEADERS**\ =YES/NO/AUTO (default AUTO) (GDAL >= 2.1) Whether the 309 first line of the file contains column names or not. When set to 310 AUTO, GDAL will assume the first line is column names if none of the 311 values are strictly numeric. 312- **EMPTY_STRING_AS_NULL**\ =YES/NO (default NO) (GDAL >= 2.1) Whether 313 to consider empty strings as null fields on reading'. 314 315Creation Issues 316--------------- 317 318The driver supports creating new databases (as a directory of .csv 319files), adding new .csv files to an existing directory or .csv files or 320appending features to an existing .csv table. Starting with GDAL 2.1, 321deleting or replacing existing features, or adding/modifying/deleting 322fields is supported, provided the modifications done are small enough to 323be stored in RAM temporarily before flushing to disk. 324 325Layer Creation options: 326 327- **LINEFORMAT**: By default when creating new .csv files they are 328 created with the line termination conventions of the local platform 329 (CR/LF on win32 or LF on all other systems). This may be overridden 330 through use of the LINEFORMAT layer creation option which may have a 331 value of **CRLF** (DOS format) or **LF** (Unix format). 332- **GEOMETRY**: By default, the geometry of 333 a feature written to a .csv file is discarded. It is possible to 334 export the geometry in its WKT representation by specifying 335 GEOMETRY=\ **AS_WKT**. It is also possible to export point geometries 336 into their X,Y,Z components (different columns in the csv file) by 337 specifying GEOMETRY=\ **AS_XYZ**, GEOMETRY=\ **AS_XY** or 338 GEOMETRY=\ **AS_YX**. The geometry column(s) will be prepended to the 339 columns with the attributes values. It is also possible to export 340 geometries in GeoJSON representation using SQLite SQL dialect query, 341 see example below. 342- **CREATE_CSVT**\ =YES/NO: Create the 343 associated .csvt file (see above paragraph) to describe the type of 344 each column of the layer and its optional width and precision. 345 Default value : NO 346- **SEPARATOR**\ =COMMA/SEMICOLON/TAB/SPACE: 347 Field separator character. Default value : COMMA 348- **WRITE_BOM**\ =YES/NO: Write a UTF-8 Byte 349 Order Mark (BOM) at the start of the file. Default value : NO 350- **GEOMETRY_NAME**\ =name (Starting with GDAL 2.1): Name of geometry 351 column. Only used if GEOMETRY=AS_WKT and CREATE_CSVT=YES. Defaults to 352 WKT 353- **STRING_QUOTING**\ =IF_NEEDED/IF_AMBIGUOUS/ALWAYS (Starting with 354 GDAL 2.3): whether to double-quote strings. IF_AMBIGUOUS means that 355 string values that look like numbers will be quoted (it also implies 356 IF_NEEDED). Defaults to IF_AMBIGUOUS (behavior in older versions was 357 IF_NEEDED) 358 359Configuration options (set with ``--config key value`` on command line 360utilities): 361 362- **OGR_WKT_PRECISION**\ =int: Number of decimals for coordinate 363 values. Default to 15. A heuristics is used to remove insignificant 364 trailing 00000x or 99999x that can appear when formatting decimal 365 numbers. 366- **OGR_WKT_ROUND**\ =YES/NO: (GDAL >= 2.3) Whether to enable the above 367 mentioned heuristics to remove insignificant trailing 00000x or 368 99999x. Default to YES. 369 370VSI Virtual File System API support 371----------------------------------- 372 373The driver supports reading and writing to files managed by VSI Virtual 374File System API, which include "regular" files, as well as files in the 375/vsizip/ (read-write) , /vsigzip/ (read-only) , /vsicurl/ (read-only) 376domains. 377 378Writing to /dev/stdout or /vsistdout/ is also supported. 379 380Examples 381~~~~~~~~ 382 383- This example shows using ogr2ogr to transform a shapefile with point 384 geometry into a .csv file with the X,Y,Z coordinates of the points as 385 first columns in the .csv file 386 387 :: 388 389 ogr2ogr -f CSV output.csv input.shp -lco GEOMETRY=AS_XYZ 390 391- This example shows using ogr2ogr to transform a shapefile into a .csv 392 file with geography field formatted using GeoJSON format. 393 394 :: 395 396 ogr2ogr -f CSV -dialect sqlite -sql "select AsGeoJSON(geometry) AS geom, * from input" output.csv input.shp 397 398- Convert a CSV into a GeoPackage. Specify the names of the coordinate columns and assign a coordinate reference system. 399 400 :: 401 402 ogr2ogr \ 403 -f GPKG output.gpkg \ 404 input.csv \ 405 -oo X_POSSIBLE_NAMES=longitude \ 406 -oo Y_POSSIBLE_NAMES=latitude \ 407 -a_srs 'EPSG:4326' 408 409 410Particular datasources 411---------------------- 412 413The CSV driver can also read files whose structure is close to CSV files 414: 415 416- Airport data files NfdcFacilities.xls, NfdcRunways.xls, 417 NfdcRemarks.xls and NfdcSchedules.xls found on that `FAA 418 website <http://www.faa.gov/airports/airport_safety/airportdata_5010/menu/index.cfm>`__ 419 420- Files from the `USGS 421 GNIS <http://geonames.usgs.gov/domestic/download_data.htm>`__ 422 (Geographic Names Information System) 423 424- The allCountries file from `GeoNames <http://www.geonames.org>`__ 425 426- `Eurostat .TSV 427 files <http://epp.eurostat.ec.europa.eu/NavTree_prod/everybody/BulkDownloadListing?file=read_me.pdf>`__ 428 429Other Notes 430----------- 431 432- `GeoCSV specification <http://giswiki.hsr.ch/GeoCSV>`__ (supported by 433 GDAL >= 2.1) 434- Initial development of the OGR CSV driver was supported by `DM 435 Solutions Group <http://www.dmsolutions.ca/>`__ and 436 `GoMOOS <http://www.gomoos.org/>`__. 437- `Carto <https://carto.com/>`__ funded field type auto-detection and 438 open options related to geometry columns. 439