1# PyShp 2 3The Python Shapefile Library (PyShp) reads and writes ESRI Shapefiles in pure Python. 4 5![pyshp logo](http://4.bp.blogspot.com/_SBi37QEsCvg/TPQuOhlHQxI/AAAAAAAAAE0/QjFlWfMx0tQ/S350/GSP_Logo.png "PyShp") 6 7[![Build Status](https://travis-ci.org/GeospatialPython/pyshp.svg?branch=master)](https://travis-ci.org/GeospatialPython/pyshp) 8 9## Contents 10 11[Overview](#overview) 12 13[Version Changes](#version-changes) 14 15[Examples](#examples) 16- [Reading Shapefiles](#reading-shapefiles) 17 - [The Reader Class](#the-reader-class) 18 - [Reading Geometry](#reading-geometry) 19 - [Reading Records](#reading-records) 20 - [Reading Geometry and Records Simultaneously](#reading-geometry-and-records-simultaneously) 21- [Writing Shapefiles](#writing-shapefiles) 22 - [The Writer Class](#the-writer-class) 23 - [Adding Records](#adding-records) 24 - [Adding Geometry](#adding-geometry) 25 - [Geometry and Record Balancing](#geometry-and-record-balancing) 26 27[How To's](#how-tos) 28- [3D and Other Geometry Types](#3d-and-other-geometry-types) 29- [Working with Large Shapefiles](#working-with-large-shapefiles) 30- [Unicode and Shapefile Encodings](#unicode-and-shapefile-encodings) 31 32[Testing](#testing) 33 34 35# Overview 36 37The Python Shapefile Library (PyShp) provides read and write support for the 38Esri Shapefile format. The Shapefile format is a popular Geographic 39Information System vector data format created by Esri. For more information 40about this format please read the well-written "ESRI Shapefile Technical 41Description - July 1998" located at [http://www.esri.com/library/whitepapers/p 42dfs/shapefile.pdf](http://www.esri.com/library/whitepapers/pdfs/shapefile.pdf) 43. The Esri document describes the shp and shx file formats. However a third 44file format called dbf is also required. This format is documented on the web 45as the "XBase File Format Description" and is a simple file-based database 46format created in the 1960's. For more on this specification see: [http://www.clicketyclick.dk/databases/xbase/format/index.html](http://www.clicketyclick.dk/databases/xbase/format/index.html) 47 48Both the Esri and XBase file-formats are very simple in design and memory 49efficient which is part of the reason the shapefile format remains popular 50despite the numerous ways to store and exchange GIS data available today. 51 52Pyshp is compatible with Python 2.7-3.x. 53 54This document provides examples for using PyShp to read and write shapefiles. However 55many more examples are continually added to the blog [http://GeospatialPython.com](http://GeospatialPython.com), 56and by searching for PyShp on [https://gis.stackexchange.com](https://gis.stackexchange.com). 57 58Currently the sample census blockgroup shapefile referenced in the examples is available on the GitHub project site at 59[https://github.com/GeospatialPython/pyshp](https://github.com/GeospatialPython/pyshp). These 60examples are straight-forward and you can also easily run them against your 61own shapefiles with minimal modification. 62 63Important: If you are new to GIS you should read about map projections. 64Please visit: [https://github.com/GeospatialPython/pyshp/wiki/Map-Projections](https://github.com/GeospatialPython/pyshp/wiki/Map-Projections) 65 66I sincerely hope this library eliminates the mundane distraction of simply 67reading and writing data, and allows you to focus on the challenging and FUN 68part of your geospatial project. 69 70 71# Version Changes 72 73## 2.1.3 74 75### Bug fixes: 76 77- Fix recent bug in geojson hole-in-polygon checking (see #205) 78- Misc fixes to allow geo interface dump to json (eg dates as strings) 79- Handle additional dbf date null values, and return faulty dates as unicode (see #187) 80- Add writer target typecheck 81- Fix bugs to allow reading shp/shx/dbf separately 82- Allow delayed shapefile loading by passing no args 83- Fix error with writing empty z/m shapefile (@mcuprjak) 84- Fix signed_area() so ignores z/m coords 85- Enforce writing the 11th field name character as null-terminator (only first 10 are used) 86- Minor README fixes 87- Added more tests 88 89## 2.1.2 90 91### Bug fixes: 92 93- Fix issue where warnings.simplefilter('always') changes global warning behavior [see #203] 94 95## 2.1.1 96 97### Improvements: 98 99- Handle shapes with no coords and represent as geojson with no coords (GeoJSON null-equivalent) 100- Expand testing to Python 3.6, 3.7, 3.8 and PyPy; drop 3.3 and 3.4 [@mwtoews] 101- Added pytest testing [@jmoujaes] 102 103### Bug fixes: 104 105- Fix incorrect geo interface handling of multipolygons with complex exterior-hole relations [see #202] 106- Enforce shapefile requirement of at least one field, to avoid writing invalid shapefiles [@Jonty] 107- Fix Reader geo interface including DeletionFlag field in feature properties [@nnseva] 108- Fix polygons not being auto closed, which was accidentally dropped 109- Fix error for null geometries in feature geojson 110- Misc docstring cleanup [@fiveham] 111 112## 2.1.0 113 114### New Features: 115 116- Added back read/write support for unicode field names. 117- Improved Record representation 118- More support for geojson on Reader, ShapeRecord, ShapeRecords, and shapes() 119 120### Bug fixes: 121 122- Fixed error when reading optional m-values 123- Fixed Record attribute autocomplete in Python 3 124- Misc readme cleanup 125 126## 2.0.0 127 128The newest version of PyShp, version 2.0 introduced some major new improvements. 129A great thanks to all who have contributed code and raised issues, and for everyone's 130patience and understanding during the transition period. 131Some of the new changes are incompatible with previous versions. 132Users of the previous version 1.x should therefore take note of the following changes 133(Note: Some contributor attributions may be missing): 134 135### Major Changes: 136 137- Full support for unicode text, with custom encoding, and exception handling. 138 - Means that the Reader returns unicode, and the Writer accepts unicode. 139- PyShp has been simplified to a pure input-output library using the Reader and Writer classes, dropping the Editor class. 140- Switched to a new streaming approach when writing files, keeping memory-usage at a minimum: 141 - Specify filepath/destination and text encoding when creating the Writer. 142 - The file is written incrementally with each call to shape/record. 143 - Adding shapes is now done using dedicated methods for each shapetype. 144- Reading shapefiles is now more convenient: 145 - Shapefiles can be opened using the context manager, and files are properly closed. 146 - Shapefiles can be iterated, have a length, and supports the geo interface. 147 - New ways of inspecing shapefile metadata by printing. [@megies] 148 - More convenient accessing of Record values as attributes. [@philippkraft] 149 - More convenient shape type name checking. [@megies] 150- Add more support and documentation for MultiPatch 3D shapes. 151- The Reader "elevation" and "measure" attributes now renamed "zbox" and "mbox", to make it clear they refer to the min/max values. 152- Better documentation of previously unclear aspects, such as field types. 153 154### Important Fixes: 155 156- More reliable/robust: 157 - Fixed shapefile bbox error for empty or point type shapefiles. [@mcuprjak] 158 - Reading and writing Z and M type shapes is now more robust, fixing many errors, and has been added to the documentation. [@ShinNoNoir] 159 - Improved parsing of field value types, fixed errors and made more flexible. 160 - Fixed bug when writing shapefiles with datefield and date values earlier than 1900 [@megies] 161- Fix some geo interface errors, including checking polygon directions. 162- Bug fixes for reading from case sensitive file names, individual files separately, and from file-like objects. [@gastoneb, @kb003308, @erickskb] 163- Enforce maximum field limit. [@mwtoews] 164 165 166# Examples 167 168Before doing anything you must import the library. 169 170 171 >>> import shapefile 172 173The examples below will use a shapefile created from the U.S. Census Bureau 174Blockgroups data set near San Francisco, CA and available in the git 175repository of the PyShp GitHub site. 176 177## Reading Shapefiles 178 179### The Reader Class 180 181To read a shapefile create a new "Reader" object and pass it the name of an 182existing shapefile. The shapefile format is actually a collection of three 183files. You specify the base filename of the shapefile or the complete filename 184of any of the shapefile component files. 185 186 187 >>> sf = shapefile.Reader("shapefiles/blockgroups") 188 189OR 190 191 192 >>> sf = shapefile.Reader("shapefiles/blockgroups.shp") 193 194OR 195 196 197 >>> sf = shapefile.Reader("shapefiles/blockgroups.dbf") 198 199OR any of the other 5+ formats which are potentially part of a shapefile. The 200library does not care about file extensions. 201 202#### Reading Shapefiles Using the Context Manager 203 204The "Reader" class can be used as a context manager, to ensure open file 205objects are properly closed when done reading the data: 206 207 >>> with shapefile.Reader("shapefiles/blockgroups.shp") as shp: 208 ... print(shp) 209 shapefile Reader 210 663 shapes (type 'POLYGON') 211 663 records (44 fields) 212 213#### Reading Shapefiles from File-Like Objects 214 215You can also load shapefiles from any Python file-like object using keyword 216arguments to specify any of the three files. This feature is very powerful and 217allows you to load shapefiles from a url, a zip file, a serialized object, 218or in some cases a database. 219 220 221 >>> myshp = open("shapefiles/blockgroups.shp", "rb") 222 >>> mydbf = open("shapefiles/blockgroups.dbf", "rb") 223 >>> r = shapefile.Reader(shp=myshp, dbf=mydbf) 224 225Notice in the examples above the shx file is never used. The shx file is a 226very simple fixed-record index for the variable-length records in the shp 227file. This file is optional for reading. If it's available PyShp will use the 228shx file to access shape records a little faster but will do just fine without 229it. 230 231#### Reading Shapefile Meta-Data 232 233Shapefiles have a number of attributes for inspecting the file contents. 234A shapefile is a container for a specific type of geometry, and this can be checked using the 235shapeType attribute. 236 237 238 >>> sf.shapeType 239 5 240 241Shape types are represented by numbers between 0 and 31 as defined by the 242shapefile specification and listed below. It is important to note that the numbering system has 243several reserved numbers that have not been used yet, therefore the numbers of 244the existing shape types are not sequential: 245 246- NULL = 0 247- POINT = 1 248- POLYLINE = 3 249- POLYGON = 5 250- MULTIPOINT = 8 251- POINTZ = 11 252- POLYLINEZ = 13 253- POLYGONZ = 15 254- MULTIPOINTZ = 18 255- POINTM = 21 256- POLYLINEM = 23 257- POLYGONM = 25 258- MULTIPOINTM = 28 259- MULTIPATCH = 31 260 261Based on this we can see that our blockgroups shapefile contains 262Polygon type shapes. The shape types are also defined as constants in 263the shapefile module, so that we can compare types more intuitively: 264 265 266 >>> sf.shapeType == shapefile.POLYGON 267 True 268 269For convenience, you can also get the name of the shape type as a string: 270 271 272 >>> sf.shapeTypeName == 'POLYGON' 273 True 274 275Other pieces of meta-data that we can check include the number of features 276and the bounding box area the shapefile covers: 277 278 279 >>> len(sf) 280 663 281 >>> sf.bbox 282 [-122.515048, 37.652916, -122.327622, 37.863433] 283 284Finally, if you would prefer to work with the entire shapefile in a different 285format, you can convert all of it to a GeoJSON dictionary, although you may lose 286some information in the process, such as z- and m-values: 287 288 289 >>> sf.__geo_interface__['type'] 290 'FeatureCollection' 291 292### Reading Geometry 293 294A shapefile's geometry is the collection of points or shapes made from 295vertices and implied arcs representing physical locations. All types of 296shapefiles just store points. The metadata about the points determine how they 297are handled by software. 298 299You can get a list of the shapefile's geometry by calling the shapes() 300method. 301 302 303 >>> shapes = sf.shapes() 304 305The shapes method returns a list of Shape objects describing the geometry of 306each shape record. 307 308 309 >>> len(shapes) 310 663 311 312To read a single shape by calling its index use the shape() method. The index 313is the shape's count from 0. So to read the 8th shape record you would use its 314index which is 7. 315 316 317 >>> s = sf.shape(7) 318 319 >>> # Read the bbox of the 8th shape to verify 320 >>> # Round coordinates to 3 decimal places 321 >>> ['%.3f' % coord for coord in s.bbox] 322 ['-122.450', '37.801', '-122.442', '37.808'] 323 324Each shape record (except Points) contains the following attributes. Records of 325shapeType Point do not have a bounding box 'bbox'. 326 327 328 >>> for name in dir(shapes[3]): 329 ... if not name.startswith('_'): 330 ... name 331 'bbox' 332 'parts' 333 'points' 334 'shapeType' 335 'shapeTypeName' 336 337 * shapeType: an integer representing the type of shape as defined by the 338 shapefile specification. 339 340 341 >>> shapes[3].shapeType 342 5 343 344 * shapeTypeName: a string representation of the type of shape as defined by shapeType. Read-only. 345 346 347 >>> shapes[3].shapeTypeName 348 'POLYGON' 349 350 * bbox: If the shape type contains multiple points this tuple describes the 351 lower left (x,y) coordinate and upper right corner coordinate creating a 352 complete box around the points. If the shapeType is a 353 Null (shapeType == 0) then an AttributeError is raised. 354 355 356 >>> # Get the bounding box of the 4th shape. 357 >>> # Round coordinates to 3 decimal places 358 >>> bbox = shapes[3].bbox 359 >>> ['%.3f' % coord for coord in bbox] 360 ['-122.486', '37.787', '-122.446', '37.811'] 361 362 * parts: Parts simply group collections of points into shapes. If the shape 363 record has multiple parts this attribute contains the index of the first 364 point of each part. If there is only one part then a list containing 0 is 365 returned. 366 367 368 >>> shapes[3].parts 369 [0] 370 371 * points: The points attribute contains a list of tuples containing an 372 (x,y) coordinate for each point in the shape. 373 374 375 >>> len(shapes[3].points) 376 173 377 >>> # Get the 8th point of the fourth shape 378 >>> # Truncate coordinates to 3 decimal places 379 >>> shape = shapes[3].points[7] 380 >>> ['%.3f' % coord for coord in shape] 381 ['-122.471', '37.787'] 382 383In most cases, however, if you need to do more than just type or bounds checking, you may want 384to convert the geometry to the more human-readable [GeoJSON format](http://geojson.org), 385where lines and polygons are grouped for you: 386 387 388 >>> s = sf.shape(0) 389 >>> geoj = s.__geo_interface__ 390 >>> geoj["type"] 391 'MultiPolygon' 392 393The results from the shapes() method similiarly supports converting to GeoJSON: 394 395 396 >>> shapes.__geo_interface__['type'] 397 'GeometryCollection' 398 399 400### Reading Records 401 402A record in a shapefile contains the attributes for each shape in the 403collection of geometries. Records are stored in the dbf file. The link between 404geometry and attributes is the foundation of all geographic information systems. 405This critical link is implied by the order of shapes and corresponding records 406in the shp geometry file and the dbf attribute file. 407 408The field names of a shapefile are available as soon as you read a shapefile. 409You can call the "fields" attribute of the shapefile as a Python list. Each 410field is a Python list with the following information: 411 412 * Field name: the name describing the data at this column index. 413 * Field type: the type of data at this column index. Types can be: 414 * "C": Characters, text. 415 * "N": Numbers, with or without decimals. 416 * "F": Floats (same as "N"). 417 * "L": Logical, for boolean True/False values. 418 * "D": Dates. 419 * "M": Memo, has no meaning within a GIS and is part of the xbase spec instead. 420 * Field length: the length of the data found at this column index. Older GIS 421 software may truncate this length to 8 or 11 characters for "Character" 422 fields. 423 * Decimal length: the number of decimal places found in "Number" fields. 424 425To see the fields for the Reader object above (sf) call the "fields" 426attribute: 427 428 429 >>> fields = sf.fields 430 431 >>> assert fields == [("DeletionFlag", "C", 1, 0), ["AREA", "N", 18, 5], 432 ... ["BKG_KEY", "C", 12, 0], ["POP1990", "N", 9, 0], ["POP90_SQMI", "N", 10, 1], 433 ... ["HOUSEHOLDS", "N", 9, 0], 434 ... ["MALES", "N", 9, 0], ["FEMALES", "N", 9, 0], ["WHITE", "N", 9, 0], 435 ... ["BLACK", "N", 8, 0], ["AMERI_ES", "N", 7, 0], ["ASIAN_PI", "N", 8, 0], 436 ... ["OTHER", "N", 8, 0], ["HISPANIC", "N", 8, 0], ["AGE_UNDER5", "N", 8, 0], 437 ... ["AGE_5_17", "N", 8, 0], ["AGE_18_29", "N", 8, 0], ["AGE_30_49", "N", 8, 0], 438 ... ["AGE_50_64", "N", 8, 0], ["AGE_65_UP", "N", 8, 0], 439 ... ["NEVERMARRY", "N", 8, 0], ["MARRIED", "N", 9, 0], ["SEPARATED", "N", 7, 0], 440 ... ["WIDOWED", "N", 8, 0], ["DIVORCED", "N", 8, 0], ["HSEHLD_1_M", "N", 8, 0], 441 ... ["HSEHLD_1_F", "N", 8, 0], ["MARHH_CHD", "N", 8, 0], 442 ... ["MARHH_NO_C", "N", 8, 0], ["MHH_CHILD", "N", 7, 0], 443 ... ["FHH_CHILD", "N", 7, 0], ["HSE_UNITS", "N", 9, 0], ["VACANT", "N", 7, 0], 444 ... ["OWNER_OCC", "N", 8, 0], ["RENTER_OCC", "N", 8, 0], 445 ... ["MEDIAN_VAL", "N", 7, 0], ["MEDIANRENT", "N", 4, 0], 446 ... ["UNITS_1DET", "N", 8, 0], ["UNITS_1ATT", "N", 7, 0], ["UNITS2", "N", 7, 0], 447 ... ["UNITS3_9", "N", 8, 0], ["UNITS10_49", "N", 8, 0], 448 ... ["UNITS50_UP", "N", 8, 0], ["MOBILEHOME", "N", 7, 0]] 449 450You can get a list of the shapefile's records by calling the records() method: 451 452 453 >>> records = sf.records() 454 455 >>> len(records) 456 663 457 458To read a single record call the record() method with the record's index: 459 460 461 >>> rec = sf.record(3) 462 463Each record is a list-like Record object containing the values corresponding to each field in 464the field list. A record's values can be accessed by positional indexing or slicing. 465For example in the blockgroups shapefile the 2nd and 3rd fields are the blockgroup id 466and the 1990 population count of that San Francisco blockgroup: 467 468 469 >>> rec[1:3] 470 ['060750601001', 4715] 471 472For simpler access, the fields of a record can also accessed via the name of the field, 473either as a key or as an attribute name. The blockgroup id (BKG_KEY) of the blockgroups shapefile 474can also be retrieved as: 475 476 477 >>> rec['BKG_KEY'] 478 '060750601001' 479 480 >>> rec.BKG_KEY 481 '060750601001' 482 483The record values can be easily integrated with other programs by converting it to a field-value dictionary: 484 485 486 >>> dct = rec.as_dict() 487 >>> sorted(dct.items()) 488 [('AGE_18_29', 1467), ('AGE_30_49', 1681), ('AGE_50_64', 92), ('AGE_5_17', 848), ('AGE_65_UP', 30), ('AGE_UNDER5', 597), ('AMERI_ES', 6), ('AREA', 2.34385), ('ASIAN_PI', 452), ('BKG_KEY', '060750601001'), ('BLACK', 1007), ('DIVORCED', 149), ('FEMALES', 2095), ('FHH_CHILD', 16), ('HISPANIC', 416), ('HOUSEHOLDS', 1195), ('HSEHLD_1_F', 40), ('HSEHLD_1_M', 22), ('HSE_UNITS', 1258), ('MALES', 2620), ('MARHH_CHD', 79), ('MARHH_NO_C', 958), ('MARRIED', 2021), ('MEDIANRENT', 739), ('MEDIAN_VAL', 337500), ('MHH_CHILD', 0), ('MOBILEHOME', 0), ('NEVERMARRY', 703), ('OTHER', 288), ('OWNER_OCC', 66), ('POP1990', 4715), ('POP90_SQMI', 2011.6), ('RENTER_OCC', 3733), ('SEPARATED', 49), ('UNITS10_49', 49), ('UNITS2', 160), ('UNITS3_9', 672), ('UNITS50_UP', 0), ('UNITS_1ATT', 302), ('UNITS_1DET', 43), ('VACANT', 93), ('WHITE', 2962), ('WIDOWED', 37)] 489 490If at a later point you need to check the record's index position in the original 491shapefile, you can do this through the "oid" attribute: 492 493 494 >>> rec.oid 495 3 496 497### Reading Geometry and Records Simultaneously 498 499You may want to examine both the geometry and the attributes for a record at 500the same time. The shapeRecord() and shapeRecords() method let you do just 501that. 502 503Calling the shapeRecords() method will return the geometry and attributes for 504all shapes as a list of ShapeRecord objects. Each ShapeRecord instance has a 505"shape" and "record" attribute. The shape attribute is a Shape object as 506discussed in the first section "Reading Geometry". The record attribute is a 507list-like object containing field values as demonstrated in the "Reading Records" section. 508 509 510 >>> shapeRecs = sf.shapeRecords() 511 512Let's read the blockgroup key and the population for the 4th blockgroup: 513 514 515 >>> shapeRecs[3].record[1:3] 516 ['060750601001', 4715] 517 518The results from the shapeRecords() method is a list-like object that can be easily converted 519to GeoJSON through the _\_geo_interface\_\_: 520 521 522 >>> shapeRecs.__geo_interface__['type'] 523 'FeatureCollection' 524 525The shapeRecord() method reads a single shape/record pair at the specified index. 526To get the 4th shape record from the blockgroups shapefile use the third index: 527 528 529 >>> shapeRec = sf.shapeRecord(3) 530 531Each individual shape record also supports the _\_geo_interface\_\_ to convert it to a GeoJSON: 532 533 534 >>> shapeRec.__geo_interface__['type'] 535 'Feature' 536 537The blockgroup key and population count: 538 539 540 >>> shapeRec.record[1:3] 541 ['060750601001', 4715] 542 543 544## Writing Shapefiles 545 546### The Writer Class 547 548PyShp tries to be as flexible as possible when writing shapefiles while 549maintaining some degree of automatic validation to make sure you don't 550accidentally write an invalid file. 551 552PyShp can write just one of the component files such as the shp or dbf file 553without writing the others. So in addition to being a complete shapefile 554library, it can also be used as a basic dbf (xbase) library. Dbf files are a 555common database format which are often useful as a standalone simple database 556format. And even shp files occasionally have uses as a standalone format. Some 557web-based GIS systems use an user-uploaded shp file to specify an area of 558interest. Many precision agriculture chemical field sprayers also use the shp 559format as a control file for the sprayer system (usually in combination with 560custom database file formats). 561 562To create a shapefile you begin by initiating a new Writer instance, passing it 563the file path and name to save to: 564 565 566 >>> w = shapefile.Writer('shapefiles/test/testfile') 567 >>> w.field('field1', 'C') 568 569File extensions are optional when reading or writing shapefiles. If you specify 570them PyShp ignores them anyway. When you save files you can specify a base 571file name that is used for all three file types. Or you can specify a name for 572one or more file types: 573 574 575 >>> w = shapefile.Writer(dbf='shapefiles/test/onlydbf.dbf') 576 >>> w.field('field1', 'C') 577 578In that case, any file types not assigned will not 579save and only file types with file names will be saved. 580 581#### Writing Shapefiles Using the Context Manager 582 583The "Writer" class automatically closes the open files and writes the final headers once it is garbage collected. 584In case of a crash and to make the code more readable, it is nevertheless recommended 585you do this manually by calling the "close()" method: 586 587 588 >>> w.close() 589 590Alternatively, you can also use the "Writer" class as a context manager, to ensure open file 591objects are properly closed and final headers written once you exit the with-clause: 592 593 594 >>> with shapefile.Writer("shapefiles/test/contextwriter") as w: 595 ... w.field('field1', 'C') 596 ... pass 597 598#### Writing Shapefiles to File-Like Objects 599 600Just as you can read shapefiles from python file-like objects you can also 601write to them: 602 603 604 >>> try: 605 ... from StringIO import StringIO 606 ... except ImportError: 607 ... from io import BytesIO as StringIO 608 >>> shp = StringIO() 609 >>> shx = StringIO() 610 >>> dbf = StringIO() 611 >>> w = shapefile.Writer(shp=shp, shx=shx, dbf=dbf) 612 >>> w.field('field1', 'C') 613 >>> w.record() 614 >>> w.null() 615 >>> w.close() 616 >>> # To read back the files you could call the "StringIO.getvalue()" method later. 617 618#### Setting the Shape Type 619 620The shape type defines the type of geometry contained in the shapefile. All of 621the shapes must match the shape type setting. 622 623There are three ways to set the shape type: 624 * Set it when creating the class instance. 625 * Set it by assigning a value to an existing class instance. 626 * Set it automatically to the type of the first non-null shape by saving the shapefile. 627 628To manually set the shape type for a Writer object when creating the Writer: 629 630 631 >>> w = shapefile.Writer('shapefiles/test/shapetype', shapeType=3) 632 >>> w.field('field1', 'C') 633 634 >>> w.shapeType 635 3 636 637OR you can set it after the Writer is created: 638 639 640 >>> w.shapeType = 1 641 642 >>> w.shapeType 643 1 644 645 646### Adding Records 647 648Before you can add records you must first create the fields that define what types of 649values will go into each attribute. 650 651There are several different field types, all of which support storing None values as NULL. 652 653Text fields are created using the 'C' type, and the third 'size' argument can be customized to the expected 654length of text values to save space: 655 656 657 >>> w = shapefile.Writer('shapefiles/test/dtype') 658 >>> w.field('TEXT', 'C') 659 >>> w.field('SHORT_TEXT', 'C', size=5) 660 >>> w.field('LONG_TEXT', 'C', size=250) 661 >>> w.null() 662 >>> w.record('Hello', 'World', 'World'*50) 663 >>> w.close() 664 665 >>> r = shapefile.Reader('shapefiles/test/dtype') 666 >>> assert r.record(0) == ['Hello', 'World', 'World'*50] 667 668Date fields are created using the 'D' type, and can be created using either 669date objects, lists, or a YYYYMMDD formatted string. 670Field length or decimal have no impact on this type: 671 672 673 >>> from datetime import date 674 >>> w = shapefile.Writer('shapefiles/test/dtype') 675 >>> w.field('DATE', 'D') 676 >>> w.null() 677 >>> w.null() 678 >>> w.null() 679 >>> w.null() 680 >>> w.record(date(1898,1,30)) 681 >>> w.record([1998,1,30]) 682 >>> w.record('19980130') 683 >>> w.record(None) 684 >>> w.close() 685 686 >>> r = shapefile.Reader('shapefiles/test/dtype') 687 >>> assert r.record(0) == [date(1898,1,30)] 688 >>> assert r.record(1) == [date(1998,1,30)] 689 >>> assert r.record(2) == [date(1998,1,30)] 690 >>> assert r.record(3) == [None] 691 692Numeric fields are created using the 'N' type (or the 'F' type, which is exactly the same). 693By default the fourth decimal argument is set to zero, essentially creating an integer field. 694To store floats you must set the decimal argument to the precision of your choice. 695To store very large numbers you must increase the field length size to the total number of digits 696(including comma and minus). 697 698 699 >>> w = shapefile.Writer('shapefiles/test/dtype') 700 >>> w.field('INT', 'N') 701 >>> w.field('LOWPREC', 'N', decimal=2) 702 >>> w.field('MEDPREC', 'N', decimal=10) 703 >>> w.field('HIGHPREC', 'N', decimal=30) 704 >>> w.field('FTYPE', 'F', decimal=10) 705 >>> w.field('LARGENR', 'N', 101) 706 >>> nr = 1.3217328 707 >>> w.null() 708 >>> w.null() 709 >>> w.record(INT=nr, LOWPREC=nr, MEDPREC=nr, HIGHPREC=-3.2302e-25, FTYPE=nr, LARGENR=int(nr)*10**100) 710 >>> w.record(None, None, None, None, None, None) 711 >>> w.close() 712 713 >>> r = shapefile.Reader('shapefiles/test/dtype') 714 >>> assert r.record(0) == [1, 1.32, 1.3217328, -3.2302e-25, 1.3217328, 10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000] 715 >>> assert r.record(1) == [None, None, None, None, None, None] 716 717 718Finally, we can create boolean fields by setting the type to 'L'. 719This field can take True or False values, or 1 (True) or 0 (False). 720None is interpreted as missing. 721 722 723 >>> w = shapefile.Writer('shapefiles/test/dtype') 724 >>> w.field('BOOLEAN', 'L') 725 >>> w.null() 726 >>> w.null() 727 >>> w.null() 728 >>> w.null() 729 >>> w.null() 730 >>> w.null() 731 >>> w.record(True) 732 >>> w.record(1) 733 >>> w.record(False) 734 >>> w.record(0) 735 >>> w.record(None) 736 >>> w.record("Nonesense") 737 >>> w.close() 738 739 >>> r = shapefile.Reader('shapefiles/test/dtype') 740 >>> r.record(0) 741 Record #0: [True] 742 >>> r.record(1) 743 Record #1: [True] 744 >>> r.record(2) 745 Record #2: [False] 746 >>> r.record(3) 747 Record #3: [False] 748 >>> r.record(4) 749 Record #4: [None] 750 >>> r.record(5) 751 Record #5: [None] 752 753You can also add attributes using keyword arguments where the keys are field names. 754 755 756 >>> w = shapefile.Writer('shapefiles/test/dtype') 757 >>> w.field('FIRST_FLD','C','40') 758 >>> w.field('SECOND_FLD','C','40') 759 >>> w.null() 760 >>> w.null() 761 >>> w.record('First', 'Line') 762 >>> w.record(FIRST_FLD='First', SECOND_FLD='Line') 763 >>> w.close() 764 765### Adding Geometry 766 767Geometry is added using one of several convenience methods. The "null" method is used 768for null shapes, "point" is used for point shapes, "multipoint" is used for multipoint shapes, "line" for lines, 769"poly" for polygons. 770 771**Adding a Null shape** 772 773A shapefile may contain some records for which geometry is not available, and may be set using the "null" method. 774Because Null shape types (shape type 0) have no geometry the "null" method is called without any arguments. 775 776 777 >>> w = shapefile.Writer('shapefiles/test/null') 778 >>> w.field('name', 'C') 779 780 >>> w.null() 781 >>> w.record('nullgeom') 782 783 >>> w.close() 784 785**Adding a Point shape** 786 787Point shapes are added using the "point" method. A point is specified by an x and 788y value. 789 790 791 >>> w = shapefile.Writer('shapefiles/test/point') 792 >>> w.field('name', 'C') 793 794 >>> w.point(122, 37) 795 >>> w.record('point1') 796 797 >>> w.close() 798 799**Adding a MultiPoint shape** 800 801If your point data allows for the possibility of multiple points per feature, use "multipoint" instead. 802These are specified as a list of xy point coordinates. 803 804 805 >>> w = shapefile.Writer('shapefiles/test/multipoint') 806 >>> w.field('name', 'C') 807 808 >>> w.multipoint([[122,37], [124,32]]) 809 >>> w.record('multipoint1') 810 811 >>> w.close() 812 813**Adding a LineString shape** 814 815For LineString shapefiles, each shape is given as a list of one or more linear features. 816Each of the linear features must have at least two points. 817 818 819 >>> w = shapefile.Writer('shapefiles/test/line') 820 >>> w.field('name', 'C') 821 822 >>> w.line([ 823 ... [[1,5],[5,5],[5,1],[3,3],[1,1]], # line 1 824 ... [[3,2],[2,6]] # line 2 825 ... ]) 826 827 >>> w.record('linestring1') 828 829 >>> w.close() 830 831**Adding a Polygon shape** 832 833Similarly to LineString, Polygon shapes consist of multiple polygons, and must be given as a list of polygons. 834The main difference is that polygons must have at least 4 points and the last point must be the same as the first. 835It's also okay if you forget to repeat the first point at the end; PyShp automatically checks and closes the polygons 836if you don't. 837 838It's important to note that for Polygon shapefiles, your polygon coordinates must be ordered in a clockwise direction. 839If any of the polygons have holes, then the hole polygon coordinates must be ordered in a counterclockwise direction. 840The direction of your polygons determines how shapefile readers will distinguish between polygon outlines and holes. 841 842 843 >>> w = shapefile.Writer('shapefiles/test/polygon') 844 >>> w.field('name', 'C') 845 846 >>> w.poly([ 847 ... [[113,24], [112,32], [117,36], [122,37], [118,20]], # poly 1 848 ... [[116,29],[116,26],[119,29],[119,32]], # hole 1 849 ... [[15,2], [17,6], [22,7]] # poly 2 850 ... ]) 851 >>> w.record('polygon1') 852 853 >>> w.close() 854 855**Adding from an existing Shape object** 856 857Finally, geometry can be added by passing an existing "Shape" object to the "shape" method. 858You can also pass it any GeoJSON dictionary or _\_geo_interface\_\_ compatible object. 859This can be particularly useful for copying from one file to another: 860 861 862 >>> r = shapefile.Reader('shapefiles/test/polygon') 863 864 >>> w = shapefile.Writer('shapefiles/test/copy') 865 >>> w.fields = r.fields[1:] # skip first deletion field 866 867 >>> # adding existing Shape objects 868 >>> for shaperec in r.iterShapeRecords(): 869 ... w.record(*shaperec.record) 870 ... w.shape(shaperec.shape) 871 872 >>> # or GeoJSON dicts 873 >>> for shaperec in r.iterShapeRecords(): 874 ... w.record(*shaperec.record) 875 ... w.shape(shaperec.shape.__geo_interface__) 876 877 >>> w.close() 878 879 880### Geometry and Record Balancing 881 882Because every shape must have a corresponding record it is critical that the 883number of records equals the number of shapes to create a valid shapefile. You 884must take care to add records and shapes in the same order so that the record 885data lines up with the geometry data. For example: 886 887 888 >>> w = shapefile.Writer('shapefiles/test/balancing', shapeType=shapefile.POINT) 889 >>> w.field("field1", "C") 890 >>> w.field("field2", "C") 891 892 >>> w.record("row", "one") 893 >>> w.point(1, 1) 894 895 >>> w.record("row", "two") 896 >>> w.point(2, 2) 897 898To help prevent accidental misalignment PyShp has an "auto balance" feature to 899make sure when you add either a shape or a record the two sides of the 900equation line up. This way if you forget to update an entry the 901shapefile will still be valid and handled correctly by most shapefile 902software. Autobalancing is NOT turned on by default. To activate it set 903the attribute autoBalance to 1 or True: 904 905 906 >>> w.autoBalance = 1 907 >>> w.record("row", "three") 908 >>> w.record("row", "four") 909 >>> w.point(4, 4) 910 911 >>> w.recNum == w.shpNum 912 True 913 914You also have the option of manually calling the balance() method at any time 915to ensure the other side is up to date. When balancing is used 916null shapes are created on the geometry side or records 917with a value of "NULL" for each field is created on the attribute side. 918This gives you flexibility in how you build the shapefile. 919You can create all of the shapes and then create all of the records or vice versa. 920 921 922 >>> w.autoBalance = 0 923 >>> w.record("row", "five") 924 >>> w.record("row", "six") 925 >>> w.record("row", "seven") 926 >>> w.point(5, 5) 927 >>> w.point(6, 6) 928 >>> w.balance() 929 930 >>> w.recNum == w.shpNum 931 True 932 933If you do not use the autoBalance() or balance() method and forget to manually 934balance the geometry and attributes the shapefile will be viewed as corrupt by 935most shapefile software. 936 937 938 939# How To's 940 941## 3D and Other Geometry Types 942 943Most shapefiles store conventional 2D points, lines, or polygons. But the shapefile format is also capable 944of storing various other types of geometries as well, including complex 3D surfaces and objects. 945 946**Shapefiles with measurement (M) values** 947 948Measured shape types are shapes that include a measurement value at each vertex, for instance 949speed measurements from a GPS device. Shapes with measurement (M) values are added with the following 950methods: "pointm", "multipointm", "linem", and "polygonm". The M-values are specified by adding a 951third M value to each XY coordinate. Missing or unobserved M-values are specified with a None value, 952or by simply omitting the third M-coordinate. 953 954 955 >>> w = shapefile.Writer('shapefiles/test/linem') 956 >>> w.field('name', 'C') 957 958 >>> w.linem([ 959 ... [[1,5,0],[5,5],[5,1,3],[3,3,None],[1,1,0]], # line with one omitted and one missing M-value 960 ... [[3,2],[2,6]] # line without any M-values 961 ... ]) 962 963 >>> w.record('linem1') 964 965 >>> w.close() 966 967Shapefiles containing M-values can be examined in several ways: 968 969 >>> r = shapefile.Reader('shapefiles/test/linem') 970 971 >>> r.mbox # the lower and upper bound of M-values in the shapefile 972 [0.0, 3.0] 973 974 >>> r.shape(0).m # flat list of M-values 975 [0.0, None, 3.0, None, 0.0, None, None] 976 977 978**Shapefiles with elevation (Z) values** 979 980Elevation shape types are shapes that include an elevation value at each vertex, for instance elevation from a GPS device. 981Shapes with elevation (Z) values are added with the following methods: "pointz", "multipointz", "linez", and "polyz". 982The Z-values are specified by adding a third Z value to each XY coordinate. Z-values do not support the concept of missing data, 983but if you omit the third Z-coordinate it will default to 0. Note that Z-type shapes also support measurement (M) values added 984as a fourth M-coordinate. This too is optional. 985 986 987 >>> w = shapefile.Writer('shapefiles/test/linez') 988 >>> w.field('name', 'C') 989 990 >>> w.linez([ 991 ... [[1,5,18],[5,5,20],[5,1,22],[3,3],[1,1]], # line with some omitted Z-values 992 ... [[3,2],[2,6]], # line without any Z-values 993 ... [[3,2,15,0],[2,6,13,3],[1,9,14,2]] # line with both Z- and M-values 994 ... ]) 995 996 >>> w.record('linez1') 997 998 >>> w.close() 999 1000To examine a Z-type shapefile you can do: 1001 1002 >>> r = shapefile.Reader('shapefiles/test/linez') 1003 1004 >>> r.zbox # the lower and upper bound of Z-values in the shapefile 1005 [0.0, 22.0] 1006 1007 >>> r.shape(0).z # flat list of Z-values 1008 [18.0, 20.0, 22.0, 0.0, 0.0, 0.0, 0.0, 15.0, 13.0, 14.0] 1009 1010**3D MultiPatch Shapefiles** 1011 1012Multipatch shapes are useful for storing composite 3-Dimensional objects. 1013A MultiPatch shape represents a 3D object made up of one or more surface parts. 1014Each surface in "parts" is defined by a list of XYZM values (Z and M values optional), and its corresponding type is 1015given in the "partTypes" argument. The part type decides how the coordinate sequence is to be interpreted, and can be one 1016of the following module constants: TRIANGLE_STRIP, TRIANGLE_FAN, OUTER_RING, INNER_RING, FIRST_RING, or RING. 1017For instance, a TRIANGLE_STRIP may be used to represent the walls of a building, combined with a TRIANGLE_FAN to represent 1018its roof: 1019 1020 >>> from shapefile import TRIANGLE_STRIP, TRIANGLE_FAN 1021 1022 >>> w = shapefile.Writer('shapefiles/test/multipatch') 1023 >>> w.field('name', 'C') 1024 1025 >>> w.multipatch([ 1026 ... [[0,0,0],[0,0,3],[5,0,0],[5,0,3],[5,5,0],[5,5,3],[0,5,0],[0,5,3],[0,0,0],[0,0,3]], # TRIANGLE_STRIP for house walls 1027 ... [[2.5,2.5,5],[0,0,3],[5,0,3],[5,5,3],[0,5,3],[0,0,3]], # TRIANGLE_FAN for pointed house roof 1028 ... ], 1029 ... partTypes=[TRIANGLE_STRIP, TRIANGLE_FAN]) # one type for each part 1030 1031 >>> w.record('house1') 1032 1033 >>> w.close() 1034 1035For an introduction to the various multipatch part types and examples of how to create 3D MultiPatch objects see [this 1036ESRI White Paper](http://downloads.esri.com/support/whitepapers/ao_/J9749_MultiPatch_Geometry_Type.pdf). 1037 1038## Working with Large Shapefiles 1039 1040Despite being a lightweight library, PyShp is designed to be able to read and write 1041shapefiles of any size, allowing you to work with hundreds of thousands or even millions 1042of records and complex geometries. 1043 1044When first creating the Reader class, the library only reads the header information 1045and leaves the rest of the file contents alone. Once you call the records() and shapes() 1046methods however, it will attempt to read the entire file into memory at once. 1047For very large files this can result in MemoryError. So when working with large files 1048it is recommended to use instead the iterShapes(), iterRecords(), or iterShapeRecords() 1049methods instead. These iterate through the file contents one at a time, enabling you to loop 1050through them while keeping memory usage at a minimum. 1051 1052 1053 >>> for shape in sf.iterShapes(): 1054 ... # do something here 1055 ... pass 1056 1057 >>> for rec in sf.iterRecords(): 1058 ... # do something here 1059 ... pass 1060 1061 >>> for shapeRec in sf.iterShapeRecords(): 1062 ... # do something here 1063 ... pass 1064 1065 >>> for shapeRec in sf: # same as iterShapeRecords() 1066 ... # do something here 1067 ... pass 1068 1069The shapefile Writer class uses a similar streaming approach to keep memory 1070usage at a minimum. The library takes care of this under-the-hood by immediately 1071writing each geometry and record to disk the moment they 1072are added using shape() or record(). Once the writer is closed, exited, or garbage 1073collected, the final header information is calculated and written to the beginning of 1074the file. 1075 1076This means that as long as you are able to iterate through a source file without having 1077to load everything into memory, such as a large CSV table or a large shapefile, you can 1078process and write any number of items, and even merge many different source files into a single 1079large shapefile. If you need to edit or undo any of your writing you would have to read the 1080file back in, one record at a time, make your changes, and write it back out. 1081 1082## Unicode and Shapefile Encodings 1083 1084PyShp has full support for unicode and shapefile encodings, so you can always expect to be working 1085with unicode strings in shapefiles that have text fields. 1086Most shapefiles are written in UTF-8 encoding, PyShp's default encoding, so in most cases you don't 1087have to specify the encoding. For reading shapefiles in any other encoding, such as Latin-1, just 1088supply the encoding option when creating the Reader class. 1089 1090 1091 >>> r = shapefile.Reader("shapefiles/test/latin1.shp", encoding="latin1") 1092 >>> r.record(0) == [2, u'Ñandú'] 1093 True 1094 1095Once you have loaded the shapefile, you may choose to save it using another more supportive encoding such 1096as UTF-8. Provided the new encoding supports the characters you are trying to write, reading it back in 1097should give you the same unicode string you started with. 1098 1099 1100 >>> w = shapefile.Writer("shapefiles/test/latin_as_utf8.shp", encoding="utf8") 1101 >>> w.fields = r.fields[1:] 1102 >>> w.record(*r.record(0)) 1103 >>> w.null() 1104 >>> w.close() 1105 1106 >>> r = shapefile.Reader("shapefiles/test/latin_as_utf8.shp", encoding="utf8") 1107 >>> r.record(0) == [2, u'Ñandú'] 1108 True 1109 1110If you supply the wrong encoding and the string is unable to be decoded, PyShp will by default raise an 1111exception. If however, on rare occasion, you are unable to find the correct encoding and want to ignore 1112or replace encoding errors, you can specify the "encodingErrors" to be used by the decode method. This 1113applies to both reading and writing. 1114 1115 1116 >>> r = shapefile.Reader("shapefiles/test/latin1.shp", encoding="ascii", encodingErrors="replace") 1117 >>> r.record(0) == [2, u'�and�'] 1118 True 1119 1120 1121# Testing 1122 1123The testing framework is doctest, which are located in this file README.md. 1124In the same folder as README.md and shapefile.py, from the command line run 1125``` 1126$ python shapefile.py 1127``` 1128 1129Linux/Mac and similar platforms will need to run `$ dos2unix README.md` in order 1130correct line endings in README.md. 1131 1132# Contributors 1133 1134``` 1135Atle Frenvik Sveen 1136Bas Couwenberg 1137Casey Meisenzahl 1138Charles Arnold 1139David A. Riggs 1140davidh-ssec 1141Evan Heidtmann 1142ezcitron 1143fiveham 1144geospatialpython 1145Hannes 1146Ignacio Martinez Vazquez 1147Jason Moujaes 1148Jonty Wareing 1149Karim Bahgat 1150Kyle Kelley 1151Louis Tiao 1152Marcin Cuprjak 1153mcuprjak 1154Micah Cochran 1155Michael Davis 1156Michal Čihař 1157Mike Toews 1158Nilo 1159pakoun 1160Paulo Ernesto 1161Raynor Vliegendhart 1162Razzi Abuissa 1163RosBer97 1164Ross Rogers 1165Ryan Brideau 1166Tobias Megies 1167Tommi Penttinen 1168Uli Köhler 1169Vsevolod Novikov 1170Zac Miller 1171``` 1172