1.. _rfc-54: 2 3======================================================================================= 4RFC 54: Dataset transactions 5======================================================================================= 6 7Authors: Even Rouault 8 9Contact: even dot rouault at spatialys.com 10 11Status: Adopted, implemented in GDAL 2.0 12 13Summary 14------- 15 16This RFC introduces an API to offer a transaction mechanism at dataset 17level and uses it in the PostgreSQL, SQLite and GPKG drivers. It also 18reworks significantly how transactions are handled in the PostgreSQL 19driver. It also introduces a generic mechanism to implement an emulation 20of transactions for datasources that would not natively support it, and 21uses it in the FileGDB driver. 22 23Rationale 24--------- 25 26The current abstraction offers a transaction API at the layer level. 27However, this is generally misleading since, when it is implemented in 28DBMS with BEGIN/COMMIT/ROLLBACK sql statements (PostgreSQL, SQLite, 29GPKG, PGDump, MSSQLSpatial), the semantics is really a transaction at 30database level that spans over all layers/tables. So even if calling 31StartTransaction() on a layer, it also extends on the changes done on 32other layers. In a very few drivers 33StartTransaction()/CommitTransaction() is sometimes used as a mechanism 34to do bulk insertion. This is for example the case of WFS, CartoDB, GFT, 35GME. For some of them, it could rather be at dataset level too since 36potentially multiple layer modifications could be stacked together. 37 38Furthermode some use cases require updating several layers consistently, 39hence the need for a real database level transaction abstraction. 40 41The current situation of various drivers is the following (some of the 42below observations resulting from the analysis are kept mainly for the 43benefit of developers that would need to work in the drivers) : 44 45PostgreSQL 46~~~~~~~~~~ 47 48A few facts about cursors used to run GetNextFeature() requests: 49 50- Cursors are needed for retrieval of huge amount of data without being 51 memory bound. 52- Cursors need transactions to run. 53- Default cursors (WITHOUT HOLD) cannot be used outside of the 54 transaction that created tem 55- You cannot modify the structure of a table while the transaction 56 (yes, the transaction, not the cursor) is still active and if you do 57 that on another connection, it hangs until the other connection 58 commits or rollbacks) 59- Within a transaction, deleted/modified rows are only visible if they 60 are done before declaring the cursor. 61- Cursors WITH HOLD: may be used outside of transaction but cause a 62 copy of the table to be done --> bad for performance 63 64Current flaws are : 65 66- one cannot do interleaved layer reading (beyond the first 500 67 features fetched, can be easily seen with OGR_PG_CURSOR_PAGE=1) due 68 to the underlying implicit transaction created to read layer A being 69 closed when the reading of layer B starts. 70- GetFeature() flushes the current transaction and starts a new one to 71 do a cursor SELECT. Which is unnecessary since we retrieve only one 72 record 73- SetAttributeFilter() issues a ResetReading() which currently 74 FlushSoftTransaction() the ongoing transaction. Can be annoying in a 75 scenario with long update where you need transactional guarantee 76 77What works : 78 79- Transaction support at the layer level forwarded to datasource. 80- Interleaved writing works (even with copy mode) 81 82SQLite/GPKG 83~~~~~~~~~~~ 84 85- Mechanisms used to read table content (sqlite3_prepare() / 86 sqlite3_step()) do not need transactions. 87- Step sees structure modifications (e.g. column addition) if run after 88 prepared statement but before first step. 89- Step sees row modifications/additions as soon as they occur. 90- Transaction support at the layer level forwarded to datasource. 91 92MySQL 93~~~~~ 94 95- Cannot do interleaved layer reading (reading in one layer resets the 96 other reading) because of the use of mysql_use_result() that can work 97 with one single request at a time. mysql_store_result() would be a 98 solution but requires ingesting the whole result set into memory, 99 which is inpractical for big layers. 100- step does not set row changes once the query has started (if done 101 through another connection, because if done through ExecuteSQL() the 102 long transaction is interrupted) 103- No transaction support 104 105OCI 106~~~ 107 108- Interleaved layer reading works 109- Changes done after SELECT seem not to be seen. 110- No transaction support 111 112FileGDB 113~~~~~~~ 114 115- Interleaved layer reading works 116- Changes done after SELECT seem not to be seen. 117- No transaction support 118 119Proposed changes 120---------------- 121 122GDALDataset changes 123~~~~~~~~~~~~~~~~~~~ 124 125The following methods are added to GDALDataset (and usable by 126OGRDataSource which inherits from GDALDataset). 127 128:: 129 130 /************************************************************************/ 131 /* StartTransaction() */ 132 /************************************************************************/ 133 134 /** 135 \brief For datasources which support transactions, StartTransaction creates a transaction. 136 137 If starting the transaction fails, will return 138 OGRERR_FAILURE. Datasources which do not support transactions will 139 always return OGRERR_UNSUPPORTED_OPERATION. 140 141 Nested transactions are not supported. 142 143 All changes done after the start of the transaction are definitely applied in the 144 datasource if CommitTransaction() is called. They may be canceled by calling 145 RollbackTransaction() instead. 146 147 At the time of writing, transactions only apply on vector layers. 148 149 Datasets that support transactions will advertise the ODsCTransactions capability. 150 Use of transactions at dataset level is generally preferred to transactions at 151 layer level, whose scope is rarely limited to the layer from which it was started. 152 153 In case StartTransaction() fails, neither CommitTransaction() or RollbackTransaction() 154 should be called. 155 156 If an error occurs after a successful StartTransaction(), the whole 157 transaction may or may not be implicitly canceled, depending on drivers. (e.g. 158 the PG driver will cancel it, SQLite/GPKG not). In any case, in the event of an 159 error, an explicit call to RollbackTransaction() should be done to keep things balanced. 160 161 By default, when bForce is set to FALSE, only "efficient" transactions will be 162 attempted. Some drivers may offer an emulation of transactions, but sometimes 163 with significant overhead, in which case the user must explicitly allow for such 164 an emulation by setting bForce to TRUE. Drivers that offer emulated transactions 165 should advertise the ODsCEmulatedTransactions capability (and not ODsCTransactions). 166 167 This function is the same as the C function GDALDatasetStartTransaction(). 168 169 @param bForce can be set to TRUE if an emulation, possibly slow, of a transaction 170 mechanism is acceptable. 171 172 @return OGRERR_NONE on success. 173 @since GDAL 2.0 174 */ 175 OGRErr GDALDataset::StartTransaction(CPL_UNUSED int bForce); 176 177 178 /************************************************************************/ 179 /* CommitTransaction() */ 180 /************************************************************************/ 181 182 /** 183 \brief For datasources which support transactions, CommitTransaction commits a transaction. 184 185 If no transaction is active, or the commit fails, will return 186 OGRERR_FAILURE. Datasources which do not support transactions will 187 always return OGRERR_UNSUPPORTED_OPERATION. 188 189 Depending on drivers, this may or may not abort layer sequential readings that 190 are active. 191 192 This function is the same as the C function GDALDatasetCommitTransaction(). 193 194 @return OGRERR_NONE on success. 195 @since GDAL 2.0 196 */ 197 OGRErr GDALDataset::CommitTransaction(); 198 199 /************************************************************************/ 200 /* RollbackTransaction() */ 201 /************************************************************************/ 202 203 /** 204 \brief For datasources which support transactions, RollbackTransaction will roll 205 back a datasource to its state before the start of the current transaction. 206 207 If no transaction is active, or the rollback fails, will return 208 OGRERR_FAILURE. Datasources which do not support transactions will 209 always return OGRERR_UNSUPPORTED_OPERATION. 210 211 This function is the same as the C function GDALDatasetRollbackTransaction(). 212 213 @return OGRERR_NONE on success. 214 @since GDAL 2.0 215 */ 216 OGRErr GDALDataset::RollbackTransaction(); 217 218Note: in the GDALDataset class itself, those methods have an empty 219implementation that returns OGRERR_UNSUPPORTED_OPERATION. 220 221Those 3 methods are mapped at the C level as : 222 223:: 224 225 OGRErr CPL_DLL GDALDatasetStartTransaction(GDALDatasetH hDS, int bForce); 226 OGRErr CPL_DLL GDALDatasetCommitTransaction(GDALDatasetH hDS); 227 OGRErr CPL_DLL GDALDatasetRollbackTransaction(GDALDatasetH hDS); 228 229Two news dataset capabilities are added : 230 231- ODsCTransactions: True if this datasource supports (efficient) 232 transactions. 233- ODsCEmulatedTransactions: True if this datasource supports 234 transactions through emulation. 235 236Emulated transactions 237~~~~~~~~~~~~~~~~~~~~~ 238 239A new function OGRCreateEmulatedTransactionDataSourceWrapper() is added 240for used by drivers that do not natively support transactions but want 241an emulation of them. It could potentially be adopted by any datasource 242whose data is supported by files/directories. 243 244:: 245 246 /** Returns a new datasource object that adds transactional behavior to an existing datasource. 247 * 248 * The provided poTransactionBehaviour object should implement driver-specific 249 * behavior for transactions. 250 * 251 * The generic mechanisms offered by the wrapper class do not cover concurrent 252 * updates (though different datasource connections) to the same datasource files. 253 * 254 * There are restrictions on what can be accomplished. For example it is not 255 * allowed to have a unreleased layer returned by ExecuteSQL() before calling 256 * StartTransaction(), CommitTransaction() or RollbackTransaction(). 257 * 258 * Layer structural changes are not allowed after StartTransaction() if the 259 * layer definition object has been returned previously with GetLayerDefn(). 260 * 261 * @param poBaseDataSource the datasource to which to add transactional behavior. 262 * @param poTransactionBehaviour an implementation of the IOGRTransactionBehaviour interface. 263 * @param bTakeOwnershipDataSource whether the returned object should own the 264 * passed poBaseDataSource (and thus destroy it 265 * when it is destroyed itself). 266 * @param bTakeOwnershipTransactionBehavior whether the returned object should own 267 * the passed poTransactionBehaviour 268 * (and thus destroy it when 269 * it is destroyed itself). 270 * @return a new datasource handle 271 * @since GDAL 2.0 272 */ 273 OGRDataSource CPL_DLL* OGRCreateEmulatedTransactionDataSourceWrapper( 274 OGRDataSource* poBaseDataSource, 275 IOGRTransactionBehaviour* poTransactionBehaviour, 276 int bTakeOwnershipDataSource, 277 int bTakeOwnershipTransactionBehavior); 278 279The definition of the IOGRTransactionBehaviour interface is the 280following: 281 282:: 283 284 /** IOGRTransactionBehaviour is an interface that a driver must implement 285 * to provide emulation of transactions. 286 * 287 * @since GDAL 2.0 288 */ 289 class CPL_DLL IOGRTransactionBehaviour 290 { 291 public: 292 293 /** Start a transaction. 294 * 295 * The implementation may update the poDSInOut reference by closing 296 * and reopening the datasource (or assigning it to NULL in case of error). 297 * In which case bOutHasReopenedDS must be set to TRUE. 298 * 299 * The implementation can for example backup the existing files/directories 300 * that compose the current datasource. 301 * 302 * @param poDSInOut datasource handle that may be modified 303 * @param bOutHasReopenedDS output boolean to indicate if datasource has been closed 304 * @return OGRERR_NONE in case of success 305 */ 306 virtual OGRErr StartTransaction(OGRDataSource*& poDSInOut, 307 int& bOutHasReopenedDS) = 0; 308 309 /** Commit a transaction. 310 * 311 * The implementation may update the poDSInOut reference by closing 312 * and reopening the datasource (or assigning it to NULL in case of error). 313 * In which case bOutHasReopenedDS must be set to TRUE. 314 * 315 * The implementation can for example remove the backup it may have done 316 * at StartTransaction() time. 317 * 318 * @param poDSInOut datasource handle that may be modified 319 * @param bOutHasReopenedDS output boolean to indicate if datasource has been closed 320 * @return OGRERR_NONE in case of success 321 */ 322 virtual OGRErr CommitTransaction(OGRDataSource*& poDSInOut, 323 int& bOutHasReopenedDS) = 0; 324 325 /** Rollback a transaction. 326 * 327 * The implementation may update the poDSInOut reference by closing 328 * and reopening the datasource (or assigning it to NULL in case of error). 329 * In which case bOutHasReopenedDS must be set to TRUE. 330 * 331 * The implementation can for example restore the backup it may have done 332 * at StartTransaction() time. 333 * 334 * @param poDSInOut datasource handle that may be modified 335 * @param bOutHasReopenedDS output boolean to indicate if datasource has been closed 336 * @return OGRERR_NONE in case of success 337 */ 338 virtual OGRErr RollbackTransaction(OGRDataSource*& poDSInOut, 339 int& bOutHasReopenedDS) = 0; 340 }; 341 342OPGRLayer changes 343~~~~~~~~~~~~~~~~~ 344 345At the OGRLayer level, the documentation of GetNextFeature() receives 346the following additional information to clarify its semantics : 347 348:: 349 350 Features returned by GetNextFeature() may or may not be affected by concurrent 351 modifications depending on drivers. A guaranteed way of seeing modifications in 352 effect is to call ResetReading() on layers where GetNextFeature() has been called, 353 before reading again. Structural changes in layers (field addition, deletion, ...) 354 when a read is in progress may or may not be possible depending on drivers. 355 If a transaction is committed/aborted, the current sequential reading may or may 356 not be valid after that operation and a call to ResetReading() might be needed. 357 358PG driver changes 359~~~~~~~~~~~~~~~~~ 360 361Dataset level transactions have been implemented, and use of implicitly 362created transactions reworked. 363 364Interleaved layer reading is now possible. 365 366GetFeature() has been modified to run without a cursor or a transaction, 367and all other calls to transactions have been checked/modified to not 368reset accidentally a transaction initiated by the user. 369 370Below the new behavior as described in the updated drv_pg_advanced.html 371help page : 372 373:: 374 375 Efficient sequential reading in PostgreSQL requires to be done within a transaction 376 (technically this is a CURSOR WITHOUT HOLD). 377 So the PG driver will implicitly open such a transaction if none is currently 378 opened as soon as a feature is retrieved. This transaction will be released if 379 ResetReading() is called (provided that no other layer is still being read). 380 381 If within such an implicit transaction, an explicit dataset level StartTransaction() 382 is issued, the PG driver will use a SAVEPOINT to emulate properly the transaction 383 behavior while making the active cursor on the read layer still opened. 384 385 If an explicit transaction is opened with dataset level StartTransaction() 386 before reading a layer, this transaction will be used for the cursor that iterates 387 over the layer. When explicitly committing or rolling back the transaction, the 388 cursor will become invalid, and ResetReading() should be issued again to restart 389 reading from the beginning. 390 391 As calling SetAttributeFilter() or SetSpatialFilter() implies an implicit 392 ResetReading(), they have the same effect as ResetReading(). That is to say, 393 while an implicit transaction is in progress, the transaction will be committed 394 (if no other layer is being read), and a new one will be started again at the next 395 GetNextFeature() call. On the contrary, if they are called within an explicit 396 transaction, the transaction is maintained. 397 398 With the above rules, the below examples show the SQL instructions that are 399 run when using the OGR API in different scenarios. 400 401 402 lyr1->GetNextFeature() BEGIN (implicit) 403 DECLARE cur1 CURSOR FOR SELECT * FROM lyr1 404 FETCH 1 IN cur1 405 406 lyr1->SetAttributeFilter('xxx') 407 --> lyr1->ResetReading() CLOSE cur1 408 COMMIT (implicit) 409 410 lyr1->GetNextFeature() BEGIN (implicit) 411 DECLARE cur1 CURSOR FOR SELECT * FROM lyr1 WHERE xxx 412 FETCH 1 IN cur1 413 414 lyr2->GetNextFeature() DECLARE cur2 CURSOR FOR SELECT * FROM lyr2 415 FETCH 1 IN cur2 416 417 lyr1->GetNextFeature() FETCH 1 IN cur1 418 419 lyr2->GetNextFeature() FETCH 1 IN cur2 420 421 lyr1->CreateFeature(f) INSERT INTO cur1 ... 422 423 lyr1->SetAttributeFilter('xxx') 424 --> lyr1->ResetReading() CLOSE cur1 425 COMMIT (implicit) 426 427 lyr1->GetNextFeature() DECLARE cur1 CURSOR FOR SELECT * FROM lyr1 WHERE xxx 428 FETCH 1 IN cur1 429 430 lyr1->ResetReading() CLOSE cur1 431 432 lyr2->ResetReading() CLOSE cur2 433 COMMIT (implicit) 434 435 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 436 437 ds->StartTransaction() BEGIN 438 439 lyr1->GetNextFeature() DECLARE cur1 CURSOR FOR SELECT * FROM lyr1 440 FETCH 1 IN cur1 441 442 lyr2->GetNextFeature() DECLARE cur2 CURSOR FOR SELECT * FROM lyr2 443 FETCH 1 IN cur2 444 445 lyr1->CreateFeature(f) INSERT INTO cur1 ... 446 447 lyr1->SetAttributeFilter('xxx') 448 --> lyr1->ResetReading() CLOSE cur1 449 COMMIT (implicit) 450 451 lyr1->GetNextFeature() DECLARE cur1 CURSOR FOR SELECT * FROM lyr1 WHERE xxx 452 FETCH 1 IN cur1 453 454 lyr1->ResetReading() CLOSE cur1 455 456 lyr2->ResetReading() CLOSE cur2 457 458 ds->CommitTransaction() COMMIT 459 460 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 461 462 ds->StartTransaction() BEGIN 463 464 lyr1->GetNextFeature() DECLARE cur1 CURSOR FOR SELECT * FROM lyr1 465 FETCH 1 IN cur1 466 467 lyr1->CreateFeature(f) INSERT INTO cur1 ... 468 469 ds->CommitTransaction() CLOSE cur1 (implicit) 470 COMMIT 471 472 lyr1->GetNextFeature() FETCH 1 IN cur1 ==> Error since the cursor was closed with the commit. Explicit ResetReading() required before 473 474 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 475 476 lyr1->GetNextFeature() BEGIN (implicit) 477 DECLARE cur1 CURSOR FOR SELECT * FROM lyr1 478 FETCH 1 IN cur1 479 480 ds->StartTransaction() SAVEPOINT savepoint 481 482 lyr1->CreateFeature(f) INSERT INTO cur1 ... 483 484 ds->CommitTransaction() RELEASE SAVEPOINT savepoint 485 486 lyr1->ResetReading() CLOSE cur1 487 COMMIT (implicit) 488 489 490 Note: in reality, the PG drivers fetches 500 features at once. The FETCH 1 491 is for clarity of the explanation. 492 493It is recommended to do operations within explicit transactions for ease 494of mind (some troubles fixing ogr_pg.py, but which does admittedly weird 495things like reopening connections, which does not fly very well with 496'implicit' transactions) 497 498GPKG and SQLite driver changes 499~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 500 501Dataset level transactions have been implemented. A few fixes made here 502and there to avoid resetting accidentally a transaction initiated by the 503user. 504 505FileGDB driver changes 506~~~~~~~~~~~~~~~~~~~~~~ 507 508The FileGDB driver uses the above described emulation to offer a 509transaction mechanism. This works by backing up the current state of a 510geodatabase when StartTransaction(force=TRUE) is called. If the 511transaction is committed, the backup copy is destroyed. If the 512transaction is rolled back, the backup copy is restored. So this might 513be costly when operating on huge geodatabases. Note that this emulation 514has an unspecified behavior in case of concurrent updates (with 515different connections in the same or another process). 516 517SWIG bindings (Python / Java / C# / Perl) changes 518------------------------------------------------- 519 520The following additions have been done : 521 522- Dataset.StartTransaction(int force=FALSE) 523- Dataset.CommitTransaction() 524- Dataset.RollbackTransaction() 525- ogr.ODsCTransactions constant 526- ogr.ODsCEmulatedTransactions constant 527 528Utilities 529--------- 530 531ogr2ogr now uses dataset transactions (instead of layer transactions) if 532ODsCTransactions is advertized. 533 534Documentation 535------------- 536 537New/modified API are documented. MIGRATION_GUIDE.TXT updated with 538mention to below compatibility issues. 539 540Test Suite 541---------- 542 543The test suite is extended to test 544 545- updated drivers: PG, GPKG, SQLite, FileGDB 546- use of database transactions by ogr2ogr 547 548Compatibility Issues 549-------------------- 550 551As described above, subtle behavior changes can be observed with the PG 552driver, related to implicit transactions that were flushed before and 553are no longer now, but this should hopefully be restricted to 554non-typical use cases. So some cases that "worked" before might no 555longer work, but the new behavior should hopefully be more 556understandable. 557 558The PG and SQLite drivers could accept apparently nested calls to 559StartTransaction() (at the layer level). This is no longer possible 560since they are now redirected to dataset transactions, that explicitly 561do not support it. 562 563Out of scope 564------------ 565 566The following drivers that implement BEGIN/COMMIT/ROLLBACK could be 567later enhanced to support dataset transactions: OCI, MySQL, 568MSSQLSpatial. 569 570GFT, CartoDB, WFS could also benefit for dataset transactions. 571 572VRT currently supports layer transactions (if the underlying dataset 573support it, and excluding union layers). If dataset transaction were to 574be implemented, should it consist in forwarding dataset transaction to 575source dataset(s) ? Implementation might be complicated in case the same 576dataset is used by multiple sources, but more fundamentally one cannot 577guarantee ACID on multiple datasets. 578 579Related tickets 580--------------- 581 582A proposed revision on how transactions are implemented in the PG driver 583was proposed a long time ago 584(`https://trac.osgeo.org/gdal/ticket/1265 <https://trac.osgeo.org/gdal/ticket/1265>`__) 585to solve some of the above issues. The patch no longer applies but it is 586expected that the changes done for this RFC cover the issues that the 587ticket wanted to address. 588 589Implementation 590-------------- 591 592Implementation will be done by Even Rouault 593(`Spatialys <http://spatialys.com>`__), and sponsored by `LINZ (Land 594Information New Zealand) <http://www.linz.govt.nz/>`__. 595 596The proposed implementation lies in the "rfc54_dataset_transactions" 597branch of the 598`https://github.com/rouault/gdal2/tree/rfc54_dataset_transactions <https://github.com/rouault/gdal2/tree/rfc54_dataset_transactions>`__ 599repository. 600 601The list of changes: 602`https://github.com/rouault/gdal2/compare/rfc54_dataset_transactions <https://github.com/rouault/gdal2/compare/rfc54_dataset_transactions>`__ 603 604Voting history 605-------------- 606 607+1 from JukkaR, HowardB and EvenR 608