1src/backend/storage/lmgr/README-SSI 2 3Serializable Snapshot Isolation (SSI) and Predicate Locking 4=========================================================== 5 6This code is in the lmgr directory because about 90% of it is an 7implementation of predicate locking, which is required for SSI, 8rather than being directly related to SSI itself. When another use 9for predicate locking justifies the effort to tease these two things 10apart, this README file should probably be split. 11 12 13Credits 14------- 15 16This feature was developed by Kevin Grittner and Dan R. K. Ports, 17with review and suggestions from Joe Conway, Heikki Linnakangas, and 18Jeff Davis. It is based on work published in these papers: 19 20 Michael J. Cahill, Uwe Röhm, and Alan D. Fekete. 2008. 21 Serializable isolation for snapshot databases. 22 In SIGMOD '08: Proceedings of the 2008 ACM SIGMOD 23 international conference on Management of data, 24 pages 729-738, New York, NY, USA. ACM. 25 http://doi.acm.org/10.1145/1376616.1376690 26 27 Michael James Cahill. 2009. 28 Serializable Isolation for Snapshot Databases. 29 Sydney Digital Theses. 30 University of Sydney, School of Information Technologies. 31 http://hdl.handle.net/2123/5353 32 33 34Overview 35-------- 36 37With true serializable transactions, if you can show that your 38transaction will do the right thing if there are no concurrent 39transactions, it will do the right thing in any mix of serializable 40transactions or be rolled back with a serialization failure. This 41feature has been implemented in PostgreSQL using SSI. 42 43 44Serializable and Snapshot Transaction Isolation Levels 45------------------------------------------------------ 46 47Serializable transaction isolation is attractive for shops with 48active development by many programmers against a complex schema 49because it guarantees data integrity with very little staff time -- 50if a transaction can be shown to always do the right thing when it is 51run alone (before or after any other transaction), it will always do 52the right thing in any mix of concurrent serializable transactions. 53Where conflicts with other transactions would result in an 54inconsistent state within the database or an inconsistent view of 55the data, a serializable transaction will block or roll back to 56prevent the anomaly. The SQL standard provides a specific SQLSTATE 57for errors generated when a transaction rolls back for this reason, 58so that transactions can be retried automatically. 59 60Before version 9.1, PostgreSQL did not support a full serializable 61isolation level. A request for serializable transaction isolation 62actually provided snapshot isolation. This has well known anomalies 63which can allow data corruption or inconsistent views of the data 64during concurrent transactions; although these anomalies only occur 65when certain patterns of read-write dependencies exist within a set 66of concurrent transactions. Where these patterns exist, the anomalies 67can be prevented by introducing conflicts through explicitly 68programmed locks or otherwise unnecessary writes to the database. 69Snapshot isolation is popular because performance is better than 70serializable isolation and the integrity guarantees which it does 71provide allow anomalies to be avoided or managed with reasonable 72effort in many environments. 73 74 75Serializable Isolation Implementation Strategies 76------------------------------------------------ 77 78Techniques for implementing full serializable isolation have been 79published and in use in many database products for decades. The 80primary technique which has been used is Strict Two-Phase Locking 81(S2PL), which operates by blocking writes against data which has been 82read by concurrent transactions and blocking any access (read or 83write) against data which has been written by concurrent 84transactions. A cycle in a graph of blocking indicates a deadlock, 85requiring a rollback. Blocking and deadlocks under S2PL in high 86contention workloads can be debilitating, crippling throughput and 87response time. 88 89A new technique for implementing full serializable isolation in an 90MVCC database appears in the literature beginning in 2008. This 91technique, known as Serializable Snapshot Isolation (SSI) has many of 92the advantages of snapshot isolation. In particular, reads don't 93block anything and writes don't block reads. Essentially, it runs 94snapshot isolation but monitors the read-write conflicts between 95transactions to identify dangerous structures in the transaction 96graph which indicate that a set of concurrent transactions might 97produce an anomaly, and rolls back transactions to ensure that no 98anomalies occur. It will produce some false positives (where a 99transaction is rolled back even though there would not have been an 100anomaly), but will never let an anomaly occur. In the two known 101prototype implementations, performance for many workloads (even with 102the need to restart transactions which are rolled back) is very close 103to snapshot isolation and generally far better than an S2PL 104implementation. 105 106 107Apparent Serial Order of Execution 108---------------------------------- 109 110One way to understand when snapshot anomalies can occur, and to 111visualize the difference between the serializable implementations 112described above, is to consider that among transactions executing at 113the serializable transaction isolation level, the results are 114required to be consistent with some serial (one-at-a-time) execution 115of the transactions [1]. How is that order determined in each? 116 117In S2PL, each transaction locks any data it accesses. It holds the 118locks until committing, preventing other transactions from making 119conflicting accesses to the same data in the interim. Some 120transactions may have to be rolled back to prevent deadlock. But 121successful transactions can always be viewed as having occurred 122sequentially, in the order they committed. 123 124With snapshot isolation, reads never block writes, nor vice versa, so 125more concurrency is possible. The order in which transactions appear 126to have executed is determined by something more subtle than in S2PL: 127read/write dependencies. If a transaction reads data, it appears to 128execute after the transaction that wrote the data it is reading. 129Similarly, if it updates data, it appears to execute after the 130transaction that wrote the previous version. These dependencies, which 131we call "wr-dependencies" and "ww-dependencies", are consistent with 132the commit order, because the first transaction must have committed 133before the second starts. However, there can also be dependencies 134between two *concurrent* transactions, i.e. where one was running when 135the other acquired its snapshot. These "rw-conflicts" occur when one 136transaction attempts to read data which is not visible to it because 137the transaction which wrote it (or will later write it) is 138concurrent. The reading transaction appears to have executed first, 139regardless of the actual sequence of transaction starts or commits, 140because it sees a database state prior to that in which the other 141transaction leaves it. 142 143Anomalies occur when a cycle is created in the graph of dependencies: 144when a dependency or series of dependencies causes transaction A to 145appear to have executed before transaction B, but another series of 146dependencies causes B to appear before A. If that's the case, then 147the results can't be consistent with any serial execution of the 148transactions. 149 150 151SSI Algorithm 152------------- 153 154As of 9.1, serializable transactions in PostgreSQL are implemented using 155Serializable Snapshot Isolation (SSI), based on the work of Cahill 156et al. Fundamentally, this allows snapshot isolation to run as it 157previously did, while monitoring for conditions which could create a 158serialization anomaly. 159 160SSI is based on the observation [2] that each snapshot isolation 161anomaly corresponds to a cycle that contains a "dangerous structure" 162of two adjacent rw-conflict edges: 163 164 Tin ------> Tpivot ------> Tout 165 rw rw 166 167SSI works by watching for this dangerous structure, and rolling 168back a transaction when needed to prevent any anomaly. This means it 169only needs to track rw-conflicts between concurrent transactions, not 170wr- and ww-dependencies. It also means there is a risk of false 171positives, because not every dangerous structure is embedded in an 172actual cycle. The number of false positives is low in practice, so 173this represents an acceptable tradeoff for keeping the detection 174overhead low. 175 176The PostgreSQL implementation uses two additional optimizations: 177 178* Tout must commit before any other transaction in the cycle 179 (see proof of Theorem 2.1 of [2]). We only roll back a transaction 180 if Tout commits before Tpivot and Tin. 181 182* if Tin is read-only, there can only be an anomaly if Tout committed 183 before Tin takes its snapshot. This optimization is an original 184 one. Proof: 185 186 - Because there is a cycle, there must be some transaction T0 that 187 precedes Tin in the cycle. (T0 might be the same as Tout.) 188 189 - The edge between T0 and Tin can't be a rw-conflict or ww-dependency, 190 because Tin was read-only, so it must be a wr-dependency. 191 Those can only occur if T0 committed before Tin took its snapshot, 192 else Tin would have ignored T0's output. 193 194 - Because Tout must commit before any other transaction in the 195 cycle, it must commit before T0 commits -- and thus before Tin 196 starts. 197 198 199PostgreSQL Implementation 200------------------------- 201 202 * Since this technique is based on Snapshot Isolation (SI), those 203areas in PostgreSQL which don't use SI can't be brought under SSI. 204This includes system tables, temporary tables, sequences, hint bit 205rewrites, etc. SSI can not eliminate existing anomalies in these 206areas. 207 208 * Any transaction which is run at a transaction isolation level 209other than SERIALIZABLE will not be affected by SSI. If you want to 210enforce business rules through SSI, all transactions should be run at 211the SERIALIZABLE transaction isolation level, and that should 212probably be set as the default. 213 214 * If all transactions are run at the SERIALIZABLE transaction 215isolation level, business rules can be enforced in triggers or 216application code without ever having a need to acquire an explicit 217lock or to use SELECT FOR SHARE or SELECT FOR UPDATE. 218 219 * Those who want to continue to use snapshot isolation without 220the additional protections of SSI (and the associated costs of 221enforcing those protections), can use the REPEATABLE READ transaction 222isolation level. This level retains its legacy behavior, which 223is identical to the old SERIALIZABLE implementation and fully 224consistent with the standard's requirements for the REPEATABLE READ 225transaction isolation level. 226 227 * Performance under this SSI implementation will be significantly 228improved if transactions which don't modify permanent tables are 229declared to be READ ONLY before they begin reading data. 230 231 * Performance under SSI will tend to degrade more rapidly with a 232large number of active database transactions than under less strict 233isolation levels. Limiting the number of active transactions through 234use of a connection pool or similar techniques may be necessary to 235maintain good performance. 236 237 * Any transaction which must be rolled back to prevent 238serialization anomalies will fail with SQLSTATE 40001, which has a 239standard meaning of "serialization failure". 240 241 * This SSI implementation makes an effort to choose the 242transaction to be canceled such that an immediate retry of the 243transaction will not fail due to conflicts with exactly the same 244transactions. Pursuant to this goal, no transaction is canceled 245until one of the other transactions in the set of conflicts which 246could generate an anomaly has successfully committed. This is 247conceptually similar to how write conflicts are handled. To fully 248implement this guarantee there needs to be a way to roll back the 249active transaction for another process with a serialization failure 250SQLSTATE, even if it is "idle in transaction". 251 252 253Predicate Locking 254----------------- 255 256Both S2PL and SSI require some form of predicate locking to handle 257situations where reads conflict with later inserts or with later 258updates which move data into the selected range. PostgreSQL didn't 259already have predicate locking, so it needed to be added to support 260full serializable transactions under either strategy. Practical 261implementations of predicate locking generally involve acquiring 262locks against data as it is accessed, using multiple granularities 263(tuple, page, table, etc.) with escalation as needed to keep the lock 264count to a number which can be tracked within RAM structures. This 265approach was used in PostgreSQL. Coarse granularities can cause some 266false positive indications of conflict. The number of false positives 267can be influenced by plan choice. 268 269 270Implementation overview 271----------------------- 272 273New RAM structures, inspired by those used to track traditional locks 274in PostgreSQL, but tailored to the needs of SIREAD predicate locking, 275are used. These refer to physical objects actually accessed in the 276course of executing the query, to model the predicates through 277inference. Anyone interested in this subject should review the 278Hellerstein, Stonebraker and Hamilton paper [3], along with the 279locking papers referenced from that and the Cahill papers. 280 281Because the SIREAD locks don't block, traditional locking techniques 282have to be modified. Intent locking (locking higher level objects 283before locking lower level objects) doesn't work with non-blocking 284"locks" (which are, in some respects, more like flags than locks). 285 286A configurable amount of shared memory is reserved at postmaster 287start-up to track predicate locks. This size cannot be changed 288without a restart. 289 290To prevent resource exhaustion, multiple fine-grained locks may 291be promoted to a single coarser-grained lock as needed. 292 293An attempt to acquire an SIREAD lock on a tuple when the same 294transaction already holds an SIREAD lock on the page or the relation 295will be ignored. Likewise, an attempt to lock a page when the 296relation is locked will be ignored, and the acquisition of a coarser 297lock will result in the automatic release of all finer-grained locks 298it covers. 299 300 301Heap locking 302------------ 303 304Predicate locks will be acquired for the heap based on the following: 305 306 * For a table scan, the entire relation will be locked. 307 308 * Each tuple read which is visible to the reading transaction 309will be locked, whether or not it meets selection criteria; except 310that there is no need to acquire an SIREAD lock on a tuple when the 311transaction already holds a write lock on any tuple representing the 312row, since a rw-conflict would also create a ww-dependency which 313has more aggressive enforcement and thus will prevent any anomaly. 314 315 * Modifying a heap tuple creates a rw-conflict with any transaction 316that holds a SIREAD lock on that tuple, or on the page or relation 317that contains it. 318 319 * Inserting a new tuple creates a rw-conflict with any transaction 320holding a SIREAD lock on the entire relation. It doesn't conflict with 321page-level locks, because page-level locks are only used to aggregate 322tuple locks. Unlike index page locks, they don't lock "gaps" on the page. 323 324 325Index AM implementations 326------------------------ 327 328Since predicate locks only exist to detect writes which conflict with 329earlier reads, and heap tuple locks are acquired to cover all heap 330tuples actually read, including those read through indexes, the index 331tuples which were actually scanned are not of interest in themselves; 332we only care about their "new neighbors" -- later inserts into the 333index which would have been included in the scan had they existed at 334the time. Conceptually, we want to lock the gaps between and 335surrounding index entries within the scanned range. 336 337Correctness requires that any insert into an index generates a 338rw-conflict with a concurrent serializable transaction if, after that 339insert, re-execution of any index scan of the other transaction would 340access the heap for a row not accessed during the previous execution. 341Note that a non-HOT update which expires an old index entry covered 342by the scan and adds a new entry for the modified row's new tuple 343need not generate a conflict, although an update which "moves" a row 344into the scan must generate a conflict. While correctness allows 345false positives, they should be minimized for performance reasons. 346 347Several optimizations are possible, though not all are implemented yet: 348 349 * An index scan which is just finding the right position for an 350index insertion or deletion need not acquire a predicate lock. 351 352 * An index scan which is comparing for equality on the entire key 353for a unique index need not acquire a predicate lock as long as a key 354is found corresponding to a visible tuple which has not been modified 355by another transaction -- there are no "between or around" gaps to 356cover. 357 358 * As long as built-in foreign key enforcement continues to use 359its current "special tricks" to deal with MVCC issues, predicate 360locks should not be needed for scans done by enforcement code. 361 362 * If a search determines that no rows can be found regardless of 363index contents because the search conditions are contradictory (e.g., 364x = 1 AND x = 2), then no predicate lock is needed. 365 366Other index AM implementation considerations: 367 368 * For an index AM that doesn't have support for predicate locking, 369we just acquire a predicate lock on the whole index for any search. 370 371 * B-tree index searches acquire predicate locks only on the 372index *leaf* pages needed to lock the appropriate index range. If, 373however, a search discovers that no root page has yet been created, a 374predicate lock on the index relation is required. 375 376 * GiST searches can determine that there are no matches at any 377level of the index, so there must be a predicate lock at each index 378level during a GiST search. An index insert at the leaf level can 379then be trusted to ripple up to all levels and locations where 380conflicting predicate locks may exist. 381 382 * The effects of page splits, overflows, consolidations, and 383removals must be carefully reviewed to ensure that predicate locks 384aren't "lost" during those operations, or kept with pages which could 385get re-used for different parts of the index. 386 387 388Innovations 389----------- 390 391The PostgreSQL implementation of Serializable Snapshot Isolation 392differs from what is described in the cited papers for several 393reasons: 394 395 1. PostgreSQL didn't have any existing predicate locking. It had 396to be added from scratch. 397 398 2. The existing in-memory lock structures were not suitable for 399tracking SIREAD locks. 400 * In PostgreSQL, tuple level locks are not held in RAM for 401any length of time; lock information is written to the tuples 402involved in the transactions. 403 * In PostgreSQL, existing lock structures have pointers to 404memory which is related to a session. SIREAD locks need to persist 405past the end of the originating transaction and even the session 406which ran it. 407 * PostgreSQL needs to be able to tolerate a large number of 408transactions executing while one long-running transaction stays open 409-- the in-RAM techniques discussed in the papers wouldn't support 410that. 411 412 3. Unlike the database products used for the prototypes described 413in the papers, PostgreSQL didn't already have a true serializable 414isolation level distinct from snapshot isolation. 415 416 4. PostgreSQL supports subtransactions -- an issue not mentioned 417in the papers. 418 419 5. PostgreSQL doesn't assign a transaction number to a database 420transaction until and unless necessary (normally, when the transaction 421attempts to modify data). 422 423 6. PostgreSQL has pluggable data types with user-definable 424operators, as well as pluggable index types, not all of which are 425based around data types which support ordering. 426 427 7. Some possible optimizations became apparent during development 428and testing. 429 430Differences from the implementation described in the papers are 431listed below. 432 433 * New structures needed to be created in shared memory to track 434the proper information for serializable transactions and their SIREAD 435locks. 436 437 * Because PostgreSQL does not have the same concept of an "oldest 438transaction ID" for all serializable transactions as assumed in the 439Cahill thesis, we track the oldest snapshot xmin among serializable 440transactions, and a count of how many active transactions use that 441xmin. When the count hits zero we find the new oldest xmin and run a 442clean-up based on that. 443 444 * Because reads in a subtransaction may cause that subtransaction 445to roll back, thereby affecting what is written by the top level 446transaction, predicate locks must survive a subtransaction rollback. 447As a consequence, all xid usage in SSI, including predicate locking, 448is based on the top level xid. When looking at an xid that comes 449from a tuple's xmin or xmax, for example, we always call 450SubTransGetTopmostTransaction() before doing much else with it. 451 452 * PostgreSQL does not use "update in place" with a rollback log 453for its MVCC implementation. Where possible it uses "HOT" updates on 454the same page (if there is room and no indexed value is changed). 455For non-HOT updates the old tuple is expired in place and a new tuple 456is inserted at a new location. Because of this difference, a tuple 457lock in PostgreSQL doesn't automatically lock any other versions of a 458row. We don't try to copy or expand a tuple lock to any other 459versions of the row, based on the following proof that any additional 460serialization failures we would get from that would be false 461positives: 462 463 o If transaction T1 reads a row version (thus acquiring a 464predicate lock on it) and a second transaction T2 updates that row 465version (thus creating a rw-conflict graph edge from T1 to T2), must a 466third transaction T3 which re-updates the new version of the row also 467have a rw-conflict in from T1 to prevent anomalies? In other words, 468does it matter whether we recognize the edge T1 -> T3? 469 470 o If T1 has a conflict in, it certainly doesn't. Adding the 471edge T1 -> T3 would create a dangerous structure, but we already had 472one from the edge T1 -> T2, so we would have aborted something anyway. 473(T2 has already committed, else T3 could not have updated its output; 474but we would have aborted either T1 or T1's predecessor(s). Hence 475no cycle involving T1 and T3 can survive.) 476 477 o Now let's consider the case where T1 doesn't have a 478rw-conflict in. If that's the case, for this edge T1 -> T3 to make a 479difference, T3 must have a rw-conflict out that induces a cycle in the 480dependency graph, i.e. a conflict out to some transaction preceding T1 481in the graph. (A conflict out to T1 itself would be problematic too, 482but that would mean T1 has a conflict in, the case we already 483eliminated.) 484 485 o So now we're trying to figure out if there can be an 486rw-conflict edge T3 -> T0, where T0 is some transaction that precedes 487T1. For T0 to precede T1, there has to be some edge, or sequence of 488edges, from T0 to T1. At least the last edge has to be a wr-dependency 489or ww-dependency rather than a rw-conflict, because T1 doesn't have a 490rw-conflict in. And that gives us enough information about the order 491of transactions to see that T3 can't have a rw-conflict to T0: 492 - T0 committed before T1 started (the wr/ww-dependency implies this) 493 - T1 started before T2 committed (the T1->T2 rw-conflict implies this) 494 - T2 committed before T3 started (otherwise, T3 would get aborted 495 because of an update conflict) 496 497 o That means T0 committed before T3 started, and therefore 498there can't be a rw-conflict from T3 to T0. 499 500 o So in all cases, we don't need the T1 -> T3 edge to 501recognize cycles. Therefore it's not necessary for T1's SIREAD lock 502on the original tuple version to cover later versions as well. 503 504 * Predicate locking in PostgreSQL starts at the tuple level 505when possible. Multiple fine-grained locks are promoted to a single 506coarser-granularity lock as needed to avoid resource exhaustion. The 507amount of memory used for these structures is configurable, to balance 508RAM usage against SIREAD lock granularity. 509 510 * Each backend keeps a process-local table of the locks it holds. 511To support granularity promotion decisions with low CPU and locking 512overhead, this table also includes the coarser covering locks and the 513number of finer-granularity locks they cover. 514 515 * Conflicts are identified by looking for predicate locks 516when tuples are written, and by looking at the MVCC information when 517tuples are read. There is no matching between two RAM-based locks. 518 519 * Because write locks are stored in the heap tuples rather than a 520RAM-based lock table, the optimization described in the Cahill thesis 521which eliminates an SIREAD lock where there is a write lock is 522implemented by the following: 523 1. When checking a heap write for conflicts against existing 524predicate locks, a tuple lock on the tuple being written is removed. 525 2. When acquiring a predicate lock on a heap tuple, we 526return quickly without doing anything if it is a tuple written by the 527reading transaction. 528 529 * Rather than using conflictIn and conflictOut pointers which use 530NULL to indicate no conflict and a self-reference to indicate 531multiple conflicts or conflicts with committed transactions, we use a 532list of rw-conflicts. With the more complete information, false 533positives are reduced and we have sufficient data for more aggressive 534clean-up and other optimizations: 535 536 o We can avoid ever rolling back a transaction until and 537unless there is a pivot where a transaction on the conflict *out* 538side of the pivot committed before either of the other transactions. 539 540 o We can avoid ever rolling back a transaction when the 541transaction on the conflict *in* side of the pivot is explicitly or 542implicitly READ ONLY unless the transaction on the conflict *out* 543side of the pivot committed before the READ ONLY transaction acquired 544its snapshot. (An implicit READ ONLY transaction is one which 545committed without writing, even though it was not explicitly declared 546to be READ ONLY.) 547 548 o We can more aggressively clean up conflicts, predicate 549locks, and SSI transaction information. 550 551 * We allow a READ ONLY transaction to "opt out" of SSI if there are 552no READ WRITE transactions which could cause the READ ONLY 553transaction to ever become part of a "dangerous structure" of 554overlapping transaction dependencies. 555 556 * We allow the user to request that a READ ONLY transaction wait 557until the conditions are right for it to start in the "opt out" state 558described above. We add a DEFERRABLE state to transactions, which is 559specified and maintained in a way similar to READ ONLY. It is 560ignored for transactions that are not SERIALIZABLE and READ ONLY. 561 562 * When a transaction must be rolled back, we pick among the 563active transactions such that an immediate retry will not fail again 564on conflicts with the same transactions. 565 566 * We use the PostgreSQL SLRU system to hold summarized 567information about older committed transactions to put an upper bound 568on RAM used. Beyond that limit, information spills to disk. 569Performance can degrade in a pessimal situation, but it should be 570tolerable, and transactions won't need to be canceled or blocked 571from starting. 572 573 574R&D Issues 575---------- 576 577This is intended to be the place to record specific issues which need 578more detailed review or analysis. 579 580 * WAL file replay. While serializable implementations using S2PL 581can guarantee that the write-ahead log contains commits in a sequence 582consistent with some serial execution of serializable transactions, 583SSI cannot make that guarantee. While the WAL replay is no less 584consistent than under snapshot isolation, it is possible that under 585PITR recovery or hot standby a database could reach a readable state 586where some transactions appear before other transactions which would 587have had to precede them to maintain serializable consistency. In 588essence, if we do nothing, WAL replay will be at snapshot isolation 589even for serializable transactions. Is this OK? If not, how do we 590address it? 591 592 * External replication. Look at how this impacts external 593replication solutions, like Postgres-R, Slony, pgpool, HS/SR, etc. 594This is related to the "WAL file replay" issue. 595 596 * UNIQUE btree search for equality on all columns. Since a search 597of a UNIQUE index using equality tests on all columns will lock the 598heap tuple if an entry is found, it appears that there is no need to 599get a predicate lock on the index in that case. A predicate lock is 600still needed for such a search if a matching index entry which points 601to a visible tuple is not found. 602 603 * Minimize touching of shared memory. Should lists in shared 604memory push entries which have just been returned to the front of the 605available list, so they will be popped back off soon and some memory 606might never be touched, or should we keep adding returned items to 607the end of the available list? 608 609 610References 611---------- 612 613[1] http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt 614Search for serial execution to find the relevant section. 615 616[2] A. Fekete et al. Making Snapshot Isolation Serializable. In ACM 617Transactions on Database Systems 30:2, Jun. 2005. 618http://dx.doi.org/10.1145/1071610.1071615 619 620[3] Joseph M. Hellerstein, Michael Stonebraker and James Hamilton. 2007. 621Architecture of a Database System. Foundations and Trends(R) in 622Databases Vol. 1, No. 2 (2007) 141-259. 623http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf 624 Of particular interest: 625 * 6.1 A Note on ACID 626 * 6.2 A Brief Review of Serializability 627 * 6.3 Locking and Latching 628 * 6.3.1 Transaction Isolation Levels 629 * 6.5.3 Next-Key Locking: Physical Surrogates for Logical Properties 630