1<!-- doc/src/sgml/mvcc.sgml --> 2 3 <chapter id="mvcc"> 4 <title>Concurrency Control</title> 5 6 <indexterm> 7 <primary>concurrency</primary> 8 </indexterm> 9 10 <para> 11 This chapter describes the behavior of the 12 <productname>PostgreSQL</productname> database system when two or 13 more sessions try to access the same data at the same time. The 14 goals in that situation are to allow efficient access for all 15 sessions while maintaining strict data integrity. Every developer 16 of database applications should be familiar with the topics covered 17 in this chapter. 18 </para> 19 20 <sect1 id="mvcc-intro"> 21 <title>Introduction</title> 22 23 <indexterm> 24 <primary>Multiversion Concurrency Control</primary> 25 </indexterm> 26 27 <indexterm> 28 <primary>MVCC</primary> 29 </indexterm> 30 31 <indexterm> 32 <primary>Serializable Snapshot Isolation</primary> 33 </indexterm> 34 35 <indexterm> 36 <primary>SSI</primary> 37 </indexterm> 38 39 <para> 40 <productname>PostgreSQL</productname> provides a rich set of tools 41 for developers to manage concurrent access to data. Internally, 42 data consistency is maintained by using a multiversion 43 model (Multiversion Concurrency Control, <acronym>MVCC</acronym>). 44 This means that each SQL statement sees 45 a snapshot of data (a <firstterm>database version</firstterm>) 46 as it was some 47 time ago, regardless of the current state of the underlying data. 48 This prevents statements from viewing inconsistent data produced 49 by concurrent transactions performing updates on the same 50 data rows, providing <firstterm>transaction isolation</firstterm> 51 for each database session. <acronym>MVCC</acronym>, by eschewing 52 the locking methodologies of traditional database systems, 53 minimizes lock contention in order to allow for reasonable 54 performance in multiuser environments. 55 </para> 56 57 <para> 58 The main advantage of using the <acronym>MVCC</acronym> model of 59 concurrency control rather than locking is that in 60 <acronym>MVCC</acronym> locks acquired for querying (reading) data 61 do not conflict with locks acquired for writing data, and so 62 reading never blocks writing and writing never blocks reading. 63 <productname>PostgreSQL</productname> maintains this guarantee 64 even when providing the strictest level of transaction 65 isolation through the use of an innovative <firstterm>Serializable 66 Snapshot Isolation</firstterm> (<acronym>SSI</acronym>) level. 67 </para> 68 69 <para> 70 Table- and row-level locking facilities are also available in 71 <productname>PostgreSQL</productname> for applications which don't 72 generally need full transaction isolation and prefer to explicitly 73 manage particular points of conflict. However, proper 74 use of <acronym>MVCC</acronym> will generally provide better 75 performance than locks. In addition, application-defined advisory 76 locks provide a mechanism for acquiring locks that are not tied 77 to a single transaction. 78 </para> 79 </sect1> 80 81 <sect1 id="transaction-iso"> 82 <title>Transaction Isolation</title> 83 84 <indexterm> 85 <primary>transaction isolation</primary> 86 </indexterm> 87 88 <para> 89 The <acronym>SQL</acronym> standard defines four levels of 90 transaction isolation. The most strict is Serializable, 91 which is defined by the standard in a paragraph which says that any 92 concurrent execution of a set of Serializable transactions is guaranteed 93 to produce the same effect as running them one at a time in some order. 94 The other three levels are defined in terms of phenomena, resulting from 95 interaction between concurrent transactions, which must not occur at 96 each level. The standard notes that due to the definition of 97 Serializable, none of these phenomena are possible at that level. (This 98 is hardly surprising -- if the effect of the transactions must be 99 consistent with having been run one at a time, how could you see any 100 phenomena caused by interactions?) 101 </para> 102 103 <para> 104 The phenomena which are prohibited at various levels are: 105 106 <variablelist> 107 <varlistentry> 108 <term> 109 dirty read 110 <indexterm><primary>dirty read</primary></indexterm> 111 </term> 112 <listitem> 113 <para> 114 A transaction reads data written by a concurrent uncommitted transaction. 115 </para> 116 </listitem> 117 </varlistentry> 118 119 <varlistentry> 120 <term> 121 nonrepeatable read 122 <indexterm><primary>nonrepeatable read</primary></indexterm> 123 </term> 124 <listitem> 125 <para> 126 A transaction re-reads data it has previously read and finds that data 127 has been modified by another transaction (that committed since the 128 initial read). 129 </para> 130 </listitem> 131 </varlistentry> 132 133 <varlistentry> 134 <term> 135 phantom read 136 <indexterm><primary>phantom read</primary></indexterm> 137 </term> 138 <listitem> 139 <para> 140 A transaction re-executes a query returning a set of rows that satisfy a 141 search condition and finds that the set of rows satisfying the condition 142 has changed due to another recently-committed transaction. 143 </para> 144 </listitem> 145 </varlistentry> 146 147 <varlistentry> 148 <term> 149 serialization anomaly 150 <indexterm><primary>serialization anomaly</primary></indexterm> 151 </term> 152 <listitem> 153 <para> 154 The result of successfully committing a group of transactions 155 is inconsistent with all possible orderings of running those 156 transactions one at a time. 157 </para> 158 </listitem> 159 </varlistentry> 160 </variablelist> 161 </para> 162 163 <para> 164 <indexterm> 165 <primary>transaction isolation level</primary> 166 </indexterm> 167 The SQL standard and PostgreSQL-implemented transaction isolation levels 168 are described in <xref linkend="mvcc-isolevel-table"/>. 169 </para> 170 171 <table tocentry="1" id="mvcc-isolevel-table"> 172 <title>Transaction Isolation Levels</title> 173 <tgroup cols="5"> 174 <thead> 175 <row> 176 <entry> 177 Isolation Level 178 </entry> 179 <entry> 180 Dirty Read 181 </entry> 182 <entry> 183 Nonrepeatable Read 184 </entry> 185 <entry> 186 Phantom Read 187 </entry> 188 <entry> 189 Serialization Anomaly 190 </entry> 191 </row> 192 </thead> 193 <tbody> 194 <row> 195 <entry> 196 Read uncommitted 197 </entry> 198 <entry> 199 Allowed, but not in PG 200 </entry> 201 <entry> 202 Possible 203 </entry> 204 <entry> 205 Possible 206 </entry> 207 <entry> 208 Possible 209 </entry> 210 </row> 211 212 <row> 213 <entry> 214 Read committed 215 </entry> 216 <entry> 217 Not possible 218 </entry> 219 <entry> 220 Possible 221 </entry> 222 <entry> 223 Possible 224 </entry> 225 <entry> 226 Possible 227 </entry> 228 </row> 229 230 <row> 231 <entry> 232 Repeatable read 233 </entry> 234 <entry> 235 Not possible 236 </entry> 237 <entry> 238 Not possible 239 </entry> 240 <entry> 241 Allowed, but not in PG 242 </entry> 243 <entry> 244 Possible 245 </entry> 246 </row> 247 248 <row> 249 <entry> 250 Serializable 251 </entry> 252 <entry> 253 Not possible 254 </entry> 255 <entry> 256 Not possible 257 </entry> 258 <entry> 259 Not possible 260 </entry> 261 <entry> 262 Not possible 263 </entry> 264 </row> 265 </tbody> 266 </tgroup> 267 </table> 268 269 <para> 270 In <productname>PostgreSQL</productname>, you can request any of 271 the four standard transaction isolation levels, but internally only 272 three distinct isolation levels are implemented, i.e., PostgreSQL's 273 Read Uncommitted mode behaves like Read Committed. This is because 274 it is the only sensible way to map the standard isolation levels to 275 PostgreSQL's multiversion concurrency control architecture. 276 </para> 277 278 <para> 279 The table also shows that PostgreSQL's Repeatable Read implementation 280 does not allow phantom reads. Stricter behavior is permitted by the 281 SQL standard: the four isolation levels only define which phenomena 282 must not happen, not which phenomena <emphasis>must</emphasis> happen. 283 The behavior of the available isolation levels is detailed in the 284 following subsections. 285 </para> 286 287 <para> 288 To set the transaction isolation level of a transaction, use the 289 command <xref linkend="sql-set-transaction"/>. 290 </para> 291 292 <important> 293 <para> 294 Some <productname>PostgreSQL</productname> data types and functions have 295 special rules regarding transactional behavior. In particular, changes 296 made to a sequence (and therefore the counter of a 297 column declared using <type>serial</type>) are immediately visible 298 to all other transactions and are not rolled back if the transaction 299 that made the changes aborts. See <xref linkend="functions-sequence"/> 300 and <xref linkend="datatype-serial"/>. 301 </para> 302 </important> 303 304 <sect2 id="xact-read-committed"> 305 <title>Read Committed Isolation Level</title> 306 307 <indexterm> 308 <primary>transaction isolation level</primary> 309 <secondary>read committed</secondary> 310 </indexterm> 311 312 <indexterm> 313 <primary>read committed</primary> 314 </indexterm> 315 316 <para> 317 <firstterm>Read Committed</firstterm> is the default isolation 318 level in <productname>PostgreSQL</productname>. When a transaction 319 uses this isolation level, a <command>SELECT</command> query 320 (without a <literal>FOR UPDATE/SHARE</literal> clause) sees only data 321 committed before the query began; it never sees either uncommitted 322 data or changes committed during query execution by concurrent 323 transactions. In effect, a <command>SELECT</command> query sees 324 a snapshot of the database as of the instant the query begins to 325 run. However, <command>SELECT</command> does see the effects 326 of previous updates executed within its own transaction, even 327 though they are not yet committed. Also note that two successive 328 <command>SELECT</command> commands can see different data, even 329 though they are within a single transaction, if other transactions 330 commit changes after the first <command>SELECT</command> starts and 331 before the second <command>SELECT</command> starts. 332 </para> 333 334 <para> 335 <command>UPDATE</command>, <command>DELETE</command>, <command>SELECT 336 FOR UPDATE</command>, and <command>SELECT FOR SHARE</command> commands 337 behave the same as <command>SELECT</command> 338 in terms of searching for target rows: they will only find target rows 339 that were committed as of the command start time. However, such a target 340 row might have already been updated (or deleted or locked) by 341 another concurrent transaction by the time it is found. In this case, the 342 would-be updater will wait for the first updating transaction to commit or 343 roll back (if it is still in progress). If the first updater rolls back, 344 then its effects are negated and the second updater can proceed with 345 updating the originally found row. If the first updater commits, the 346 second updater will ignore the row if the first updater deleted it, 347 otherwise it will attempt to apply its operation to the updated version of 348 the row. The search condition of the command (the <literal>WHERE</literal> clause) is 349 re-evaluated to see if the updated version of the row still matches the 350 search condition. If so, the second updater proceeds with its operation 351 using the updated version of the row. In the case of 352 <command>SELECT FOR UPDATE</command> and <command>SELECT FOR 353 SHARE</command>, this means it is the updated version of the row that is 354 locked and returned to the client. 355 </para> 356 357 <para> 358 <command>INSERT</command> with an <literal>ON CONFLICT DO UPDATE</literal> clause 359 behaves similarly. In Read Committed mode, each row proposed for insertion 360 will either insert or update. Unless there are unrelated errors, one of 361 those two outcomes is guaranteed. If a conflict originates in another 362 transaction whose effects are not yet visible to the <command>INSERT 363 </command>, the <command>UPDATE</command> clause will affect that row, 364 even though possibly <emphasis>no</emphasis> version of that row is 365 conventionally visible to the command. 366 </para> 367 368 <para> 369 <command>INSERT</command> with an <literal>ON CONFLICT DO 370 NOTHING</literal> clause may have insertion not proceed for a row due to 371 the outcome of another transaction whose effects are not visible 372 to the <command>INSERT</command> snapshot. Again, this is only 373 the case in Read Committed mode. 374 </para> 375 376 <para> 377 Because of the above rules, it is possible for an updating command to see 378 an inconsistent snapshot: it can see the effects of concurrent updating 379 commands on the same rows it is trying to update, but it 380 does not see effects of those commands on other rows in the database. 381 This behavior makes Read Committed mode unsuitable for commands that 382 involve complex search conditions; however, it is just right for simpler 383 cases. For example, consider updating bank balances with transactions 384 like: 385 386<screen> 387BEGIN; 388UPDATE accounts SET balance = balance + 100.00 WHERE acctnum = 12345; 389UPDATE accounts SET balance = balance - 100.00 WHERE acctnum = 7534; 390COMMIT; 391</screen> 392 393 If two such transactions concurrently try to change the balance of account 394 12345, we clearly want the second transaction to start with the updated 395 version of the account's row. Because each command is affecting only a 396 predetermined row, letting it see the updated version of the row does 397 not create any troublesome inconsistency. 398 </para> 399 400 <para> 401 More complex usage can produce undesirable results in Read Committed 402 mode. For example, consider a <command>DELETE</command> command 403 operating on data that is being both added and removed from its 404 restriction criteria by another command, e.g., assume 405 <literal>website</literal> is a two-row table with 406 <literal>website.hits</literal> equaling <literal>9</literal> and 407 <literal>10</literal>: 408 409<screen> 410BEGIN; 411UPDATE website SET hits = hits + 1; 412-- run from another session: DELETE FROM website WHERE hits = 10; 413COMMIT; 414</screen> 415 416 The <command>DELETE</command> will have no effect even though 417 there is a <literal>website.hits = 10</literal> row before and 418 after the <command>UPDATE</command>. This occurs because the 419 pre-update row value <literal>9</literal> is skipped, and when the 420 <command>UPDATE</command> completes and <command>DELETE</command> 421 obtains a lock, the new row value is no longer <literal>10</literal> but 422 <literal>11</literal>, which no longer matches the criteria. 423 </para> 424 425 <para> 426 Because Read Committed mode starts each command with a new snapshot 427 that includes all transactions committed up to that instant, 428 subsequent commands in the same transaction will see the effects 429 of the committed concurrent transaction in any case. The point 430 at issue above is whether or not a <emphasis>single</emphasis> command 431 sees an absolutely consistent view of the database. 432 </para> 433 434 <para> 435 The partial transaction isolation provided by Read Committed mode 436 is adequate for many applications, and this mode is fast and simple 437 to use; however, it is not sufficient for all cases. Applications 438 that do complex queries and updates might require a more rigorously 439 consistent view of the database than Read Committed mode provides. 440 </para> 441 </sect2> 442 443 <sect2 id="xact-repeatable-read"> 444 <title>Repeatable Read Isolation Level</title> 445 446 <indexterm> 447 <primary>transaction isolation level</primary> 448 <secondary>repeatable read</secondary> 449 </indexterm> 450 451 <indexterm> 452 <primary>repeatable read</primary> 453 </indexterm> 454 455 <para> 456 The <firstterm>Repeatable Read</firstterm> isolation level only sees 457 data committed before the transaction began; it never sees either 458 uncommitted data or changes committed during transaction execution 459 by concurrent transactions. (However, the query does see the 460 effects of previous updates executed within its own transaction, 461 even though they are not yet committed.) This is a stronger 462 guarantee than is required by the <acronym>SQL</acronym> standard 463 for this isolation level, and prevents all of the phenomena described 464 in <xref linkend="mvcc-isolevel-table"/> except for serialization 465 anomalies. As mentioned above, this is 466 specifically allowed by the standard, which only describes the 467 <emphasis>minimum</emphasis> protections each isolation level must 468 provide. 469 </para> 470 471 <para> 472 This level is different from Read Committed in that a query in a 473 repeatable read transaction sees a snapshot as of the start of the 474 first non-transaction-control statement in the 475 <emphasis>transaction</emphasis>, not as of the start 476 of the current statement within the transaction. Thus, successive 477 <command>SELECT</command> commands within a <emphasis>single</emphasis> 478 transaction see the same data, i.e., they do not see changes made by 479 other transactions that committed after their own transaction started. 480 </para> 481 482 <para> 483 Applications using this level must be prepared to retry transactions 484 due to serialization failures. 485 </para> 486 487 <para> 488 <command>UPDATE</command>, <command>DELETE</command>, <command>SELECT 489 FOR UPDATE</command>, and <command>SELECT FOR SHARE</command> commands 490 behave the same as <command>SELECT</command> 491 in terms of searching for target rows: they will only find target rows 492 that were committed as of the transaction start time. However, such a 493 target row might have already been updated (or deleted or locked) by 494 another concurrent transaction by the time it is found. In this case, the 495 repeatable read transaction will wait for the first updating transaction to commit or 496 roll back (if it is still in progress). If the first updater rolls back, 497 then its effects are negated and the repeatable read transaction can proceed 498 with updating the originally found row. But if the first updater commits 499 (and actually updated or deleted the row, not just locked it) 500 then the repeatable read transaction will be rolled back with the message 501 502<screen> 503ERROR: could not serialize access due to concurrent update 504</screen> 505 506 because a repeatable read transaction cannot modify or lock rows changed by 507 other transactions after the repeatable read transaction began. 508 </para> 509 510 <para> 511 When an application receives this error message, it should abort 512 the current transaction and retry the whole transaction from 513 the beginning. The second time through, the transaction will see the 514 previously-committed change as part of its initial view of the database, 515 so there is no logical conflict in using the new version of the row 516 as the starting point for the new transaction's update. 517 </para> 518 519 <para> 520 Note that only updating transactions might need to be retried; read-only 521 transactions will never have serialization conflicts. 522 </para> 523 524 <para> 525 The Repeatable Read mode provides a rigorous guarantee that each 526 transaction sees a completely stable view of the database. However, 527 this view will not necessarily always be consistent with some serial 528 (one at a time) execution of concurrent transactions of the same level. 529 For example, even a read only transaction at this level may see a 530 control record updated to show that a batch has been completed but 531 <emphasis>not</emphasis> see one of the detail records which is logically 532 part of the batch because it read an earlier revision of the control 533 record. Attempts to enforce business rules by transactions running at 534 this isolation level are not likely to work correctly without careful use 535 of explicit locks to block conflicting transactions. 536 </para> 537 538 <para> 539 The Repeatable Read isolation level is implemented using a technique 540 known in academic database literature and in some other database products 541 as <firstterm>Snapshot Isolation</firstterm>. Differences in behavior 542 and performance may be observed when compared with systems that use a 543 traditional locking technique that reduces concurrency. Some other 544 systems may even offer Repeatable Read and Snapshot Isolation as distinct 545 isolation levels with different behavior. The permitted phenomena that 546 distinguish the two techniques were not formalized by database researchers 547 until after the SQL standard was developed, and are outside the scope of 548 this manual. For a full treatment, please see 549 <xref linkend="berenson95"/>. 550 </para> 551 552 <note> 553 <para> 554 Prior to <productname>PostgreSQL</productname> version 9.1, a request 555 for the Serializable transaction isolation level provided exactly the 556 same behavior described here. To retain the legacy Serializable 557 behavior, Repeatable Read should now be requested. 558 </para> 559 </note> 560 </sect2> 561 562 <sect2 id="xact-serializable"> 563 <title>Serializable Isolation Level</title> 564 565 <indexterm> 566 <primary>transaction isolation level</primary> 567 <secondary>serializable</secondary> 568 </indexterm> 569 570 <indexterm> 571 <primary>serializable</primary> 572 </indexterm> 573 574 <indexterm> 575 <primary>predicate locking</primary> 576 </indexterm> 577 578 <indexterm> 579 <primary>serialization anomaly</primary> 580 </indexterm> 581 582 <para> 583 The <firstterm>Serializable</firstterm> isolation level provides 584 the strictest transaction isolation. This level emulates serial 585 transaction execution for all committed transactions; 586 as if transactions had been executed one after another, serially, 587 rather than concurrently. However, like the Repeatable Read level, 588 applications using this level must 589 be prepared to retry transactions due to serialization failures. 590 In fact, this isolation level works exactly the same as Repeatable 591 Read except that it monitors for conditions which could make 592 execution of a concurrent set of serializable transactions behave 593 in a manner inconsistent with all possible serial (one at a time) 594 executions of those transactions. This monitoring does not 595 introduce any blocking beyond that present in repeatable read, but 596 there is some overhead to the monitoring, and detection of the 597 conditions which could cause a 598 <firstterm>serialization anomaly</firstterm> will trigger a 599 <firstterm>serialization failure</firstterm>. 600 </para> 601 602 <para> 603 As an example, 604 consider a table <structname>mytab</structname>, initially containing: 605<screen> 606 class | value 607-------+------- 608 1 | 10 609 1 | 20 610 2 | 100 611 2 | 200 612</screen> 613 Suppose that serializable transaction A computes: 614<screen> 615SELECT SUM(value) FROM mytab WHERE class = 1; 616</screen> 617 and then inserts the result (30) as the <structfield>value</structfield> in a 618 new row with <structfield>class</structfield><literal> = 2</literal>. Concurrently, serializable 619 transaction B computes: 620<screen> 621SELECT SUM(value) FROM mytab WHERE class = 2; 622</screen> 623 and obtains the result 300, which it inserts in a new row with 624 <structfield>class</structfield><literal> = 1</literal>. Then both transactions try to commit. 625 If either transaction were running at the Repeatable Read isolation level, 626 both would be allowed to commit; but since there is no serial order of execution 627 consistent with the result, using Serializable transactions will allow one 628 transaction to commit and will roll the other back with this message: 629 630<screen> 631ERROR: could not serialize access due to read/write dependencies among transactions 632</screen> 633 634 This is because if A had 635 executed before B, B would have computed the sum 330, not 300, and 636 similarly the other order would have resulted in a different sum 637 computed by A. 638 </para> 639 640 <para> 641 When relying on Serializable transactions to prevent anomalies, it is 642 important that any data read from a permanent user table not be 643 considered valid until the transaction which read it has successfully 644 committed. This is true even for read-only transactions, except that 645 data read within a <firstterm>deferrable</firstterm> read-only 646 transaction is known to be valid as soon as it is read, because such a 647 transaction waits until it can acquire a snapshot guaranteed to be free 648 from such problems before starting to read any data. In all other cases 649 applications must not depend on results read during a transaction that 650 later aborted; instead, they should retry the transaction until it 651 succeeds. 652 </para> 653 654 <para> 655 To guarantee true serializability <productname>PostgreSQL</productname> 656 uses <firstterm>predicate locking</firstterm>, which means that it keeps locks 657 which allow it to determine when a write would have had an impact on 658 the result of a previous read from a concurrent transaction, had it run 659 first. In <productname>PostgreSQL</productname> these locks do not 660 cause any blocking and therefore can <emphasis>not</emphasis> play any part in 661 causing a deadlock. They are used to identify and flag dependencies 662 among concurrent Serializable transactions which in certain combinations 663 can lead to serialization anomalies. In contrast, a Read Committed or 664 Repeatable Read transaction which wants to ensure data consistency may 665 need to take out a lock on an entire table, which could block other 666 users attempting to use that table, or it may use <literal>SELECT FOR 667 UPDATE</literal> or <literal>SELECT FOR SHARE</literal> which not only 668 can block other transactions but cause disk access. 669 </para> 670 671 <para> 672 Predicate locks in <productname>PostgreSQL</productname>, like in most 673 other database systems, are based on data actually accessed by a 674 transaction. These will show up in the 675 <link linkend="view-pg-locks"><structname>pg_locks</structname></link> 676 system view with a <literal>mode</literal> of <literal>SIReadLock</literal>. The 677 particular locks 678 acquired during execution of a query will depend on the plan used by 679 the query, and multiple finer-grained locks (e.g., tuple locks) may be 680 combined into fewer coarser-grained locks (e.g., page locks) during the 681 course of the transaction to prevent exhaustion of the memory used to 682 track the locks. A <literal>READ ONLY</literal> transaction may be able to 683 release its SIRead locks before completion, if it detects that no 684 conflicts can still occur which could lead to a serialization anomaly. 685 In fact, <literal>READ ONLY</literal> transactions will often be able to 686 establish that fact at startup and avoid taking any predicate locks. 687 If you explicitly request a <literal>SERIALIZABLE READ ONLY DEFERRABLE</literal> 688 transaction, it will block until it can establish this fact. (This is 689 the <emphasis>only</emphasis> case where Serializable transactions block but 690 Repeatable Read transactions don't.) On the other hand, SIRead locks 691 often need to be kept past transaction commit, until overlapping read 692 write transactions complete. 693 </para> 694 695 <para> 696 Consistent use of Serializable transactions can simplify development. 697 The guarantee that any set of successfully committed concurrent 698 Serializable transactions will have the same effect as if they were run 699 one at a time means that if you can demonstrate that a single transaction, 700 as written, will do the right thing when run by itself, you can have 701 confidence that it will do the right thing in any mix of Serializable 702 transactions, even without any information about what those other 703 transactions might do, or it will not successfully commit. It is 704 important that an environment which uses this technique have a 705 generalized way of handling serialization failures (which always return 706 with a SQLSTATE value of '40001'), because it will be very hard to 707 predict exactly which transactions might contribute to the read/write 708 dependencies and need to be rolled back to prevent serialization 709 anomalies. The monitoring of read/write dependencies has a cost, as does 710 the restart of transactions which are terminated with a serialization 711 failure, but balanced against the cost and blocking involved in use of 712 explicit locks and <literal>SELECT FOR UPDATE</literal> or <literal>SELECT FOR 713 SHARE</literal>, Serializable transactions are the best performance choice 714 for some environments. 715 </para> 716 717 <para> 718 While <productname>PostgreSQL</productname>'s Serializable transaction isolation 719 level only allows concurrent transactions to commit if it can prove there 720 is a serial order of execution that would produce the same effect, it 721 doesn't always prevent errors from being raised that would not occur in 722 true serial execution. In particular, it is possible to see unique 723 constraint violations caused by conflicts with overlapping Serializable 724 transactions even after explicitly checking that the key isn't present 725 before attempting to insert it. This can be avoided by making sure 726 that <emphasis>all</emphasis> Serializable transactions that insert potentially 727 conflicting keys explicitly check if they can do so first. For example, 728 imagine an application that asks the user for a new key and then checks 729 that it doesn't exist already by trying to select it first, or generates 730 a new key by selecting the maximum existing key and adding one. If some 731 Serializable transactions insert new keys directly without following this 732 protocol, unique constraints violations might be reported even in cases 733 where they could not occur in a serial execution of the concurrent 734 transactions. 735 </para> 736 737 <para> 738 For optimal performance when relying on Serializable transactions for 739 concurrency control, these issues should be considered: 740 741 <itemizedlist> 742 <listitem> 743 <para> 744 Declare transactions as <literal>READ ONLY</literal> when possible. 745 </para> 746 </listitem> 747 <listitem> 748 <para> 749 Control the number of active connections, using a connection pool if 750 needed. This is always an important performance consideration, but 751 it can be particularly important in a busy system using Serializable 752 transactions. 753 </para> 754 </listitem> 755 <listitem> 756 <para> 757 Don't put more into a single transaction than needed for integrity 758 purposes. 759 </para> 760 </listitem> 761 <listitem> 762 <para> 763 Don't leave connections dangling <quote>idle in transaction</quote> 764 longer than necessary. The configuration parameter 765 <xref linkend="guc-idle-in-transaction-session-timeout"/> may be used to 766 automatically disconnect lingering sessions. 767 </para> 768 </listitem> 769 <listitem> 770 <para> 771 Eliminate explicit locks, <literal>SELECT FOR UPDATE</literal>, and 772 <literal>SELECT FOR SHARE</literal> where no longer needed due to the 773 protections automatically provided by Serializable transactions. 774 </para> 775 </listitem> 776 <listitem> 777 <para> 778 When the system is forced to combine multiple page-level predicate 779 locks into a single relation-level predicate lock because the predicate 780 lock table is short of memory, an increase in the rate of serialization 781 failures may occur. You can avoid this by increasing 782 <xref linkend="guc-max-pred-locks-per-transaction"/>, 783 <xref linkend="guc-max-pred-locks-per-relation"/>, and/or 784 <xref linkend="guc-max-pred-locks-per-page"/>. 785 </para> 786 </listitem> 787 <listitem> 788 <para> 789 A sequential scan will always necessitate a relation-level predicate 790 lock. This can result in an increased rate of serialization failures. 791 It may be helpful to encourage the use of index scans by reducing 792 <xref linkend="guc-random-page-cost"/> and/or increasing 793 <xref linkend="guc-cpu-tuple-cost"/>. Be sure to weigh any decrease 794 in transaction rollbacks and restarts against any overall change in 795 query execution time. 796 </para> 797 </listitem> 798 </itemizedlist> 799 </para> 800 801 <para> 802 The Serializable isolation level is implemented using a technique known 803 in academic database literature as Serializable Snapshot Isolation, which 804 builds on Snapshot Isolation by adding checks for serialization anomalies. 805 Some differences in behavior and performance may be observed when compared 806 with other systems that use a traditional locking technique. Please see 807 <xref linkend="ports12"/> for detailed information. 808 </para> 809 </sect2> 810 </sect1> 811 812 <sect1 id="explicit-locking"> 813 <title>Explicit Locking</title> 814 815 <indexterm> 816 <primary>lock</primary> 817 </indexterm> 818 819 <para> 820 <productname>PostgreSQL</productname> provides various lock modes 821 to control concurrent access to data in tables. These modes can 822 be used for application-controlled locking in situations where 823 <acronym>MVCC</acronym> does not give the desired behavior. Also, 824 most <productname>PostgreSQL</productname> commands automatically 825 acquire locks of appropriate modes to ensure that referenced 826 tables are not dropped or modified in incompatible ways while the 827 command executes. (For example, <command>TRUNCATE</command> cannot safely be 828 executed concurrently with other operations on the same table, so it 829 obtains an <literal>ACCESS EXCLUSIVE</literal> lock on the table to 830 enforce that.) 831 </para> 832 833 <para> 834 To examine a list of the currently outstanding locks in a database 835 server, use the 836 <link linkend="view-pg-locks"><structname>pg_locks</structname></link> 837 system view. For more information on monitoring the status of the lock 838 manager subsystem, refer to <xref linkend="monitoring"/>. 839 </para> 840 841 <sect2 id="locking-tables"> 842 <title>Table-Level Locks</title> 843 844 <indexterm zone="locking-tables"> 845 <primary>LOCK</primary> 846 </indexterm> 847 848 <para> 849 The list below shows the available lock modes and the contexts in 850 which they are used automatically by 851 <productname>PostgreSQL</productname>. You can also acquire any 852 of these locks explicitly with the command <xref 853 linkend="sql-lock"/>. 854 Remember that all of these lock modes are table-level locks, 855 even if the name contains the word 856 <quote>row</quote>; the names of the lock modes are historical. 857 To some extent the names reflect the typical usage of each lock 858 mode — but the semantics are all the same. The only real difference 859 between one lock mode and another is the set of lock modes with 860 which each conflicts (see <xref linkend="table-lock-compatibility"/>). 861 Two transactions cannot hold locks of conflicting 862 modes on the same table at the same time. (However, a transaction 863 never conflicts with itself. For example, it might acquire 864 <literal>ACCESS EXCLUSIVE</literal> lock and later acquire 865 <literal>ACCESS SHARE</literal> lock on the same table.) Non-conflicting 866 lock modes can be held concurrently by many transactions. Notice in 867 particular that some lock modes are self-conflicting (for example, 868 an <literal>ACCESS EXCLUSIVE</literal> lock cannot be held by more than one 869 transaction at a time) while others are not self-conflicting (for example, 870 an <literal>ACCESS SHARE</literal> lock can be held by multiple transactions). 871 </para> 872 873 <variablelist> 874 <title>Table-Level Lock Modes</title> 875 <varlistentry> 876 <term> 877 <literal>ACCESS SHARE</literal> 878 </term> 879 <listitem> 880 <para> 881 Conflicts with the <literal>ACCESS EXCLUSIVE</literal> lock 882 mode only. 883 </para> 884 885 <para> 886 The <command>SELECT</command> command acquires a lock of this mode on 887 referenced tables. In general, any query that only <emphasis>reads</emphasis> a table 888 and does not modify it will acquire this lock mode. 889 </para> 890 </listitem> 891 </varlistentry> 892 893 <varlistentry> 894 <term> 895 <literal>ROW SHARE</literal> 896 </term> 897 <listitem> 898 <para> 899 Conflicts with the <literal>EXCLUSIVE</literal> and 900 <literal>ACCESS EXCLUSIVE</literal> lock modes. 901 </para> 902 903 <para> 904 The <command>SELECT FOR UPDATE</command> and 905 <command>SELECT FOR SHARE</command> commands acquire a 906 lock of this mode on the target table(s) (in addition to 907 <literal>ACCESS SHARE</literal> locks on any other tables 908 that are referenced but not selected 909 <option>FOR UPDATE/FOR SHARE</option>). 910 </para> 911 </listitem> 912 </varlistentry> 913 914 <varlistentry> 915 <term> 916 <literal>ROW EXCLUSIVE</literal> 917 </term> 918 <listitem> 919 <para> 920 Conflicts with the <literal>SHARE</literal>, <literal>SHARE ROW 921 EXCLUSIVE</literal>, <literal>EXCLUSIVE</literal>, and 922 <literal>ACCESS EXCLUSIVE</literal> lock modes. 923 </para> 924 925 <para> 926 The commands <command>UPDATE</command>, 927 <command>DELETE</command>, and <command>INSERT</command> 928 acquire this lock mode on the target table (in addition to 929 <literal>ACCESS SHARE</literal> locks on any other referenced 930 tables). In general, this lock mode will be acquired by any 931 command that <emphasis>modifies data</emphasis> in a table. 932 </para> 933 </listitem> 934 </varlistentry> 935 936 <varlistentry> 937 <term> 938 <literal>SHARE UPDATE EXCLUSIVE</literal> 939 </term> 940 <listitem> 941 <para> 942 Conflicts with the <literal>SHARE UPDATE EXCLUSIVE</literal>, 943 <literal>SHARE</literal>, <literal>SHARE ROW 944 EXCLUSIVE</literal>, <literal>EXCLUSIVE</literal>, and 945 <literal>ACCESS EXCLUSIVE</literal> lock modes. 946 This mode protects a table against 947 concurrent schema changes and <command>VACUUM</command> runs. 948 </para> 949 950 <para> 951 Acquired by <command>VACUUM</command> (without <option>FULL</option>), 952 <command>ANALYZE</command>, <command>CREATE INDEX CONCURRENTLY</command>, 953 <command>REINDEX CONCURRENTLY</command>, 954 <command>CREATE STATISTICS</command>, and certain <command>ALTER 955 INDEX</command> and <command>ALTER TABLE</command> variants (for full 956 details see <xref linkend="sql-alterindex"/> and <xref 957 linkend="sql-altertable"/>). 958 </para> 959 </listitem> 960 </varlistentry> 961 962 <varlistentry> 963 <term> 964 <literal>SHARE</literal> 965 </term> 966 <listitem> 967 <para> 968 Conflicts with the <literal>ROW EXCLUSIVE</literal>, 969 <literal>SHARE UPDATE EXCLUSIVE</literal>, <literal>SHARE ROW 970 EXCLUSIVE</literal>, <literal>EXCLUSIVE</literal>, and 971 <literal>ACCESS EXCLUSIVE</literal> lock modes. 972 This mode protects a table against concurrent data changes. 973 </para> 974 975 <para> 976 Acquired by <command>CREATE INDEX</command> 977 (without <option>CONCURRENTLY</option>). 978 </para> 979 </listitem> 980 </varlistentry> 981 982 <varlistentry> 983 <term> 984 <literal>SHARE ROW EXCLUSIVE</literal> 985 </term> 986 <listitem> 987 <para> 988 Conflicts with the <literal>ROW EXCLUSIVE</literal>, 989 <literal>SHARE UPDATE EXCLUSIVE</literal>, 990 <literal>SHARE</literal>, <literal>SHARE ROW 991 EXCLUSIVE</literal>, <literal>EXCLUSIVE</literal>, and 992 <literal>ACCESS EXCLUSIVE</literal> lock modes. 993 This mode protects a table against concurrent data changes, and 994 is self-exclusive so that only one session can hold it at a time. 995 </para> 996 997 <para> 998 Acquired by <command>CREATE TRIGGER</command> and some forms of 999 <command>ALTER TABLE</command> (see <xref linkend="sql-altertable"/>). 1000 </para> 1001 </listitem> 1002 </varlistentry> 1003 1004 <varlistentry> 1005 <term> 1006 <literal>EXCLUSIVE</literal> 1007 </term> 1008 <listitem> 1009 <para> 1010 Conflicts with the <literal>ROW SHARE</literal>, <literal>ROW 1011 EXCLUSIVE</literal>, <literal>SHARE UPDATE 1012 EXCLUSIVE</literal>, <literal>SHARE</literal>, <literal>SHARE 1013 ROW EXCLUSIVE</literal>, <literal>EXCLUSIVE</literal>, and 1014 <literal>ACCESS EXCLUSIVE</literal> lock modes. 1015 This mode allows only concurrent <literal>ACCESS SHARE</literal> locks, 1016 i.e., only reads from the table can proceed in parallel with a 1017 transaction holding this lock mode. 1018 </para> 1019 1020 <para> 1021 Acquired by <command>REFRESH MATERIALIZED VIEW CONCURRENTLY</command>. 1022 </para> 1023 </listitem> 1024 </varlistentry> 1025 1026 <varlistentry> 1027 <term> 1028 <literal>ACCESS EXCLUSIVE</literal> 1029 </term> 1030 <listitem> 1031 <para> 1032 Conflicts with locks of all modes (<literal>ACCESS 1033 SHARE</literal>, <literal>ROW SHARE</literal>, <literal>ROW 1034 EXCLUSIVE</literal>, <literal>SHARE UPDATE 1035 EXCLUSIVE</literal>, <literal>SHARE</literal>, <literal>SHARE 1036 ROW EXCLUSIVE</literal>, <literal>EXCLUSIVE</literal>, and 1037 <literal>ACCESS EXCLUSIVE</literal>). 1038 This mode guarantees that the 1039 holder is the only transaction accessing the table in any way. 1040 </para> 1041 1042 <para> 1043 Acquired by the <command>DROP TABLE</command>, 1044 <command>TRUNCATE</command>, <command>REINDEX</command>, 1045 <command>CLUSTER</command>, <command>VACUUM FULL</command>, 1046 and <command>REFRESH MATERIALIZED VIEW</command> (without 1047 <option>CONCURRENTLY</option>) 1048 commands. Many forms of <command>ALTER INDEX</command> and <command>ALTER TABLE</command> also acquire 1049 a lock at this level. This is also the default lock mode for 1050 <command>LOCK TABLE</command> statements that do not specify 1051 a mode explicitly. 1052 </para> 1053 </listitem> 1054 </varlistentry> 1055 </variablelist> 1056 1057 <tip> 1058 <para> 1059 Only an <literal>ACCESS EXCLUSIVE</literal> lock blocks a 1060 <command>SELECT</command> (without <option>FOR UPDATE/SHARE</option>) 1061 statement. 1062 </para> 1063 </tip> 1064 1065 <para> 1066 Once acquired, a lock is normally held until the end of the transaction. But if a 1067 lock is acquired after establishing a savepoint, the lock is released 1068 immediately if the savepoint is rolled back to. This is consistent with 1069 the principle that <command>ROLLBACK</command> cancels all effects of the 1070 commands since the savepoint. The same holds for locks acquired within a 1071 <application>PL/pgSQL</application> exception block: an error escape from the block 1072 releases locks acquired within it. 1073 </para> 1074 1075 1076 1077 <table tocentry="1" id="table-lock-compatibility"> 1078 <title> Conflicting Lock Modes</title> 1079 <tgroup cols="9"> 1080 <colspec colnum="1" colwidth="1.25*"/> 1081 <colspec colnum="2" colwidth="1*" colname="lockst"/> 1082 <colspec colnum="3" colwidth="1*"/> 1083 <colspec colnum="4" colwidth="1*"/> 1084 <colspec colnum="5" colwidth="1*"/> 1085 <colspec colnum="6" colwidth="1*"/> 1086 <colspec colnum="7" colwidth="1*"/> 1087 <colspec colnum="8" colwidth="1*"/> 1088 <colspec colnum="9" colwidth="1*" colname="lockend"/> 1089 <spanspec spanname="lockreq" namest="lockst" nameend="lockend" align="center"/> 1090 <thead> 1091 <row> 1092 <entry morerows="1">Requested Lock Mode</entry> 1093 <entry spanname="lockreq">Existing Lock Mode</entry> 1094 </row> 1095 <row> 1096 <entry><literal>ACCESS SHARE</literal></entry> 1097 <entry><literal>ROW SHARE</literal></entry> 1098 <entry><literal>ROW EXCL.</literal></entry> 1099 <entry><literal>SHARE UPDATE EXCL.</literal></entry> 1100 <entry><literal>SHARE</literal></entry> 1101 <entry><literal>SHARE ROW EXCL.</literal></entry> 1102 <entry><literal>EXCL.</literal></entry> 1103 <entry><literal>ACCESS EXCL.</literal></entry> 1104 </row> 1105 </thead> 1106 <tbody> 1107 <row> 1108 <entry><literal>ACCESS SHARE</literal></entry> 1109 <entry align="center"></entry> 1110 <entry align="center"></entry> 1111 <entry align="center"></entry> 1112 <entry align="center"></entry> 1113 <entry align="center"></entry> 1114 <entry align="center"></entry> 1115 <entry align="center"></entry> 1116 <entry align="center">X</entry> 1117 </row> 1118 <row> 1119 <entry><literal>ROW SHARE</literal></entry> 1120 <entry align="center"></entry> 1121 <entry align="center"></entry> 1122 <entry align="center"></entry> 1123 <entry align="center"></entry> 1124 <entry align="center"></entry> 1125 <entry align="center"></entry> 1126 <entry align="center">X</entry> 1127 <entry align="center">X</entry> 1128 </row> 1129 <row> 1130 <entry><literal>ROW EXCL.</literal></entry> 1131 <entry align="center"></entry> 1132 <entry align="center"></entry> 1133 <entry align="center"></entry> 1134 <entry align="center"></entry> 1135 <entry align="center">X</entry> 1136 <entry align="center">X</entry> 1137 <entry align="center">X</entry> 1138 <entry align="center">X</entry> 1139 </row> 1140 <row> 1141 <entry><literal>SHARE UPDATE EXCL.</literal></entry> 1142 <entry align="center"></entry> 1143 <entry align="center"></entry> 1144 <entry align="center"></entry> 1145 <entry align="center">X</entry> 1146 <entry align="center">X</entry> 1147 <entry align="center">X</entry> 1148 <entry align="center">X</entry> 1149 <entry align="center">X</entry> 1150 </row> 1151 <row> 1152 <entry><literal>SHARE</literal></entry> 1153 <entry align="center"></entry> 1154 <entry align="center"></entry> 1155 <entry align="center">X</entry> 1156 <entry align="center">X</entry> 1157 <entry align="center"></entry> 1158 <entry align="center">X</entry> 1159 <entry align="center">X</entry> 1160 <entry align="center">X</entry> 1161 </row> 1162 <row> 1163 <entry><literal>SHARE ROW EXCL.</literal></entry> 1164 <entry align="center"></entry> 1165 <entry align="center"></entry> 1166 <entry align="center">X</entry> 1167 <entry align="center">X</entry> 1168 <entry align="center">X</entry> 1169 <entry align="center">X</entry> 1170 <entry align="center">X</entry> 1171 <entry align="center">X</entry> 1172 </row> 1173 <row> 1174 <entry><literal>EXCL.</literal></entry> 1175 <entry align="center"></entry> 1176 <entry align="center">X</entry> 1177 <entry align="center">X</entry> 1178 <entry align="center">X</entry> 1179 <entry align="center">X</entry> 1180 <entry align="center">X</entry> 1181 <entry align="center">X</entry> 1182 <entry align="center">X</entry> 1183 </row> 1184 <row> 1185 <entry><literal>ACCESS EXCL.</literal></entry> 1186 <entry align="center">X</entry> 1187 <entry align="center">X</entry> 1188 <entry align="center">X</entry> 1189 <entry align="center">X</entry> 1190 <entry align="center">X</entry> 1191 <entry align="center">X</entry> 1192 <entry align="center">X</entry> 1193 <entry align="center">X</entry> 1194 </row> 1195 </tbody> 1196 </tgroup> 1197 </table> 1198 </sect2> 1199 1200 <sect2 id="locking-rows"> 1201 <title>Row-Level Locks</title> 1202 1203 <para> 1204 In addition to table-level locks, there are row-level locks, which 1205 are listed as below with the contexts in which they are used 1206 automatically by <productname>PostgreSQL</productname>. See 1207 <xref linkend="row-lock-compatibility"/> for a complete table of 1208 row-level lock conflicts. Note that a transaction can hold 1209 conflicting locks on the same row, even in different subtransactions; 1210 but other than that, two transactions can never hold conflicting locks 1211 on the same row. Row-level locks do not affect data querying; they 1212 block only <emphasis>writers and lockers</emphasis> to the same 1213 row. Row-level locks are released at transaction end or during 1214 savepoint rollback, just like table-level locks. 1215 1216 </para> 1217 1218 <variablelist> 1219 <title>Row-Level Lock Modes</title> 1220 <varlistentry> 1221 <term> 1222 <literal>FOR UPDATE</literal> 1223 </term> 1224 <listitem> 1225 <para> 1226 <literal>FOR UPDATE</literal> causes the rows retrieved by the 1227 <command>SELECT</command> statement to be locked as though for 1228 update. This prevents them from being locked, modified or deleted by 1229 other transactions until the current transaction ends. That is, 1230 other transactions that attempt <command>UPDATE</command>, 1231 <command>DELETE</command>, 1232 <command>SELECT FOR UPDATE</command>, 1233 <command>SELECT FOR NO KEY UPDATE</command>, 1234 <command>SELECT FOR SHARE</command> or 1235 <command>SELECT FOR KEY SHARE</command> 1236 of these rows will be blocked until the current transaction ends; 1237 conversely, <command>SELECT FOR UPDATE</command> will wait for a 1238 concurrent transaction that has run any of those commands on the 1239 same row, 1240 and will then lock and return the updated row (or no row, if the 1241 row was deleted). Within a <literal>REPEATABLE READ</literal> or 1242 <literal>SERIALIZABLE</literal> transaction, 1243 however, an error will be thrown if a row to be locked has changed 1244 since the transaction started. For further discussion see 1245 <xref linkend="applevel-consistency"/>. 1246 </para> 1247 <para> 1248 The <literal>FOR UPDATE</literal> lock mode 1249 is also acquired by any <command>DELETE</command> on a row, and also by an 1250 <command>UPDATE</command> that modifies the values of certain columns. Currently, 1251 the set of columns considered for the <command>UPDATE</command> case are those that 1252 have a unique index on them that can be used in a foreign key (so partial 1253 indexes and expressional indexes are not considered), but this may change 1254 in the future. 1255 </para> 1256 </listitem> 1257 </varlistentry> 1258 1259 <varlistentry> 1260 <term> 1261 <literal>FOR NO KEY UPDATE</literal> 1262 </term> 1263 <listitem> 1264 <para> 1265 Behaves similarly to <literal>FOR UPDATE</literal>, except that the lock 1266 acquired is weaker: this lock will not block 1267 <literal>SELECT FOR KEY SHARE</literal> commands that attempt to acquire 1268 a lock on the same rows. This lock mode is also acquired by any 1269 <command>UPDATE</command> that does not acquire a <literal>FOR UPDATE</literal> lock. 1270 </para> 1271 </listitem> 1272 </varlistentry> 1273 1274 <varlistentry> 1275 <term> 1276 <literal>FOR SHARE</literal> 1277 </term> 1278 <listitem> 1279 <para> 1280 Behaves similarly to <literal>FOR NO KEY UPDATE</literal>, except that it 1281 acquires a shared lock rather than exclusive lock on each retrieved 1282 row. A shared lock blocks other transactions from performing 1283 <command>UPDATE</command>, <command>DELETE</command>, 1284 <command>SELECT FOR UPDATE</command> or 1285 <command>SELECT FOR NO KEY UPDATE</command> on these rows, but it does not 1286 prevent them from performing <command>SELECT FOR SHARE</command> or 1287 <command>SELECT FOR KEY SHARE</command>. 1288 </para> 1289 </listitem> 1290 </varlistentry> 1291 1292 <varlistentry> 1293 <term> 1294 <literal>FOR KEY SHARE</literal> 1295 </term> 1296 <listitem> 1297 <para> 1298 Behaves similarly to <literal>FOR SHARE</literal>, except that the 1299 lock is weaker: <literal>SELECT FOR UPDATE</literal> is blocked, but not 1300 <literal>SELECT FOR NO KEY UPDATE</literal>. A key-shared lock blocks 1301 other transactions from performing <command>DELETE</command> or 1302 any <command>UPDATE</command> that changes the key values, but not 1303 other <command>UPDATE</command>, and neither does it prevent 1304 <command>SELECT FOR NO KEY UPDATE</command>, <command>SELECT FOR SHARE</command>, 1305 or <command>SELECT FOR KEY SHARE</command>. 1306 </para> 1307 </listitem> 1308 </varlistentry> 1309 </variablelist> 1310 1311 <para> 1312 <productname>PostgreSQL</productname> doesn't remember any 1313 information about modified rows in memory, so there is no limit on 1314 the number of rows locked at one time. However, locking a row 1315 might cause a disk write, e.g., <command>SELECT FOR 1316 UPDATE</command> modifies selected rows to mark them locked, and so 1317 will result in disk writes. 1318 </para> 1319 1320 <table tocentry="1" id="row-lock-compatibility"> 1321 <title>Conflicting Row-Level Locks</title> 1322 <tgroup cols="5"> 1323 <colspec colname="col1" colwidth="1.5*"/> 1324 <colspec colname="lockst" colwidth="1*"/> 1325 <colspec colname="col3" colwidth="1*"/> 1326 <colspec colname="col4" colwidth="1*"/> 1327 <colspec colname="lockend" colwidth="1*"/> 1328 <spanspec namest="lockst" nameend="lockend" spanname="lockreq"/> 1329 <thead> 1330 <row> 1331 <entry morerows="1">Requested Lock Mode</entry> 1332 <entry spanname="lockreq">Current Lock Mode</entry> 1333 </row> 1334 <row> 1335 <entry>FOR KEY SHARE</entry> 1336 <entry>FOR SHARE</entry> 1337 <entry>FOR NO KEY UPDATE</entry> 1338 <entry>FOR UPDATE</entry> 1339 </row> 1340 </thead> 1341 <tbody> 1342 <row> 1343 <entry>FOR KEY SHARE</entry> 1344 <entry align="center"></entry> 1345 <entry align="center"></entry> 1346 <entry align="center"></entry> 1347 <entry align="center">X</entry> 1348 </row> 1349 <row> 1350 <entry>FOR SHARE</entry> 1351 <entry align="center"></entry> 1352 <entry align="center"></entry> 1353 <entry align="center">X</entry> 1354 <entry align="center">X</entry> 1355 </row> 1356 <row> 1357 <entry>FOR NO KEY UPDATE</entry> 1358 <entry align="center"></entry> 1359 <entry align="center">X</entry> 1360 <entry align="center">X</entry> 1361 <entry align="center">X</entry> 1362 </row> 1363 <row> 1364 <entry>FOR UPDATE</entry> 1365 <entry align="center">X</entry> 1366 <entry align="center">X</entry> 1367 <entry align="center">X</entry> 1368 <entry align="center">X</entry> 1369 </row> 1370 </tbody> 1371 </tgroup> 1372 </table> 1373 </sect2> 1374 1375 <sect2 id="locking-pages"> 1376 <title>Page-Level Locks</title> 1377 1378 <para> 1379 In addition to table and row locks, page-level share/exclusive locks are 1380 used to control read/write access to table pages in the shared buffer 1381 pool. These locks are released immediately after a row is fetched or 1382 updated. Application developers normally need not be concerned with 1383 page-level locks, but they are mentioned here for completeness. 1384 </para> 1385 1386 </sect2> 1387 1388 <sect2 id="locking-deadlocks"> 1389 <title>Deadlocks</title> 1390 1391 <indexterm zone="locking-deadlocks"> 1392 <primary>deadlock</primary> 1393 </indexterm> 1394 1395 <para> 1396 The use of explicit locking can increase the likelihood of 1397 <firstterm>deadlocks</firstterm>, wherein two (or more) transactions each 1398 hold locks that the other wants. For example, if transaction 1 1399 acquires an exclusive lock on table A and then tries to acquire 1400 an exclusive lock on table B, while transaction 2 has already 1401 exclusive-locked table B and now wants an exclusive lock on table 1402 A, then neither one can proceed. 1403 <productname>PostgreSQL</productname> automatically detects 1404 deadlock situations and resolves them by aborting one of the 1405 transactions involved, allowing the other(s) to complete. 1406 (Exactly which transaction will be aborted is difficult to 1407 predict and should not be relied upon.) 1408 </para> 1409 1410 <para> 1411 Note that deadlocks can also occur as the result of row-level 1412 locks (and thus, they can occur even if explicit locking is not 1413 used). Consider the case in which two concurrent 1414 transactions modify a table. The first transaction executes: 1415 1416<screen> 1417UPDATE accounts SET balance = balance + 100.00 WHERE acctnum = 11111; 1418</screen> 1419 1420 This acquires a row-level lock on the row with the specified 1421 account number. Then, the second transaction executes: 1422 1423<screen> 1424UPDATE accounts SET balance = balance + 100.00 WHERE acctnum = 22222; 1425UPDATE accounts SET balance = balance - 100.00 WHERE acctnum = 11111; 1426</screen> 1427 1428 The first <command>UPDATE</command> statement successfully 1429 acquires a row-level lock on the specified row, so it succeeds in 1430 updating that row. However, the second <command>UPDATE</command> 1431 statement finds that the row it is attempting to update has 1432 already been locked, so it waits for the transaction that 1433 acquired the lock to complete. Transaction two is now waiting on 1434 transaction one to complete before it continues execution. Now, 1435 transaction one executes: 1436 1437<screen> 1438UPDATE accounts SET balance = balance - 100.00 WHERE acctnum = 22222; 1439</screen> 1440 1441 Transaction one attempts to acquire a row-level lock on the 1442 specified row, but it cannot: transaction two already holds such 1443 a lock. So it waits for transaction two to complete. Thus, 1444 transaction one is blocked on transaction two, and transaction 1445 two is blocked on transaction one: a deadlock 1446 condition. <productname>PostgreSQL</productname> will detect this 1447 situation and abort one of the transactions. 1448 </para> 1449 1450 <para> 1451 The best defense against deadlocks is generally to avoid them by 1452 being certain that all applications using a database acquire 1453 locks on multiple objects in a consistent order. In the example 1454 above, if both transactions 1455 had updated the rows in the same order, no deadlock would have 1456 occurred. One should also ensure that the first lock acquired on 1457 an object in a transaction is the most restrictive mode that will be 1458 needed for that object. If it is not feasible to verify this in 1459 advance, then deadlocks can be handled on-the-fly by retrying 1460 transactions that abort due to deadlocks. 1461 </para> 1462 1463 <para> 1464 So long as no deadlock situation is detected, a transaction seeking 1465 either a table-level or row-level lock will wait indefinitely for 1466 conflicting locks to be released. This means it is a bad idea for 1467 applications to hold transactions open for long periods of time 1468 (e.g., while waiting for user input). 1469 </para> 1470 </sect2> 1471 1472 <sect2 id="advisory-locks"> 1473 <title>Advisory Locks</title> 1474 1475 <indexterm zone="advisory-locks"> 1476 <primary>advisory lock</primary> 1477 </indexterm> 1478 1479 <indexterm zone="advisory-locks"> 1480 <primary>lock</primary> 1481 <secondary>advisory</secondary> 1482 </indexterm> 1483 1484 <para> 1485 <productname>PostgreSQL</productname> provides a means for 1486 creating locks that have application-defined meanings. These are 1487 called <firstterm>advisory locks</firstterm>, because the system does not 1488 enforce their use — it is up to the application to use them 1489 correctly. Advisory locks can be useful for locking strategies 1490 that are an awkward fit for the MVCC model. 1491 For example, a common use of advisory locks is to emulate pessimistic 1492 locking strategies typical of so-called <quote>flat file</quote> data 1493 management systems. 1494 While a flag stored in a table could be used for the same purpose, 1495 advisory locks are faster, avoid table bloat, and are automatically 1496 cleaned up by the server at the end of the session. 1497 </para> 1498 1499 <para> 1500 There are two ways to acquire an advisory lock in 1501 <productname>PostgreSQL</productname>: at session level or at 1502 transaction level. 1503 Once acquired at session level, an advisory lock is held until 1504 explicitly released or the session ends. Unlike standard lock requests, 1505 session-level advisory lock requests do not honor transaction semantics: 1506 a lock acquired during a transaction that is later rolled back will still 1507 be held following the rollback, and likewise an unlock is effective even 1508 if the calling transaction fails later. A lock can be acquired multiple 1509 times by its owning process; for each completed lock request there must 1510 be a corresponding unlock request before the lock is actually released. 1511 Transaction-level lock requests, on the other hand, behave more like 1512 regular lock requests: they are automatically released at the end of the 1513 transaction, and there is no explicit unlock operation. This behavior 1514 is often more convenient than the session-level behavior for short-term 1515 usage of an advisory lock. 1516 Session-level and transaction-level lock requests for the same advisory 1517 lock identifier will block each other in the expected way. 1518 If a session already holds a given advisory lock, additional requests by 1519 it will always succeed, even if other sessions are awaiting the lock; this 1520 statement is true regardless of whether the existing lock hold and new 1521 request are at session level or transaction level. 1522 </para> 1523 1524 <para> 1525 Like all locks in 1526 <productname>PostgreSQL</productname>, a complete list of advisory locks 1527 currently held by any session can be found in the <link 1528 linkend="view-pg-locks"><structname>pg_locks</structname></link> system 1529 view. 1530 </para> 1531 1532 <para> 1533 Both advisory locks and regular locks are stored in a shared memory 1534 pool whose size is defined by the configuration variables 1535 <xref linkend="guc-max-locks-per-transaction"/> and 1536 <xref linkend="guc-max-connections"/>. 1537 Care must be taken not to exhaust this 1538 memory or the server will be unable to grant any locks at all. 1539 This imposes an upper limit on the number of advisory locks 1540 grantable by the server, typically in the tens to hundreds of thousands 1541 depending on how the server is configured. 1542 </para> 1543 1544 <para> 1545 In certain cases using advisory locking methods, especially in queries 1546 involving explicit ordering and <literal>LIMIT</literal> clauses, care must be 1547 taken to control the locks acquired because of the order in which SQL 1548 expressions are evaluated. For example: 1549<screen> 1550SELECT pg_advisory_lock(id) FROM foo WHERE id = 12345; -- ok 1551SELECT pg_advisory_lock(id) FROM foo WHERE id > 12345 LIMIT 100; -- danger! 1552SELECT pg_advisory_lock(q.id) FROM 1553( 1554 SELECT id FROM foo WHERE id > 12345 LIMIT 100 1555) q; -- ok 1556</screen> 1557 In the above queries, the second form is dangerous because the 1558 <literal>LIMIT</literal> is not guaranteed to be applied before the locking 1559 function is executed. This might cause some locks to be acquired 1560 that the application was not expecting, and hence would fail to release 1561 (until it ends the session). 1562 From the point of view of the application, such locks 1563 would be dangling, although still viewable in 1564 <structname>pg_locks</structname>. 1565 </para> 1566 1567 <para> 1568 The functions provided to manipulate advisory locks are described in 1569 <xref linkend="functions-advisory-locks"/>. 1570 </para> 1571 </sect2> 1572 1573 </sect1> 1574 1575 <sect1 id="applevel-consistency"> 1576 <title>Data Consistency Checks at the Application Level</title> 1577 1578 <para> 1579 It is very difficult to enforce business rules regarding data integrity 1580 using Read Committed transactions because the view of the data is 1581 shifting with each statement, and even a single statement may not 1582 restrict itself to the statement's snapshot if a write conflict occurs. 1583 </para> 1584 1585 <para> 1586 While a Repeatable Read transaction has a stable view of the data 1587 throughout its execution, there is a subtle issue with using 1588 <acronym>MVCC</acronym> snapshots for data consistency checks, involving 1589 something known as <firstterm>read/write conflicts</firstterm>. 1590 If one transaction writes data and a concurrent transaction attempts 1591 to read the same data (whether before or after the write), it cannot 1592 see the work of the other transaction. The reader then appears to have 1593 executed first regardless of which started first or which committed 1594 first. If that is as far as it goes, there is no problem, but 1595 if the reader also writes data which is read by a concurrent transaction 1596 there is now a transaction which appears to have run before either of 1597 the previously mentioned transactions. If the transaction which appears 1598 to have executed last actually commits first, it is very easy for a 1599 cycle to appear in a graph of the order of execution of the transactions. 1600 When such a cycle appears, integrity checks will not work correctly 1601 without some help. 1602 </para> 1603 1604 <para> 1605 As mentioned in <xref linkend="xact-serializable"/>, Serializable 1606 transactions are just Repeatable Read transactions which add 1607 nonblocking monitoring for dangerous patterns of read/write conflicts. 1608 When a pattern is detected which could cause a cycle in the apparent 1609 order of execution, one of the transactions involved is rolled back to 1610 break the cycle. 1611 </para> 1612 1613 <sect2 id="serializable-consistency"> 1614 <title>Enforcing Consistency with Serializable Transactions</title> 1615 1616 <para> 1617 If the Serializable transaction isolation level is used for all writes 1618 and for all reads which need a consistent view of the data, no other 1619 effort is required to ensure consistency. Software from other 1620 environments which is written to use serializable transactions to 1621 ensure consistency should <quote>just work</quote> in this regard in 1622 <productname>PostgreSQL</productname>. 1623 </para> 1624 1625 <para> 1626 When using this technique, it will avoid creating an unnecessary burden 1627 for application programmers if the application software goes through a 1628 framework which automatically retries transactions which are rolled 1629 back with a serialization failure. It may be a good idea to set 1630 <literal>default_transaction_isolation</literal> to <literal>serializable</literal>. 1631 It would also be wise to take some action to ensure that no other 1632 transaction isolation level is used, either inadvertently or to 1633 subvert integrity checks, through checks of the transaction isolation 1634 level in triggers. 1635 </para> 1636 1637 <para> 1638 See <xref linkend="xact-serializable"/> for performance suggestions. 1639 </para> 1640 1641 <warning> 1642 <para> 1643 This level of integrity protection using Serializable transactions 1644 does not yet extend to hot standby mode (<xref linkend="hot-standby"/>). 1645 Because of that, those using hot standby may want to use Repeatable 1646 Read and explicit locking on the master. 1647 </para> 1648 </warning> 1649 </sect2> 1650 1651 <sect2 id="non-serializable-consistency"> 1652 <title>Enforcing Consistency with Explicit Blocking Locks</title> 1653 1654 <para> 1655 When non-serializable writes are possible, 1656 to ensure the current validity of a row and protect it against 1657 concurrent updates one must use <command>SELECT FOR UPDATE</command>, 1658 <command>SELECT FOR SHARE</command>, or an appropriate <command>LOCK 1659 TABLE</command> statement. (<command>SELECT FOR UPDATE</command> 1660 and <command>SELECT FOR SHARE</command> lock just the 1661 returned rows against concurrent updates, while <command>LOCK 1662 TABLE</command> locks the whole table.) This should be taken into 1663 account when porting applications to 1664 <productname>PostgreSQL</productname> from other environments. 1665 </para> 1666 1667 <para> 1668 Also of note to those converting from other environments is the fact 1669 that <command>SELECT FOR UPDATE</command> does not ensure that a 1670 concurrent transaction will not update or delete a selected row. 1671 To do that in <productname>PostgreSQL</productname> you must actually 1672 update the row, even if no values need to be changed. 1673 <command>SELECT FOR UPDATE</command> <emphasis>temporarily blocks</emphasis> 1674 other transactions from acquiring the same lock or executing an 1675 <command>UPDATE</command> or <command>DELETE</command> which would 1676 affect the locked row, but once the transaction holding this lock 1677 commits or rolls back, a blocked transaction will proceed with the 1678 conflicting operation unless an actual <command>UPDATE</command> of 1679 the row was performed while the lock was held. 1680 </para> 1681 1682 <para> 1683 Global validity checks require extra thought under 1684 non-serializable <acronym>MVCC</acronym>. 1685 For example, a banking application might wish to check that the sum of 1686 all credits in one table equals the sum of debits in another table, 1687 when both tables are being actively updated. Comparing the results of two 1688 successive <literal>SELECT sum(...)</literal> commands will not work reliably in 1689 Read Committed mode, since the second query will likely include the results 1690 of transactions not counted by the first. Doing the two sums in a 1691 single repeatable read transaction will give an accurate picture of only the 1692 effects of transactions that committed before the repeatable read transaction 1693 started — but one might legitimately wonder whether the answer is still 1694 relevant by the time it is delivered. If the repeatable read transaction 1695 itself applied some changes before trying to make the consistency check, 1696 the usefulness of the check becomes even more debatable, since now it 1697 includes some but not all post-transaction-start changes. In such cases 1698 a careful person might wish to lock all tables needed for the check, 1699 in order to get an indisputable picture of current reality. A 1700 <literal>SHARE</literal> mode (or higher) lock guarantees that there are no 1701 uncommitted changes in the locked table, other than those of the current 1702 transaction. 1703 </para> 1704 1705 <para> 1706 Note also that if one is relying on explicit locking to prevent concurrent 1707 changes, one should either use Read Committed mode, or in Repeatable Read 1708 mode be careful to obtain 1709 locks before performing queries. A lock obtained by a 1710 repeatable read transaction guarantees that no other transactions modifying 1711 the table are still running, but if the snapshot seen by the 1712 transaction predates obtaining the lock, it might predate some now-committed 1713 changes in the table. A repeatable read transaction's snapshot is actually 1714 frozen at the start of its first query or data-modification command 1715 (<literal>SELECT</literal>, <literal>INSERT</literal>, 1716 <literal>UPDATE</literal>, or <literal>DELETE</literal>), so 1717 it is possible to obtain locks explicitly before the snapshot is 1718 frozen. 1719 </para> 1720 </sect2> 1721 </sect1> 1722 1723 <sect1 id="mvcc-caveats"> 1724 <title>Caveats</title> 1725 1726 <para> 1727 Some DDL commands, currently only <xref linkend="sql-truncate"/> and the 1728 table-rewriting forms of <xref linkend="sql-altertable"/>, are not 1729 MVCC-safe. This means that after the truncation or rewrite commits, the 1730 table will appear empty to concurrent transactions, if they are using a 1731 snapshot taken before the DDL command committed. This will only be an 1732 issue for a transaction that did not access the table in question 1733 before the DDL command started — any transaction that has done so 1734 would hold at least an <literal>ACCESS SHARE</literal> table lock, 1735 which would block the DDL command until that transaction completes. 1736 So these commands will not cause any apparent inconsistency in the 1737 table contents for successive queries on the target table, but they 1738 could cause visible inconsistency between the contents of the target 1739 table and other tables in the database. 1740 </para> 1741 1742 <para> 1743 Support for the Serializable transaction isolation level has not yet 1744 been added to Hot Standby replication targets (described in 1745 <xref linkend="hot-standby"/>). The strictest isolation level currently 1746 supported in hot standby mode is Repeatable Read. While performing all 1747 permanent database writes within Serializable transactions on the 1748 master will ensure that all standbys will eventually reach a consistent 1749 state, a Repeatable Read transaction run on the standby can sometimes 1750 see a transient state that is inconsistent with any serial execution 1751 of the transactions on the master. 1752 </para> 1753 1754 <para> 1755 Internal access to the system catalogs is not done using the isolation 1756 level of the current transaction. This means that newly created database 1757 objects such as tables are visible to concurrent Repeatable Read and 1758 Serializable transactions, even though the rows they contain are not. In 1759 contrast, queries that explicitly examine the system catalogs don't see 1760 rows representing concurrently created database objects, in the higher 1761 isolation levels. 1762 </para> 1763 </sect1> 1764 1765 <sect1 id="locking-indexes"> 1766 <title>Locking and Indexes</title> 1767 1768 <indexterm zone="locking-indexes"> 1769 <primary>index</primary> 1770 <secondary>locks</secondary> 1771 </indexterm> 1772 1773 <para> 1774 Though <productname>PostgreSQL</productname> 1775 provides nonblocking read/write access to table 1776 data, nonblocking read/write access is not currently offered for every 1777 index access method implemented 1778 in <productname>PostgreSQL</productname>. 1779 The various index types are handled as follows: 1780 1781 <variablelist> 1782 <varlistentry> 1783 <term> 1784 B-tree, <acronym>GiST</acronym> and <acronym>SP-GiST</acronym> indexes 1785 </term> 1786 <listitem> 1787 <para> 1788 Short-term share/exclusive page-level locks are used for 1789 read/write access. Locks are released immediately after each 1790 index row is fetched or inserted. These index types provide 1791 the highest concurrency without deadlock conditions. 1792 </para> 1793 </listitem> 1794 </varlistentry> 1795 1796 <varlistentry> 1797 <term> 1798 Hash indexes 1799 </term> 1800 <listitem> 1801 <para> 1802 Share/exclusive hash-bucket-level locks are used for read/write 1803 access. Locks are released after the whole bucket is processed. 1804 Bucket-level locks provide better concurrency than index-level 1805 ones, but deadlock is possible since the locks are held longer 1806 than one index operation. 1807 </para> 1808 </listitem> 1809 </varlistentry> 1810 1811 <varlistentry> 1812 <term> 1813 <acronym>GIN</acronym> indexes 1814 </term> 1815 <listitem> 1816 <para> 1817 Short-term share/exclusive page-level locks are used for 1818 read/write access. Locks are released immediately after each 1819 index row is fetched or inserted. But note that insertion of a 1820 GIN-indexed value usually produces several index key insertions 1821 per row, so GIN might do substantial work for a single value's 1822 insertion. 1823 </para> 1824 </listitem> 1825 </varlistentry> 1826 </variablelist> 1827 </para> 1828 1829 <para> 1830 Currently, B-tree indexes offer the best performance for concurrent 1831 applications; since they also have more features than hash 1832 indexes, they are the recommended index type for concurrent 1833 applications that need to index scalar data. When dealing with 1834 non-scalar data, B-trees are not useful, and GiST, SP-GiST or GIN 1835 indexes should be used instead. 1836 </para> 1837 </sect1> 1838 </chapter> 1839