1<!-- doc/src/sgml/mvcc.sgml -->
2
3 <chapter id="mvcc">
4  <title>Concurrency Control</title>
5
6  <indexterm>
7   <primary>concurrency</primary>
8  </indexterm>
9
10  <para>
11   This chapter describes the behavior of the
12   <productname>PostgreSQL</productname> database system when two or
13   more sessions try to access the same data at the same time.  The
14   goals in that situation are to allow efficient access for all
15   sessions while maintaining strict data integrity.  Every developer
16   of database applications should be familiar with the topics covered
17   in this chapter.
18  </para>
19
20  <sect1 id="mvcc-intro">
21   <title>Introduction</title>
22
23   <indexterm>
24    <primary>Multiversion Concurrency Control</primary>
25   </indexterm>
26
27   <indexterm>
28    <primary>MVCC</primary>
29   </indexterm>
30
31   <indexterm>
32    <primary>Serializable Snapshot Isolation</primary>
33   </indexterm>
34
35   <indexterm>
36    <primary>SSI</primary>
37   </indexterm>
38
39   <para>
40    <productname>PostgreSQL</productname> provides a rich set of tools
41    for developers to manage concurrent access to data.  Internally,
42    data consistency is maintained by using a multiversion
43    model (Multiversion Concurrency Control, <acronym>MVCC</acronym>).
44    This means that each SQL statement sees
45    a snapshot of data (a <firstterm>database version</firstterm>)
46    as it was some
47    time ago, regardless of the current state of the underlying data.
48    This prevents statements from viewing inconsistent data produced
49    by concurrent transactions performing updates on the same
50    data rows, providing <firstterm>transaction isolation</firstterm>
51    for each database session.  <acronym>MVCC</acronym>, by eschewing
52    the locking methodologies of traditional database systems,
53    minimizes lock contention in order to allow for reasonable
54    performance in multiuser environments.
55   </para>
56
57   <para>
58    The main advantage of using the <acronym>MVCC</acronym> model of
59    concurrency control rather than locking is that in
60    <acronym>MVCC</acronym> locks acquired for querying (reading) data
61    do not conflict with locks acquired for writing data, and so
62    reading never blocks writing and writing never blocks reading.
63    <productname>PostgreSQL</productname> maintains this guarantee
64    even when providing the strictest level of transaction
65    isolation through the use of an innovative <firstterm>Serializable
66    Snapshot Isolation</firstterm> (<acronym>SSI</acronym>) level.
67   </para>
68
69   <para>
70    Table- and row-level locking facilities are also available in
71    <productname>PostgreSQL</productname> for applications which don't
72    generally need full transaction isolation and prefer to explicitly
73    manage particular points of conflict.  However, proper
74    use of <acronym>MVCC</acronym> will generally provide better
75    performance than locks.  In addition, application-defined advisory
76    locks provide a mechanism for acquiring locks that are not tied
77    to a single transaction.
78   </para>
79  </sect1>
80
81  <sect1 id="transaction-iso">
82   <title>Transaction Isolation</title>
83
84   <indexterm>
85    <primary>transaction isolation</primary>
86   </indexterm>
87
88   <para>
89    The <acronym>SQL</acronym> standard defines four levels of
90    transaction isolation.  The most strict is Serializable,
91    which is defined by the standard in a paragraph which says that any
92    concurrent execution of a set of Serializable transactions is guaranteed
93    to produce the same effect as running them one at a time in some order.
94    The other three levels are defined in terms of phenomena, resulting from
95    interaction between concurrent transactions, which must not occur at
96    each level.  The standard notes that due to the definition of
97    Serializable, none of these phenomena are possible at that level.  (This
98    is hardly surprising -- if the effect of the transactions must be
99    consistent with having been run one at a time, how could you see any
100    phenomena caused by interactions?)
101   </para>
102
103   <para>
104    The phenomena which are prohibited at various levels are:
105
106    <variablelist>
107     <varlistentry>
108      <term>
109       dirty read
110       <indexterm><primary>dirty read</primary></indexterm>
111      </term>
112     <listitem>
113      <para>
114        A transaction reads data written by a concurrent uncommitted transaction.
115       </para>
116      </listitem>
117     </varlistentry>
118
119     <varlistentry>
120      <term>
121       nonrepeatable read
122       <indexterm><primary>nonrepeatable read</primary></indexterm>
123      </term>
124     <listitem>
125      <para>
126        A transaction re-reads data it has previously read and finds that data
127        has been modified by another transaction (that committed since the
128        initial read).
129       </para>
130      </listitem>
131     </varlistentry>
132
133     <varlistentry>
134      <term>
135       phantom read
136       <indexterm><primary>phantom read</primary></indexterm>
137      </term>
138     <listitem>
139      <para>
140        A transaction re-executes a query returning a set of rows that satisfy a
141        search condition and finds that the set of rows satisfying the condition
142        has changed due to another recently-committed transaction.
143       </para>
144      </listitem>
145     </varlistentry>
146
147     <varlistentry>
148      <term>
149       serialization anomaly
150       <indexterm><primary>serialization anomaly</primary></indexterm>
151      </term>
152     <listitem>
153      <para>
154        The result of successfully committing a group of transactions
155        is inconsistent with all possible orderings of running those
156        transactions one at a time.
157       </para>
158      </listitem>
159     </varlistentry>
160    </variablelist>
161   </para>
162
163   <para>
164    <indexterm>
165     <primary>transaction isolation level</primary>
166    </indexterm>
167    The SQL standard and PostgreSQL-implemented transaction isolation levels
168    are described in <xref linkend="mvcc-isolevel-table"/>.
169   </para>
170
171    <table tocentry="1" id="mvcc-isolevel-table">
172     <title>Transaction Isolation Levels</title>
173     <tgroup cols="5">
174      <thead>
175       <row>
176        <entry>
177         Isolation Level
178        </entry>
179        <entry>
180         Dirty Read
181        </entry>
182        <entry>
183         Nonrepeatable Read
184        </entry>
185        <entry>
186         Phantom Read
187        </entry>
188        <entry>
189         Serialization Anomaly
190        </entry>
191       </row>
192      </thead>
193      <tbody>
194       <row>
195        <entry>
196         Read uncommitted
197        </entry>
198        <entry>
199         Allowed, but not in PG
200        </entry>
201        <entry>
202         Possible
203        </entry>
204        <entry>
205         Possible
206        </entry>
207        <entry>
208         Possible
209        </entry>
210       </row>
211
212       <row>
213        <entry>
214         Read committed
215        </entry>
216        <entry>
217         Not possible
218        </entry>
219        <entry>
220         Possible
221        </entry>
222        <entry>
223         Possible
224        </entry>
225        <entry>
226         Possible
227        </entry>
228       </row>
229
230       <row>
231        <entry>
232         Repeatable read
233        </entry>
234        <entry>
235         Not possible
236        </entry>
237        <entry>
238         Not possible
239        </entry>
240        <entry>
241         Allowed, but not in PG
242        </entry>
243        <entry>
244         Possible
245        </entry>
246       </row>
247
248       <row>
249        <entry>
250         Serializable
251        </entry>
252        <entry>
253         Not possible
254        </entry>
255        <entry>
256         Not possible
257        </entry>
258        <entry>
259         Not possible
260        </entry>
261        <entry>
262         Not possible
263        </entry>
264       </row>
265      </tbody>
266     </tgroup>
267    </table>
268
269   <para>
270    In <productname>PostgreSQL</productname>, you can request any of
271    the four standard transaction isolation levels, but internally only
272    three distinct isolation levels are implemented, i.e., PostgreSQL's
273    Read Uncommitted mode behaves like Read Committed.  This is because
274    it is the only sensible way to map the standard isolation levels to
275    PostgreSQL's multiversion concurrency control architecture.
276   </para>
277
278   <para>
279    The table also shows that PostgreSQL's Repeatable Read implementation
280    does not allow phantom reads.  Stricter behavior is permitted by the
281    SQL standard: the four isolation levels only define which phenomena
282    must not happen, not which phenomena <emphasis>must</emphasis> happen.
283    The behavior of the available isolation levels is detailed in the
284    following subsections.
285   </para>
286
287   <para>
288    To set the transaction isolation level of a transaction, use the
289    command <xref linkend="sql-set-transaction"/>.
290   </para>
291
292   <important>
293     <para>
294       Some <productname>PostgreSQL</productname> data types and functions have
295       special rules regarding transactional behavior.  In particular, changes
296       made to a sequence (and therefore the counter of a
297       column declared using <type>serial</type>) are immediately visible
298       to all other transactions and are not rolled back if the transaction
299       that made the changes aborts.  See <xref linkend="functions-sequence"/>
300       and <xref linkend="datatype-serial"/>.
301     </para>
302   </important>
303
304  <sect2 id="xact-read-committed">
305   <title>Read Committed Isolation Level</title>
306
307   <indexterm>
308    <primary>transaction isolation level</primary>
309    <secondary>read committed</secondary>
310   </indexterm>
311
312   <indexterm>
313    <primary>read committed</primary>
314   </indexterm>
315
316   <para>
317    <firstterm>Read Committed</firstterm> is the default isolation
318    level in <productname>PostgreSQL</productname>.  When a transaction
319    uses this isolation level, a <command>SELECT</command> query
320    (without a <literal>FOR UPDATE/SHARE</literal> clause) sees only data
321    committed before the query began; it never sees either uncommitted
322    data or changes committed during query execution by concurrent
323    transactions.  In effect, a <command>SELECT</command> query sees
324    a snapshot of the database as of the instant the query begins to
325    run.   However, <command>SELECT</command> does see the effects
326    of previous updates executed within its own transaction, even
327    though they are not yet committed.  Also note that two successive
328    <command>SELECT</command> commands can see different data, even
329    though they are within a single transaction, if other transactions
330    commit changes after the first <command>SELECT</command> starts and
331    before the second <command>SELECT</command> starts.
332   </para>
333
334   <para>
335    <command>UPDATE</command>, <command>DELETE</command>, <command>SELECT
336    FOR UPDATE</command>, and <command>SELECT FOR SHARE</command> commands
337    behave the same as <command>SELECT</command>
338    in terms of searching for target rows: they will only find target rows
339    that were committed as of the command start time.  However, such a target
340    row might have already been updated (or deleted or locked) by
341    another concurrent transaction by the time it is found.  In this case, the
342    would-be updater will wait for the first updating transaction to commit or
343    roll back (if it is still in progress).  If the first updater rolls back,
344    then its effects are negated and the second updater can proceed with
345    updating the originally found row.  If the first updater commits, the
346    second updater will ignore the row if the first updater deleted it,
347    otherwise it will attempt to apply its operation to the updated version of
348    the row.  The search condition of the command (the <literal>WHERE</literal> clause) is
349    re-evaluated to see if the updated version of the row still matches the
350    search condition.  If so, the second updater proceeds with its operation
351    using the updated version of the row.  In the case of
352    <command>SELECT FOR UPDATE</command> and <command>SELECT FOR
353    SHARE</command>, this means it is the updated version of the row that is
354    locked and returned to the client.
355   </para>
356
357   <para>
358    <command>INSERT</command> with an <literal>ON CONFLICT DO UPDATE</literal> clause
359    behaves similarly. In Read Committed mode, each row proposed for insertion
360    will either insert or update. Unless there are unrelated errors, one of
361    those two outcomes is guaranteed.  If a conflict originates in another
362    transaction whose effects are not yet visible to the <command>INSERT
363    </command>, the <command>UPDATE</command> clause will affect that row,
364    even though possibly <emphasis>no</emphasis> version of that row is
365    conventionally visible to the command.
366   </para>
367
368   <para>
369    <command>INSERT</command> with an <literal>ON CONFLICT DO
370    NOTHING</literal> clause may have insertion not proceed for a row due to
371    the outcome of another transaction whose effects are not visible
372    to the <command>INSERT</command> snapshot.  Again, this is only
373    the case in Read Committed mode.
374   </para>
375
376   <para>
377    Because of the above rules, it is possible for an updating command to see
378    an inconsistent snapshot: it can see the effects of concurrent updating
379    commands on the same rows it is trying to update, but it
380    does not see effects of those commands on other rows in the database.
381    This behavior makes Read Committed mode unsuitable for commands that
382    involve complex search conditions; however, it is just right for simpler
383    cases.  For example, consider updating bank balances with transactions
384    like:
385
386<screen>
387BEGIN;
388UPDATE accounts SET balance = balance + 100.00 WHERE acctnum = 12345;
389UPDATE accounts SET balance = balance - 100.00 WHERE acctnum = 7534;
390COMMIT;
391</screen>
392
393    If two such transactions concurrently try to change the balance of account
394    12345, we clearly want the second transaction to start with the updated
395    version of the account's row.  Because each command is affecting only a
396    predetermined row, letting it see the updated version of the row does
397    not create any troublesome inconsistency.
398   </para>
399
400   <para>
401    More complex usage can produce undesirable results in Read Committed
402    mode.  For example, consider a <command>DELETE</command> command
403    operating on data that is being both added and removed from its
404    restriction criteria by another command, e.g., assume
405    <literal>website</literal> is a two-row table with
406    <literal>website.hits</literal> equaling <literal>9</literal> and
407    <literal>10</literal>:
408
409<screen>
410BEGIN;
411UPDATE website SET hits = hits + 1;
412-- run from another session:  DELETE FROM website WHERE hits = 10;
413COMMIT;
414</screen>
415
416    The <command>DELETE</command> will have no effect even though
417    there is a <literal>website.hits = 10</literal> row before and
418    after the <command>UPDATE</command>. This occurs because the
419    pre-update row value <literal>9</literal> is skipped, and when the
420    <command>UPDATE</command> completes and <command>DELETE</command>
421    obtains a lock, the new row value is no longer <literal>10</literal> but
422    <literal>11</literal>, which no longer matches the criteria.
423   </para>
424
425   <para>
426    Because Read Committed mode starts each command with a new snapshot
427    that includes all transactions committed up to that instant,
428    subsequent commands in the same transaction will see the effects
429    of the committed concurrent transaction in any case.  The point
430    at issue above is whether or not a <emphasis>single</emphasis> command
431    sees an absolutely consistent view of the database.
432   </para>
433
434   <para>
435    The partial transaction isolation provided by Read Committed mode
436    is adequate for many applications, and this mode is fast and simple
437    to use;  however, it is not sufficient for all cases.  Applications
438    that do complex queries and updates might require a more rigorously
439    consistent view of the database than Read Committed mode provides.
440   </para>
441  </sect2>
442
443  <sect2 id="xact-repeatable-read">
444   <title>Repeatable Read Isolation Level</title>
445
446   <indexterm>
447    <primary>transaction isolation level</primary>
448    <secondary>repeatable read</secondary>
449   </indexterm>
450
451   <indexterm>
452    <primary>repeatable read</primary>
453   </indexterm>
454
455   <para>
456    The <firstterm>Repeatable Read</firstterm> isolation level only sees
457    data committed before the transaction began; it never sees either
458    uncommitted data or changes committed during transaction execution
459    by concurrent transactions.  (However, the query does see the
460    effects of previous updates executed within its own transaction,
461    even though they are not yet committed.)  This is a stronger
462    guarantee than is required by the <acronym>SQL</acronym> standard
463    for this isolation level, and prevents all of the phenomena described
464    in <xref linkend="mvcc-isolevel-table"/> except for serialization
465    anomalies.  As mentioned above, this is
466    specifically allowed by the standard, which only describes the
467    <emphasis>minimum</emphasis> protections each isolation level must
468    provide.
469   </para>
470
471   <para>
472    This level is different from Read Committed in that a query in a
473    repeatable read transaction sees a snapshot as of the start of the
474    first non-transaction-control statement in the
475    <emphasis>transaction</emphasis>, not as of the start
476    of the current statement within the transaction.  Thus, successive
477    <command>SELECT</command> commands within a <emphasis>single</emphasis>
478    transaction see the same data, i.e., they do not see changes made by
479    other transactions that committed after their own transaction started.
480   </para>
481
482   <para>
483    Applications using this level must be prepared to retry transactions
484    due to serialization failures.
485   </para>
486
487   <para>
488    <command>UPDATE</command>, <command>DELETE</command>, <command>SELECT
489    FOR UPDATE</command>, and <command>SELECT FOR SHARE</command> commands
490    behave the same as <command>SELECT</command>
491    in terms of searching for target rows: they will only find target rows
492    that were committed as of the transaction start time.  However, such a
493    target row might have already been updated (or deleted or locked) by
494    another concurrent transaction by the time it is found.  In this case, the
495    repeatable read transaction will wait for the first updating transaction to commit or
496    roll back (if it is still in progress).  If the first updater rolls back,
497    then its effects are negated and the repeatable read transaction can proceed
498    with updating the originally found row.  But if the first updater commits
499    (and actually updated or deleted the row, not just locked it)
500    then the repeatable read transaction will be rolled back with the message
501
502<screen>
503ERROR:  could not serialize access due to concurrent update
504</screen>
505
506    because a repeatable read transaction cannot modify or lock rows changed by
507    other transactions after the repeatable read transaction began.
508   </para>
509
510   <para>
511    When an application receives this error message, it should abort
512    the current transaction and retry the whole transaction from
513    the beginning.  The second time through, the transaction will see the
514    previously-committed change as part of its initial view of the database,
515    so there is no logical conflict in using the new version of the row
516    as the starting point for the new transaction's update.
517   </para>
518
519   <para>
520    Note that only updating transactions might need to be retried; read-only
521    transactions will never have serialization conflicts.
522   </para>
523
524   <para>
525    The Repeatable Read mode provides a rigorous guarantee that each
526    transaction sees a completely stable view of the database.  However,
527    this view will not necessarily always be consistent with some serial
528    (one at a time) execution of concurrent transactions of the same level.
529    For example, even a read only transaction at this level may see a
530    control record updated to show that a batch has been completed but
531    <emphasis>not</emphasis> see one of the detail records which is logically
532    part of the batch because it read an earlier revision of the control
533    record.  Attempts to enforce business rules by transactions running at
534    this isolation level are not likely to work correctly without careful use
535    of explicit locks to block conflicting transactions.
536   </para>
537
538   <para>
539    The Repeatable Read isolation level is implemented using a technique
540    known in academic database literature and in some other database products
541    as <firstterm>Snapshot Isolation</firstterm>.  Differences in behavior
542    and performance may be observed when compared with systems that use a
543    traditional locking technique that reduces concurrency.  Some other
544    systems may even offer Repeatable Read and Snapshot Isolation as distinct
545    isolation levels with different behavior.  The permitted phenomena that
546    distinguish the two techniques were not formalized by database researchers
547    until after the SQL standard was developed, and are outside the scope of
548    this manual.  For a full treatment, please see
549    <xref linkend="berenson95"/>.
550   </para>
551
552   <note>
553    <para>
554     Prior to <productname>PostgreSQL</productname> version 9.1, a request
555     for the Serializable transaction isolation level provided exactly the
556     same behavior described here.  To retain the legacy Serializable
557     behavior, Repeatable Read should now be requested.
558    </para>
559   </note>
560  </sect2>
561
562  <sect2 id="xact-serializable">
563   <title>Serializable Isolation Level</title>
564
565   <indexterm>
566    <primary>transaction isolation level</primary>
567    <secondary>serializable</secondary>
568   </indexterm>
569
570   <indexterm>
571    <primary>serializable</primary>
572   </indexterm>
573
574   <indexterm>
575    <primary>predicate locking</primary>
576   </indexterm>
577
578   <indexterm>
579    <primary>serialization anomaly</primary>
580   </indexterm>
581
582   <para>
583    The <firstterm>Serializable</firstterm> isolation level provides
584    the strictest transaction isolation.  This level emulates serial
585    transaction execution for all committed transactions;
586    as if transactions had been executed one after another, serially,
587    rather than concurrently.  However, like the Repeatable Read level,
588    applications using this level must
589    be prepared to retry transactions due to serialization failures.
590    In fact, this isolation level works exactly the same as Repeatable
591    Read except that it monitors for conditions which could make
592    execution of a concurrent set of serializable transactions behave
593    in a manner inconsistent with all possible serial (one at a time)
594    executions of those transactions.  This monitoring does not
595    introduce any blocking beyond that present in repeatable read, but
596    there is some overhead to the monitoring, and detection of the
597    conditions which could cause a
598    <firstterm>serialization anomaly</firstterm> will trigger a
599    <firstterm>serialization failure</firstterm>.
600   </para>
601
602   <para>
603    As an example,
604    consider a table <structname>mytab</structname>, initially containing:
605<screen>
606 class | value
607-------+-------
608     1 |    10
609     1 |    20
610     2 |   100
611     2 |   200
612</screen>
613    Suppose that serializable transaction A computes:
614<screen>
615SELECT SUM(value) FROM mytab WHERE class = 1;
616</screen>
617    and then inserts the result (30) as the <structfield>value</structfield> in a
618    new row with <structfield>class</structfield><literal> = 2</literal>.  Concurrently, serializable
619    transaction B computes:
620<screen>
621SELECT SUM(value) FROM mytab WHERE class = 2;
622</screen>
623    and obtains the result 300, which it inserts in a new row with
624    <structfield>class</structfield><literal> = 1</literal>.  Then both transactions try to commit.
625    If either transaction were running at the Repeatable Read isolation level,
626    both would be allowed to commit; but since there is no serial order of execution
627    consistent with the result, using Serializable transactions will allow one
628    transaction to commit and will roll the other back with this message:
629
630<screen>
631ERROR:  could not serialize access due to read/write dependencies among transactions
632</screen>
633
634    This is because if A had
635    executed before B, B would have computed the sum 330, not 300, and
636    similarly the other order would have resulted in a different sum
637    computed by A.
638   </para>
639
640   <para>
641    When relying on Serializable transactions to prevent anomalies, it is
642    important that any data read from a permanent user table not be
643    considered valid until the transaction which read it has successfully
644    committed.  This is true even for read-only transactions, except that
645    data read within a <firstterm>deferrable</firstterm> read-only
646    transaction is known to be valid as soon as it is read, because such a
647    transaction waits until it can acquire a snapshot guaranteed to be free
648    from such problems before starting to read any data.  In all other cases
649    applications must not depend on results read during a transaction that
650    later aborted; instead, they should retry the transaction until it
651    succeeds.
652   </para>
653
654   <para>
655    To guarantee true serializability <productname>PostgreSQL</productname>
656    uses <firstterm>predicate locking</firstterm>, which means that it keeps locks
657    which allow it to determine when a write would have had an impact on
658    the result of a previous read from a concurrent transaction, had it run
659    first.  In <productname>PostgreSQL</productname> these locks do not
660    cause any blocking and therefore can <emphasis>not</emphasis> play any part in
661    causing a deadlock.  They are used to identify and flag dependencies
662    among concurrent Serializable transactions which in certain combinations
663    can lead to serialization anomalies.  In contrast, a Read Committed or
664    Repeatable Read transaction which wants to ensure data consistency may
665    need to take out a lock on an entire table, which could block other
666    users attempting to use that table, or it may use <literal>SELECT FOR
667    UPDATE</literal> or <literal>SELECT FOR SHARE</literal> which not only
668    can block other transactions but cause disk access.
669   </para>
670
671   <para>
672    Predicate locks in <productname>PostgreSQL</productname>, like in most
673    other database systems, are based on data actually accessed by a
674    transaction.  These will show up in the
675    <link linkend="view-pg-locks"><structname>pg_locks</structname></link>
676    system view with a <literal>mode</literal> of <literal>SIReadLock</literal>.  The
677    particular locks
678    acquired during execution of a query will depend on the plan used by
679    the query, and multiple finer-grained locks (e.g., tuple locks) may be
680    combined into fewer coarser-grained locks (e.g., page locks) during the
681    course of the transaction to prevent exhaustion of the memory used to
682    track the locks.  A <literal>READ ONLY</literal> transaction may be able to
683    release its SIRead locks before completion, if it detects that no
684    conflicts can still occur which could lead to a serialization anomaly.
685    In fact, <literal>READ ONLY</literal> transactions will often be able to
686    establish that fact at startup and avoid taking any predicate locks.
687    If you explicitly request a <literal>SERIALIZABLE READ ONLY DEFERRABLE</literal>
688    transaction, it will block until it can establish this fact.  (This is
689    the <emphasis>only</emphasis> case where Serializable transactions block but
690    Repeatable Read transactions don't.)  On the other hand, SIRead locks
691    often need to be kept past transaction commit, until overlapping read
692    write transactions complete.
693   </para>
694
695   <para>
696    Consistent use of Serializable transactions can simplify development.
697    The guarantee that any set of successfully committed concurrent
698    Serializable transactions will have the same effect as if they were run
699    one at a time means that if you can demonstrate that a single transaction,
700    as written, will do the right thing when run by itself, you can have
701    confidence that it will do the right thing in any mix of Serializable
702    transactions, even without any information about what those other
703    transactions might do, or it will not successfully commit.  It is
704    important that an environment which uses this technique have a
705    generalized way of handling serialization failures (which always return
706    with a SQLSTATE value of '40001'), because it will be very hard to
707    predict exactly which transactions might contribute to the read/write
708    dependencies and need to be rolled back to prevent serialization
709    anomalies.  The monitoring of read/write dependencies has a cost, as does
710    the restart of transactions which are terminated with a serialization
711    failure, but balanced against the cost and blocking involved in use of
712    explicit locks and <literal>SELECT FOR UPDATE</literal> or <literal>SELECT FOR
713    SHARE</literal>, Serializable transactions are the best performance choice
714    for some environments.
715   </para>
716
717   <para>
718    While <productname>PostgreSQL</productname>'s Serializable transaction isolation
719    level only allows concurrent transactions to commit if it can prove there
720    is a serial order of execution that would produce the same effect, it
721    doesn't always prevent errors from being raised that would not occur in
722    true serial execution.  In particular, it is possible to see unique
723    constraint violations caused by conflicts with overlapping Serializable
724    transactions even after explicitly checking that the key isn't present
725    before attempting to insert it.  This can be avoided by making sure
726    that <emphasis>all</emphasis> Serializable transactions that insert potentially
727    conflicting keys explicitly check if they can do so first.  For example,
728    imagine an application that asks the user for a new key and then checks
729    that it doesn't exist already by trying to select it first, or generates
730    a new key by selecting the maximum existing key and adding one.  If some
731    Serializable transactions insert new keys directly without following this
732    protocol, unique constraints violations might be reported even in cases
733    where they could not occur in a serial execution of the concurrent
734    transactions.
735   </para>
736
737   <para>
738    For optimal performance when relying on Serializable transactions for
739    concurrency control, these issues should be considered:
740
741    <itemizedlist>
742     <listitem>
743      <para>
744       Declare transactions as <literal>READ ONLY</literal> when possible.
745      </para>
746     </listitem>
747     <listitem>
748      <para>
749       Control the number of active connections, using a connection pool if
750       needed.  This is always an important performance consideration, but
751       it can be particularly important in a busy system using Serializable
752       transactions.
753      </para>
754     </listitem>
755     <listitem>
756      <para>
757       Don't put more into a single transaction than needed for integrity
758       purposes.
759      </para>
760     </listitem>
761     <listitem>
762      <para>
763       Don't leave connections dangling <quote>idle in transaction</quote>
764       longer than necessary.  The configuration parameter
765       <xref linkend="guc-idle-in-transaction-session-timeout"/> may be used to
766       automatically disconnect lingering sessions.
767      </para>
768     </listitem>
769     <listitem>
770      <para>
771       Eliminate explicit locks, <literal>SELECT FOR UPDATE</literal>, and
772       <literal>SELECT FOR SHARE</literal> where no longer needed due to the
773       protections automatically provided by Serializable transactions.
774      </para>
775     </listitem>
776     <listitem>
777      <para>
778       When the system is forced to combine multiple page-level predicate
779       locks into a single relation-level predicate lock because the predicate
780       lock table is short of memory, an increase in the rate of serialization
781       failures may occur.  You can avoid this by increasing
782       <xref linkend="guc-max-pred-locks-per-transaction"/>,
783       <xref linkend="guc-max-pred-locks-per-relation"/>, and/or
784       <xref linkend="guc-max-pred-locks-per-page"/>.
785      </para>
786     </listitem>
787     <listitem>
788      <para>
789       A sequential scan will always necessitate a relation-level predicate
790       lock.  This can result in an increased rate of serialization failures.
791       It may be helpful to encourage the use of index scans by reducing
792       <xref linkend="guc-random-page-cost"/> and/or increasing
793       <xref linkend="guc-cpu-tuple-cost"/>.  Be sure to weigh any decrease
794       in transaction rollbacks and restarts against any overall change in
795       query execution time.
796      </para>
797     </listitem>
798    </itemizedlist>
799   </para>
800
801   <para>
802    The Serializable isolation level is implemented using a technique known
803    in academic database literature as Serializable Snapshot Isolation, which
804    builds on Snapshot Isolation by adding checks for serialization anomalies.
805    Some differences in behavior and performance may be observed when compared
806    with other systems that use a traditional locking technique.  Please see
807    <xref linkend="ports12"/> for detailed information.
808   </para>
809  </sect2>
810 </sect1>
811
812  <sect1 id="explicit-locking">
813   <title>Explicit Locking</title>
814
815   <indexterm>
816    <primary>lock</primary>
817   </indexterm>
818
819   <para>
820    <productname>PostgreSQL</productname> provides various lock modes
821    to control concurrent access to data in tables.  These modes can
822    be used for application-controlled locking in situations where
823    <acronym>MVCC</acronym> does not give the desired behavior.  Also,
824    most <productname>PostgreSQL</productname> commands automatically
825    acquire locks of appropriate modes to ensure that referenced
826    tables are not dropped or modified in incompatible ways while the
827    command executes.  (For example, <command>TRUNCATE</command> cannot safely be
828    executed concurrently with other operations on the same table, so it
829    obtains an <literal>ACCESS EXCLUSIVE</literal> lock on the table to
830    enforce that.)
831   </para>
832
833   <para>
834    To examine a list of the currently outstanding locks in a database
835    server, use the
836    <link linkend="view-pg-locks"><structname>pg_locks</structname></link>
837    system view. For more information on monitoring the status of the lock
838    manager subsystem, refer to <xref linkend="monitoring"/>.
839   </para>
840
841  <sect2 id="locking-tables">
842   <title>Table-Level Locks</title>
843
844   <indexterm zone="locking-tables">
845    <primary>LOCK</primary>
846   </indexterm>
847
848   <para>
849    The list below shows the available lock modes and the contexts in
850    which they are used automatically by
851    <productname>PostgreSQL</productname>.  You can also acquire any
852    of these locks explicitly with the command <xref
853    linkend="sql-lock"/>.
854    Remember that all of these lock modes are table-level locks,
855    even if the name contains the word
856    <quote>row</quote>; the names of the lock modes are historical.
857    To some extent the names reflect the typical usage of each lock
858    mode &mdash; but the semantics are all the same.  The only real difference
859    between one lock mode and another is the set of lock modes with
860    which each conflicts (see <xref linkend="table-lock-compatibility"/>).
861    Two transactions cannot hold locks of conflicting
862    modes on the same table at the same time.  (However, a transaction
863    never conflicts with itself.  For example, it might acquire
864    <literal>ACCESS EXCLUSIVE</literal> lock and later acquire
865    <literal>ACCESS SHARE</literal> lock on the same table.)  Non-conflicting
866    lock modes can be held concurrently by many transactions.  Notice in
867    particular that some lock modes are self-conflicting (for example,
868    an <literal>ACCESS EXCLUSIVE</literal> lock cannot be held by more than one
869    transaction at a time) while others are not self-conflicting (for example,
870    an <literal>ACCESS SHARE</literal> lock can be held by multiple transactions).
871   </para>
872
873     <variablelist>
874      <title>Table-Level Lock Modes</title>
875      <varlistentry>
876       <term>
877        <literal>ACCESS SHARE</literal>
878       </term>
879       <listitem>
880        <para>
881         Conflicts with the <literal>ACCESS EXCLUSIVE</literal> lock
882         mode only.
883        </para>
884
885        <para>
886         The <command>SELECT</command> command acquires a lock of this mode on
887         referenced tables.  In general, any query that only <emphasis>reads</emphasis> a table
888         and does not modify it will acquire this lock mode.
889        </para>
890       </listitem>
891      </varlistentry>
892
893      <varlistentry>
894       <term>
895        <literal>ROW SHARE</literal>
896       </term>
897       <listitem>
898        <para>
899         Conflicts with the <literal>EXCLUSIVE</literal> and
900         <literal>ACCESS EXCLUSIVE</literal> lock modes.
901        </para>
902
903        <para>
904         The <command>SELECT FOR UPDATE</command> and
905         <command>SELECT FOR SHARE</command> commands acquire a
906         lock of this mode on the target table(s) (in addition to
907         <literal>ACCESS SHARE</literal> locks on any other tables
908         that are referenced but not selected
909         <option>FOR UPDATE/FOR SHARE</option>).
910        </para>
911       </listitem>
912      </varlistentry>
913
914      <varlistentry>
915       <term>
916        <literal>ROW EXCLUSIVE</literal>
917       </term>
918       <listitem>
919        <para>
920         Conflicts with the <literal>SHARE</literal>, <literal>SHARE ROW
921         EXCLUSIVE</literal>, <literal>EXCLUSIVE</literal>, and
922         <literal>ACCESS EXCLUSIVE</literal> lock modes.
923        </para>
924
925        <para>
926         The commands <command>UPDATE</command>,
927         <command>DELETE</command>, and <command>INSERT</command>
928         acquire this lock mode on the target table (in addition to
929         <literal>ACCESS SHARE</literal> locks on any other referenced
930         tables).  In general, this lock mode will be acquired by any
931         command that <emphasis>modifies data</emphasis> in a table.
932        </para>
933       </listitem>
934      </varlistentry>
935
936      <varlistentry>
937       <term>
938        <literal>SHARE UPDATE EXCLUSIVE</literal>
939       </term>
940       <listitem>
941        <para>
942         Conflicts with the <literal>SHARE UPDATE EXCLUSIVE</literal>,
943         <literal>SHARE</literal>, <literal>SHARE ROW
944         EXCLUSIVE</literal>, <literal>EXCLUSIVE</literal>, and
945         <literal>ACCESS EXCLUSIVE</literal> lock modes.
946         This mode protects a table against
947         concurrent schema changes and <command>VACUUM</command> runs.
948        </para>
949
950        <para>
951         Acquired by <command>VACUUM</command> (without <option>FULL</option>),
952         <command>ANALYZE</command>, <command>CREATE INDEX CONCURRENTLY</command>,
953         <command>REINDEX CONCURRENTLY</command>,
954         <command>CREATE STATISTICS</command>, and certain <command>ALTER
955         INDEX</command> and <command>ALTER TABLE</command> variants (for full
956         details see <xref linkend="sql-alterindex"/> and <xref
957         linkend="sql-altertable"/>).
958        </para>
959       </listitem>
960      </varlistentry>
961
962      <varlistentry>
963       <term>
964        <literal>SHARE</literal>
965       </term>
966       <listitem>
967        <para>
968         Conflicts with the <literal>ROW EXCLUSIVE</literal>,
969         <literal>SHARE UPDATE EXCLUSIVE</literal>, <literal>SHARE ROW
970         EXCLUSIVE</literal>, <literal>EXCLUSIVE</literal>, and
971         <literal>ACCESS EXCLUSIVE</literal> lock modes.
972         This mode protects a table against concurrent data changes.
973        </para>
974
975        <para>
976         Acquired by <command>CREATE INDEX</command>
977         (without <option>CONCURRENTLY</option>).
978        </para>
979       </listitem>
980      </varlistentry>
981
982      <varlistentry>
983       <term>
984        <literal>SHARE ROW EXCLUSIVE</literal>
985       </term>
986       <listitem>
987        <para>
988         Conflicts with the <literal>ROW EXCLUSIVE</literal>,
989         <literal>SHARE UPDATE EXCLUSIVE</literal>,
990         <literal>SHARE</literal>, <literal>SHARE ROW
991         EXCLUSIVE</literal>, <literal>EXCLUSIVE</literal>, and
992         <literal>ACCESS EXCLUSIVE</literal> lock modes.
993         This mode protects a table against concurrent data changes, and
994         is self-exclusive so that only one session can hold it at a time.
995        </para>
996
997        <para>
998         Acquired by <command>CREATE TRIGGER</command> and some forms of
999         <command>ALTER TABLE</command> (see <xref linkend="sql-altertable"/>).
1000        </para>
1001       </listitem>
1002      </varlistentry>
1003
1004      <varlistentry>
1005       <term>
1006        <literal>EXCLUSIVE</literal>
1007       </term>
1008       <listitem>
1009        <para>
1010         Conflicts with the <literal>ROW SHARE</literal>, <literal>ROW
1011         EXCLUSIVE</literal>, <literal>SHARE UPDATE
1012         EXCLUSIVE</literal>, <literal>SHARE</literal>, <literal>SHARE
1013         ROW EXCLUSIVE</literal>, <literal>EXCLUSIVE</literal>, and
1014         <literal>ACCESS EXCLUSIVE</literal> lock modes.
1015         This mode allows only concurrent <literal>ACCESS SHARE</literal> locks,
1016         i.e., only reads from the table can proceed in parallel with a
1017         transaction holding this lock mode.
1018        </para>
1019
1020        <para>
1021         Acquired by <command>REFRESH MATERIALIZED VIEW CONCURRENTLY</command>.
1022        </para>
1023       </listitem>
1024      </varlistentry>
1025
1026      <varlistentry>
1027       <term>
1028        <literal>ACCESS EXCLUSIVE</literal>
1029       </term>
1030       <listitem>
1031        <para>
1032         Conflicts with locks of all modes (<literal>ACCESS
1033         SHARE</literal>, <literal>ROW SHARE</literal>, <literal>ROW
1034         EXCLUSIVE</literal>, <literal>SHARE UPDATE
1035         EXCLUSIVE</literal>, <literal>SHARE</literal>, <literal>SHARE
1036         ROW EXCLUSIVE</literal>, <literal>EXCLUSIVE</literal>, and
1037         <literal>ACCESS EXCLUSIVE</literal>).
1038         This mode guarantees that the
1039         holder is the only transaction accessing the table in any way.
1040        </para>
1041
1042        <para>
1043         Acquired by the <command>DROP TABLE</command>,
1044         <command>TRUNCATE</command>, <command>REINDEX</command>,
1045         <command>CLUSTER</command>, <command>VACUUM FULL</command>,
1046         and <command>REFRESH MATERIALIZED VIEW</command> (without
1047         <option>CONCURRENTLY</option>)
1048         commands. Many forms of <command>ALTER INDEX</command> and <command>ALTER TABLE</command> also acquire
1049         a lock at this level. This is also the default lock mode for
1050         <command>LOCK TABLE</command> statements that do not specify
1051         a mode explicitly.
1052        </para>
1053       </listitem>
1054      </varlistentry>
1055     </variablelist>
1056
1057     <tip>
1058      <para>
1059       Only an <literal>ACCESS EXCLUSIVE</literal> lock blocks a
1060       <command>SELECT</command> (without <option>FOR UPDATE/SHARE</option>)
1061       statement.
1062      </para>
1063     </tip>
1064
1065   <para>
1066    Once acquired, a lock is normally held until the end of the transaction.  But if a
1067    lock is acquired after establishing a savepoint, the lock is released
1068    immediately if the savepoint is rolled back to.  This is consistent with
1069    the principle that <command>ROLLBACK</command> cancels all effects of the
1070    commands since the savepoint.  The same holds for locks acquired within a
1071    <application>PL/pgSQL</application> exception block: an error escape from the block
1072    releases locks acquired within it.
1073   </para>
1074
1075
1076
1077    <table tocentry="1" id="table-lock-compatibility">
1078     <title> Conflicting Lock Modes</title>
1079     <tgroup cols="9">
1080      <colspec colnum="1" colwidth="1.25*"/>
1081      <colspec colnum="2" colwidth="1*" colname="lockst"/>
1082      <colspec colnum="3" colwidth="1*"/>
1083      <colspec colnum="4" colwidth="1*"/>
1084      <colspec colnum="5" colwidth="1*"/>
1085      <colspec colnum="6" colwidth="1*"/>
1086      <colspec colnum="7" colwidth="1*"/>
1087      <colspec colnum="8" colwidth="1*"/>
1088      <colspec colnum="9" colwidth="1*" colname="lockend"/>
1089      <spanspec spanname="lockreq" namest="lockst" nameend="lockend" align="center"/>
1090      <thead>
1091       <row>
1092        <entry morerows="1">Requested Lock Mode</entry>
1093        <entry spanname="lockreq">Existing Lock Mode</entry>
1094       </row>
1095       <row>
1096        <entry><literal>ACCESS SHARE</literal></entry>
1097        <entry><literal>ROW SHARE</literal></entry>
1098        <entry><literal>ROW EXCL.</literal></entry>
1099        <entry><literal>SHARE UPDATE EXCL.</literal></entry>
1100        <entry><literal>SHARE</literal></entry>
1101        <entry><literal>SHARE ROW EXCL.</literal></entry>
1102        <entry><literal>EXCL.</literal></entry>
1103        <entry><literal>ACCESS EXCL.</literal></entry>
1104       </row>
1105      </thead>
1106      <tbody>
1107       <row>
1108        <entry><literal>ACCESS SHARE</literal></entry>
1109        <entry align="center"></entry>
1110        <entry align="center"></entry>
1111        <entry align="center"></entry>
1112        <entry align="center"></entry>
1113        <entry align="center"></entry>
1114        <entry align="center"></entry>
1115        <entry align="center"></entry>
1116        <entry align="center">X</entry>
1117       </row>
1118       <row>
1119        <entry><literal>ROW SHARE</literal></entry>
1120        <entry align="center"></entry>
1121        <entry align="center"></entry>
1122        <entry align="center"></entry>
1123        <entry align="center"></entry>
1124        <entry align="center"></entry>
1125        <entry align="center"></entry>
1126        <entry align="center">X</entry>
1127        <entry align="center">X</entry>
1128       </row>
1129       <row>
1130        <entry><literal>ROW EXCL.</literal></entry>
1131        <entry align="center"></entry>
1132        <entry align="center"></entry>
1133        <entry align="center"></entry>
1134        <entry align="center"></entry>
1135        <entry align="center">X</entry>
1136        <entry align="center">X</entry>
1137        <entry align="center">X</entry>
1138        <entry align="center">X</entry>
1139       </row>
1140       <row>
1141        <entry><literal>SHARE UPDATE EXCL.</literal></entry>
1142        <entry align="center"></entry>
1143        <entry align="center"></entry>
1144        <entry align="center"></entry>
1145        <entry align="center">X</entry>
1146        <entry align="center">X</entry>
1147        <entry align="center">X</entry>
1148        <entry align="center">X</entry>
1149        <entry align="center">X</entry>
1150       </row>
1151       <row>
1152        <entry><literal>SHARE</literal></entry>
1153        <entry align="center"></entry>
1154        <entry align="center"></entry>
1155        <entry align="center">X</entry>
1156        <entry align="center">X</entry>
1157        <entry align="center"></entry>
1158        <entry align="center">X</entry>
1159        <entry align="center">X</entry>
1160        <entry align="center">X</entry>
1161       </row>
1162       <row>
1163        <entry><literal>SHARE ROW EXCL.</literal></entry>
1164        <entry align="center"></entry>
1165        <entry align="center"></entry>
1166        <entry align="center">X</entry>
1167        <entry align="center">X</entry>
1168        <entry align="center">X</entry>
1169        <entry align="center">X</entry>
1170        <entry align="center">X</entry>
1171        <entry align="center">X</entry>
1172       </row>
1173       <row>
1174        <entry><literal>EXCL.</literal></entry>
1175        <entry align="center"></entry>
1176        <entry align="center">X</entry>
1177        <entry align="center">X</entry>
1178        <entry align="center">X</entry>
1179        <entry align="center">X</entry>
1180        <entry align="center">X</entry>
1181        <entry align="center">X</entry>
1182        <entry align="center">X</entry>
1183       </row>
1184       <row>
1185        <entry><literal>ACCESS EXCL.</literal></entry>
1186        <entry align="center">X</entry>
1187        <entry align="center">X</entry>
1188        <entry align="center">X</entry>
1189        <entry align="center">X</entry>
1190        <entry align="center">X</entry>
1191        <entry align="center">X</entry>
1192        <entry align="center">X</entry>
1193        <entry align="center">X</entry>
1194       </row>
1195      </tbody>
1196     </tgroup>
1197    </table>
1198   </sect2>
1199
1200   <sect2 id="locking-rows">
1201    <title>Row-Level Locks</title>
1202
1203    <para>
1204     In addition to table-level locks, there are row-level locks, which
1205     are listed as below with the contexts in which they are used
1206     automatically by <productname>PostgreSQL</productname>.  See
1207     <xref linkend="row-lock-compatibility"/> for a complete table of
1208     row-level lock conflicts.  Note that a transaction can hold
1209     conflicting locks on the same row, even in different subtransactions;
1210     but other than that, two transactions can never hold conflicting locks
1211     on the same row.  Row-level locks do not affect data querying; they
1212     block only <emphasis>writers and lockers</emphasis> to the same
1213     row.  Row-level locks are released at transaction end or during
1214     savepoint rollback, just like table-level locks.
1215
1216    </para>
1217
1218     <variablelist>
1219      <title>Row-Level Lock Modes</title>
1220      <varlistentry>
1221       <term>
1222        <literal>FOR UPDATE</literal>
1223       </term>
1224       <listitem>
1225        <para>
1226         <literal>FOR UPDATE</literal> causes the rows retrieved by the
1227         <command>SELECT</command> statement to be locked as though for
1228         update.  This prevents them from being locked, modified or deleted by
1229         other transactions until the current transaction ends.  That is,
1230         other transactions that attempt <command>UPDATE</command>,
1231         <command>DELETE</command>,
1232         <command>SELECT FOR UPDATE</command>,
1233         <command>SELECT FOR NO KEY UPDATE</command>,
1234         <command>SELECT FOR SHARE</command> or
1235         <command>SELECT FOR KEY SHARE</command>
1236         of these rows will be blocked until the current transaction ends;
1237         conversely, <command>SELECT FOR UPDATE</command> will wait for a
1238         concurrent transaction that has run any of those commands on the
1239         same row,
1240         and will then lock and return the updated row (or no row, if the
1241         row was deleted).  Within a <literal>REPEATABLE READ</literal> or
1242         <literal>SERIALIZABLE</literal> transaction,
1243         however, an error will be thrown if a row to be locked has changed
1244         since the transaction started.  For further discussion see
1245         <xref linkend="applevel-consistency"/>.
1246        </para>
1247        <para>
1248         The <literal>FOR UPDATE</literal> lock mode
1249         is also acquired by any <command>DELETE</command> on a row, and also by an
1250         <command>UPDATE</command> that modifies the values of certain columns.  Currently,
1251         the set of columns considered for the <command>UPDATE</command> case are those that
1252         have a unique index on them that can be used in a foreign key (so partial
1253         indexes and expressional indexes are not considered), but this may change
1254         in the future.
1255        </para>
1256       </listitem>
1257      </varlistentry>
1258
1259      <varlistentry>
1260       <term>
1261        <literal>FOR NO KEY UPDATE</literal>
1262       </term>
1263       <listitem>
1264        <para>
1265         Behaves similarly to <literal>FOR UPDATE</literal>, except that the lock
1266         acquired is weaker: this lock will not block
1267         <literal>SELECT FOR KEY SHARE</literal> commands that attempt to acquire
1268         a lock on the same rows. This lock mode is also acquired by any
1269         <command>UPDATE</command> that does not acquire a <literal>FOR UPDATE</literal> lock.
1270        </para>
1271       </listitem>
1272      </varlistentry>
1273
1274      <varlistentry>
1275       <term>
1276        <literal>FOR SHARE</literal>
1277       </term>
1278       <listitem>
1279        <para>
1280         Behaves similarly to <literal>FOR NO KEY UPDATE</literal>, except that it
1281         acquires a shared lock rather than exclusive lock on each retrieved
1282         row.  A shared lock blocks other transactions from performing
1283         <command>UPDATE</command>, <command>DELETE</command>,
1284         <command>SELECT FOR UPDATE</command> or
1285         <command>SELECT FOR NO KEY UPDATE</command> on these rows, but it does not
1286         prevent them from performing <command>SELECT FOR SHARE</command> or
1287         <command>SELECT FOR KEY SHARE</command>.
1288        </para>
1289       </listitem>
1290      </varlistentry>
1291
1292      <varlistentry>
1293       <term>
1294        <literal>FOR KEY SHARE</literal>
1295       </term>
1296       <listitem>
1297        <para>
1298         Behaves similarly to <literal>FOR SHARE</literal>, except that the
1299         lock is weaker: <literal>SELECT FOR UPDATE</literal> is blocked, but not
1300         <literal>SELECT FOR NO KEY UPDATE</literal>.  A key-shared lock blocks
1301         other transactions from performing <command>DELETE</command> or
1302         any <command>UPDATE</command> that changes the key values, but not
1303         other <command>UPDATE</command>, and neither does it prevent
1304         <command>SELECT FOR NO KEY UPDATE</command>, <command>SELECT FOR SHARE</command>,
1305         or <command>SELECT FOR KEY SHARE</command>.
1306        </para>
1307       </listitem>
1308      </varlistentry>
1309     </variablelist>
1310
1311    <para>
1312     <productname>PostgreSQL</productname> doesn't remember any
1313     information about modified rows in memory, so there is no limit on
1314     the number of rows locked at one time.  However, locking a row
1315     might cause a disk write, e.g., <command>SELECT FOR
1316     UPDATE</command> modifies selected rows to mark them locked, and so
1317     will result in disk writes.
1318    </para>
1319
1320    <table tocentry="1" id="row-lock-compatibility">
1321     <title>Conflicting Row-Level Locks</title>
1322     <tgroup cols="5">
1323      <colspec colname="col1"    colwidth="1.5*"/>
1324      <colspec colname="lockst"  colwidth="1*"/>
1325      <colspec colname="col3"    colwidth="1*"/>
1326      <colspec colname="col4"    colwidth="1*"/>
1327      <colspec colname="lockend" colwidth="1*"/>
1328      <spanspec namest="lockst" nameend="lockend" spanname="lockreq"/>
1329      <thead>
1330       <row>
1331        <entry morerows="1">Requested Lock Mode</entry>
1332        <entry spanname="lockreq">Current Lock Mode</entry>
1333       </row>
1334       <row>
1335        <entry>FOR KEY SHARE</entry>
1336        <entry>FOR SHARE</entry>
1337        <entry>FOR NO KEY UPDATE</entry>
1338        <entry>FOR UPDATE</entry>
1339       </row>
1340      </thead>
1341      <tbody>
1342       <row>
1343        <entry>FOR KEY SHARE</entry>
1344        <entry align="center"></entry>
1345        <entry align="center"></entry>
1346        <entry align="center"></entry>
1347        <entry align="center">X</entry>
1348       </row>
1349       <row>
1350        <entry>FOR SHARE</entry>
1351        <entry align="center"></entry>
1352        <entry align="center"></entry>
1353        <entry align="center">X</entry>
1354        <entry align="center">X</entry>
1355       </row>
1356       <row>
1357        <entry>FOR NO KEY UPDATE</entry>
1358        <entry align="center"></entry>
1359        <entry align="center">X</entry>
1360        <entry align="center">X</entry>
1361        <entry align="center">X</entry>
1362       </row>
1363       <row>
1364        <entry>FOR UPDATE</entry>
1365        <entry align="center">X</entry>
1366        <entry align="center">X</entry>
1367        <entry align="center">X</entry>
1368        <entry align="center">X</entry>
1369       </row>
1370      </tbody>
1371     </tgroup>
1372    </table>
1373   </sect2>
1374
1375   <sect2 id="locking-pages">
1376    <title>Page-Level Locks</title>
1377
1378    <para>
1379     In addition to table and row locks, page-level share/exclusive locks are
1380     used to control read/write access to table pages in the shared buffer
1381     pool.  These locks are released immediately after a row is fetched or
1382     updated.  Application developers normally need not be concerned with
1383     page-level locks, but they are mentioned here for completeness.
1384    </para>
1385
1386   </sect2>
1387
1388   <sect2 id="locking-deadlocks">
1389    <title>Deadlocks</title>
1390
1391    <indexterm zone="locking-deadlocks">
1392     <primary>deadlock</primary>
1393    </indexterm>
1394
1395    <para>
1396     The use of explicit locking can increase the likelihood of
1397     <firstterm>deadlocks</firstterm>, wherein two (or more) transactions each
1398     hold locks that the other wants.  For example, if transaction 1
1399     acquires an exclusive lock on table A and then tries to acquire
1400     an exclusive lock on table B, while transaction 2 has already
1401     exclusive-locked table B and now wants an exclusive lock on table
1402     A, then neither one can proceed.
1403     <productname>PostgreSQL</productname> automatically detects
1404     deadlock situations and resolves them by aborting one of the
1405     transactions involved, allowing the other(s) to complete.
1406     (Exactly which transaction will be aborted is difficult to
1407     predict and should not be relied upon.)
1408    </para>
1409
1410    <para>
1411     Note that deadlocks can also occur as the result of row-level
1412     locks (and thus, they can occur even if explicit locking is not
1413     used). Consider the case in which two concurrent
1414     transactions modify a table. The first transaction executes:
1415
1416<screen>
1417UPDATE accounts SET balance = balance + 100.00 WHERE acctnum = 11111;
1418</screen>
1419
1420     This acquires a row-level lock on the row with the specified
1421     account number. Then, the second transaction executes:
1422
1423<screen>
1424UPDATE accounts SET balance = balance + 100.00 WHERE acctnum = 22222;
1425UPDATE accounts SET balance = balance - 100.00 WHERE acctnum = 11111;
1426</screen>
1427
1428     The first <command>UPDATE</command> statement successfully
1429     acquires a row-level lock on the specified row, so it succeeds in
1430     updating that row. However, the second <command>UPDATE</command>
1431     statement finds that the row it is attempting to update has
1432     already been locked, so it waits for the transaction that
1433     acquired the lock to complete. Transaction two is now waiting on
1434     transaction one to complete before it continues execution. Now,
1435     transaction one executes:
1436
1437<screen>
1438UPDATE accounts SET balance = balance - 100.00 WHERE acctnum = 22222;
1439</screen>
1440
1441     Transaction one attempts to acquire a row-level lock on the
1442     specified row, but it cannot: transaction two already holds such
1443     a lock. So it waits for transaction two to complete. Thus,
1444     transaction one is blocked on transaction two, and transaction
1445     two is blocked on transaction one: a deadlock
1446     condition. <productname>PostgreSQL</productname> will detect this
1447     situation and abort one of the transactions.
1448    </para>
1449
1450    <para>
1451     The best defense against deadlocks is generally to avoid them by
1452     being certain that all applications using a database acquire
1453     locks on multiple objects in a consistent order. In the example
1454     above, if both transactions
1455     had updated the rows in the same order, no deadlock would have
1456     occurred. One should also ensure that the first lock acquired on
1457     an object in a transaction is the most restrictive mode that will be
1458     needed for that object.  If it is not feasible to verify this in
1459     advance, then deadlocks can be handled on-the-fly by retrying
1460     transactions that abort due to deadlocks.
1461    </para>
1462
1463    <para>
1464     So long as no deadlock situation is detected, a transaction seeking
1465     either a table-level or row-level lock will wait indefinitely for
1466     conflicting locks to be released.  This means it is a bad idea for
1467     applications to hold transactions open for long periods of time
1468     (e.g., while waiting for user input).
1469    </para>
1470   </sect2>
1471
1472   <sect2 id="advisory-locks">
1473    <title>Advisory Locks</title>
1474
1475    <indexterm zone="advisory-locks">
1476     <primary>advisory lock</primary>
1477    </indexterm>
1478
1479    <indexterm zone="advisory-locks">
1480     <primary>lock</primary>
1481     <secondary>advisory</secondary>
1482    </indexterm>
1483
1484    <para>
1485     <productname>PostgreSQL</productname> provides a means for
1486     creating locks that have application-defined meanings.  These are
1487     called <firstterm>advisory locks</firstterm>, because the system does not
1488     enforce their use &mdash; it is up to the application to use them
1489     correctly.  Advisory locks can be useful for locking strategies
1490     that are an awkward fit for the MVCC model.
1491     For example, a common use of advisory locks is to emulate pessimistic
1492     locking strategies typical of so-called <quote>flat file</quote> data
1493     management systems.
1494     While a flag stored in a table could be used for the same purpose,
1495     advisory locks are faster, avoid table bloat, and are automatically
1496     cleaned up by the server at the end of the session.
1497    </para>
1498
1499    <para>
1500     There are two ways to acquire an advisory lock in
1501     <productname>PostgreSQL</productname>: at session level or at
1502     transaction level.
1503     Once acquired at session level, an advisory lock is held until
1504     explicitly released or the session ends.  Unlike standard lock requests,
1505     session-level advisory lock requests do not honor transaction semantics:
1506     a lock acquired during a transaction that is later rolled back will still
1507     be held following the rollback, and likewise an unlock is effective even
1508     if the calling transaction fails later.  A lock can be acquired multiple
1509     times by its owning process; for each completed lock request there must
1510     be a corresponding unlock request before the lock is actually released.
1511     Transaction-level lock requests, on the other hand, behave more like
1512     regular lock requests: they are automatically released at the end of the
1513     transaction, and there is no explicit unlock operation.  This behavior
1514     is often more convenient than the session-level behavior for short-term
1515     usage of an advisory lock.
1516     Session-level and transaction-level lock requests for the same advisory
1517     lock identifier will block each other in the expected way.
1518     If a session already holds a given advisory lock, additional requests by
1519     it will always succeed, even if other sessions are awaiting the lock; this
1520     statement is true regardless of whether the existing lock hold and new
1521     request are at session level or transaction level.
1522    </para>
1523
1524    <para>
1525     Like all locks in
1526     <productname>PostgreSQL</productname>, a complete list of advisory locks
1527     currently held by any session can be found in the <link
1528     linkend="view-pg-locks"><structname>pg_locks</structname></link> system
1529     view.
1530    </para>
1531
1532    <para>
1533     Both advisory locks and regular locks are stored in a shared memory
1534     pool whose size is defined by the configuration variables
1535     <xref linkend="guc-max-locks-per-transaction"/> and
1536     <xref linkend="guc-max-connections"/>.
1537     Care must be taken not to exhaust this
1538     memory or the server will be unable to grant any locks at all.
1539     This imposes an upper limit on the number of advisory locks
1540     grantable by the server, typically in the tens to hundreds of thousands
1541     depending on how the server is configured.
1542    </para>
1543
1544    <para>
1545     In certain cases using advisory locking methods, especially in queries
1546     involving explicit ordering and <literal>LIMIT</literal> clauses, care must be
1547     taken to control the locks acquired because of the order in which SQL
1548     expressions are evaluated.  For example:
1549<screen>
1550SELECT pg_advisory_lock(id) FROM foo WHERE id = 12345; -- ok
1551SELECT pg_advisory_lock(id) FROM foo WHERE id &gt; 12345 LIMIT 100; -- danger!
1552SELECT pg_advisory_lock(q.id) FROM
1553(
1554  SELECT id FROM foo WHERE id &gt; 12345 LIMIT 100
1555) q; -- ok
1556</screen>
1557     In the above queries, the second form is dangerous because the
1558     <literal>LIMIT</literal> is not guaranteed to be applied before the locking
1559     function is executed.  This might cause some locks to be acquired
1560     that the application was not expecting, and hence would fail to release
1561     (until it ends the session).
1562     From the point of view of the application, such locks
1563     would be dangling, although still viewable in
1564     <structname>pg_locks</structname>.
1565    </para>
1566
1567    <para>
1568     The functions provided to manipulate advisory locks are described in
1569     <xref linkend="functions-advisory-locks"/>.
1570    </para>
1571   </sect2>
1572
1573  </sect1>
1574
1575  <sect1 id="applevel-consistency">
1576   <title>Data Consistency Checks at the Application Level</title>
1577
1578   <para>
1579    It is very difficult to enforce business rules regarding data integrity
1580    using Read Committed transactions because the view of the data is
1581    shifting with each statement, and even a single statement may not
1582    restrict itself to the statement's snapshot if a write conflict occurs.
1583   </para>
1584
1585   <para>
1586    While a Repeatable Read transaction has a stable view of the data
1587    throughout its execution, there is a subtle issue with using
1588    <acronym>MVCC</acronym> snapshots for data consistency checks, involving
1589    something known as <firstterm>read/write conflicts</firstterm>.
1590    If one transaction writes data and a concurrent transaction attempts
1591    to read the same data (whether before or after the write), it cannot
1592    see the work of the other transaction.  The reader then appears to have
1593    executed first regardless of which started first or which committed
1594    first.  If that is as far as it goes, there is no problem, but
1595    if the reader also writes data which is read by a concurrent transaction
1596    there is now a transaction which appears to have run before either of
1597    the previously mentioned transactions.  If the transaction which appears
1598    to have executed last actually commits first, it is very easy for a
1599    cycle to appear in a graph of the order of execution of the transactions.
1600    When such a cycle appears, integrity checks will not work correctly
1601    without some help.
1602   </para>
1603
1604   <para>
1605    As mentioned in <xref linkend="xact-serializable"/>, Serializable
1606    transactions are just Repeatable Read transactions which add
1607    nonblocking monitoring for dangerous patterns of read/write conflicts.
1608    When a pattern is detected which could cause a cycle in the apparent
1609    order of execution, one of the transactions involved is rolled back to
1610    break the cycle.
1611   </para>
1612
1613   <sect2 id="serializable-consistency">
1614    <title>Enforcing Consistency with Serializable Transactions</title>
1615
1616    <para>
1617     If the Serializable transaction isolation level is used for all writes
1618     and for all reads which need a consistent view of the data, no other
1619     effort is required to ensure consistency.  Software from other
1620     environments which is written to use serializable transactions to
1621     ensure consistency should <quote>just work</quote> in this regard in
1622     <productname>PostgreSQL</productname>.
1623    </para>
1624
1625    <para>
1626     When using this technique, it will avoid creating an unnecessary burden
1627     for application programmers if the application software goes through a
1628     framework which automatically retries transactions which are rolled
1629     back with a serialization failure.  It may be a good idea to set
1630     <literal>default_transaction_isolation</literal> to <literal>serializable</literal>.
1631     It would also be wise to take some action to ensure that no other
1632     transaction isolation level is used, either inadvertently or to
1633     subvert integrity checks, through checks of the transaction isolation
1634     level in triggers.
1635    </para>
1636
1637    <para>
1638     See <xref linkend="xact-serializable"/> for performance suggestions.
1639    </para>
1640
1641    <warning>
1642     <para>
1643      This level of integrity protection using Serializable transactions
1644      does not yet extend to hot standby mode (<xref linkend="hot-standby"/>).
1645      Because of that, those using hot standby may want to use Repeatable
1646      Read and explicit locking on the master.
1647     </para>
1648    </warning>
1649   </sect2>
1650
1651   <sect2 id="non-serializable-consistency">
1652    <title>Enforcing Consistency with Explicit Blocking Locks</title>
1653
1654    <para>
1655     When non-serializable writes are possible,
1656     to ensure the current validity of a row and protect it against
1657     concurrent updates one must use <command>SELECT FOR UPDATE</command>,
1658     <command>SELECT FOR SHARE</command>, or an appropriate <command>LOCK
1659     TABLE</command> statement.  (<command>SELECT FOR UPDATE</command>
1660     and <command>SELECT FOR SHARE</command> lock just the
1661     returned rows against concurrent updates, while <command>LOCK
1662     TABLE</command> locks the whole table.)  This should be taken into
1663     account when porting applications to
1664     <productname>PostgreSQL</productname> from other environments.
1665    </para>
1666
1667    <para>
1668     Also of note to those converting from other environments is the fact
1669     that <command>SELECT FOR UPDATE</command> does not ensure that a
1670     concurrent transaction will not update or delete a selected row.
1671     To do that in <productname>PostgreSQL</productname> you must actually
1672     update the row, even if no values need to be changed.
1673     <command>SELECT FOR UPDATE</command> <emphasis>temporarily blocks</emphasis>
1674     other transactions from acquiring the same lock or executing an
1675     <command>UPDATE</command> or <command>DELETE</command> which would
1676     affect the locked row, but once the transaction holding this lock
1677     commits or rolls back, a blocked transaction will proceed with the
1678     conflicting operation unless an actual <command>UPDATE</command> of
1679     the row was performed while the lock was held.
1680    </para>
1681
1682    <para>
1683     Global validity checks require extra thought under
1684     non-serializable <acronym>MVCC</acronym>.
1685     For example, a banking application might wish to check that the sum of
1686     all credits in one table equals the sum of debits in another table,
1687     when both tables are being actively updated.  Comparing the results of two
1688     successive <literal>SELECT sum(...)</literal> commands will not work reliably in
1689     Read Committed mode, since the second query will likely include the results
1690     of transactions not counted by the first.  Doing the two sums in a
1691     single repeatable read transaction will give an accurate picture of only the
1692     effects of transactions that committed before the repeatable read transaction
1693     started &mdash; but one might legitimately wonder whether the answer is still
1694     relevant by the time it is delivered.  If the repeatable read transaction
1695     itself applied some changes before trying to make the consistency check,
1696     the usefulness of the check becomes even more debatable, since now it
1697     includes some but not all post-transaction-start changes.  In such cases
1698     a careful person might wish to lock all tables needed for the check,
1699     in order to get an indisputable picture of current reality.  A
1700     <literal>SHARE</literal> mode (or higher) lock guarantees that there are no
1701     uncommitted changes in the locked table, other than those of the current
1702     transaction.
1703    </para>
1704
1705    <para>
1706     Note also that if one is relying on explicit locking to prevent concurrent
1707     changes, one should either use Read Committed mode, or in Repeatable Read
1708     mode be careful to obtain
1709     locks before performing queries.  A lock obtained by a
1710     repeatable read transaction guarantees that no other transactions modifying
1711     the table are still running, but if the snapshot seen by the
1712     transaction predates obtaining the lock, it might predate some now-committed
1713     changes in the table.  A repeatable read transaction's snapshot is actually
1714     frozen at the start of its first query or data-modification command
1715     (<literal>SELECT</literal>, <literal>INSERT</literal>,
1716     <literal>UPDATE</literal>, or <literal>DELETE</literal>), so
1717     it is possible to obtain locks explicitly before the snapshot is
1718     frozen.
1719    </para>
1720   </sect2>
1721  </sect1>
1722
1723  <sect1 id="mvcc-caveats">
1724   <title>Caveats</title>
1725
1726   <para>
1727    Some DDL commands, currently only <xref linkend="sql-truncate"/> and the
1728    table-rewriting forms of <xref linkend="sql-altertable"/>, are not
1729    MVCC-safe.  This means that after the truncation or rewrite commits, the
1730    table will appear empty to concurrent transactions, if they are using a
1731    snapshot taken before the DDL command committed.  This will only be an
1732    issue for a transaction that did not access the table in question
1733    before the DDL command started &mdash; any transaction that has done so
1734    would hold at least an <literal>ACCESS SHARE</literal> table lock,
1735    which would block the DDL command until that transaction completes.
1736    So these commands will not cause any apparent inconsistency in the
1737    table contents for successive queries on the target table, but they
1738    could cause visible inconsistency between the contents of the target
1739    table and other tables in the database.
1740   </para>
1741
1742   <para>
1743    Support for the Serializable transaction isolation level has not yet
1744    been added to Hot Standby replication targets (described in
1745    <xref linkend="hot-standby"/>).  The strictest isolation level currently
1746    supported in hot standby mode is Repeatable Read.  While performing all
1747    permanent database writes within Serializable transactions on the
1748    master will ensure that all standbys will eventually reach a consistent
1749    state, a Repeatable Read transaction run on the standby can sometimes
1750    see a transient state that is inconsistent with any serial execution
1751    of the transactions on the master.
1752   </para>
1753
1754   <para>
1755    Internal access to the system catalogs is not done using the isolation
1756    level of the current transaction.  This means that newly created database
1757    objects such as tables are visible to concurrent Repeatable Read and
1758    Serializable transactions, even though the rows they contain are not.  In
1759    contrast, queries that explicitly examine the system catalogs don't see
1760    rows representing concurrently created database objects, in the higher
1761    isolation levels.
1762   </para>
1763  </sect1>
1764
1765  <sect1 id="locking-indexes">
1766   <title>Locking and Indexes</title>
1767
1768   <indexterm zone="locking-indexes">
1769    <primary>index</primary>
1770    <secondary>locks</secondary>
1771   </indexterm>
1772
1773   <para>
1774    Though <productname>PostgreSQL</productname>
1775    provides nonblocking read/write access to table
1776    data, nonblocking read/write access is not currently offered for every
1777    index access method implemented
1778    in <productname>PostgreSQL</productname>.
1779    The various index types are handled as follows:
1780
1781    <variablelist>
1782     <varlistentry>
1783      <term>
1784       B-tree, <acronym>GiST</acronym> and <acronym>SP-GiST</acronym> indexes
1785      </term>
1786      <listitem>
1787       <para>
1788        Short-term share/exclusive page-level locks are used for
1789        read/write access. Locks are released immediately after each
1790        index row is fetched or inserted.  These index types provide
1791        the highest concurrency without deadlock conditions.
1792       </para>
1793      </listitem>
1794     </varlistentry>
1795
1796     <varlistentry>
1797      <term>
1798       Hash indexes
1799      </term>
1800      <listitem>
1801       <para>
1802        Share/exclusive hash-bucket-level locks are used for read/write
1803        access.  Locks are released after the whole bucket is processed.
1804        Bucket-level locks provide better concurrency than index-level
1805        ones, but deadlock is possible since the locks are held longer
1806        than one index operation.
1807       </para>
1808      </listitem>
1809     </varlistentry>
1810
1811     <varlistentry>
1812      <term>
1813       <acronym>GIN</acronym> indexes
1814      </term>
1815      <listitem>
1816       <para>
1817        Short-term share/exclusive page-level locks are used for
1818        read/write access. Locks are released immediately after each
1819        index row is fetched or inserted. But note that insertion of a
1820        GIN-indexed value usually produces several index key insertions
1821        per row, so GIN might do substantial work for a single value's
1822        insertion.
1823       </para>
1824      </listitem>
1825     </varlistentry>
1826    </variablelist>
1827   </para>
1828
1829   <para>
1830    Currently, B-tree indexes offer the best performance for concurrent
1831    applications; since they also have more features than hash
1832    indexes, they are the recommended index type for concurrent
1833    applications that need to index scalar data. When dealing with
1834    non-scalar data, B-trees are not useful, and GiST, SP-GiST or GIN
1835    indexes should be used instead.
1836   </para>
1837  </sect1>
1838 </chapter>
1839