1============================
2Transactions and concurrency
3============================
4
5.. contents::
6
7`Transactions <https://en.wikipedia.org/wiki/Database_transaction>`_
8are a core feature of ZODB.  Much has been written about transactions,
9and we won't go into much detail here.  Transactions provide two core
10benefits:
11
12Atomicity
13  When a transaction executes, it succeeds or fails completely. If
14  some data are updated and then an error occurs, causing the
15  transaction to fail, the updates are rolled back automatically. The
16  application using the transactional system doesn't have to undo
17  partial changes.  This takes a significant burden from developers
18  and increases the reliability of applications.
19
20Concurrency
21  Transactions provide a way of managing concurrent updates to data.
22  Different programs operate on the data independently, without having
23  to use low-level techniques to moderate their access. Coordination
24  and synchronization happen via transactions.
25
26
27.. _using-transactions-label:
28
29Using transactions
30==================
31
32All activity in ZODB happens in the context of database connections
33and transactions.  Here's a simple example::
34
35  import ZODB, transaction
36  db = ZODB.DB(None) # Use a mapping storage
37  conn = db.open()
38
39  conn.root.x = 1
40  transaction.commit()
41
42.. -> src
43
44   >>> exec(src)
45
46In the example above, we used ``transaction.commit()`` to commit a
47transaction, making the change to ``conn.root`` permanent.  This is
48the most common way to use ZODB, at least historically.
49
50If we decide we don't want to commit a transaction, we can use
51``abort``::
52
53  conn.root.x = 2
54  transaction.abort() # conn.root.x goes back to 1
55
56.. -> src
57
58   >>> exec(src)
59   >>> conn.root.x
60   1
61   >>> conn.close()
62
63In this example, because we aborted the transaction, the value of
64``conn.root.x`` was rolled back to 1.
65
66There are a number of things going on here that deserve some
67explanation.  When using transactions, there are three kinds of
68objects involved:
69
70Transaction
71   Transactions represent units of work.  Each transaction has a beginning and
72   an end. Transactions provide the
73   :interface:`~transaction.interfaces.ITransaction` interface.
74
75Transaction manager
76   Transaction managers create transactions and
77   provide APIs to start and end transactions.  The transactions
78   managed are always sequential. There is always exactly one active
79   transaction associated with a transaction manager at any point in
80   time. Transaction managers provide the
81   :interface:`~transaction.interfaces.ITransactionManager` interface.
82
83Data manager
84   Data managers manage data associated with transactions.  ZODB
85   connections are data managers.  The details of how they interact
86   with transactions aren't important here.
87
88Explicit transaction managers
89-----------------------------
90
91ZODB connections have transaction managers associated with them when
92they're opened. When we call the database :meth:`~ZODB.DB.open` method
93without an argument, a thread-local transaction manager is used. Each
94thread has its own transaction manager.  When we called
95``transaction.commit()`` above we were calling commit on the
96thread-local transaction manager.
97
98Because we used a thread-local transaction manager, all of the work in
99the transaction needs to happen in the same thread.  Similarly, only
100one transaction can be active in a thread.
101
102If we want to run multiple simultaneous transactions in a single
103thread, or if we want to spread the work of a transaction over
104multiple threads [#bad-idea-using-multiple-threads-per-transaction]_,
105then we can create transaction managers ourselves and pass them to
106:meth:`~ZODB.DB.open`::
107
108  my_transaction_manager = transaction.TransactionManager()
109  conn = db.open(my_transaction_manager)
110  conn.root.x = 2
111  my_transaction_manager.commit()
112
113.. -> src
114
115   >>> exec(src)
116
117In this example, to commit our work, we called ``commit()`` on the
118transaction manager we created and passed to :meth:`~ZODB.DB.open`.
119
120Context managers
121----------------
122
123In the examples above, the transaction beginnings were
124implicit. Transactions were effectively
125[#implicit-transaction-creation]_ created when the transaction
126managers were created and when previous transactions were committed.
127We can create transactions explicitly using
128:meth:`~transaction.interfaces.ITransactionManager.begin`::
129
130  my_transaction_manager.begin()
131
132.. -> src
133
134   >>> exec(src)
135
136A more modern [#context-managers-are-new]_ way to manage transaction
137boundaries is to use context managers and the Python ``with``
138statement. Transaction managers are context managers, so we can use
139them with the ``with`` statement directly::
140
141  with my_transaction_manager as trans:
142     trans.note(u"incrementing x")
143     conn.root.x += 1
144
145.. -> src
146
147   >>> exec(src)
148   >>> conn.root.x
149   3
150
151
152When used as a context manager, a transaction manager explicitly
153begins a new transaction, executes the code block and commits the
154transaction if there isn't an error and aborts it if there is an
155error.
156
157We used ``as trans`` above to get the transaction.
158
159Databases provide the :meth:`~ZODB.DB.transaction` method to execute a code
160block as a transaction::
161
162  with db.transaction() as conn2:
163     conn2.root.x += 1
164
165.. -> src
166
167   >>> exec(src)
168
169This opens a connection, assignes it its own context manager, and
170executes the nested code in a transaction.  We used ``as conn2`` to
171get the connection.  The transaction boundaries are defined by the
172``with`` statement.
173
174Getting a connection's transaction manager
175------------------------------------------
176
177In the previous example, you may have wondered how one might get the
178current transaction. Every connection has an associated transaction
179manager, which is available as the ``transaction_manager`` attribute.
180So, for example, if we wanted to set a transaction note::
181
182
183  with db.transaction() as conn2:
184     conn2.transaction_manager.get().note(u"incrementing x again")
185     conn2.root.x += 1
186
187.. -> src
188
189   >>> exec(src)
190   >>> (db.history(conn.root()._p_oid)[0]['description'] ==
191   ...  u'incrementing x again')
192   True
193
194Here, we used the
195:meth:`~transaction.interfaces.ITransactionManager.get` method to get
196the current transaction.
197
198Connection isolation
199--------------------
200
201In the last few examples, we used a connection opened using
202:meth:`~ZODB.DB.transaction`.  This was distinct from and used a
203different transaction manager than the original connection. If we
204looked at the original connection, ``conn``, we'd see that it has the
205same value for ``x`` that we set earlier:
206
207  >>> conn.root.x
208  3
209
210This is because it's still in the same transaction that was begun when
211a change was last committed against it.  If we want to see changes, we
212have to begin a new transaction:
213
214  >>> trans = my_transaction_manager.begin()
215  >>> conn.root.x
216  5
217
218ZODB uses a timestamp-based commit protocol that provides `snapshot
219isolation <https://en.wikipedia.org/wiki/Snapshot_isolation>`_.
220Whenever we look at ZODB data, we see its state as of the time the
221transaction began.
222
223.. _conflicts-label:
224
225Conflict errors
226---------------
227
228As mentioned in the previous section, each connection sees and
229operates on a view of the database as of the transaction start time.
230If two connections modify the same object at the same time, one of the
231connections will get a conflict error when it tries to commit::
232
233  with db.transaction() as conn2:
234     conn2.root.x += 1
235
236  conn.root.x = 9
237  my_transaction_manager.commit() # will raise a conflict error
238
239.. -> src
240
241    >>> exec(src) # doctest: +ELLIPSIS
242    Traceback (most recent call last):
243    ...
244    ZODB.POSException.ConflictError: ...
245
246If we executed this code, we'd get a ``ConflictError`` exception on the
247last line.  After a conflict error is raised, we'd need to abort the
248transaction, or begin a new one, at which point we'd see the data as
249written by the other connection:
250
251    >>> my_transaction_manager.abort()
252    >>> conn.root.x
253    6
254
255The timestamp-based approach used by ZODB is referred to as an
256*optimistic* approach, because it works best if there are no
257conflicts.
258
259The best way to avoid conflicts is to design your application so that
260multiple connections don't update the same object at the same time.
261This isn't always easy.
262
263Sometimes you may need to queue some operations that update shared
264data structures, like indexes, so the updates can be made by a
265dedicated thread or process, without making simultaneous updates.
266
267Retrying transactions
268~~~~~~~~~~~~~~~~~~~~~
269
270The most common way to deal with conflict errors is to catch them and
271retry transactions.  To do this manually involves code that looks
272something like this::
273
274  max_attempts = 3
275  attempts = 0
276  while True:
277      try:
278          with transaction.manager:
279              ... code that updates a database
280      except transaction.interfaces.TransientError:
281          attempts += 1
282          if attempts == max_attempts:
283              raise
284      else:
285          break
286
287In the example above, we used ``transaction.manager`` to refer to the
288thread-local transaction manager, which we then used used with the
289``with`` statement.  When a conflict error occurs, the transaction
290must be aborted before retrying the update. Using the transaction
291manager as a context manager in the ``with`` statement takes care of this
292for us.
293
294The example above is rather tedious.  There are a number of tools to
295automate transaction retry.  The `transaction
296<http://zodb.readthedocs.io/en/latest/transactions.html#retrying-transactions>`_
297package provides a context-manager-based mechanism for retrying
298transactions::
299
300  for attempt in transaction.manager.attempts():
301      with attempt:
302          ... code that updates a database
303
304Which is shorter and simpler [#but-obscure]_.
305
306For Python web frameworks, there are WSGI [#wtf-wsgi]_ middle-ware
307components, such as `repoze.tm2
308<https://pypi.org/project/repoze.tm2/>`_ that align transaction
309boundaries with HTTP requests and retry transactions when there are
310transient errors.
311
312For applications like queue workers or `cron jobs
313<https://en.wikipedia.org/wiki/Cron>`_, conflicts can sometimes be
314allowed to fail, letting other queue workers or subsequent cron-job
315runs retry the work.
316
317Conflict resolution
318~~~~~~~~~~~~~~~~~~~
319
320ZODB provides a conflict-resolution framework for merging conflicting
321changes.  When conflicts occur, conflict resolution is used, when
322possible, to resolve the conflicts without raising a ConflictError to
323the application.
324
325Commonly used objects that implement conflict resolution are
326buckets and ``Length`` objects provided by the `BTree
327<https://pythonhosted.org/BTrees/>`_ package.
328
329The main data structures provided by BTrees, BTrees and TreeSets,
330spread their data over multiple objects.  The leaf-level objects,
331called *buckets*, allow distinct keys to be updated without causing
332conflicts [#usually-avoids-conflicts]_.
333
334``Length`` objects are conflict-free counters that merge changes by
335simply accumulating changes.
336
337.. caution::
338   Conflict resolution weakens consistency.  Resist the temptation to
339   try to implement conflict resolution yourself.  In the future, ZODB
340   will provide greater control over conflict resolution, including
341   the option of disabling it.
342
343   It's generally best to avoid conflicts in the first place, if possible.
344
345ZODB and atomicity
346==================
347
348ZODB provides atomic transactions. When using ZODB, it's important to
349align work with transactions.  Once a transaction is committed, it
350can't be rolled back [#undo]_ automatically.  For applications, this
351implies that work that should be atomic shouldn't be split over
352multiple transactions.  This may seem somewhat obvious, but the rule
353can be broken in non-obvious ways. For example a Web API that splits
354logical operations over multiple web requests, as is often done in
355`REST
356<https://en.wikipedia.org/wiki/Representational_state_transfer>`_
357APIs, violates this rule.
358
359Partial transaction error recovery using savepoints
360---------------------------------------------------
361
362A transaction can be split into multiple steps that can be rolled back
363individually.  This is done by creating savepoints.  Changes in a
364savepoint can be rolled back without rolling back an entire
365transaction::
366
367  import ZODB
368  db = ZODB.DB(None) # using a mapping storage
369  with db.transaction() as conn:
370      conn.root.x = 1
371      conn.root.y = 0
372      savepoint = conn.transaction_manager.savepoint()
373      conn.root.y = 2
374      savepoint.rollback()
375
376  with db.transaction() as conn:
377      print([conn.root.x, conn.root.y]) # prints 1 0
378
379.. -> src
380
381   >>> exec(src)
382   [1, 0]
383
384If we executed this code, it would print 1 and 0, because while the
385initial changes were committed, the changes in the savepoint were
386rolled back.
387
388A secondary benefit of savepoints is that they save any changes made
389before the savepoint to a file, so that memory of changed objects can
390be freed if they aren't used later in the transaction.
391
392Concurrency, threads and processes
393==================================
394
395ZODB supports concurrency through transactions.  Multiple programs
396[#wtf-program]_ can operate independently in separate transactions.
397They synchronize at transaction boundaries.
398
399The most common way to run ZODB is with each program running in its
400own thread.  Usually the thread-local transaction manager is used.
401
402You can use multiple threads per transaction and you can run multiple
403transactions in a single thread. To do this, you need to instantiate
404and use your own transaction manager, as described in `Explicit
405transaction managers`_.  To run multiple transaction managers
406simultaneously in a thread, you need to use a separate transaction
407manager for each transaction.
408
409To spread a transaction over multiple threads, you need to keep in
410mind that database connections, transaction managers and transactions
411are **not thread-safe**.  You have to prevent simultaneous access from
412multiple threads.  For this reason, **using multiple threads with a
413single transaction is not recommended**, but it is possible with care.
414
415Using multiple processes
416------------------------
417
418Using multiple Python processes is a good way to scale an application
419horizontally, especially given Python's `global interpreter lock
420<https://wiki.python.org/moin/GlobalInterpreterLock>`_.
421
422Some things to keep in mind when utilizing multiple processes:
423
424- If using the :mod:`multiprocessing` module, you can't
425  [#cant-share-now]_ share databases or connections between
426  processes. When you launch a subprocess, you'll need to
427  re-instantiate your storage and database.
428
429- You'll need to use a storage such as `ZEO
430  <https://github.com/zopefoundation/ZEO>`_, `RelStorage
431  <http://relstorage.readthedocs.io/en/latest/>`_, or `NEO
432  <http://www.neoppod.org/>`_, that supports multiple processes.  None
433  of the included storages do.
434
435.. [#but-obscure] But also a bit obscure.  The Python context-manager
436   mechanism isn't a great fit for the transaction-retry use case.
437
438.. [#wtf-wsgi] `Web Server Gateway Interface
439   <http://wsgi.readthedocs.io/en/latest/>`_
440
441.. [#usually-avoids-conflicts] Conflicts can still occur when buckets
442   split due to added objects causing them to exceed their maximum size.
443
444.. [#undo] Transactions can't be rolled back, but they may be undone
445   in some cases, especially if subsequent transactions
446   haven't modified the same objects.
447
448.. [#bad-idea-using-multiple-threads-per-transaction] While it's
449   possible to spread transaction work over multiple threads, **it's
450   not a good idea**. See `Concurrency, threads and processes`_
451
452.. [#implicit-transaction-creation] Transactions are implicitly
453   created when needed, such as when data are first modified.
454
455.. [#context-managers-are-new] ZODB and the transaction package
456   predate context managers and the Python ``with`` statement.
457
458.. [#wtf-program] We're using *program* here in a fairly general
459   sense, meaning some logic that we want to run to
460   perform some function, as opposed to an operating system program.
461
462.. [#cant-share-now] at least not now.
463