1============================ 2Transactions and concurrency 3============================ 4 5.. contents:: 6 7`Transactions <https://en.wikipedia.org/wiki/Database_transaction>`_ 8are a core feature of ZODB. Much has been written about transactions, 9and we won't go into much detail here. Transactions provide two core 10benefits: 11 12Atomicity 13 When a transaction executes, it succeeds or fails completely. If 14 some data are updated and then an error occurs, causing the 15 transaction to fail, the updates are rolled back automatically. The 16 application using the transactional system doesn't have to undo 17 partial changes. This takes a significant burden from developers 18 and increases the reliability of applications. 19 20Concurrency 21 Transactions provide a way of managing concurrent updates to data. 22 Different programs operate on the data independently, without having 23 to use low-level techniques to moderate their access. Coordination 24 and synchronization happen via transactions. 25 26 27.. _using-transactions-label: 28 29Using transactions 30================== 31 32All activity in ZODB happens in the context of database connections 33and transactions. Here's a simple example:: 34 35 import ZODB, transaction 36 db = ZODB.DB(None) # Use a mapping storage 37 conn = db.open() 38 39 conn.root.x = 1 40 transaction.commit() 41 42.. -> src 43 44 >>> exec(src) 45 46In the example above, we used ``transaction.commit()`` to commit a 47transaction, making the change to ``conn.root`` permanent. This is 48the most common way to use ZODB, at least historically. 49 50If we decide we don't want to commit a transaction, we can use 51``abort``:: 52 53 conn.root.x = 2 54 transaction.abort() # conn.root.x goes back to 1 55 56.. -> src 57 58 >>> exec(src) 59 >>> conn.root.x 60 1 61 >>> conn.close() 62 63In this example, because we aborted the transaction, the value of 64``conn.root.x`` was rolled back to 1. 65 66There are a number of things going on here that deserve some 67explanation. When using transactions, there are three kinds of 68objects involved: 69 70Transaction 71 Transactions represent units of work. Each transaction has a beginning and 72 an end. Transactions provide the 73 :interface:`~transaction.interfaces.ITransaction` interface. 74 75Transaction manager 76 Transaction managers create transactions and 77 provide APIs to start and end transactions. The transactions 78 managed are always sequential. There is always exactly one active 79 transaction associated with a transaction manager at any point in 80 time. Transaction managers provide the 81 :interface:`~transaction.interfaces.ITransactionManager` interface. 82 83Data manager 84 Data managers manage data associated with transactions. ZODB 85 connections are data managers. The details of how they interact 86 with transactions aren't important here. 87 88Explicit transaction managers 89----------------------------- 90 91ZODB connections have transaction managers associated with them when 92they're opened. When we call the database :meth:`~ZODB.DB.open` method 93without an argument, a thread-local transaction manager is used. Each 94thread has its own transaction manager. When we called 95``transaction.commit()`` above we were calling commit on the 96thread-local transaction manager. 97 98Because we used a thread-local transaction manager, all of the work in 99the transaction needs to happen in the same thread. Similarly, only 100one transaction can be active in a thread. 101 102If we want to run multiple simultaneous transactions in a single 103thread, or if we want to spread the work of a transaction over 104multiple threads [#bad-idea-using-multiple-threads-per-transaction]_, 105then we can create transaction managers ourselves and pass them to 106:meth:`~ZODB.DB.open`:: 107 108 my_transaction_manager = transaction.TransactionManager() 109 conn = db.open(my_transaction_manager) 110 conn.root.x = 2 111 my_transaction_manager.commit() 112 113.. -> src 114 115 >>> exec(src) 116 117In this example, to commit our work, we called ``commit()`` on the 118transaction manager we created and passed to :meth:`~ZODB.DB.open`. 119 120Context managers 121---------------- 122 123In the examples above, the transaction beginnings were 124implicit. Transactions were effectively 125[#implicit-transaction-creation]_ created when the transaction 126managers were created and when previous transactions were committed. 127We can create transactions explicitly using 128:meth:`~transaction.interfaces.ITransactionManager.begin`:: 129 130 my_transaction_manager.begin() 131 132.. -> src 133 134 >>> exec(src) 135 136A more modern [#context-managers-are-new]_ way to manage transaction 137boundaries is to use context managers and the Python ``with`` 138statement. Transaction managers are context managers, so we can use 139them with the ``with`` statement directly:: 140 141 with my_transaction_manager as trans: 142 trans.note(u"incrementing x") 143 conn.root.x += 1 144 145.. -> src 146 147 >>> exec(src) 148 >>> conn.root.x 149 3 150 151 152When used as a context manager, a transaction manager explicitly 153begins a new transaction, executes the code block and commits the 154transaction if there isn't an error and aborts it if there is an 155error. 156 157We used ``as trans`` above to get the transaction. 158 159Databases provide the :meth:`~ZODB.DB.transaction` method to execute a code 160block as a transaction:: 161 162 with db.transaction() as conn2: 163 conn2.root.x += 1 164 165.. -> src 166 167 >>> exec(src) 168 169This opens a connection, assignes it its own context manager, and 170executes the nested code in a transaction. We used ``as conn2`` to 171get the connection. The transaction boundaries are defined by the 172``with`` statement. 173 174Getting a connection's transaction manager 175------------------------------------------ 176 177In the previous example, you may have wondered how one might get the 178current transaction. Every connection has an associated transaction 179manager, which is available as the ``transaction_manager`` attribute. 180So, for example, if we wanted to set a transaction note:: 181 182 183 with db.transaction() as conn2: 184 conn2.transaction_manager.get().note(u"incrementing x again") 185 conn2.root.x += 1 186 187.. -> src 188 189 >>> exec(src) 190 >>> (db.history(conn.root()._p_oid)[0]['description'] == 191 ... u'incrementing x again') 192 True 193 194Here, we used the 195:meth:`~transaction.interfaces.ITransactionManager.get` method to get 196the current transaction. 197 198Connection isolation 199-------------------- 200 201In the last few examples, we used a connection opened using 202:meth:`~ZODB.DB.transaction`. This was distinct from and used a 203different transaction manager than the original connection. If we 204looked at the original connection, ``conn``, we'd see that it has the 205same value for ``x`` that we set earlier: 206 207 >>> conn.root.x 208 3 209 210This is because it's still in the same transaction that was begun when 211a change was last committed against it. If we want to see changes, we 212have to begin a new transaction: 213 214 >>> trans = my_transaction_manager.begin() 215 >>> conn.root.x 216 5 217 218ZODB uses a timestamp-based commit protocol that provides `snapshot 219isolation <https://en.wikipedia.org/wiki/Snapshot_isolation>`_. 220Whenever we look at ZODB data, we see its state as of the time the 221transaction began. 222 223.. _conflicts-label: 224 225Conflict errors 226--------------- 227 228As mentioned in the previous section, each connection sees and 229operates on a view of the database as of the transaction start time. 230If two connections modify the same object at the same time, one of the 231connections will get a conflict error when it tries to commit:: 232 233 with db.transaction() as conn2: 234 conn2.root.x += 1 235 236 conn.root.x = 9 237 my_transaction_manager.commit() # will raise a conflict error 238 239.. -> src 240 241 >>> exec(src) # doctest: +ELLIPSIS 242 Traceback (most recent call last): 243 ... 244 ZODB.POSException.ConflictError: ... 245 246If we executed this code, we'd get a ``ConflictError`` exception on the 247last line. After a conflict error is raised, we'd need to abort the 248transaction, or begin a new one, at which point we'd see the data as 249written by the other connection: 250 251 >>> my_transaction_manager.abort() 252 >>> conn.root.x 253 6 254 255The timestamp-based approach used by ZODB is referred to as an 256*optimistic* approach, because it works best if there are no 257conflicts. 258 259The best way to avoid conflicts is to design your application so that 260multiple connections don't update the same object at the same time. 261This isn't always easy. 262 263Sometimes you may need to queue some operations that update shared 264data structures, like indexes, so the updates can be made by a 265dedicated thread or process, without making simultaneous updates. 266 267Retrying transactions 268~~~~~~~~~~~~~~~~~~~~~ 269 270The most common way to deal with conflict errors is to catch them and 271retry transactions. To do this manually involves code that looks 272something like this:: 273 274 max_attempts = 3 275 attempts = 0 276 while True: 277 try: 278 with transaction.manager: 279 ... code that updates a database 280 except transaction.interfaces.TransientError: 281 attempts += 1 282 if attempts == max_attempts: 283 raise 284 else: 285 break 286 287In the example above, we used ``transaction.manager`` to refer to the 288thread-local transaction manager, which we then used used with the 289``with`` statement. When a conflict error occurs, the transaction 290must be aborted before retrying the update. Using the transaction 291manager as a context manager in the ``with`` statement takes care of this 292for us. 293 294The example above is rather tedious. There are a number of tools to 295automate transaction retry. The `transaction 296<http://zodb.readthedocs.io/en/latest/transactions.html#retrying-transactions>`_ 297package provides a context-manager-based mechanism for retrying 298transactions:: 299 300 for attempt in transaction.manager.attempts(): 301 with attempt: 302 ... code that updates a database 303 304Which is shorter and simpler [#but-obscure]_. 305 306For Python web frameworks, there are WSGI [#wtf-wsgi]_ middle-ware 307components, such as `repoze.tm2 308<https://pypi.org/project/repoze.tm2/>`_ that align transaction 309boundaries with HTTP requests and retry transactions when there are 310transient errors. 311 312For applications like queue workers or `cron jobs 313<https://en.wikipedia.org/wiki/Cron>`_, conflicts can sometimes be 314allowed to fail, letting other queue workers or subsequent cron-job 315runs retry the work. 316 317Conflict resolution 318~~~~~~~~~~~~~~~~~~~ 319 320ZODB provides a conflict-resolution framework for merging conflicting 321changes. When conflicts occur, conflict resolution is used, when 322possible, to resolve the conflicts without raising a ConflictError to 323the application. 324 325Commonly used objects that implement conflict resolution are 326buckets and ``Length`` objects provided by the `BTree 327<https://pythonhosted.org/BTrees/>`_ package. 328 329The main data structures provided by BTrees, BTrees and TreeSets, 330spread their data over multiple objects. The leaf-level objects, 331called *buckets*, allow distinct keys to be updated without causing 332conflicts [#usually-avoids-conflicts]_. 333 334``Length`` objects are conflict-free counters that merge changes by 335simply accumulating changes. 336 337.. caution:: 338 Conflict resolution weakens consistency. Resist the temptation to 339 try to implement conflict resolution yourself. In the future, ZODB 340 will provide greater control over conflict resolution, including 341 the option of disabling it. 342 343 It's generally best to avoid conflicts in the first place, if possible. 344 345ZODB and atomicity 346================== 347 348ZODB provides atomic transactions. When using ZODB, it's important to 349align work with transactions. Once a transaction is committed, it 350can't be rolled back [#undo]_ automatically. For applications, this 351implies that work that should be atomic shouldn't be split over 352multiple transactions. This may seem somewhat obvious, but the rule 353can be broken in non-obvious ways. For example a Web API that splits 354logical operations over multiple web requests, as is often done in 355`REST 356<https://en.wikipedia.org/wiki/Representational_state_transfer>`_ 357APIs, violates this rule. 358 359Partial transaction error recovery using savepoints 360--------------------------------------------------- 361 362A transaction can be split into multiple steps that can be rolled back 363individually. This is done by creating savepoints. Changes in a 364savepoint can be rolled back without rolling back an entire 365transaction:: 366 367 import ZODB 368 db = ZODB.DB(None) # using a mapping storage 369 with db.transaction() as conn: 370 conn.root.x = 1 371 conn.root.y = 0 372 savepoint = conn.transaction_manager.savepoint() 373 conn.root.y = 2 374 savepoint.rollback() 375 376 with db.transaction() as conn: 377 print([conn.root.x, conn.root.y]) # prints 1 0 378 379.. -> src 380 381 >>> exec(src) 382 [1, 0] 383 384If we executed this code, it would print 1 and 0, because while the 385initial changes were committed, the changes in the savepoint were 386rolled back. 387 388A secondary benefit of savepoints is that they save any changes made 389before the savepoint to a file, so that memory of changed objects can 390be freed if they aren't used later in the transaction. 391 392Concurrency, threads and processes 393================================== 394 395ZODB supports concurrency through transactions. Multiple programs 396[#wtf-program]_ can operate independently in separate transactions. 397They synchronize at transaction boundaries. 398 399The most common way to run ZODB is with each program running in its 400own thread. Usually the thread-local transaction manager is used. 401 402You can use multiple threads per transaction and you can run multiple 403transactions in a single thread. To do this, you need to instantiate 404and use your own transaction manager, as described in `Explicit 405transaction managers`_. To run multiple transaction managers 406simultaneously in a thread, you need to use a separate transaction 407manager for each transaction. 408 409To spread a transaction over multiple threads, you need to keep in 410mind that database connections, transaction managers and transactions 411are **not thread-safe**. You have to prevent simultaneous access from 412multiple threads. For this reason, **using multiple threads with a 413single transaction is not recommended**, but it is possible with care. 414 415Using multiple processes 416------------------------ 417 418Using multiple Python processes is a good way to scale an application 419horizontally, especially given Python's `global interpreter lock 420<https://wiki.python.org/moin/GlobalInterpreterLock>`_. 421 422Some things to keep in mind when utilizing multiple processes: 423 424- If using the :mod:`multiprocessing` module, you can't 425 [#cant-share-now]_ share databases or connections between 426 processes. When you launch a subprocess, you'll need to 427 re-instantiate your storage and database. 428 429- You'll need to use a storage such as `ZEO 430 <https://github.com/zopefoundation/ZEO>`_, `RelStorage 431 <http://relstorage.readthedocs.io/en/latest/>`_, or `NEO 432 <http://www.neoppod.org/>`_, that supports multiple processes. None 433 of the included storages do. 434 435.. [#but-obscure] But also a bit obscure. The Python context-manager 436 mechanism isn't a great fit for the transaction-retry use case. 437 438.. [#wtf-wsgi] `Web Server Gateway Interface 439 <http://wsgi.readthedocs.io/en/latest/>`_ 440 441.. [#usually-avoids-conflicts] Conflicts can still occur when buckets 442 split due to added objects causing them to exceed their maximum size. 443 444.. [#undo] Transactions can't be rolled back, but they may be undone 445 in some cases, especially if subsequent transactions 446 haven't modified the same objects. 447 448.. [#bad-idea-using-multiple-threads-per-transaction] While it's 449 possible to spread transaction work over multiple threads, **it's 450 not a good idea**. See `Concurrency, threads and processes`_ 451 452.. [#implicit-transaction-creation] Transactions are implicitly 453 created when needed, such as when data are first modified. 454 455.. [#context-managers-are-new] ZODB and the transaction package 456 predate context managers and the Python ``with`` statement. 457 458.. [#wtf-program] We're using *program* here in a fairly general 459 sense, meaning some logic that we want to run to 460 perform some function, as opposed to an operating system program. 461 462.. [#cant-share-now] at least not now. 463