1==================
2Transactions Tests
3==================
4
5.. contents::
6
7----
8
9Introduction
10============
11
12The YAML and JSON files in this directory are platform-independent tests that
13drivers can use to prove their conformance to the Transactions Spec. They are
14designed with the intention of sharing some test-runner code with the CRUD Spec
15tests and the Command Monitoring Spec tests.
16
17Several prose tests, which are not easily expressed in YAML, are also presented
18in this file. Those tests will need to be manually implemented by each driver.
19
20Server Fail Point
21=================
22
23failCommand
24```````````
25
26Some tests depend on a server fail point, expressed in the ``failPoint`` field.
27For example the ``failCommand`` fail point allows the client to force the
28server to return an error. Keep in mind that the fail point only triggers for
29commands listed in the "failCommands" field. See `SERVER-35004`_ and
30`SERVER-35083`_ for more information.
31
32.. _SERVER-35004: https://jira.mongodb.org/browse/SERVER-35004
33.. _SERVER-35083: https://jira.mongodb.org/browse/SERVER-35083
34
35The ``failCommand`` fail point may be configured like so::
36
37 db.adminCommand({
38 configureFailPoint: "failCommand",
39 mode: <string|document>,
40 data: {
41 failCommands: ["commandName", "commandName2"],
42 closeConnection: <true|false>,
43 errorCode: <Number>,
44 writeConcernError: <document>
45 }
46 });
47
48``mode`` is a generic fail point option and may be assigned a string or document
49value. The string values ``"alwaysOn"`` and ``"off"`` may be used to enable or
50disable the fail point, respectively. A document may be used to specify either
51``times`` or ``skip``, which are mutually exclusive:
52
53- ``{ times: <integer> }`` may be used to limit the number of times the fail
54 point may trigger before transitioning to ``"off"``.
55- ``{ skip: <integer> }`` may be used to defer the first trigger of a fail
56 point, after which it will transition to ``"alwaysOn"``.
57
58The ``data`` option is a document that may be used to specify options that
59control the fail point's behavior. ``failCommand`` supports the following
60``data`` options, which may be combined if desired:
61
62- ``failCommands``: Required, the list of command names to fail.
63- ``closeConnection``: Boolean option, which defaults to ``false``. If
64 ``true``, the command will not be executed, the connection will be closed, and
65 the client will see a network error.
66- ``errorCode``: Integer option, which is unset by default. If set, the command
67 will not be executed and the specified command error code will be returned as
68 a command error.
69- ``writeConcernError``: A document, which is unset by default. If set, the
70 server will return this document in the "writeConcernError" field. This
71 failure response only applies to commands that support write concern and
72 happens *after* the command finishes (regardless of success or failure).
73
74Test Format
75===========
76
77Each YAML file has the following keys:
78
79- ``runOn`` (optional): An array of server version and/or topology requirements
80 for which the tests can be run. If the test environment satisfies one or more
81 of these requirements, the tests may be executed; otherwise, this file should
82 be skipped. If this field is omitted, the tests can be assumed to have no
83 particular requirements and should be executed. Each element will have some or
84 all of the following fields:
85
86 - ``minServerVersion`` (optional): The minimum server version (inclusive)
87 required to successfully run the tests. If this field is omitted, it should
88 be assumed that there is no lower bound on the required server version.
89
90 - ``maxServerVersion`` (optional): The maximum server version (inclusive)
91 against which the tests can be run successfully. If this field is omitted,
92 it should be assumed that there is no upper bound on the required server
93 version.
94
95 - ``topology`` (optional): An array of server topologies against which the
96 tests can be run successfully. Valid topologies are "single", "replicaset",
97 and "sharded". If this field is omitted, the default is all topologies (i.e.
98 ``["single", "replicaset", "sharded"]``).
99
100- ``database_name`` and ``collection_name``: The database and collection to use
101 for testing.
102
103- ``data``: The data that should exist in the collection under test before each
104 test run.
105
106- ``tests``: An array of tests that are to be run independently of each other.
107 Each test will have some or all of the following fields:
108
109 - ``description``: The name of the test.
110
111 - ``skipReason``: Optional, string describing why this test should be
112 skipped.
113
114 - ``useMultipleMongoses`` (optional): If ``true``, the MongoClient for this
115 test should be initialized with multiple mongos seed addresses. If ``false``
116 or omitted, only a single mongos address should be specified. This field has
117 no effect for non-sharded topologies.
118
119 - ``clientOptions``: Optional, parameters to pass to MongoClient().
120
121 - ``failPoint``: Optional, a server failpoint to enable expressed as the
122 configureFailPoint command to run on the admin database. This option and
123 ``useMultipleMongoses: true`` are mutually exclusive.
124
125 - ``sessionOptions``: Optional, map of session names (e.g. "session0") to
126 parameters to pass to MongoClient.startSession() when creating that session.
127
128 - ``operations``: Array of documents, each describing an operation to be
129 executed. Each document has the following fields:
130
131 - ``name``: The name of the operation on ``object``.
132
133 - ``object``: The name of the object to perform the operation on. Can be
134 "database", "collection", "session0", "session1", or "testRunner". See
135 the "targetedFailPoint" operation in `Special Test Operations`_.
136
137 - ``collectionOptions``: Optional, parameters to pass to the Collection()
138 used for this operation.
139
140 - ``databaseOptions``: Optional, parameters to pass to the Database()
141 used for this operation.
142
143 - ``command_name``: Present only when ``name`` is "runCommand". The name
144 of the command to run. Required for languages that are unable preserve
145 the order keys in the "command" argument when parsing JSON/YAML.
146
147 - ``arguments``: Optional, the names and values of arguments.
148
149 - ``error``: Optional. If true, the test should expect an error or
150 exception. This could be a server-generated or a driver-generated error.
151
152 - ``result``: The return value from the operation, if any. This field may
153 be a single document or an array of documents in the case of a
154 multi-document read. If the operation is expected to return an error, the
155 ``result`` is a single document that has one or more of the following
156 fields:
157
158 - ``errorContains``: A substring of the expected error message.
159
160 - ``errorCodeName``: The expected "codeName" field in the server
161 error response.
162
163 - ``errorLabelsContain``: A list of error label strings that the
164 error is expected to have.
165
166 - ``errorLabelsOmit``: A list of error label strings that the
167 error is expected not to have.
168
169 - ``expectations``: Optional list of command-started events.
170
171 - ``outcome``: Document describing the return value and/or expected state of
172 the collection after the operation is executed. Contains the following
173 fields:
174
175 - ``collection``:
176
177 - ``data``: The data that should exist in the collection after the
178 operations have run.
179
180Use as Integration Tests
181========================
182
183Run a MongoDB replica set with a primary, a secondary, and an arbiter,
184**server version 4.0.0 or later**. (Including a secondary ensures that
185server selection in a transaction works properly. Including an arbiter helps
186ensure that no new bugs have been introduced related to arbiters.)
187
188A driver that implements support for sharded transactions MUST also run these
189tests against a MongoDB sharded cluster with multiple mongoses and
190**server version 4.2 or later**. Some tests require
191initializing the MongoClient with multiple mongos seeds to ensures that mongos
192transaction pinning and the recoveryToken works properly.
193
194Load each YAML (or JSON) file using a Canonical Extended JSON parser.
195
196Then for each element in ``tests``:
197
198#. If the ``skipReason`` field is present, skip this test completely.
199#. Create a MongoClient and call
200 ``client.admin.runCommand({killAllSessions: []})`` to clean up any open
201 transactions from previous test failures. Ignore a command failure with
202 error code 11601 ("Interrupted") to work around `SERVER-38335`_.
203
204 - Running ``killAllSessions`` cleans up any open transactions from
205 a previously failed test to prevent the current test from blocking.
206 It is sufficient to run this command once before starting the test suite
207 and once after each failed test.
208 - When testing against a sharded cluster run this command on ALL mongoses.
209
210#. Create a collection object from the MongoClient, using the ``database_name``
211 and ``collection_name`` fields of the YAML file.
212#. Drop the test collection, using writeConcern "majority".
213#. Execute the "create" command to recreate the collection, using writeConcern
214 "majority". (Creating the collection inside a transaction is prohibited, so
215 create it explicitly.)
216#. If the YAML file contains a ``data`` array, insert the documents in ``data``
217 into the test collection, using writeConcern "majority".
218#. When testing against a sharded cluster run a ``distinct`` command on the
219 newly created collection on all mongoses. For an explanation see,
220 `Why do tests that run distinct sometimes fail with StaleDbVersion?`_
221#. If ``failPoint`` is specified, its value is a configureFailPoint command.
222 Run the command on the admin database to enable the fail point.
223#. Create a **new** MongoClient ``client``, with Command Monitoring listeners
224 enabled. (Using a new MongoClient for each test ensures a fresh session pool
225 that hasn't executed any transactions previously, so the tests can assert
226 actual txnNumbers, starting from 1.) Pass this test's ``clientOptions`` if
227 present.
228
229 - When testing against a sharded cluster and ``useMultipleMongoses`` is
230 ``true`` the client MUST be created with multiple (valid) mongos seed
231 addreses.
232
233#. Call ``client.startSession`` twice to create ClientSession objects
234 ``session0`` and ``session1``, using the test's "sessionOptions" if they
235 are present. Save their lsids so they are available after calling
236 ``endSession``, see `Logical Session Id`_.
237#. For each element in ``operations``:
238
239 - If the operation ``name`` is a special test operation type, execute it and
240 go to the next operation, otherwise proceed to the next step.
241 - Enter a "try" block or your programming language's closest equivalent.
242 - Create a Database object from the MongoClient, using the ``database_name``
243 field at the top level of the test file.
244 - Create a Collection object from the Database, using the
245 ``collection_name`` field at the top level of the test file.
246 If ``collectionOptions`` or ``databaseOptions`` is present, create the
247 Collection or Database object with the provided options, respectively.
248 Otherwise create the object with the default options.
249 - Execute the named method on the provided ``object``, passing the
250 arguments listed. Pass ``session0`` or ``session1`` to the method,
251 depending on which session's name is in the arguments list.
252 If ``arguments`` contains no "session", pass no explicit session to the
253 method.
254 - If the driver throws an exception / returns an error while executing this
255 series of operations, store the error message and server error code.
256 - If the operation's ``error`` field is ``true``, verify that the method
257 threw an exception or returned an error.
258 - If the result document has an "errorContains" field, verify that the
259 method threw an exception or returned an error, and that the value of the
260 "errorContains" field matches the error string. "errorContains" is a
261 substring (case-insensitive) of the actual error message.
262
263 If the result document has an "errorCodeName" field, verify that the
264 method threw a command failed exception or returned an error, and that
265 the value of the "errorCodeName" field matches the "codeName" in the
266 server error response.
267
268 If the result document has an "errorLabelsContain" field, verify that the
269 method threw an exception or returned an error. Verify that all of the
270 error labels in "errorLabelsContain" are present in the error or exception
271 using the ``hasErrorLabel`` method.
272
273 If the result document has an "errorLabelsOmit" field, verify that the
274 method threw an exception or returned an error. Verify that none of the
275 error labels in "errorLabelsOmit" are present in the error or exception
276 using the ``hasErrorLabel`` method.
277 - If the operation returns a raw command response, eg from ``runCommand``,
278 then compare only the fields present in the expected result document.
279 Otherwise, compare the method's return value to ``result`` using the same
280 logic as the CRUD Spec Tests runner.
281
282#. Call ``session0.endSession()`` and ``session1.endSession``.
283#. If the test includes a list of command-started events in ``expectations``,
284 compare them to the actual command-started events using the
285 same logic as the Command Monitoring Spec Tests runner, plus the rules in
286 the Command-Started Events instructions below.
287#. If ``failPoint`` is specified, disable the fail point to avoid spurious
288 failures in subsequent tests. The fail point may be disabled like so::
289
290 db.adminCommand({
291 configureFailPoint: <fail point name>,
292 mode: "off"
293 });
294
295#. For each element in ``outcome``:
296
297 - If ``name`` is "collection", verify that the test collection contains
298 exactly the documents in the ``data`` array. Ensure this find reads the
299 latest data by using **primary read preference** with
300 **local read concern** even when the MongoClient is configured with
301 another read preference or read concern.
302
303.. _SERVER-38335: https://jira.mongodb.org/browse/SERVER-38335
304
305Special Test Operations
306```````````````````````
307
308Certain operations that appear in the "operations" array do not correspond to
309API methods but instead represent special test operations. Such operations are
310defined on the "testRunner" object and documented here:
311
312targetedFailPoint
313~~~~~~~~~~~~~~~~~
314
315The "targetedFailPoint" operation instructs the test runner to configure a fail
316point on a specific mongos. The mongos to run the ``configureFailPoint`` is
317determined by the "session" argument (either "session0" or "session1").
318The session must already be pinned to a mongos server. The "failPoint" argument
319is the ``configureFailPoint`` command to run.
320
321If a test uses ``targetedFailPoint``, disable the fail point after running
322all ``operations`` to avoid spurious failures in subsequent tests. The fail
323point may be disabled like so::
324
325 db.adminCommand({
326 configureFailPoint: <fail point name>,
327 mode: "off"
328 });
329
330Here is an example which instructs the test runner to enable the failCommand
331fail point on the mongos server which "session0" is pinned to::
332
333 # Enable the fail point only on the Mongos that session0 is pinned to.
334 - name: targetedFailPoint
335 object: testRunner
336 arguments:
337 session: session0
338 failPoint:
339 configureFailPoint: failCommand
340 mode: { times: 1 }
341 data:
342 failCommands: ["commitTransaction"]
343 closeConnection: true
344
345Tests that use the "targetedFailPoint" operation do not include
346``configureFailPoint`` commands in their command expectations. Drivers MUST
347ensure that ``configureFailPoint`` commands do not appear in the list of logged
348commands, either by manually filtering it from the list of observed commands or
349by using a different MongoClient to execute ``configureFailPoint``.
350
351assertSessionTransactionState
352~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
353
354The "assertSessionTransactionState" operation instructs the test runner to
355assert that the transaction state of the given session is equal to the
356specified value. The possible values are as follows: ``none``, ``starting``,
357``in_progress``, ``committed``, ``aborted``::
358
359 - name: assertSessionTransactionState
360 object: testRunner
361 arguments:
362 session: session0
363 state: in_progress
364
365assertSessionPinned
366~~~~~~~~~~~~~~~~~~~
367
368The "assertSessionPinned" operation instructs the test runner to assert that
369the given session is pinned to a mongos::
370
371 - name: assertSessionPinned
372 object: testRunner
373 arguments:
374 session: session0
375
376assertSessionUnpinned
377~~~~~~~~~~~~~~~~~~~~~
378
379The "assertSessionUnpinned" operation instructs the test runner to assert that
380the given session is not pinned to a mongos::
381
382 - name: assertSessionPinned
383 object: testRunner
384 arguments:
385 session: session0
386
387Command-Started Events
388``````````````````````
389
390The event listener used for these tests MUST ignore the security commands
391listed in the Command Monitoring Spec.
392
393Logical Session Id
394~~~~~~~~~~~~~~~~~~
395
396Each command-started event in ``expectations`` includes an ``lsid`` with the
397value "session0" or "session1". Tests MUST assert that the command's actual
398``lsid`` matches the id of the correct ClientSession named ``session0`` or
399``session1``.
400
401Null Values
402~~~~~~~~~~~
403
404Some command-started events in ``expectations`` include ``null`` values for
405fields such as ``txnNumber``, ``autocommit``, and ``writeConcern``.
406Tests MUST assert that the actual command **omits** any field that has a
407``null`` value in the expected command.
408
409Cursor Id
410^^^^^^^^^
411
412A ``getMore`` value of ``"42"`` in a command-started event is a fake cursorId
413that MUST be ignored. (In the Command Monitoring Spec tests, fake cursorIds are
414correlated with real ones, but that is not necessary for Transactions Spec
415tests.)
416
417afterClusterTime
418^^^^^^^^^^^^^^^^
419
420A ``readConcern.afterClusterTime`` value of ``42`` in a command-started event
421is a fake cluster time. Drivers MUST assert that the actual command includes an
422afterClusterTime.
423
424recoveryToken
425^^^^^^^^^^^^^
426
427A ``recoveryToken`` value of ``42`` in a command-started event is a
428placeholder for an arbitrary recovery token. Drivers MUST assert that the
429actual command includes a "recoveryToken" field and SHOULD assert that field
430is a BSON document.
431
432Mongos Pinning Prose Tests
433==========================
434
435The following tests ensure that a ClientSession is properly unpinned after
436a sharded transaction. Initialize these tests with a MongoClient connected
437to multiple mongoses.
438
439These tests use a cursor's address field to track which server an operation
440was run on. If this is not possible in your driver, use command monitoring
441instead.
442
443#. Test that starting a new transaction on a pinned ClientSession unpins the
444 session and normal server selection is performed for the next operation.
445
446 .. code:: python
447
448 @require_server_version(4, 1, 6)
449 @require_mongos_count_at_least(2)
450 def test_unpin_for_next_transaction(self):
451 # Increase localThresholdMS and wait until both nodes are discovered
452 # to avoid false positives.
453 client = MongoClient(mongos_hosts, localThresholdMS=1000)
454 wait_until(lambda: len(client.nodes) > 1)
455 # Create the collection.
456 client.test.test.insert_one({})
457 with client.start_session() as s:
458 # Session is pinned to Mongos.
459 with s.start_transaction():
460 client.test.test.insert_one({}, session=s)
461
462 addresses = set()
463 for _ in range(50):
464 with s.start_transaction():
465 cursor = client.test.test.find({}, session=s)
466 assert next(cursor)
467 addresses.add(cursor.address)
468
469 assert len(addresses) > 1
470
471#. Test non-transaction operations using a pinned ClientSession unpins the
472 session and normal server selection is performed.
473
474 .. code:: python
475
476 @require_server_version(4, 1, 6)
477 @require_mongos_count_at_least(2)
478 def test_unpin_for_non_transaction_operation(self):
479 # Increase localThresholdMS and wait until both nodes are discovered
480 # to avoid false positives.
481 client = MongoClient(mongos_hosts, localThresholdMS=1000)
482 wait_until(lambda: len(client.nodes) > 1)
483 # Create the collection.
484 client.test.test.insert_one({})
485 with client.start_session() as s:
486 # Session is pinned to Mongos.
487 with s.start_transaction():
488 client.test.test.insert_one({}, session=s)
489
490 addresses = set()
491 for _ in range(50):
492 cursor = client.test.test.find({}, session=s)
493 assert next(cursor)
494 addresses.add(cursor.address)
495
496 assert len(addresses) > 1
497
498Q & A
499=====
500
501Why do some tests appear to hang for 60 seconds on a sharded cluster?
502`````````````````````````````````````````````````````````````````````
503
504There are two cases where this can happen. When the initial commitTransaction
505attempt fails on mongos A and is retried on mongos B, mongos B will block
506waiting for the transaction to complete. However because the initial commit
507attempt failed, the command will only complete after the transaction is
508automatically aborted for exceeding the shard's
509transactionLifetimeLimitSeconds setting. `SERVER-39726`_ requests that
510recovering the outcome of an uncommitted transaction should immediately abort
511the transaction.
512
513The second case is when a *single-shard* transaction is committed successfully
514on mongos A and then explicitly committed again on mongos B. Mongos B will also
515block until the transactionLifetimeLimitSeconds timeout is hit at which point
516``{ok:1}`` will be returned. `SERVER-39349`_ requests that recovering the
517outcome of a completed single-shard transaction should not block.
518Note that this test suite only includes single shard transactions.
519
520To workaround these issues, drivers SHOULD decrease the transaction timeout
521setting by running setParameter **on each shard**. Setting the timeout to 3
522seconds significantly speeds up the test suite without a high risk of
523prematurely timing out any tests' transactions. To decrease the timeout, run::
524
525 db.adminCommand( { setParameter: 1, transactionLifetimeLimitSeconds: 3 } )
526
527Note that mongo-orchestration >=0.6.13 automatically sets this timeout to 3
528seconds so drivers using mongo-orchestration do not need to run these commands
529manually.
530
531.. _SERVER-39726: https://jira.mongodb.org/browse/SERVER-39726
532
533.. _SERVER-39349: https://jira.mongodb.org/browse/SERVER-39349
534
535Why do tests that run distinct sometimes fail with StaleDbVersion?
536``````````````````````````````````````````````````````````````````
537
538When a shard receives its first command that contains a dbVersion, the shard
539returns a StaleDbVersion error and the Mongos retries the operation. In a
540sharded transaction, Mongos does not retry these operations and instead returns
541the error to the client. For example::
542
543 Command distinct failed: Transaction aa09e296-472a-494f-8334-48d57ab530b6:1 was aborted on statement 0 due to: an error from cluster data placement change :: caused by :: got stale databaseVersion response from shard sh01 at host localhost:27217 :: caused by :: don't know dbVersion.
544
545To workaround this limitation, a driver test runner MUST run a
546non-transactional ``distinct`` command on each Mongos before running any test
547that uses ``distinct``. To ease the implementation drivers can simply run
548``distinct`` before *every* test.
549
550Note that drivers can remove this workaround once `SERVER-39704`_ is resolved
551so that mongos retries this operation transparently. The ``distinct`` command
552is the only command allowed in a sharded transaction that uses the
553``dbVersion`` concept so it is the only command affected.
554
555.. _SERVER-39704: https://jira.mongodb.org/browse/SERVER-39704
556
557Changelog
558=========
559
560:2019-05-15: Add operation level ``error`` field to assert any error.
561:2019-03-25: Add workaround for StaleDbVersion on distinct.
562:2019-03-01: Add top-level ``runOn`` field to denote server version and/or
563 topology requirements requirements for the test file. Removes the
564 ``topology`` top-level field, which is now expressed within
565 ``runOn`` elements.
566:2019-02-28: ``useMultipleMongoses: true`` and non-targeted fail points are
567 mutually exclusive.
568:2019-02-13: Modify test format for 4.2 sharded transactions, including
569 "useMultipleMongoses", ``object: testRunner``, the
570 ``targetedFailPoint`` operation, and recoveryToken assertions.
571