1.. _handling-uuid-data-example:
2
3Handling UUID Data
4==================
5
6PyMongo ships with built-in support for dealing with UUID types.
7It is straightforward to store native :class:`uuid.UUID` objects
8to MongoDB and retrieve them as native :class:`uuid.UUID` objects::
9
10  from pymongo import MongoClient
11  from bson.binary import UuidRepresentation
12  from uuid import uuid4
13
14  # use the 'standard' representation for cross-language compatibility.
15  client = MongoClient(uuid_representation=UuidRepresentation.STANDARD)
16  collection = client.get_database('uuid_db').get_collection('uuid_coll')
17
18  # remove all documents from collection
19  collection.delete_many({})
20
21  # create a native uuid object
22  uuid_obj = uuid4()
23
24  # save the native uuid object to MongoDB
25  collection.insert_one({'uuid': uuid_obj})
26
27  # retrieve the stored uuid object from MongoDB
28  document = collection.find_one({})
29
30  # check that the retrieved UUID matches the inserted UUID
31  assert document['uuid'] == uuid_obj
32
33Native :class:`uuid.UUID` objects can also be used as part of MongoDB
34queries::
35
36  document = collection.find({'uuid': uuid_obj})
37  assert document['uuid'] == uuid_obj
38
39The above examples illustrate the simplest of use-cases - one where the
40UUID is generated by, and used in the same application. However,
41the situation can be significantly more complex when dealing with a MongoDB
42deployment that contains UUIDs created by other drivers as the Java and CSharp
43drivers have historically encoded UUIDs using a byte-order that is different
44from the one used by PyMongo. Applications that require interoperability across
45these drivers must specify the appropriate
46:class:`~bson.binary.UuidRepresentation`.
47
48In the following sections, we describe how drivers have historically differed
49in their encoding of UUIDs, and how applications can use the
50:class:`~bson.binary.UuidRepresentation` configuration option to maintain
51cross-language compatibility.
52
53.. attention:: New applications that do not share a MongoDB deployment with
54   any other application and that have never stored UUIDs in MongoDB
55   should use the ``standard`` UUID representation for cross-language
56   compatibility. See :ref:`configuring-uuid-representation` for details
57   on how to configure the :class:`~bson.binary.UuidRepresentation`.
58
59.. _example-legacy-uuid:
60
61Legacy Handling of UUID Data
62----------------------------
63
64Historically, MongoDB Drivers have used different byte-ordering
65while serializing UUID types to :class:`~bson.binary.Binary`.
66Consider, for instance, a UUID with the following canonical textual
67representation::
68
69  00112233-4455-6677-8899-aabbccddeeff
70
71This UUID would historically be serialized by the Python driver as::
72
73  00112233-4455-6677-8899-aabbccddeeff
74
75The same UUID would historically be serialized by the C# driver as::
76
77  33221100-5544-7766-8899-aabbccddeeff
78
79Finally, the same UUID would historically be serialized by the Java driver as::
80
81  77665544-3322-1100-ffee-ddccbbaa9988
82
83.. note:: For in-depth information about the the byte-order historically
84   used by different drivers, see the `Handling of Native UUID Types
85   Specification
86   <https://github.com/mongodb/specifications/blob/master/source/uuid.rst>`_.
87
88This difference in the byte-order of UUIDs encoded by different drivers can
89result in highly unintuitive behavior in some scenarios. We detail two such
90scenarios in the next sections.
91
92Scenario 1: Applications Share a MongoDB Deployment
93^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
94
95Consider the following situation:
96
97* Application ``C`` written in C# generates a UUID and uses it as the ``_id``
98  of a document that it proceeds to insert into the ``uuid_test`` collection of
99  the ``example_db`` database. Let's assume that the canonical textual
100  representation of the generated UUID is::
101
102    00112233-4455-6677-8899-aabbccddeeff
103
104* Application ``P`` written in Python attempts to ``find`` the document
105  written by application ``C`` in the following manner::
106
107    from uuid import UUID
108    collection = client.example_db.uuid_test
109    result = collection.find_one({'_id': UUID('00112233-4455-6677-8899-aabbccddeeff')})
110
111  In this instance, ``result`` will never be the document that
112  was inserted by application ``C`` in the previous step. This is because of
113  the different byte-order used by the C# driver for representing UUIDs as
114  BSON Binary. The following query, on the other hand, will successfully find
115  this document::
116
117    result = collection.find_one({'_id': UUID('33221100-5544-7766-8899-aabbccddeeff')})
118
119This example demonstrates how the differing byte-order used by different
120drivers can hamper interoperability. To workaround this problem, users should
121configure their ``MongoClient`` with the appropriate
122:class:`~bson.binary.UuidRepresentation` (in this case, ``client`` in application
123``P`` can be configured to use the
124:data:`~bson.binary.UuidRepresentation.CSHARP_LEGACY` representation to
125avoid the unintuitive behavior) as described in
126:ref:`configuring-uuid-representation`.
127
128Scenario 2: Round-Tripping UUIDs
129^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
130
131In the following examples, we see how using a misconfigured
132:class:`~bson.binary.UuidRepresentation` can cause an application
133to inadvertently change the :class:`~bson.binary.Binary` subtype, and in some
134cases, the bytes of the :class:`~bson.binary.Binary` field itself when
135round-tripping documents containing UUIDs.
136
137Consider the following situation::
138
139  from bson.codec_options import CodecOptions, DEFAULT_CODEC_OPTIONS
140  from bson.binary import Binary, UuidRepresentation
141  from uuid import uuid4
142
143  # Using UuidRepresentation.PYTHON_LEGACY stores a Binary subtype-3 UUID
144  python_opts = CodecOptions(uuid_representation=UuidRepresentation.PYTHON_LEGACY)
145  input_uuid = uuid4()
146  collection = client.testdb.get_collection('test', codec_options=python_opts)
147  collection.insert_one({'_id': 'foo', 'uuid': input_uuid})
148  assert collection.find_one({'uuid': Binary(input_uuid.bytes, 3)})['_id'] == 'foo'
149
150  # Retrieving this document using UuidRepresentation.STANDARD returns a native UUID
151  std_opts = CodecOptions(uuid_representation=UuidRepresentation.STANDARD)
152  std_collection = client.testdb.get_collection('test', codec_options=std_opts)
153  doc = std_collection.find_one({'_id': 'foo'})
154  assert doc['uuid'] == input_uuid
155
156  # Round-tripping the retrieved document silently changes the Binary subtype to 4
157  std_collection.replace_one({'_id': 'foo'}, doc)
158  assert collection.find_one({'uuid': Binary(input_uuid.bytes, 3)}) is None
159  round_tripped_doc = collection.find_one({'uuid': Binary(input_uuid.bytes, 4)})
160  assert doc == round_tripped_doc
161
162
163In this example, round-tripping the document using the incorrect
164:class:`~bson.binary.UuidRepresentation` (``STANDARD`` instead of
165``PYTHON_LEGACY``) changes the :class:`~bson.binary.Binary` subtype as a
166side-effect. **Note that this can also happen when the situation is reversed -
167i.e. when the original document is written using ``STANDARD`` representation
168and then round-tripped using the ``PYTHON_LEGACY`` representation.**
169
170In the next example, we see the consequences of incorrectly using a
171representation that modifies byte-order (``CSHARP_LEGACY`` or ``JAVA_LEGACY``)
172when round-tripping documents::
173
174  from bson.codec_options import CodecOptions, DEFAULT_CODEC_OPTIONS
175  from bson.binary import Binary, UuidRepresentation
176  from uuid import uuid4
177
178  # Using UuidRepresentation.STANDARD stores a Binary subtype-4 UUID
179  std_opts = CodecOptions(uuid_representation=UuidRepresentation.STANDARD)
180  input_uuid = uuid4()
181  collection = client.testdb.get_collection('test', codec_options=std_opts)
182  collection.insert_one({'_id': 'baz', 'uuid': input_uuid})
183  assert collection.find_one({'uuid': Binary(input_uuid.bytes, 4)})['_id'] == 'baz'
184
185  # Retrieving this document using UuidRepresentation.JAVA_LEGACY returns a native UUID
186  # without modifying the UUID byte-order
187  java_opts = CodecOptions(uuid_representation=UuidRepresentation.JAVA_LEGACY)
188  java_collection = client.testdb.get_collection('test', codec_options=java_opts)
189  doc = java_collection.find_one({'_id': 'baz'})
190  assert doc['uuid'] == input_uuid
191
192  # Round-tripping the retrieved document silently changes the Binary bytes and subtype
193  java_collection.replace_one({'_id': 'baz'}, doc)
194  assert collection.find_one({'uuid': Binary(input_uuid.bytes, 3)}) is None
195  assert collection.find_one({'uuid': Binary(input_uuid.bytes, 4)}) is None
196  round_tripped_doc = collection.find_one({'_id': 'baz'})
197  assert round_tripped_doc['uuid'] == Binary(input_uuid.bytes, 3).as_uuid(UuidRepresentation.JAVA_LEGACY)
198
199
200In this case, using the incorrect :class:`~bson.binary.UuidRepresentation`
201(``JAVA_LEGACY`` instead of ``STANDARD``) changes the
202:class:`~bson.binary.Binary` bytes and subtype as a side-effect.
203**Note that this happens when any representation that
204manipulates byte-order (``CSHARP_LEGACY`` or ``JAVA_LEGACY``) is incorrectly
205used to round-trip UUIDs written with ``STANDARD``. When the situation is
206reversed - i.e. when the original document is written using ``CSHARP_LEGACY``
207or ``JAVA_LEGACY`` and then round-tripped using ``STANDARD`` -
208only the :class:`~bson.binary.Binary` subtype is changed.**
209
210.. note:: Starting in PyMongo 4.0, these issue will be resolved as
211   the ``STANDARD`` representation will decode Binary subtype 3 fields as
212   :class:`~bson.binary.Binary` objects of subtype 3 (instead of
213   :class:`uuid.UUID`), and each of the ``LEGACY_*`` representations will
214   decode Binary subtype 4 fields to :class:`~bson.binary.Binary` objects of
215   subtype 4 (instead of :class:`uuid.UUID`).
216
217.. _configuring-uuid-representation:
218
219Configuring a UUID Representation
220---------------------------------
221
222Users can workaround the problems described above by configuring their
223applications with the appropriate :class:`~bson.binary.UuidRepresentation`.
224Configuring the representation modifies PyMongo's behavior while
225encoding :class:`uuid.UUID` objects to BSON and decoding
226Binary subtype 3 and 4 fields from BSON.
227
228Applications can set the UUID representation in one of the following ways:
229
230#. At the ``MongoClient`` level using the ``uuidRepresentation`` URI option,
231   e.g.::
232
233     client = MongoClient("mongodb://a:27107/?uuidRepresentation=javaLegacy")
234
235   Valid values are:
236
237   .. list-table::
238      :header-rows: 1
239
240      * - Value
241        - UUID Representation
242
243      * - ``pythonLegacy``
244        - :ref:`python-legacy-representation-details`
245
246      * - ``javaLegacy``
247        - :ref:`java-legacy-representation-details`
248
249      * - ``csharpLegacy``
250        - :ref:`csharp-legacy-representation-details`
251
252      * - ``standard``
253        - :ref:`standard-representation-details`
254
255      * - ``unspecified``
256        - :ref:`unspecified-representation-details`
257
258#. At the ``MongoClient`` level using the ``uuidRepresentation`` kwarg
259   option, e.g.::
260
261     from bson.binary import UuidRepresentation
262     client = MongoClient(uuidRepresentation=UuidRepresentation.PYTHON_LEGACY)
263
264#. At the ``Database`` or ``Collection`` level by supplying a suitable
265   :class:`~bson.codec_options.CodecOptions` instance, e.g.::
266
267     from bson.codec_options import CodecOptions
268     csharp_opts = CodecOptions(uuid_representation=UuidRepresentation.CSHARP_LEGACY)
269     java_opts = CodecOptions(uuid_representation=UuidRepresentation.JAVA_LEGACY)
270
271     # Get database/collection from client with csharpLegacy UUID representation
272     csharp_database = client.get_database('csharp_db', codec_options=csharp_opts)
273     csharp_collection = client.testdb.get_collection('csharp_coll', codec_options=csharp_opts)
274
275     # Get database/collection from existing database/collection with javaLegacy UUID representation
276     java_database = csharp_database.with_options(codec_options=java_opts)
277     java_collection = csharp_collection.with_options(codec_options=java_opts)
278
279Supported UUID Representations
280------------------------------
281
282.. list-table::
283   :header-rows: 1
284
285   * - UUID Representation
286     - Default?
287     - Encode :class:`uuid.UUID` to
288     - Decode :class:`~bson.binary.Binary` subtype 4 to
289     - Decode :class:`~bson.binary.Binary` subtype 3 to
290
291   * - :ref:`python-legacy-representation-details`
292     - Yes, in PyMongo>=2.9,<4
293     - :class:`~bson.binary.Binary` subtype 3 with standard byte-order
294     - :class:`uuid.UUID` in PyMongo<4; :class:`~bson.binary.Binary` subtype 4 in PyMongo>=4
295     - :class:`uuid.UUID`
296
297   * - :ref:`java-legacy-representation-details`
298     - No
299     - :class:`~bson.binary.Binary` subtype 3 with Java legacy byte-order
300     - :class:`uuid.UUID` in PyMongo<4; :class:`~bson.binary.Binary` subtype 4 in PyMongo>=4
301     - :class:`uuid.UUID`
302
303   * - :ref:`csharp-legacy-representation-details`
304     - No
305     - :class:`~bson.binary.Binary` subtype 3 with C# legacy byte-order
306     - :class:`uuid.UUID` in PyMongo<4; :class:`~bson.binary.Binary` subtype 4 in PyMongo>=4
307     - :class:`uuid.UUID`
308
309   * - :ref:`standard-representation-details`
310     - No
311     - :class:`~bson.binary.Binary` subtype 4
312     - :class:`uuid.UUID`
313     - :class:`uuid.UUID` in PyMongo<4; :class:`~bson.binary.Binary` subtype 3 in PyMongo>=4
314
315   * - :ref:`unspecified-representation-details`
316     - Yes, in PyMongo>=4
317     - Raise :exc:`ValueError`
318     - :class:`~bson.binary.Binary` subtype 4
319     - :class:`uuid.UUID` in PyMongo<4; :class:`~bson.binary.Binary` subtype 3 in PyMongo>=4
320
321We now detail the behavior and use-case for each supported UUID
322representation.
323
324.. _python-legacy-representation-details:
325
326``PYTHON_LEGACY``
327^^^^^^^^^^^^^^^^^
328
329.. attention:: This uuid representation should be used when reading UUIDs
330   generated by existing applications that use the Python driver
331   but **don't** explicitly set a UUID representation.
332
333.. attention:: :data:`~bson.binary.UuidRepresentation.PYTHON_LEGACY`
334   has been the default uuid representation since PyMongo 2.9.
335
336The :data:`~bson.binary.UuidRepresentation.PYTHON_LEGACY` representation
337corresponds to the legacy representation of UUIDs used by PyMongo. This
338representation conforms with
339`RFC 4122 Section 4.1.2 <https://tools.ietf.org/html/rfc4122#section-4.1.2>`_.
340
341The following example illustrates the use of this representation::
342
343  from bson.codec_options import CodecOptions, DEFAULT_CODEC_OPTIONS
344  from bson.binary import UuidRepresentation
345
346  # No configured UUID representation
347  collection = client.python_legacy.get_collection('test', codec_options=DEFAULT_CODEC_OPTIONS)
348
349  # Using UuidRepresentation.PYTHON_LEGACY
350  pylegacy_opts = CodecOptions(uuid_representation=UuidRepresentation.PYTHON_LEGACY)
351  pylegacy_collection = client.python_legacy.get_collection('test', codec_options=pylegacy_opts)
352
353  # UUIDs written by PyMongo with no UuidRepresentation configured can be queried using PYTHON_LEGACY
354  uuid_1 = uuid4()
355  collection.insert_one({'uuid': uuid_1})
356  document = pylegacy_collection.find_one({'uuid': uuid_1})
357
358  # UUIDs written using PYTHON_LEGACY can be read by PyMongo with no UuidRepresentation configured
359  uuid_2 = uuid4()
360  pylegacy_collection.insert_one({'uuid': uuid_2})
361  document = collection.find_one({'uuid': uuid_2})
362
363``PYTHON_LEGACY`` encodes native :class:`uuid.UUID` objects to
364:class:`~bson.binary.Binary` subtype 3 objects, preserving the same
365byte-order as :attr:`~uuid.UUID.bytes`::
366
367  from bson.binary import Binary
368
369  document = collection.find_one({'uuid': Binary(uuid_2.bytes, subtype=3)})
370  assert document['uuid'] == uuid_2
371
372.. _java-legacy-representation-details:
373
374``JAVA_LEGACY``
375^^^^^^^^^^^^^^^
376
377.. attention:: This UUID representation should be used when reading UUIDs
378   written to MongoDB by the legacy applications (i.e. applications that don't
379   use the ``STANDARD`` representation) using the Java driver.
380
381The :data:`~bson.binary.UuidRepresentation.JAVA_LEGACY` representation
382corresponds to the legacy representation of UUIDs used by the MongoDB Java
383Driver.
384
385.. note:: The ``JAVA_LEGACY`` representation reverses the order of bytes 0-7,
386   and bytes 8-15.
387
388As an example, consider the same UUID described in :ref:`example-legacy-uuid`.
389Let us assume that an application used the Java driver without an explicitly
390specified UUID representation to insert the example UUID
391``00112233-4455-6677-8899-aabbccddeeff`` into MongoDB. If we try to read this
392value using PyMongo with no UUID representation specified, we end up with an
393entirely different UUID::
394
395  UUID('77665544-3322-1100-ffee-ddccbbaa9988')
396
397However, if we explicitly set the representation to
398:data:`~bson.binary.UuidRepresentation.JAVA_LEGACY`, we get the correct result::
399
400  UUID('00112233-4455-6677-8899-aabbccddeeff')
401
402PyMongo uses the specified UUID representation to reorder the BSON bytes and
403load them correctly. ``JAVA_LEGACY`` encodes native :class:`uuid.UUID` objects
404to :class:`~bson.binary.Binary` subtype 3 objects, while performing the same
405byte-reordering as the legacy Java driver's UUID to BSON encoder.
406
407.. _csharp-legacy-representation-details:
408
409``CSHARP_LEGACY``
410^^^^^^^^^^^^^^^^^
411
412.. attention:: This UUID representation should be used when reading UUIDs
413   written to MongoDB by the legacy applications (i.e. applications that don't
414   use the ``STANDARD`` representation) using the C# driver.
415
416The :data:`~bson.binary.UuidRepresentation.CSHARP_LEGACY` representation
417corresponds to the legacy representation of UUIDs used by the MongoDB Java
418Driver.
419
420.. note:: The ``CSHARP_LEGACY`` representation reverses the order of bytes 0-3,
421   bytes 4-5, and bytes 6-7.
422
423As an example, consider the same UUID described in :ref:`example-legacy-uuid`.
424Let us assume that an application used the C# driver without an explicitly
425specified UUID representation to insert the example UUID
426``00112233-4455-6677-8899-aabbccddeeff`` into MongoDB. If we try to read this
427value using PyMongo with no UUID representation specified, we end up with an
428entirely different UUID::
429
430  UUID('33221100-5544-7766-8899-aabbccddeeff')
431
432However, if we explicitly set the representation to
433:data:`~bson.binary.UuidRepresentation.CSHARP_LEGACY`, we get the correct result::
434
435  UUID('00112233-4455-6677-8899-aabbccddeeff')
436
437PyMongo uses the specified UUID representation to reorder the BSON bytes and
438load them correctly. ``CSHARP_LEGACY`` encodes native :class:`uuid.UUID`
439objects to :class:`~bson.binary.Binary` subtype 3 objects, while performing
440the same byte-reordering as the legacy C# driver's UUID to BSON encoder.
441
442.. _standard-representation-details:
443
444``STANDARD``
445^^^^^^^^^^^^
446
447.. attention:: This UUID representation should be used by new applications
448   that have never stored UUIDs in MongoDB.
449
450The :data:`~bson.binary.UuidRepresentation.STANDARD` representation
451enables cross-language compatibility by ensuring the same byte-ordering
452when encoding UUIDs from all drivers. UUIDs written by a driver with this
453representation configured will be handled correctly by every other provided
454it is also configured with the ``STANDARD`` representation.
455
456``STANDARD`` encodes native :class:`uuid.UUID` objects to
457:class:`~bson.binary.Binary` subtype 4 objects.
458
459.. _unspecified-representation-details:
460
461``UNSPECIFIED``
462^^^^^^^^^^^^^^^
463
464.. attention:: Starting in PyMongo 4.0,
465   :data:`~bson.binary.UuidRepresentation.UNSPECIFIED` will be the default
466   UUID representation used by PyMongo.
467
468The :data:`~bson.binary.UuidRepresentation.UNSPECIFIED` representation
469prevents the incorrect interpretation of UUID bytes by stopping short of
470automatically converting UUID fields in BSON to native UUID types. Loading
471a UUID when using this representation returns a :class:`~bson.binary.Binary`
472object instead. If required, users can coerce the decoded
473:class:`~bson.binary.Binary` objects into native UUIDs using the
474:meth:`~bson.binary.Binary.as_uuid` method and specifying the appropriate
475representation format. The following example shows
476what this might look like for a UUID stored by the C# driver::
477
478  from bson.codec_options import CodecOptions, DEFAULT_CODEC_OPTIONS
479  from bson.binary import Binary, UuidRepresentation
480  from uuid import uuid4
481
482  # Using UuidRepresentation.CSHARP_LEGACY
483  csharp_opts = CodecOptions(uuid_representation=UuidRepresentation.CSHARP_LEGACY)
484
485  # Store a legacy C#-formatted UUID
486  input_uuid = uuid4()
487  collection = client.testdb.get_collection('test', codec_options=csharp_opts)
488  collection.insert_one({'_id': 'foo', 'uuid': input_uuid})
489
490  # Using UuidRepresentation.UNSPECIFIED
491  unspec_opts = CodecOptions(uuid_representation=UuidRepresentation.UNSPECIFIED)
492  unspec_collection = client.testdb.get_collection('test', codec_options=unspec_opts)
493
494  # UUID fields are decoded as Binary when UuidRepresentation.UNSPECIFIED is configured
495  document = unspec_collection.find_one({'_id': 'foo'})
496  decoded_field = document['uuid']
497  assert isinstance(decoded_field, Binary)
498
499  # Binary.as_uuid() can be used to coerce the decoded value to a native UUID
500  decoded_uuid = decoded_field.as_uuid(UuidRepresentation.CSHARP_LEGACY)
501  assert decoded_uuid == input_uuid
502
503Native :class:`uuid.UUID` objects cannot directly be encoded to
504:class:`~bson.binary.Binary` when the UUID representation is ``UNSPECIFIED``
505and attempting to do so will result in an exception::
506
507  unspec_collection.insert_one({'_id': 'bar', 'uuid': uuid4()})
508  Traceback (most recent call last):
509  ...
510  ValueError: cannot encode native uuid.UUID with UuidRepresentation.UNSPECIFIED. UUIDs can be manually converted to bson.Binary instances using bson.Binary.from_uuid() or a different UuidRepresentation can be configured. See the documentation for UuidRepresentation for more information.
511
512Instead, applications using :data:`~bson.binary.UuidRepresentation.UNSPECIFIED`
513must explicitly coerce a native UUID using the
514:meth:`~bson.binary.Binary.from_uuid` method::
515
516  explicit_binary = Binary.from_uuid(uuid4(), UuidRepresentation.PYTHON_LEGACY)
517  unspec_collection.insert_one({'_id': 'bar', 'uuid': explicit_binary})
518