1.. _handling-uuid-data-example: 2 3Handling UUID Data 4================== 5 6PyMongo ships with built-in support for dealing with UUID types. 7It is straightforward to store native :class:`uuid.UUID` objects 8to MongoDB and retrieve them as native :class:`uuid.UUID` objects:: 9 10 from pymongo import MongoClient 11 from bson.binary import UuidRepresentation 12 from uuid import uuid4 13 14 # use the 'standard' representation for cross-language compatibility. 15 client = MongoClient(uuid_representation=UuidRepresentation.STANDARD) 16 collection = client.get_database('uuid_db').get_collection('uuid_coll') 17 18 # remove all documents from collection 19 collection.delete_many({}) 20 21 # create a native uuid object 22 uuid_obj = uuid4() 23 24 # save the native uuid object to MongoDB 25 collection.insert_one({'uuid': uuid_obj}) 26 27 # retrieve the stored uuid object from MongoDB 28 document = collection.find_one({}) 29 30 # check that the retrieved UUID matches the inserted UUID 31 assert document['uuid'] == uuid_obj 32 33Native :class:`uuid.UUID` objects can also be used as part of MongoDB 34queries:: 35 36 document = collection.find({'uuid': uuid_obj}) 37 assert document['uuid'] == uuid_obj 38 39The above examples illustrate the simplest of use-cases - one where the 40UUID is generated by, and used in the same application. However, 41the situation can be significantly more complex when dealing with a MongoDB 42deployment that contains UUIDs created by other drivers as the Java and CSharp 43drivers have historically encoded UUIDs using a byte-order that is different 44from the one used by PyMongo. Applications that require interoperability across 45these drivers must specify the appropriate 46:class:`~bson.binary.UuidRepresentation`. 47 48In the following sections, we describe how drivers have historically differed 49in their encoding of UUIDs, and how applications can use the 50:class:`~bson.binary.UuidRepresentation` configuration option to maintain 51cross-language compatibility. 52 53.. attention:: New applications that do not share a MongoDB deployment with 54 any other application and that have never stored UUIDs in MongoDB 55 should use the ``standard`` UUID representation for cross-language 56 compatibility. See :ref:`configuring-uuid-representation` for details 57 on how to configure the :class:`~bson.binary.UuidRepresentation`. 58 59.. _example-legacy-uuid: 60 61Legacy Handling of UUID Data 62---------------------------- 63 64Historically, MongoDB Drivers have used different byte-ordering 65while serializing UUID types to :class:`~bson.binary.Binary`. 66Consider, for instance, a UUID with the following canonical textual 67representation:: 68 69 00112233-4455-6677-8899-aabbccddeeff 70 71This UUID would historically be serialized by the Python driver as:: 72 73 00112233-4455-6677-8899-aabbccddeeff 74 75The same UUID would historically be serialized by the C# driver as:: 76 77 33221100-5544-7766-8899-aabbccddeeff 78 79Finally, the same UUID would historically be serialized by the Java driver as:: 80 81 77665544-3322-1100-ffee-ddccbbaa9988 82 83.. note:: For in-depth information about the the byte-order historically 84 used by different drivers, see the `Handling of Native UUID Types 85 Specification 86 <https://github.com/mongodb/specifications/blob/master/source/uuid.rst>`_. 87 88This difference in the byte-order of UUIDs encoded by different drivers can 89result in highly unintuitive behavior in some scenarios. We detail two such 90scenarios in the next sections. 91 92Scenario 1: Applications Share a MongoDB Deployment 93^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 94 95Consider the following situation: 96 97* Application ``C`` written in C# generates a UUID and uses it as the ``_id`` 98 of a document that it proceeds to insert into the ``uuid_test`` collection of 99 the ``example_db`` database. Let's assume that the canonical textual 100 representation of the generated UUID is:: 101 102 00112233-4455-6677-8899-aabbccddeeff 103 104* Application ``P`` written in Python attempts to ``find`` the document 105 written by application ``C`` in the following manner:: 106 107 from uuid import UUID 108 collection = client.example_db.uuid_test 109 result = collection.find_one({'_id': UUID('00112233-4455-6677-8899-aabbccddeeff')}) 110 111 In this instance, ``result`` will never be the document that 112 was inserted by application ``C`` in the previous step. This is because of 113 the different byte-order used by the C# driver for representing UUIDs as 114 BSON Binary. The following query, on the other hand, will successfully find 115 this document:: 116 117 result = collection.find_one({'_id': UUID('33221100-5544-7766-8899-aabbccddeeff')}) 118 119This example demonstrates how the differing byte-order used by different 120drivers can hamper interoperability. To workaround this problem, users should 121configure their ``MongoClient`` with the appropriate 122:class:`~bson.binary.UuidRepresentation` (in this case, ``client`` in application 123``P`` can be configured to use the 124:data:`~bson.binary.UuidRepresentation.CSHARP_LEGACY` representation to 125avoid the unintuitive behavior) as described in 126:ref:`configuring-uuid-representation`. 127 128Scenario 2: Round-Tripping UUIDs 129^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 130 131In the following examples, we see how using a misconfigured 132:class:`~bson.binary.UuidRepresentation` can cause an application 133to inadvertently change the :class:`~bson.binary.Binary` subtype, and in some 134cases, the bytes of the :class:`~bson.binary.Binary` field itself when 135round-tripping documents containing UUIDs. 136 137Consider the following situation:: 138 139 from bson.codec_options import CodecOptions, DEFAULT_CODEC_OPTIONS 140 from bson.binary import Binary, UuidRepresentation 141 from uuid import uuid4 142 143 # Using UuidRepresentation.PYTHON_LEGACY stores a Binary subtype-3 UUID 144 python_opts = CodecOptions(uuid_representation=UuidRepresentation.PYTHON_LEGACY) 145 input_uuid = uuid4() 146 collection = client.testdb.get_collection('test', codec_options=python_opts) 147 collection.insert_one({'_id': 'foo', 'uuid': input_uuid}) 148 assert collection.find_one({'uuid': Binary(input_uuid.bytes, 3)})['_id'] == 'foo' 149 150 # Retrieving this document using UuidRepresentation.STANDARD returns a native UUID 151 std_opts = CodecOptions(uuid_representation=UuidRepresentation.STANDARD) 152 std_collection = client.testdb.get_collection('test', codec_options=std_opts) 153 doc = std_collection.find_one({'_id': 'foo'}) 154 assert doc['uuid'] == input_uuid 155 156 # Round-tripping the retrieved document silently changes the Binary subtype to 4 157 std_collection.replace_one({'_id': 'foo'}, doc) 158 assert collection.find_one({'uuid': Binary(input_uuid.bytes, 3)}) is None 159 round_tripped_doc = collection.find_one({'uuid': Binary(input_uuid.bytes, 4)}) 160 assert doc == round_tripped_doc 161 162 163In this example, round-tripping the document using the incorrect 164:class:`~bson.binary.UuidRepresentation` (``STANDARD`` instead of 165``PYTHON_LEGACY``) changes the :class:`~bson.binary.Binary` subtype as a 166side-effect. **Note that this can also happen when the situation is reversed - 167i.e. when the original document is written using ``STANDARD`` representation 168and then round-tripped using the ``PYTHON_LEGACY`` representation.** 169 170In the next example, we see the consequences of incorrectly using a 171representation that modifies byte-order (``CSHARP_LEGACY`` or ``JAVA_LEGACY``) 172when round-tripping documents:: 173 174 from bson.codec_options import CodecOptions, DEFAULT_CODEC_OPTIONS 175 from bson.binary import Binary, UuidRepresentation 176 from uuid import uuid4 177 178 # Using UuidRepresentation.STANDARD stores a Binary subtype-4 UUID 179 std_opts = CodecOptions(uuid_representation=UuidRepresentation.STANDARD) 180 input_uuid = uuid4() 181 collection = client.testdb.get_collection('test', codec_options=std_opts) 182 collection.insert_one({'_id': 'baz', 'uuid': input_uuid}) 183 assert collection.find_one({'uuid': Binary(input_uuid.bytes, 4)})['_id'] == 'baz' 184 185 # Retrieving this document using UuidRepresentation.JAVA_LEGACY returns a native UUID 186 # without modifying the UUID byte-order 187 java_opts = CodecOptions(uuid_representation=UuidRepresentation.JAVA_LEGACY) 188 java_collection = client.testdb.get_collection('test', codec_options=java_opts) 189 doc = java_collection.find_one({'_id': 'baz'}) 190 assert doc['uuid'] == input_uuid 191 192 # Round-tripping the retrieved document silently changes the Binary bytes and subtype 193 java_collection.replace_one({'_id': 'baz'}, doc) 194 assert collection.find_one({'uuid': Binary(input_uuid.bytes, 3)}) is None 195 assert collection.find_one({'uuid': Binary(input_uuid.bytes, 4)}) is None 196 round_tripped_doc = collection.find_one({'_id': 'baz'}) 197 assert round_tripped_doc['uuid'] == Binary(input_uuid.bytes, 3).as_uuid(UuidRepresentation.JAVA_LEGACY) 198 199 200In this case, using the incorrect :class:`~bson.binary.UuidRepresentation` 201(``JAVA_LEGACY`` instead of ``STANDARD``) changes the 202:class:`~bson.binary.Binary` bytes and subtype as a side-effect. 203**Note that this happens when any representation that 204manipulates byte-order (``CSHARP_LEGACY`` or ``JAVA_LEGACY``) is incorrectly 205used to round-trip UUIDs written with ``STANDARD``. When the situation is 206reversed - i.e. when the original document is written using ``CSHARP_LEGACY`` 207or ``JAVA_LEGACY`` and then round-tripped using ``STANDARD`` - 208only the :class:`~bson.binary.Binary` subtype is changed.** 209 210.. note:: Starting in PyMongo 4.0, these issue will be resolved as 211 the ``STANDARD`` representation will decode Binary subtype 3 fields as 212 :class:`~bson.binary.Binary` objects of subtype 3 (instead of 213 :class:`uuid.UUID`), and each of the ``LEGACY_*`` representations will 214 decode Binary subtype 4 fields to :class:`~bson.binary.Binary` objects of 215 subtype 4 (instead of :class:`uuid.UUID`). 216 217.. _configuring-uuid-representation: 218 219Configuring a UUID Representation 220--------------------------------- 221 222Users can workaround the problems described above by configuring their 223applications with the appropriate :class:`~bson.binary.UuidRepresentation`. 224Configuring the representation modifies PyMongo's behavior while 225encoding :class:`uuid.UUID` objects to BSON and decoding 226Binary subtype 3 and 4 fields from BSON. 227 228Applications can set the UUID representation in one of the following ways: 229 230#. At the ``MongoClient`` level using the ``uuidRepresentation`` URI option, 231 e.g.:: 232 233 client = MongoClient("mongodb://a:27107/?uuidRepresentation=javaLegacy") 234 235 Valid values are: 236 237 .. list-table:: 238 :header-rows: 1 239 240 * - Value 241 - UUID Representation 242 243 * - ``pythonLegacy`` 244 - :ref:`python-legacy-representation-details` 245 246 * - ``javaLegacy`` 247 - :ref:`java-legacy-representation-details` 248 249 * - ``csharpLegacy`` 250 - :ref:`csharp-legacy-representation-details` 251 252 * - ``standard`` 253 - :ref:`standard-representation-details` 254 255 * - ``unspecified`` 256 - :ref:`unspecified-representation-details` 257 258#. At the ``MongoClient`` level using the ``uuidRepresentation`` kwarg 259 option, e.g.:: 260 261 from bson.binary import UuidRepresentation 262 client = MongoClient(uuidRepresentation=UuidRepresentation.PYTHON_LEGACY) 263 264#. At the ``Database`` or ``Collection`` level by supplying a suitable 265 :class:`~bson.codec_options.CodecOptions` instance, e.g.:: 266 267 from bson.codec_options import CodecOptions 268 csharp_opts = CodecOptions(uuid_representation=UuidRepresentation.CSHARP_LEGACY) 269 java_opts = CodecOptions(uuid_representation=UuidRepresentation.JAVA_LEGACY) 270 271 # Get database/collection from client with csharpLegacy UUID representation 272 csharp_database = client.get_database('csharp_db', codec_options=csharp_opts) 273 csharp_collection = client.testdb.get_collection('csharp_coll', codec_options=csharp_opts) 274 275 # Get database/collection from existing database/collection with javaLegacy UUID representation 276 java_database = csharp_database.with_options(codec_options=java_opts) 277 java_collection = csharp_collection.with_options(codec_options=java_opts) 278 279Supported UUID Representations 280------------------------------ 281 282.. list-table:: 283 :header-rows: 1 284 285 * - UUID Representation 286 - Default? 287 - Encode :class:`uuid.UUID` to 288 - Decode :class:`~bson.binary.Binary` subtype 4 to 289 - Decode :class:`~bson.binary.Binary` subtype 3 to 290 291 * - :ref:`python-legacy-representation-details` 292 - Yes, in PyMongo>=2.9,<4 293 - :class:`~bson.binary.Binary` subtype 3 with standard byte-order 294 - :class:`uuid.UUID` in PyMongo<4; :class:`~bson.binary.Binary` subtype 4 in PyMongo>=4 295 - :class:`uuid.UUID` 296 297 * - :ref:`java-legacy-representation-details` 298 - No 299 - :class:`~bson.binary.Binary` subtype 3 with Java legacy byte-order 300 - :class:`uuid.UUID` in PyMongo<4; :class:`~bson.binary.Binary` subtype 4 in PyMongo>=4 301 - :class:`uuid.UUID` 302 303 * - :ref:`csharp-legacy-representation-details` 304 - No 305 - :class:`~bson.binary.Binary` subtype 3 with C# legacy byte-order 306 - :class:`uuid.UUID` in PyMongo<4; :class:`~bson.binary.Binary` subtype 4 in PyMongo>=4 307 - :class:`uuid.UUID` 308 309 * - :ref:`standard-representation-details` 310 - No 311 - :class:`~bson.binary.Binary` subtype 4 312 - :class:`uuid.UUID` 313 - :class:`uuid.UUID` in PyMongo<4; :class:`~bson.binary.Binary` subtype 3 in PyMongo>=4 314 315 * - :ref:`unspecified-representation-details` 316 - Yes, in PyMongo>=4 317 - Raise :exc:`ValueError` 318 - :class:`~bson.binary.Binary` subtype 4 319 - :class:`uuid.UUID` in PyMongo<4; :class:`~bson.binary.Binary` subtype 3 in PyMongo>=4 320 321We now detail the behavior and use-case for each supported UUID 322representation. 323 324.. _python-legacy-representation-details: 325 326``PYTHON_LEGACY`` 327^^^^^^^^^^^^^^^^^ 328 329.. attention:: This uuid representation should be used when reading UUIDs 330 generated by existing applications that use the Python driver 331 but **don't** explicitly set a UUID representation. 332 333.. attention:: :data:`~bson.binary.UuidRepresentation.PYTHON_LEGACY` 334 has been the default uuid representation since PyMongo 2.9. 335 336The :data:`~bson.binary.UuidRepresentation.PYTHON_LEGACY` representation 337corresponds to the legacy representation of UUIDs used by PyMongo. This 338representation conforms with 339`RFC 4122 Section 4.1.2 <https://tools.ietf.org/html/rfc4122#section-4.1.2>`_. 340 341The following example illustrates the use of this representation:: 342 343 from bson.codec_options import CodecOptions, DEFAULT_CODEC_OPTIONS 344 from bson.binary import UuidRepresentation 345 346 # No configured UUID representation 347 collection = client.python_legacy.get_collection('test', codec_options=DEFAULT_CODEC_OPTIONS) 348 349 # Using UuidRepresentation.PYTHON_LEGACY 350 pylegacy_opts = CodecOptions(uuid_representation=UuidRepresentation.PYTHON_LEGACY) 351 pylegacy_collection = client.python_legacy.get_collection('test', codec_options=pylegacy_opts) 352 353 # UUIDs written by PyMongo with no UuidRepresentation configured can be queried using PYTHON_LEGACY 354 uuid_1 = uuid4() 355 collection.insert_one({'uuid': uuid_1}) 356 document = pylegacy_collection.find_one({'uuid': uuid_1}) 357 358 # UUIDs written using PYTHON_LEGACY can be read by PyMongo with no UuidRepresentation configured 359 uuid_2 = uuid4() 360 pylegacy_collection.insert_one({'uuid': uuid_2}) 361 document = collection.find_one({'uuid': uuid_2}) 362 363``PYTHON_LEGACY`` encodes native :class:`uuid.UUID` objects to 364:class:`~bson.binary.Binary` subtype 3 objects, preserving the same 365byte-order as :attr:`~uuid.UUID.bytes`:: 366 367 from bson.binary import Binary 368 369 document = collection.find_one({'uuid': Binary(uuid_2.bytes, subtype=3)}) 370 assert document['uuid'] == uuid_2 371 372.. _java-legacy-representation-details: 373 374``JAVA_LEGACY`` 375^^^^^^^^^^^^^^^ 376 377.. attention:: This UUID representation should be used when reading UUIDs 378 written to MongoDB by the legacy applications (i.e. applications that don't 379 use the ``STANDARD`` representation) using the Java driver. 380 381The :data:`~bson.binary.UuidRepresentation.JAVA_LEGACY` representation 382corresponds to the legacy representation of UUIDs used by the MongoDB Java 383Driver. 384 385.. note:: The ``JAVA_LEGACY`` representation reverses the order of bytes 0-7, 386 and bytes 8-15. 387 388As an example, consider the same UUID described in :ref:`example-legacy-uuid`. 389Let us assume that an application used the Java driver without an explicitly 390specified UUID representation to insert the example UUID 391``00112233-4455-6677-8899-aabbccddeeff`` into MongoDB. If we try to read this 392value using PyMongo with no UUID representation specified, we end up with an 393entirely different UUID:: 394 395 UUID('77665544-3322-1100-ffee-ddccbbaa9988') 396 397However, if we explicitly set the representation to 398:data:`~bson.binary.UuidRepresentation.JAVA_LEGACY`, we get the correct result:: 399 400 UUID('00112233-4455-6677-8899-aabbccddeeff') 401 402PyMongo uses the specified UUID representation to reorder the BSON bytes and 403load them correctly. ``JAVA_LEGACY`` encodes native :class:`uuid.UUID` objects 404to :class:`~bson.binary.Binary` subtype 3 objects, while performing the same 405byte-reordering as the legacy Java driver's UUID to BSON encoder. 406 407.. _csharp-legacy-representation-details: 408 409``CSHARP_LEGACY`` 410^^^^^^^^^^^^^^^^^ 411 412.. attention:: This UUID representation should be used when reading UUIDs 413 written to MongoDB by the legacy applications (i.e. applications that don't 414 use the ``STANDARD`` representation) using the C# driver. 415 416The :data:`~bson.binary.UuidRepresentation.CSHARP_LEGACY` representation 417corresponds to the legacy representation of UUIDs used by the MongoDB Java 418Driver. 419 420.. note:: The ``CSHARP_LEGACY`` representation reverses the order of bytes 0-3, 421 bytes 4-5, and bytes 6-7. 422 423As an example, consider the same UUID described in :ref:`example-legacy-uuid`. 424Let us assume that an application used the C# driver without an explicitly 425specified UUID representation to insert the example UUID 426``00112233-4455-6677-8899-aabbccddeeff`` into MongoDB. If we try to read this 427value using PyMongo with no UUID representation specified, we end up with an 428entirely different UUID:: 429 430 UUID('33221100-5544-7766-8899-aabbccddeeff') 431 432However, if we explicitly set the representation to 433:data:`~bson.binary.UuidRepresentation.CSHARP_LEGACY`, we get the correct result:: 434 435 UUID('00112233-4455-6677-8899-aabbccddeeff') 436 437PyMongo uses the specified UUID representation to reorder the BSON bytes and 438load them correctly. ``CSHARP_LEGACY`` encodes native :class:`uuid.UUID` 439objects to :class:`~bson.binary.Binary` subtype 3 objects, while performing 440the same byte-reordering as the legacy C# driver's UUID to BSON encoder. 441 442.. _standard-representation-details: 443 444``STANDARD`` 445^^^^^^^^^^^^ 446 447.. attention:: This UUID representation should be used by new applications 448 that have never stored UUIDs in MongoDB. 449 450The :data:`~bson.binary.UuidRepresentation.STANDARD` representation 451enables cross-language compatibility by ensuring the same byte-ordering 452when encoding UUIDs from all drivers. UUIDs written by a driver with this 453representation configured will be handled correctly by every other provided 454it is also configured with the ``STANDARD`` representation. 455 456``STANDARD`` encodes native :class:`uuid.UUID` objects to 457:class:`~bson.binary.Binary` subtype 4 objects. 458 459.. _unspecified-representation-details: 460 461``UNSPECIFIED`` 462^^^^^^^^^^^^^^^ 463 464.. attention:: Starting in PyMongo 4.0, 465 :data:`~bson.binary.UuidRepresentation.UNSPECIFIED` will be the default 466 UUID representation used by PyMongo. 467 468The :data:`~bson.binary.UuidRepresentation.UNSPECIFIED` representation 469prevents the incorrect interpretation of UUID bytes by stopping short of 470automatically converting UUID fields in BSON to native UUID types. Loading 471a UUID when using this representation returns a :class:`~bson.binary.Binary` 472object instead. If required, users can coerce the decoded 473:class:`~bson.binary.Binary` objects into native UUIDs using the 474:meth:`~bson.binary.Binary.as_uuid` method and specifying the appropriate 475representation format. The following example shows 476what this might look like for a UUID stored by the C# driver:: 477 478 from bson.codec_options import CodecOptions, DEFAULT_CODEC_OPTIONS 479 from bson.binary import Binary, UuidRepresentation 480 from uuid import uuid4 481 482 # Using UuidRepresentation.CSHARP_LEGACY 483 csharp_opts = CodecOptions(uuid_representation=UuidRepresentation.CSHARP_LEGACY) 484 485 # Store a legacy C#-formatted UUID 486 input_uuid = uuid4() 487 collection = client.testdb.get_collection('test', codec_options=csharp_opts) 488 collection.insert_one({'_id': 'foo', 'uuid': input_uuid}) 489 490 # Using UuidRepresentation.UNSPECIFIED 491 unspec_opts = CodecOptions(uuid_representation=UuidRepresentation.UNSPECIFIED) 492 unspec_collection = client.testdb.get_collection('test', codec_options=unspec_opts) 493 494 # UUID fields are decoded as Binary when UuidRepresentation.UNSPECIFIED is configured 495 document = unspec_collection.find_one({'_id': 'foo'}) 496 decoded_field = document['uuid'] 497 assert isinstance(decoded_field, Binary) 498 499 # Binary.as_uuid() can be used to coerce the decoded value to a native UUID 500 decoded_uuid = decoded_field.as_uuid(UuidRepresentation.CSHARP_LEGACY) 501 assert decoded_uuid == input_uuid 502 503Native :class:`uuid.UUID` objects cannot directly be encoded to 504:class:`~bson.binary.Binary` when the UUID representation is ``UNSPECIFIED`` 505and attempting to do so will result in an exception:: 506 507 unspec_collection.insert_one({'_id': 'bar', 'uuid': uuid4()}) 508 Traceback (most recent call last): 509 ... 510 ValueError: cannot encode native uuid.UUID with UuidRepresentation.UNSPECIFIED. UUIDs can be manually converted to bson.Binary instances using bson.Binary.from_uuid() or a different UuidRepresentation can be configured. See the documentation for UuidRepresentation for more information. 511 512Instead, applications using :data:`~bson.binary.UuidRepresentation.UNSPECIFIED` 513must explicitly coerce a native UUID using the 514:meth:`~bson.binary.Binary.from_uuid` method:: 515 516 explicit_binary = Binary.from_uuid(uuid4(), UuidRepresentation.PYTHON_LEGACY) 517 unspec_collection.insert_one({'_id': 'bar', 'uuid': explicit_binary}) 518