1High Availability and PyMongo 2============================= 3 4PyMongo makes it easy to write highly available applications whether 5you use a `single replica set <http://dochub.mongodb.org/core/rs>`_ 6or a `large sharded cluster 7<http://www.mongodb.org/display/DOCS/Sharding+Introduction>`_. 8 9Connecting to a Replica Set 10--------------------------- 11 12PyMongo makes working with `replica sets 13<http://dochub.mongodb.org/core/rs>`_ easy. Here we'll launch a new 14replica set and show how to handle both initialization and normal 15connections with PyMongo. 16 17.. mongodoc:: rs 18 19Starting a Replica Set 20~~~~~~~~~~~~~~~~~~~~~~ 21 22The main `replica set documentation 23<http://dochub.mongodb.org/core/rs>`_ contains extensive information 24about setting up a new replica set or migrating an existing MongoDB 25setup, be sure to check that out. Here, we'll just do the bare minimum 26to get a three node replica set setup locally. 27 28.. warning:: Replica sets should always use multiple nodes in 29 production - putting all set members on the same physical node is 30 only recommended for testing and development. 31 32We start three ``mongod`` processes, each on a different port and with 33a different dbpath, but all using the same replica set name "foo". 34 35.. code-block:: bash 36 37 $ mkdir -p /data/db0 /data/db1 /data/db2 38 $ mongod --port 27017 --dbpath /data/db0 --replSet foo 39 40.. code-block:: bash 41 42 $ mongod --port 27018 --dbpath /data/db1 --replSet foo 43 44.. code-block:: bash 45 46 $ mongod --port 27019 --dbpath /data/db2 --replSet foo 47 48Initializing the Set 49~~~~~~~~~~~~~~~~~~~~ 50 51At this point all of our nodes are up and running, but the set has yet 52to be initialized. Until the set is initialized no node will become 53the primary, and things are essentially "offline". 54 55To initialize the set we need to connect to a single node and run the 56initiate command:: 57 58 >>> from pymongo import MongoClient 59 >>> c = MongoClient('localhost', 27017) 60 61.. note:: We could have connected to any of the other nodes instead, 62 but only the node we initiate from is allowed to contain any 63 initial data. 64 65After connecting, we run the initiate command to get things started:: 66 67 >>> config = {'_id': 'foo', 'members': [ 68 ... {'_id': 0, 'host': 'localhost:27017'}, 69 ... {'_id': 1, 'host': 'localhost:27018'}, 70 ... {'_id': 2, 'host': 'localhost:27019'}]} 71 >>> c.admin.command("replSetInitiate", config) 72 {'ok': 1.0, ...} 73 74The three ``mongod`` servers we started earlier will now coordinate 75and come online as a replica set. 76 77Connecting to a Replica Set 78~~~~~~~~~~~~~~~~~~~~~~~~~~~ 79 80The initial connection as made above is a special case for an 81uninitialized replica set. Normally we'll want to connect 82differently. A connection to a replica set can be made using the 83:meth:`~pymongo.mongo_client.MongoClient` constructor, specifying 84one or more members of the set, along with the replica set name. Any of 85the following connects to the replica set we just created:: 86 87 >>> MongoClient('localhost', replicaset='foo') 88 MongoClient(host=['localhost:27017'], replicaset='foo', ...) 89 >>> MongoClient('localhost:27018', replicaset='foo') 90 MongoClient(['localhost:27018'], replicaset='foo', ...) 91 >>> MongoClient('localhost', 27019, replicaset='foo') 92 MongoClient(['localhost:27019'], replicaset='foo', ...) 93 >>> MongoClient('mongodb://localhost:27017,localhost:27018/?replicaSet=foo') 94 MongoClient(['localhost:27017', 'localhost:27018'], replicaset='foo', ...) 95 96The addresses passed to :meth:`~pymongo.mongo_client.MongoClient` are called 97the *seeds*. As long as at least one of the seeds is online, MongoClient 98discovers all the members in the replica set, and determines which is the 99current primary and which are secondaries or arbiters. Each seed must be the 100address of a single mongod. Multihomed and round robin DNS addresses are 101**not** supported. 102 103The :class:`~pymongo.mongo_client.MongoClient` constructor is non-blocking: 104the constructor returns immediately while the client connects to the replica 105set using background threads. Note how, if you create a client and immediately 106print the string representation of its 107:attr:`~pymongo.mongo_client.MongoClient.nodes` attribute, the list may be 108empty initially. If you wait a moment, MongoClient discovers the whole replica 109set:: 110 111 >>> from time import sleep 112 >>> c = MongoClient(replicaset='foo'); print(c.nodes); sleep(0.1); print(c.nodes) 113 frozenset([]) 114 frozenset([(u'localhost', 27019), (u'localhost', 27017), (u'localhost', 27018)]) 115 116You need not wait for replica set discovery in your application, however. 117If you need to do any operation with a MongoClient, such as a 118:meth:`~pymongo.collection.Collection.find` or an 119:meth:`~pymongo.collection.Collection.insert_one`, the client waits to discover 120a suitable member before it attempts the operation. 121 122Handling Failover 123~~~~~~~~~~~~~~~~~ 124 125When a failover occurs, PyMongo will automatically attempt to find the 126new primary node and perform subsequent operations on that node. This 127can't happen completely transparently, however. Here we'll perform an 128example failover to illustrate how everything behaves. First, we'll 129connect to the replica set and perform a couple of basic operations:: 130 131 >>> db = MongoClient("localhost", replicaSet='foo').test 132 >>> db.test.insert_one({"x": 1}).inserted_id 133 ObjectId('...') 134 >>> db.test.find_one() 135 {u'x': 1, u'_id': ObjectId('...')} 136 137By checking the host and port, we can see that we're connected to 138*localhost:27017*, which is the current primary:: 139 140 >>> db.client.address 141 ('localhost', 27017) 142 143Now let's bring down that node and see what happens when we run our 144query again:: 145 146 >>> db.test.find_one() 147 Traceback (most recent call last): 148 pymongo.errors.AutoReconnect: ... 149 150We get an :class:`~pymongo.errors.AutoReconnect` exception. This means 151that the driver was not able to connect to the old primary (which 152makes sense, as we killed the server), but that it will attempt to 153automatically reconnect on subsequent operations. When this exception 154is raised our application code needs to decide whether to retry the 155operation or to simply continue, accepting the fact that the operation 156might have failed. 157 158On subsequent attempts to run the query we might continue to see this 159exception. Eventually, however, the replica set will failover and 160elect a new primary (this should take no more than a couple of seconds in 161general). At that point the driver will connect to the new primary and 162the operation will succeed:: 163 164 >>> db.test.find_one() 165 {u'x': 1, u'_id': ObjectId('...')} 166 >>> db.client.address 167 ('localhost', 27018) 168 169Bring the former primary back up. It will rejoin the set as a secondary. 170Now we can move to the next section: distributing reads to secondaries. 171 172.. _secondary-reads: 173 174Secondary Reads 175~~~~~~~~~~~~~~~ 176 177By default an instance of MongoClient sends queries to 178the primary member of the replica set. To use secondaries for queries 179we have to change the read preference:: 180 181 >>> client = MongoClient( 182 ... 'localhost:27017', 183 ... replicaSet='foo', 184 ... readPreference='secondaryPreferred') 185 >>> client.read_preference 186 SecondaryPreferred(tag_sets=None) 187 188Now all queries will be sent to the secondary members of the set. If there are 189no secondary members the primary will be used as a fallback. If you have 190queries you would prefer to never send to the primary you can specify that 191using the ``secondary`` read preference. 192 193By default the read preference of a :class:`~pymongo.database.Database` is 194inherited from its MongoClient, and the read preference of a 195:class:`~pymongo.collection.Collection` is inherited from its Database. To use 196a different read preference use the 197:meth:`~pymongo.mongo_client.MongoClient.get_database` method, or the 198:meth:`~pymongo.database.Database.get_collection` method:: 199 200 >>> from pymongo import ReadPreference 201 >>> client.read_preference 202 SecondaryPreferred(tag_sets=None) 203 >>> db = client.get_database('test', read_preference=ReadPreference.SECONDARY) 204 >>> db.read_preference 205 Secondary(tag_sets=None) 206 >>> coll = db.get_collection('test', read_preference=ReadPreference.PRIMARY) 207 >>> coll.read_preference 208 Primary() 209 210You can also change the read preference of an existing 211:class:`~pymongo.collection.Collection` with the 212:meth:`~pymongo.collection.Collection.with_options` method:: 213 214 >>> coll2 = coll.with_options(read_preference=ReadPreference.NEAREST) 215 >>> coll.read_preference 216 Primary() 217 >>> coll2.read_preference 218 Nearest(tag_sets=None) 219 220Note that since most database commands can only be sent to the primary of a 221replica set, the :meth:`~pymongo.database.Database.command` method does not obey 222the Database's :attr:`~pymongo.database.Database.read_preference`, but you can 223pass an explicit read preference to the method:: 224 225 >>> db.command('dbstats', read_preference=ReadPreference.NEAREST) 226 {...} 227 228Reads are configured using three options: **read preference**, **tag sets**, 229and **local threshold**. 230 231**Read preference**: 232 233Read preference is configured using one of the classes from 234:mod:`~pymongo.read_preferences` (:class:`~pymongo.read_preferences.Primary`, 235:class:`~pymongo.read_preferences.PrimaryPreferred`, 236:class:`~pymongo.read_preferences.Secondary`, 237:class:`~pymongo.read_preferences.SecondaryPreferred`, or 238:class:`~pymongo.read_preferences.Nearest`). For convenience, we also provide 239:class:`~pymongo.read_preferences.ReadPreference` with the following 240attributes: 241 242- ``PRIMARY``: Read from the primary. This is the default read preference, 243 and provides the strongest consistency. If no primary is available, raise 244 :class:`~pymongo.errors.AutoReconnect`. 245 246- ``PRIMARY_PREFERRED``: Read from the primary if available, otherwise read 247 from a secondary. 248 249- ``SECONDARY``: Read from a secondary. If no matching secondary is available, 250 raise :class:`~pymongo.errors.AutoReconnect`. 251 252- ``SECONDARY_PREFERRED``: Read from a secondary if available, otherwise 253 from the primary. 254 255- ``NEAREST``: Read from any available member. 256 257**Tag sets**: 258 259Replica-set members can be `tagged 260<http://www.mongodb.org/display/DOCS/Data+Center+Awareness>`_ according to any 261criteria you choose. By default, PyMongo ignores tags when 262choosing a member to read from, but your read preference can be configured with 263a ``tag_sets`` parameter. ``tag_sets`` must be a list of dictionaries, each 264dict providing tag values that the replica set member must match. 265PyMongo tries each set of tags in turn until it finds a set of 266tags with at least one matching member. For example, to prefer reads from the 267New York data center, but fall back to the San Francisco data center, tag your 268replica set members according to their location and create a 269MongoClient like so:: 270 271 >>> from pymongo.read_preferences import Secondary 272 >>> db = client.get_database( 273 ... 'test', read_preference=Secondary([{'dc': 'ny'}, {'dc': 'sf'}])) 274 >>> db.read_preference 275 Secondary(tag_sets=[{'dc': 'ny'}, {'dc': 'sf'}]) 276 277MongoClient tries to find secondaries in New York, then San Francisco, 278and raises :class:`~pymongo.errors.AutoReconnect` if none are available. As an 279additional fallback, specify a final, empty tag set, ``{}``, which means "read 280from any member that matches the mode, ignoring tags." 281 282See :mod:`~pymongo.read_preferences` for more information. 283 284.. _distributes reads to secondaries: 285 286**Local threshold**: 287 288If multiple members match the read preference and tag sets, PyMongo reads 289from among the nearest members, chosen according to ping time. By default, 290only members whose ping times are within 15 milliseconds of the nearest 291are used for queries. You can choose to distribute reads among members with 292higher latencies by setting ``localThresholdMS`` to a larger 293number:: 294 295 >>> client = pymongo.MongoClient( 296 ... replicaSet='repl0', 297 ... readPreference='secondaryPreferred', 298 ... localThresholdMS=35) 299 300In this case, PyMongo distributes reads among matching members within 35 301milliseconds of the closest member's ping time. 302 303.. note:: ``localThresholdMS`` is ignored when talking to a 304 replica set *through* a mongos. The equivalent is the localThreshold_ command 305 line option. 306 307.. _localThreshold: http://docs.mongodb.org/manual/reference/mongos/#cmdoption--localThreshold 308 309.. _health-monitoring: 310 311Health Monitoring 312''''''''''''''''' 313 314When MongoClient is initialized it launches background threads to 315monitor the replica set for changes in: 316 317* Health: detect when a member goes down or comes up, or if a different member 318 becomes primary 319* Configuration: detect when members are added or removed, and detect changes 320 in members' tags 321* Latency: track a moving average of each member's ping time 322 323Replica-set monitoring ensures queries are continually routed to the proper 324members as the state of the replica set changes. 325 326.. _mongos-load-balancing: 327 328mongos Load Balancing 329--------------------- 330 331An instance of :class:`~pymongo.mongo_client.MongoClient` can be configured 332with a list of addresses of mongos servers: 333 334 >>> client = MongoClient('mongodb://host1,host2,host3') 335 336Each member of the list must be a single mongos server. Multihomed and round 337robin DNS addresses are **not** supported. The client continuously 338monitors all the mongoses' availability, and its network latency to each. 339 340PyMongo distributes operations evenly among the set of mongoses within its 341``localThresholdMS`` (similar to how it `distributes reads to secondaries`_ 342in a replica set). By default the threshold is 15 ms. 343 344The lowest-latency server, and all servers with latencies no more than 345``localThresholdMS`` beyond the lowest-latency server's, receive 346operations equally. For example, if we have three mongoses: 347 348 - host1: 20 ms 349 - host2: 35 ms 350 - host3: 40 ms 351 352By default the ``localThresholdMS`` is 15 ms, so PyMongo uses host1 and host2 353evenly. It uses host1 because its network latency to the driver is shortest. It 354uses host2 because its latency is within 15 ms of the lowest-latency server's. 355But it excuses host3: host3 is 20ms beyond the lowest-latency server. 356 357If we set ``localThresholdMS`` to 30 ms all servers are within the threshold: 358 359 >>> client = MongoClient('mongodb://host1,host2,host3/?localThresholdMS=30') 360 361.. warning:: Do **not** connect PyMongo to a pool of mongos instances through a 362 load balancer. A single socket connection must always be routed to the same 363 mongos instance for proper cursor support. 364