1Migrating to the new client-based S3 API 2======================================== 3 4Version of smart_open prior to 5.0.0 used the boto3 `resource API`_ for communicating with S3. 5This API was easy to integrate for smart_open developers, but this came at a cost: it was not thread- or multiprocess-safe. 6Furthermore, as smart_open supported more and more options, the transport parameter list grew, making it less maintainable. 7 8Starting with version 5.0.0, smart_open uses the `client API`_ instead of the resource API. 9Functionally, very little changes for the smart_open user. 10The only difference is in passing transport parameters to the S3 backend. 11 12More specifically, the following S3 transport parameters are no longer supported: 13 14- `multipart_upload_kwargs` 15- `object_kwargs` 16- `resource` 17- `resource_kwargs` 18- `session` 19- `singlepart_upload_kwargs` 20 21**If you weren't using the above parameters, nothing changes for you.** 22 23However, if you were using any of the above, then you need to adjust your code. 24Here are some quick recipes below. 25 26If you were previously passing `session`, then construct an S3 client from the session and pass that instead. 27For example, before: 28 29.. code-block:: python 30 31 smart_open.open('s3://bucket/key', transport_params={'session': session}) 32 33After: 34 35.. code-block:: python 36 37 smart_open.open('s3://bucket/key', transport_params={'client': session.client('s3')}) 38 39If you were passing `resource`, then replace the resource with a client, and pass that instead. 40For example, before: 41 42.. code-block:: python 43 44 resource = session.resource('s3', **resource_kwargs) 45 smart_open.open('s3://bucket/key', transport_params={'resource': resource}) 46 47After: 48 49.. code-block:: python 50 51 client = session.client('s3') 52 smart_open.open('s3://bucket/key', transport_params={'client': client}) 53 54If you were passing any of the `*_kwargs` parameters, you will need to include them in `client_kwargs`, keeping in mind the following transformations. 55 56========================== ====================================== ========================== 57Parameter name Resource API method Client API function 58========================== ====================================== ========================== 59`multipart_upload_kwargs` `S3.Object.initiate_multipart_upload`_ `S3.Client.create_multipart_upload`_ 60`object_kwargs` `S3.Object.get`_ `S3.Client.get_object`_ 61`resource_kwargs` S3.resource `S3.client`_ 62`singlepart_upload_kwargs` `S3.Object.put`_ `S3.Client.put_object`_ 63========================== ====================================== ========================== 64 65Most of the above is self-explanatory, with the exception of `resource_kwargs`. 66These were previously used mostly for passing a custom endpoint URL. 67 68The `client_kwargs` dict can thus contain the following members: 69 70- `S3.Client`: initializer parameters, e.g. those to pass directly to the `boto3.client` function, such as `endpoint_url`. 71- `S3.Client.create_multipart_upload` 72- `S3.Client.get_object` 73- `S3.Client.put_object` 74 75Here's a before-and-after example for connecting to a custom endpoint. Before: 76 77.. code-block:: python 78 79 session = boto3.Session(profile_name='digitalocean') 80 resource_kwargs = {'endpoint_url': 'https://ams3.digitaloceanspaces.com'} 81 with open('s3://bucket/key.txt', 'wb', transport_params={'resource_kwarg': resource_kwargs}) as fout: 82 fout.write(b'here we stand') 83 84After: 85 86.. code-block:: python 87 88 session = boto3.Session(profile_name='digitalocean') 89 client = session.client('s3', endpoint_url='https://ams3.digitaloceanspaces.com') 90 with open('s3://bucket/key.txt', 'wb', transport_params={'client': client}) as fout: 91 fout.write(b'here we stand') 92 93See `README <README.rst>`_ and `HOWTO <howto.md>`_ for more examples. 94 95.. _resource API: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#service-resource 96.. _S3.Object.initiate_multipart_upload: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Object.initiate_multipart_upload 97.. _S3.Object.get: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.ObjectSummary.get 98.. _S3.Object.put: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.ObjectSummary.put 99 100.. _client API: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#client 101.. _S3.Client: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#client 102.. _S3.Client.create_multipart_upload: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.create_multipart_upload 103.. _S3.Client.get_object: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.get_object 104.. _S3.Client.put_object: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.put_object 105 106Migrating to the new dependency management subsystem 107==================================================== 108 109Smart_open has grown over the years to cover a lot of different storages, each with a different set of library dependencies. Not everybody needs *all* of them, so to make each smart_open installation leaner and faster, version 3.0.0 introduced a new, backward-incompatible installation method: 110 111* smart_open < 3.0.0: All dependencies were installed by default. No way to select just a subset during installation. 112* smart_open >= 3.0.0: No dependencies installed by default. Install the ones you need with e.g. ``pip install smart_open[s3]`` (only AWS), or ``smart_open[all]`` (install everything = same behaviour as < 3.0.0; use this for backward compatibility). 113 114You can read more about the motivation and internal discussions for this change `here <https://github.com/RaRe-Technologies/smart_open/issues/443>`_. 115 116Migrating to the new ``open`` function 117====================================== 118 119Since 1.8.1, there is a ``smart_open.open`` function that replaces ``smart_open.smart_open``. 120The new function offers several advantages over the old one: 121 122- 100% compatible with the built-in ``open`` function (aka ``io.open``): it accepts all 123 the parameters that the built-in ``open`` accepts. 124- The default open mode is now "r", the same as for the built-in ``open``. 125 The default for the old ``smart_open.smart_open`` function used to be "rb". 126- Fully documented keyword parameters (try ``help("smart_open.open")``) 127 128The instructions below will help you migrate to the new function painlessly. 129 130First, update your imports: 131 132.. code-block:: python 133 134 >>> from smart_open import smart_open # before 135 >>> from smart_open import open # after 136 137In general, ``smart_open`` uses ``io.open`` directly, where possible, so if your 138code already uses ``open`` for local file I/O, then it will continue to work. 139If you want to continue using the built-in ``open`` function for e.g. debugging, 140then you can ``import smart_open`` and use ``smart_open.open``. 141 142**The default read mode is now "r" (read text).** 143If your code was implicitly relying on the default mode being "rb" (read 144binary), you'll need to update it and pass "rb" explicitly. 145 146Before: 147 148.. code-block:: python 149 150 >>> import smart_open 151 >>> smart_open.smart_open('s3://commoncrawl/robots.txt').read(32) # 'rb' used to be the default 152 b'User-Agent: *\nDisallow: /' 153 154After: 155 156.. code-block:: python 157 158 >>> import smart_open 159 >>> smart_open.open('s3://commoncrawl/robots.txt', 'rb').read(32) 160 b'User-Agent: *\nDisallow: /' 161 162The ``ignore_extension`` keyword parameter is now called ``ignore_ext``. 163It behaves identically otherwise. 164 165The most significant change is in the handling on keyword parameters for the 166transport layer, e.g. HTTP, S3, etc. The old function accepted these directly: 167 168.. code-block:: python 169 170 >>> url = 's3://smart-open-py37-benchmark-results/test.txt' 171 >>> session = boto3.Session(profile_name='smart_open') 172 >>> smart_open.smart_open(url, 'r', session=session).read(32) 173 'first line\nsecond line\nthird lin' 174 175The new function accepts a ``transport_params`` keyword argument. It's a dict. 176Put your transport parameters in that dictionary. 177 178.. code-block:: python 179 180 >>> url = 's3://smart-open-py37-benchmark-results/test.txt' 181 >>> params = {'session': boto3.Session(profile_name='smart_open')} 182 >>> open(url, 'r', transport_params=params).read(32) 183 'first line\nsecond line\nthird lin' 184 185Renamed parameters: 186 187- ``s3_upload`` -> ``multipart_upload_kwargs`` 188- ``s3_session`` -> ``session`` 189 190Removed parameters: 191 192- ``profile_name`` 193 194**The profile_name parameter has been removed.** 195Pass an entire ``boto3.Session`` object instead. 196 197Before: 198 199.. code-block:: python 200 201 >>> url = 's3://smart-open-py37-benchmark-results/test.txt' 202 >>> smart_open.smart_open(url, 'r', profile_name='smart_open').read(32) 203 'first line\nsecond line\nthird lin' 204 205After: 206 207.. code-block:: python 208 209 >>> url = 's3://smart-open-py37-benchmark-results/test.txt' 210 >>> params = {'session': boto3.Session(profile_name='smart_open')} 211 >>> open(url, 'r', transport_params=params).read(32) 212 'first line\nsecond line\nthird lin' 213 214See ``help("smart_open.open")`` for the full list of acceptable parameter names, 215or view the help online `here <https://github.com/RaRe-Technologies/smart_open/blob/master/help.txt>`__. 216 217If you pass an invalid parameter name, the ``smart_open.open`` function will warn you about it. 218Keep an eye on your logs for WARNING messages from ``smart_open``. 219