1Migrating to the new client-based S3 API
2========================================
3
4Version of smart_open prior to 5.0.0 used the boto3 `resource API`_ for communicating with S3.
5This API was easy to integrate for smart_open developers, but this came at a cost: it was not thread- or multiprocess-safe.
6Furthermore, as smart_open supported more and more options, the transport parameter list grew, making it less maintainable.
7
8Starting with version 5.0.0, smart_open uses the `client API`_ instead of the resource API.
9Functionally, very little changes for the smart_open user.
10The only difference is in passing transport parameters to the S3 backend.
11
12More specifically, the following S3 transport parameters are no longer supported:
13
14- `multipart_upload_kwargs`
15- `object_kwargs`
16- `resource`
17- `resource_kwargs`
18- `session`
19- `singlepart_upload_kwargs`
20
21**If you weren't using the above parameters, nothing changes for you.**
22
23However, if you were using any of the above, then you need to adjust your code.
24Here are some quick recipes below.
25
26If you were previously passing `session`, then construct an S3 client from the session and pass that instead.
27For example, before:
28
29.. code-block:: python
30
31    smart_open.open('s3://bucket/key', transport_params={'session': session})
32
33After:
34
35.. code-block:: python
36
37    smart_open.open('s3://bucket/key', transport_params={'client': session.client('s3')})
38
39If you were passing `resource`, then replace the resource with a client, and pass that instead.
40For example, before:
41
42.. code-block:: python
43
44    resource = session.resource('s3', **resource_kwargs)
45    smart_open.open('s3://bucket/key', transport_params={'resource': resource})
46
47After:
48
49.. code-block:: python
50
51    client = session.client('s3')
52    smart_open.open('s3://bucket/key', transport_params={'client': client})
53
54If you were passing any of the `*_kwargs` parameters, you will need to include them in `client_kwargs`, keeping in mind the following transformations.
55
56========================== ====================================== ==========================
57Parameter name             Resource API method                    Client API function
58========================== ====================================== ==========================
59`multipart_upload_kwargs`  `S3.Object.initiate_multipart_upload`_ `S3.Client.create_multipart_upload`_
60`object_kwargs`            `S3.Object.get`_                       `S3.Client.get_object`_
61`resource_kwargs`          S3.resource                            `S3.client`_
62`singlepart_upload_kwargs` `S3.Object.put`_                       `S3.Client.put_object`_
63========================== ====================================== ==========================
64
65Most of the above is self-explanatory, with the exception of `resource_kwargs`.
66These were previously used mostly for passing a custom endpoint URL.
67
68The `client_kwargs` dict can thus contain the following members:
69
70- `S3.Client`: initializer parameters, e.g. those to pass directly to the `boto3.client` function, such as `endpoint_url`.
71- `S3.Client.create_multipart_upload`
72- `S3.Client.get_object`
73- `S3.Client.put_object`
74
75Here's a before-and-after example for connecting to a custom endpoint.  Before:
76
77.. code-block:: python
78
79    session = boto3.Session(profile_name='digitalocean')
80    resource_kwargs = {'endpoint_url': 'https://ams3.digitaloceanspaces.com'}
81    with open('s3://bucket/key.txt', 'wb', transport_params={'resource_kwarg': resource_kwargs}) as fout:
82        fout.write(b'here we stand')
83
84After:
85
86.. code-block:: python
87
88    session = boto3.Session(profile_name='digitalocean')
89    client = session.client('s3', endpoint_url='https://ams3.digitaloceanspaces.com')
90    with open('s3://bucket/key.txt', 'wb', transport_params={'client': client}) as fout:
91        fout.write(b'here we stand')
92
93See `README <README.rst>`_ and `HOWTO <howto.md>`_ for more examples.
94
95.. _resource API: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#service-resource
96.. _S3.Object.initiate_multipart_upload: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Object.initiate_multipart_upload
97.. _S3.Object.get: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.ObjectSummary.get
98.. _S3.Object.put: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.ObjectSummary.put
99
100.. _client API: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#client
101.. _S3.Client: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#client
102.. _S3.Client.create_multipart_upload: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.create_multipart_upload
103.. _S3.Client.get_object: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.get_object
104.. _S3.Client.put_object: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.put_object
105
106Migrating to the new dependency management subsystem
107====================================================
108
109Smart_open has grown over the years to cover a lot of different storages, each with a different set of library dependencies. Not everybody needs *all* of them, so to make each smart_open installation leaner and faster, version 3.0.0 introduced a new, backward-incompatible installation method:
110
111* smart_open < 3.0.0: All dependencies were installed by default. No way to select just a subset during installation.
112* smart_open >= 3.0.0: No dependencies installed by default. Install the ones you need with e.g. ``pip install smart_open[s3]`` (only AWS), or ``smart_open[all]`` (install everything = same behaviour as < 3.0.0; use this for backward compatibility).
113
114You can read more about the motivation and internal discussions for this change  `here <https://github.com/RaRe-Technologies/smart_open/issues/443>`_.
115
116Migrating to the new ``open`` function
117======================================
118
119Since 1.8.1, there is a ``smart_open.open`` function that replaces ``smart_open.smart_open``.
120The new function offers several advantages over the old one:
121
122- 100% compatible with the built-in ``open`` function (aka ``io.open``): it accepts all
123  the parameters that the built-in ``open`` accepts.
124- The default open mode is now "r", the same as for the built-in ``open``.
125  The default for the old ``smart_open.smart_open`` function used to be "rb".
126- Fully documented keyword parameters (try ``help("smart_open.open")``)
127
128The instructions below will help you migrate to the new function painlessly.
129
130First, update your imports:
131
132.. code-block:: python
133
134  >>> from smart_open import smart_open  # before
135  >>> from smart_open import open  # after
136
137In general, ``smart_open`` uses ``io.open`` directly, where possible, so if your
138code already uses ``open`` for local file I/O, then it will continue to work.
139If you want to continue using the built-in ``open`` function for e.g. debugging,
140then you can ``import smart_open`` and use ``smart_open.open``.
141
142**The default read mode is now "r" (read text).**
143If your code was implicitly relying on the default mode being "rb" (read
144binary), you'll need to update it and pass "rb" explicitly.
145
146Before:
147
148.. code-block:: python
149
150  >>> import smart_open
151  >>> smart_open.smart_open('s3://commoncrawl/robots.txt').read(32)  # 'rb' used to be the default
152  b'User-Agent: *\nDisallow: /'
153
154After:
155
156.. code-block:: python
157
158  >>> import smart_open
159  >>> smart_open.open('s3://commoncrawl/robots.txt', 'rb').read(32)
160  b'User-Agent: *\nDisallow: /'
161
162The ``ignore_extension`` keyword parameter is now called ``ignore_ext``.
163It behaves identically otherwise.
164
165The most significant change is in the handling on keyword parameters for the
166transport layer, e.g. HTTP, S3, etc. The old function accepted these directly:
167
168.. code-block:: python
169
170  >>> url = 's3://smart-open-py37-benchmark-results/test.txt'
171  >>> session = boto3.Session(profile_name='smart_open')
172  >>> smart_open.smart_open(url, 'r', session=session).read(32)
173  'first line\nsecond line\nthird lin'
174
175The new function accepts a ``transport_params`` keyword argument.  It's a dict.
176Put your transport parameters in that dictionary.
177
178.. code-block:: python
179
180  >>> url = 's3://smart-open-py37-benchmark-results/test.txt'
181  >>> params = {'session': boto3.Session(profile_name='smart_open')}
182  >>> open(url, 'r', transport_params=params).read(32)
183  'first line\nsecond line\nthird lin'
184
185Renamed parameters:
186
187- ``s3_upload`` ->  ``multipart_upload_kwargs``
188- ``s3_session`` -> ``session``
189
190Removed parameters:
191
192- ``profile_name``
193
194**The profile_name parameter has been removed.**
195Pass an entire ``boto3.Session`` object instead.
196
197Before:
198
199.. code-block:: python
200
201  >>> url = 's3://smart-open-py37-benchmark-results/test.txt'
202  >>> smart_open.smart_open(url, 'r', profile_name='smart_open').read(32)
203  'first line\nsecond line\nthird lin'
204
205After:
206
207.. code-block:: python
208
209  >>> url = 's3://smart-open-py37-benchmark-results/test.txt'
210  >>> params = {'session': boto3.Session(profile_name='smart_open')}
211  >>> open(url, 'r', transport_params=params).read(32)
212  'first line\nsecond line\nthird lin'
213
214See ``help("smart_open.open")`` for the full list of acceptable parameter names,
215or view the help online `here <https://github.com/RaRe-Technologies/smart_open/blob/master/help.txt>`__.
216
217If you pass an invalid parameter name, the ``smart_open.open`` function will warn you about it.
218Keep an eye on your logs for WARNING messages from ``smart_open``.
219