1.. Places parent toc into the sidebar
2
3:parenttoc: True
4
5Parallelism, resource management, and configuration
6===================================================
7
8.. _parallelism:
9
10Parallelism
11-----------
12
13Some scikit-learn estimators and utilities can parallelize costly operations
14using multiple CPU cores, thanks to the following components:
15
16- via the `joblib <https://joblib.readthedocs.io/en/latest/>`_ library. In
17  this case the number of threads or processes can be controlled with the
18  ``n_jobs`` parameter.
19- via OpenMP, used in C or Cython code.
20
21In addition, some of the numpy routines that are used internally by
22scikit-learn may also be parallelized if numpy is installed with specific
23numerical libraries such as MKL, OpenBLAS, or BLIS.
24
25We describe these 3 scenarios in the following subsections.
26
27Joblib-based parallelism
28........................
29
30When the underlying implementation uses joblib, the number of workers
31(threads or processes) that are spawned in parallel can be controlled via the
32``n_jobs`` parameter.
33
34.. note::
35
36    Where (and how) parallelization happens in the estimators is currently
37    poorly documented. Please help us by improving our docs and tackle `issue
38    14228 <https://github.com/scikit-learn/scikit-learn/issues/14228>`_!
39
40Joblib is able to support both multi-processing and multi-threading. Whether
41joblib chooses to spawn a thread or a process depends on the **backend**
42that it's using.
43
44Scikit-learn generally relies on the ``loky`` backend, which is joblib's
45default backend. Loky is a multi-processing backend. When doing
46multi-processing, in order to avoid duplicating the memory in each process
47(which isn't reasonable with big datasets), joblib will create a `memmap
48<https://docs.scipy.org/doc/numpy/reference/generated/numpy.memmap.html>`_
49that all processes can share, when the data is bigger than 1MB.
50
51In some specific cases (when the code that is run in parallel releases the
52GIL), scikit-learn will indicate to ``joblib`` that a multi-threading
53backend is preferable.
54
55As a user, you may control the backend that joblib will use (regardless of
56what scikit-learn recommends) by using a context manager::
57
58    from joblib import parallel_backend
59
60    with parallel_backend('threading', n_jobs=2):
61        # Your scikit-learn code here
62
63Please refer to the `joblib's docs
64<https://joblib.readthedocs.io/en/latest/parallel.html#thread-based-parallelism-vs-process-based-parallelism>`_
65for more details.
66
67In practice, whether parallelism is helpful at improving runtime depends on
68many factors. It is usually a good idea to experiment rather than assuming
69that increasing the number of workers is always a good thing. In some cases
70it can be highly detrimental to performance to run multiple copies of some
71estimators or functions in parallel (see oversubscription below).
72
73OpenMP-based parallelism
74........................
75
76OpenMP is used to parallelize code written in Cython or C, relying on
77multi-threading exclusively. By default (and unless joblib is trying to
78avoid oversubscription), the implementation will use as many threads as
79possible.
80
81You can control the exact number of threads that are used via the
82``OMP_NUM_THREADS`` environment variable:
83
84.. prompt:: bash $
85
86    OMP_NUM_THREADS=4 python my_script.py
87
88Parallel Numpy routines from numerical libraries
89................................................
90
91Scikit-learn relies heavily on NumPy and SciPy, which internally call
92multi-threaded linear algebra routines implemented in libraries such as MKL,
93OpenBLAS or BLIS.
94
95The number of threads used by the OpenBLAS, MKL or BLIS libraries can be set
96via the ``MKL_NUM_THREADS``, ``OPENBLAS_NUM_THREADS``, and
97``BLIS_NUM_THREADS`` environment variables.
98
99Please note that scikit-learn has no direct control over these
100implementations. Scikit-learn solely relies on Numpy and Scipy.
101
102.. note::
103    At the time of writing (2019), NumPy and SciPy packages distributed on
104    pypi.org (used by ``pip``) and on the conda-forge channel are linked
105    with OpenBLAS, while conda packages shipped on the "defaults" channel
106    from anaconda.org are linked by default with MKL.
107
108
109Oversubscription: spawning too many threads
110...........................................
111
112It is generally recommended to avoid using significantly more processes or
113threads than the number of CPUs on a machine. Over-subscription happens when
114a program is running too many threads at the same time.
115
116Suppose you have a machine with 8 CPUs. Consider a case where you're running
117a :class:`~sklearn.model_selection.GridSearchCV` (parallelized with joblib)
118with ``n_jobs=8`` over a
119:class:`~sklearn.ensemble.HistGradientBoostingClassifier` (parallelized with
120OpenMP). Each instance of
121:class:`~sklearn.ensemble.HistGradientBoostingClassifier` will spawn 8 threads
122(since you have 8 CPUs). That's a total of ``8 * 8 = 64`` threads, which
123leads to oversubscription of physical CPU resources and to scheduling
124overhead.
125
126Oversubscription can arise in the exact same fashion with parallelized
127routines from MKL, OpenBLAS or BLIS that are nested in joblib calls.
128
129Starting from ``joblib >= 0.14``, when the ``loky`` backend is used (which
130is the default), joblib will tell its child **processes** to limit the
131number of threads they can use, so as to avoid oversubscription. In practice
132the heuristic that joblib uses is to tell the processes to use ``max_threads
133= n_cpus // n_jobs``, via their corresponding environment variable. Back to
134our example from above, since the joblib backend of
135:class:`~sklearn.model_selection.GridSearchCV` is ``loky``, each process will
136only be able to use 1 thread instead of 8, thus mitigating the
137oversubscription issue.
138
139Note that:
140
141- Manually setting one of the environment variables (``OMP_NUM_THREADS``,
142  ``MKL_NUM_THREADS``, ``OPENBLAS_NUM_THREADS``, or ``BLIS_NUM_THREADS``)
143  will take precedence over what joblib tries to do. The total number of
144  threads will be ``n_jobs * <LIB>_NUM_THREADS``. Note that setting this
145  limit will also impact your computations in the main process, which will
146  only use ``<LIB>_NUM_THREADS``. Joblib exposes a context manager for
147  finer control over the number of threads in its workers (see joblib docs
148  linked below).
149- Joblib is currently unable to avoid oversubscription in a
150  multi-threading context. It can only do so with the ``loky`` backend
151  (which spawns processes).
152
153You will find additional details about joblib mitigation of oversubscription
154in `joblib documentation
155<https://joblib.readthedocs.io/en/latest/parallel.html#avoiding-over-subscription-of-cpu-ressources>`_.
156
157
158Configuration switches
159-----------------------
160
161Python runtime
162..............
163
164:func:`sklearn.set_config` controls the following behaviors:
165
166:assume_finite:
167
168    used to skip validation, which enables faster computations but may
169    lead to segmentation faults if the data contains NaNs.
170
171:working_memory:
172
173    the optimal size of temporary arrays used by some algorithms.
174
175.. _environment_variable:
176
177Environment variables
178......................
179
180These environment variables should be set before importing scikit-learn.
181
182:SKLEARN_SITE_JOBLIB:
183
184    When this environment variable is set to a non zero value,
185    scikit-learn uses the site joblib rather than its vendored version.
186    Consequently, joblib must be installed for scikit-learn to run.
187    Note that using the site joblib is at your own risks: the versions of
188    scikit-learn and joblib need to be compatible. Currently, joblib 0.11+
189    is supported. In addition, dumps from joblib.Memory might be incompatible,
190    and you might loose some caches and have to redownload some datasets.
191
192    .. deprecated:: 0.21
193
194       As of version 0.21 this parameter has no effect, vendored joblib was
195       removed and site joblib is always used.
196
197:SKLEARN_ASSUME_FINITE:
198
199    Sets the default value for the `assume_finite` argument of
200    :func:`sklearn.set_config`.
201
202:SKLEARN_WORKING_MEMORY:
203
204    Sets the default value for the `working_memory` argument of
205    :func:`sklearn.set_config`.
206
207:SKLEARN_SEED:
208
209    Sets the seed of the global random generator when running the tests,
210    for reproducibility.
211
212:SKLEARN_SKIP_NETWORK_TESTS:
213
214    When this environment variable is set to a non zero value, the tests
215    that need network access are skipped. When this environment variable is
216    not set then network tests are skipped.
217
218:SKLEARN_ENABLE_DEBUG_CYTHON_DIRECTIVES:
219
220    When this environment variable is set to a non zero value, the `Cython`
221    derivative, `boundscheck` is set to `True`. This is useful for finding
222    segfaults.
223