1.. Places parent toc into the sidebar 2 3:parenttoc: True 4 5Parallelism, resource management, and configuration 6=================================================== 7 8.. _parallelism: 9 10Parallelism 11----------- 12 13Some scikit-learn estimators and utilities can parallelize costly operations 14using multiple CPU cores, thanks to the following components: 15 16- via the `joblib <https://joblib.readthedocs.io/en/latest/>`_ library. In 17 this case the number of threads or processes can be controlled with the 18 ``n_jobs`` parameter. 19- via OpenMP, used in C or Cython code. 20 21In addition, some of the numpy routines that are used internally by 22scikit-learn may also be parallelized if numpy is installed with specific 23numerical libraries such as MKL, OpenBLAS, or BLIS. 24 25We describe these 3 scenarios in the following subsections. 26 27Joblib-based parallelism 28........................ 29 30When the underlying implementation uses joblib, the number of workers 31(threads or processes) that are spawned in parallel can be controlled via the 32``n_jobs`` parameter. 33 34.. note:: 35 36 Where (and how) parallelization happens in the estimators is currently 37 poorly documented. Please help us by improving our docs and tackle `issue 38 14228 <https://github.com/scikit-learn/scikit-learn/issues/14228>`_! 39 40Joblib is able to support both multi-processing and multi-threading. Whether 41joblib chooses to spawn a thread or a process depends on the **backend** 42that it's using. 43 44Scikit-learn generally relies on the ``loky`` backend, which is joblib's 45default backend. Loky is a multi-processing backend. When doing 46multi-processing, in order to avoid duplicating the memory in each process 47(which isn't reasonable with big datasets), joblib will create a `memmap 48<https://docs.scipy.org/doc/numpy/reference/generated/numpy.memmap.html>`_ 49that all processes can share, when the data is bigger than 1MB. 50 51In some specific cases (when the code that is run in parallel releases the 52GIL), scikit-learn will indicate to ``joblib`` that a multi-threading 53backend is preferable. 54 55As a user, you may control the backend that joblib will use (regardless of 56what scikit-learn recommends) by using a context manager:: 57 58 from joblib import parallel_backend 59 60 with parallel_backend('threading', n_jobs=2): 61 # Your scikit-learn code here 62 63Please refer to the `joblib's docs 64<https://joblib.readthedocs.io/en/latest/parallel.html#thread-based-parallelism-vs-process-based-parallelism>`_ 65for more details. 66 67In practice, whether parallelism is helpful at improving runtime depends on 68many factors. It is usually a good idea to experiment rather than assuming 69that increasing the number of workers is always a good thing. In some cases 70it can be highly detrimental to performance to run multiple copies of some 71estimators or functions in parallel (see oversubscription below). 72 73OpenMP-based parallelism 74........................ 75 76OpenMP is used to parallelize code written in Cython or C, relying on 77multi-threading exclusively. By default (and unless joblib is trying to 78avoid oversubscription), the implementation will use as many threads as 79possible. 80 81You can control the exact number of threads that are used via the 82``OMP_NUM_THREADS`` environment variable: 83 84.. prompt:: bash $ 85 86 OMP_NUM_THREADS=4 python my_script.py 87 88Parallel Numpy routines from numerical libraries 89................................................ 90 91Scikit-learn relies heavily on NumPy and SciPy, which internally call 92multi-threaded linear algebra routines implemented in libraries such as MKL, 93OpenBLAS or BLIS. 94 95The number of threads used by the OpenBLAS, MKL or BLIS libraries can be set 96via the ``MKL_NUM_THREADS``, ``OPENBLAS_NUM_THREADS``, and 97``BLIS_NUM_THREADS`` environment variables. 98 99Please note that scikit-learn has no direct control over these 100implementations. Scikit-learn solely relies on Numpy and Scipy. 101 102.. note:: 103 At the time of writing (2019), NumPy and SciPy packages distributed on 104 pypi.org (used by ``pip``) and on the conda-forge channel are linked 105 with OpenBLAS, while conda packages shipped on the "defaults" channel 106 from anaconda.org are linked by default with MKL. 107 108 109Oversubscription: spawning too many threads 110........................................... 111 112It is generally recommended to avoid using significantly more processes or 113threads than the number of CPUs on a machine. Over-subscription happens when 114a program is running too many threads at the same time. 115 116Suppose you have a machine with 8 CPUs. Consider a case where you're running 117a :class:`~sklearn.model_selection.GridSearchCV` (parallelized with joblib) 118with ``n_jobs=8`` over a 119:class:`~sklearn.ensemble.HistGradientBoostingClassifier` (parallelized with 120OpenMP). Each instance of 121:class:`~sklearn.ensemble.HistGradientBoostingClassifier` will spawn 8 threads 122(since you have 8 CPUs). That's a total of ``8 * 8 = 64`` threads, which 123leads to oversubscription of physical CPU resources and to scheduling 124overhead. 125 126Oversubscription can arise in the exact same fashion with parallelized 127routines from MKL, OpenBLAS or BLIS that are nested in joblib calls. 128 129Starting from ``joblib >= 0.14``, when the ``loky`` backend is used (which 130is the default), joblib will tell its child **processes** to limit the 131number of threads they can use, so as to avoid oversubscription. In practice 132the heuristic that joblib uses is to tell the processes to use ``max_threads 133= n_cpus // n_jobs``, via their corresponding environment variable. Back to 134our example from above, since the joblib backend of 135:class:`~sklearn.model_selection.GridSearchCV` is ``loky``, each process will 136only be able to use 1 thread instead of 8, thus mitigating the 137oversubscription issue. 138 139Note that: 140 141- Manually setting one of the environment variables (``OMP_NUM_THREADS``, 142 ``MKL_NUM_THREADS``, ``OPENBLAS_NUM_THREADS``, or ``BLIS_NUM_THREADS``) 143 will take precedence over what joblib tries to do. The total number of 144 threads will be ``n_jobs * <LIB>_NUM_THREADS``. Note that setting this 145 limit will also impact your computations in the main process, which will 146 only use ``<LIB>_NUM_THREADS``. Joblib exposes a context manager for 147 finer control over the number of threads in its workers (see joblib docs 148 linked below). 149- Joblib is currently unable to avoid oversubscription in a 150 multi-threading context. It can only do so with the ``loky`` backend 151 (which spawns processes). 152 153You will find additional details about joblib mitigation of oversubscription 154in `joblib documentation 155<https://joblib.readthedocs.io/en/latest/parallel.html#avoiding-over-subscription-of-cpu-ressources>`_. 156 157 158Configuration switches 159----------------------- 160 161Python runtime 162.............. 163 164:func:`sklearn.set_config` controls the following behaviors: 165 166:assume_finite: 167 168 used to skip validation, which enables faster computations but may 169 lead to segmentation faults if the data contains NaNs. 170 171:working_memory: 172 173 the optimal size of temporary arrays used by some algorithms. 174 175.. _environment_variable: 176 177Environment variables 178...................... 179 180These environment variables should be set before importing scikit-learn. 181 182:SKLEARN_SITE_JOBLIB: 183 184 When this environment variable is set to a non zero value, 185 scikit-learn uses the site joblib rather than its vendored version. 186 Consequently, joblib must be installed for scikit-learn to run. 187 Note that using the site joblib is at your own risks: the versions of 188 scikit-learn and joblib need to be compatible. Currently, joblib 0.11+ 189 is supported. In addition, dumps from joblib.Memory might be incompatible, 190 and you might loose some caches and have to redownload some datasets. 191 192 .. deprecated:: 0.21 193 194 As of version 0.21 this parameter has no effect, vendored joblib was 195 removed and site joblib is always used. 196 197:SKLEARN_ASSUME_FINITE: 198 199 Sets the default value for the `assume_finite` argument of 200 :func:`sklearn.set_config`. 201 202:SKLEARN_WORKING_MEMORY: 203 204 Sets the default value for the `working_memory` argument of 205 :func:`sklearn.set_config`. 206 207:SKLEARN_SEED: 208 209 Sets the seed of the global random generator when running the tests, 210 for reproducibility. 211 212:SKLEARN_SKIP_NETWORK_TESTS: 213 214 When this environment variable is set to a non zero value, the tests 215 that need network access are skipped. When this environment variable is 216 not set then network tests are skipped. 217 218:SKLEARN_ENABLE_DEBUG_CYTHON_DIRECTIVES: 219 220 When this environment variable is set to a non zero value, the `Cython` 221 derivative, `boundscheck` is set to `True`. This is useful for finding 222 segfaults. 223