1.. include:: _contributors.rst 2 3.. currentmodule:: sklearn 4 5.. _changes_1_0_2: 6 7Version 1.0.2 8============= 9 10**December 2021** 11 12- |Fix| :class:`cluster.Birch`, 13 :class:`feature_selection.RFECV`, :class:`ensemble.RandomForestRegressor`, 14 :class:`ensemble.RandomForestClassifier`, 15 :class:`ensemble.GradientBoostingRegressor`, and 16 :class:`ensemble.GradientBoostingClassifier` do not raise warning when fitted 17 on a pandas DataFrame anymore. :pr:`21578` by `Thomas Fan`_. 18 19Changelog 20--------- 21 22:mod:`sklearn.cluster` 23...................... 24 25- |Fix| Fixed an infinite loop in :func:`cluster.SpectralClustering` by 26 moving an iteration counter from try to except. 27 :pr:`21271` by :user:`Tyler Martin <martintb>`. 28 29:mod:`sklearn.datasets` 30....................... 31 32- |Fix| :func:`datasets.fetch_openml` is now thread safe. Data is first 33 downloaded to a temporary subfolder and then renamed. 34 :pr:`21833` by :user:`Siavash Rezazadeh <siavrez>`. 35 36:mod:`sklearn.decomposition` 37............................ 38 39- |Fix| Fixed the constraint on the objective function of 40 :class:`decomposition.DictionaryLearning`, 41 :class:`decomposition.MiniBatchDictionaryLearning`, :class:`decomposition.SparsePCA` 42 and :class:`decomposition.MiniBatchSparsePCA` to be convex and match the referenced 43 article. :pr:`19210` by :user:`Jérémie du Boisberranger <jeremiedbb>`. 44 45:mod:`sklearn.ensemble` 46....................... 47 48- |Fix| :class:`ensemble.RandomForestClassifier`, 49 :class:`ensemble.RandomForestRegressor`, 50 :class:`ensemble.ExtraTreesClassifier`, :class:`ensemble.ExtraTreesRegressor`, 51 and :class:`ensemble.RandomTreesEmbedding` now raise a ``ValueError`` when 52 ``bootstrap=False`` and ``max_samples`` is not ``None``. 53 :pr:`21295` :user:`Haoyin Xu <PSSF23>`. 54 55- |Fix| Solve a bug in :class:`ensemble.GradientBoostingClassifier` where the 56 exponential loss was computing the positive gradient instead of the 57 negative one. 58 :pr:`22050` by :user:`Guillaume Lemaitre <glemaitre>`. 59 60:mod:`sklearn.feature_selection` 61................................ 62 63- |Fix| Fixed :class:`feature_selection.SelectFromModel` by improving support 64 for base estimators that do not set `feature_names_in_`. :pr:`21991` by 65 `Thomas Fan`_. 66 67:mod:`sklearn.impute` 68..................... 69 70- |Fix| Fix a bug in :class:`linear_model.RidgeClassifierCV` where the method 71 `predict` was performing an `argmax` on the scores obtained from 72 `decision_function` instead of returning the multilabel indicator matrix. 73 :pr:`19869` by :user:`Guillaume Lemaitre <glemaitre>`. 74 75:mod:`sklearn.linear_model` 76........................... 77 78- |Fix| :class:`linear_model.LassoLarsIC` now correctly computes AIC 79 and BIC. An error is now raised when `n_features > n_samples` and 80 when the noise variance is not provided. 81 :pr:`21481` by :user:`Guillaume Lemaitre <glemaitre>` and 82 :user:`Andrés Babino <ababino>`. 83 84:mod:`sklearn.manifold` 85....................... 86 87- |Fix| Fixed an unnecessary error when fitting :class:`manifold.Isomap` with a 88 precomputed dense distance matrix where the neighbors graph has multiple 89 disconnected components. :pr:`21915` by `Tom Dupre la Tour`_. 90 91:mod:`sklearn.metrics` 92...................... 93 94- |Fix| All :class:`sklearn.metrics.DistanceMetric` subclasses now correctly support 95 read-only buffer attributes. 96 This fixes a regression introduced in 1.0.0 with respect to 0.24.2. 97 :pr:`21694` by :user:`Julien Jerphanion <jjerphan>`. 98 99- |Fix| All :class:`sklearn.metrics.MinkowskiDistance` now accepts a weight 100 parameter that makes it possible to write code that behaves consistently both 101 with scipy 1.8 and earlier versions. In turns this means that all 102 neighbors-based estimators (except those that use `algorithm="kd_tree"`) now 103 accept a weight parameter with `metric="minknowski"` to yield results that 104 are always consistent with `scipy.spatial.distance.cdist`. 105 :pr:`21741` by :user:`Olivier Grisel <ogrisel>`. 106 107:mod:`sklearn.neighbors` 108........................ 109 110- |Fix| :class:`neighbors.KDTree` and :class:`neighbors.BallTree` correctly supports 111 read-only buffer attributes. :pr:`21845` by `Thomas Fan`_. 112 113:mod:`sklearn.preprocessing` 114............................ 115 116- |Fix| Fixes compatibility bug with NumPy 1.22 in :class:`preprocessing.OneHotEncoder`. 117 :pr:`21517` by `Thomas Fan`_. 118 119:mod:`sklearn.tree` 120................... 121 122- |Fix| Prevents :func:`tree.plot_tree` from drawing out of the boundary of 123 the figure. :pr:`21917` by `Thomas Fan`_. 124 125- |Fix| Support loading pickles of decision tree models when the pickle has 126 been generated on a platform with a different bitness. A typical example is 127 to train and pickle the model on 64 bit machine and load the model on a 32 128 bit machine for prediction. :pr:`21552` by :user:`Loïc Estève <lesteve>`. 129 130:mod:`sklearn.utils` 131.................... 132 133- |Fix| :func:`utils.estimator_html_repr` now escapes all the estimator 134 descriptions in the generated HTML. :pr:`21493` by 135 :user:`Aurélien Geron <ageron>`. 136 137.. _changes_1_0_1: 138 139Version 1.0.1 140============= 141 142**October 2021** 143 144Changelog 145--------- 146 147Fixed models 148------------ 149 150- |Fix| Non-fit methods in the following classes do not raise a UserWarning 151 when fitted on DataFrames with valid feature names: 152 :class:`covariance.EllipticEnvelope`, :class:`ensemble.IsolationForest`, 153 :class:`ensemble.AdaBoostClassifier`, :class:`neighbors.KNeighborsClassifier`, 154 :class:`neighbors.KNeighborsRegressor`, 155 :class:`neighbors.RadiusNeighborsClassifier`, 156 :class:`neighbors.RadiusNeighborsRegressor`. :pr:`21199` by `Thomas Fan`_. 157 158:mod:`sklearn.calibration` 159.......................... 160 161- |Fix| Fixed :class:`calibration.CalibratedClassifierCV` to take into account 162 `sample_weight` when computing the base estimator prediction when 163 `ensemble=False`. 164 :pr:`20638` by :user:`Julien Bohné <JulienB-78>`. 165 166- |Fix| Fixed a bug in :class:`calibration.CalibratedClassifierCV` with 167 `method="sigmoid"` that was ignoring the `sample_weight` when computing the 168 the Bayesian priors. 169 :pr:`21179` by :user:`Guillaume Lemaitre <glemaitre>`. 170 171:mod:`sklearn.cluster` 172...................... 173 174- |Fix| Fixed a bug in :class:`cluster.KMeans`, ensuring reproducibility and equivalence 175 between sparse and dense input. :pr:`21195` 176 by :user:`Jérémie du Boisberranger <jeremiedbb>`. 177 178:mod:`sklearn.ensemble` 179....................... 180 181- |Fix| Fixed a bug that could produce a segfault in rare cases for 182 :class:`ensemble.HistGradientBoostingClassifier` and 183 :class:`ensemble.HistGradientBoostingRegressor`. 184 :pr:`21130` :user:`Christian Lorentzen <lorentzenchr>`. 185 186:mod:`sklearn.gaussian_process` 187............................... 188 189- |Fix| Compute `y_std` properly with multi-target in 190 :class:`sklearn.gaussian_process.GaussianProcessRegressor` allowing 191 proper normalization in multi-target scene. 192 :pr:`20761` by :user:`Patrick de C. T. R. Ferreira <patrickctrf>`. 193 194:mod:`sklearn.feature_extraction` 195................................. 196 197- |Efficiency| Fixed an efficiency regression introduced in version 1.0.0 in the 198 `transform` method of :class:`feature_extraction.text.CountVectorizer` which no 199 longer checks for uppercase characters in the provided vocabulary. :pr:`21251` 200 by :user:`Jérémie du Boisberranger <jeremiedbb>`. 201 202- |Fix| Fixed a bug in :class:`feature_extraction.CountVectorizer` and 203 :class:`feature_extraction.TfidfVectorizer` by raising an 204 error when 'min_idf' or 'max_idf' are floating-point numbers greater than 1. 205 :pr:`20752` by :user:`Alek Lefebvre <AlekLefebvre>`. 206 207:mod:`sklearn.linear_model` 208........................... 209 210- |Fix| Improves stability of :class:`linear_model.LassoLars` for different 211 versions of openblas. :pr:`21340` by `Thomas Fan`_. 212 213- |Fix| :class:`linear_model.LogisticRegression` now raises a better error 214 message when the solver does not support sparse matrices with int64 indices. 215 :pr:`21093` by `Tom Dupre la Tour`_. 216 217:mod:`sklearn.neighbors` 218........................ 219 220- |Fix| :class:`neighbors.KNeighborsClassifier`, 221 :class:`neighbors.KNeighborsRegressor`, 222 :class:`neighbors.RadiusNeighborsClassifier`, 223 :class:`neighbors.RadiusNeighborsRegressor` with `metric="precomputed"` raises 224 an error for `bsr` and `dok` sparse matrices in methods: `fit`, `kneighbors` 225 and `radius_neighbors`, due to handling of explicit zeros in `bsr` and `dok` 226 :term:`sparse graph` formats. :pr:`21199` by `Thomas Fan`_. 227 228:mod:`sklearn.pipeline` 229....................... 230 231- |Fix| :meth:`pipeline.Pipeline.get_feature_names_out` correctly passes feature 232 names out from one step of a pipeline to the next. :pr:`21351` by 233 `Thomas Fan`_. 234 235:mod:`sklearn.svm` 236.................. 237 238- |Fix| :class:`svm.SVC` and :class:`svm.SVR` check for an inconsistency 239 in its internal representation and raise an error instead of segfaulting. 240 This fix also resolves 241 `CVE-2020-28975 <https://nvd.nist.gov/vuln/detail/CVE-2020-28975>`__. 242 :pr:`21336` by `Thomas Fan`_. 243 244:mod:`sklearn.utils` 245.................... 246 247- |Enhancement| :func:`utils.validation._check_sample_weight` can perform a 248 non-negativity check on the sample weights. It can be turned on 249 using the only_non_negative bool parameter. 250 Estimators that check for non-negative weights are updated: 251 :func:`linear_model.LinearRegression` (here the previous 252 error message was misleading), 253 :func:`ensemble.AdaBoostClassifier`, 254 :func:`ensemble.AdaBoostRegressor`, 255 :func:`neighbors.KernelDensity`. 256 :pr:`20880` by :user:`Guillaume Lemaitre <glemaitre>` 257 and :user:`András Simon <simonandras>`. 258 259- |Fix| Solve a bug in :func:`~sklearn.utils.metaestimators.if_delegate_has_method` 260 where the underlying check for an attribute did not work with NumPy arrays. 261 :pr:`21145` by :user:`Zahlii <Zahlii>`. 262 263Miscellaneous 264............. 265 266- |Fix| Fitting an estimator on a dataset that has no feature names, that was previously 267 fitted on a dataset with feature names no longer keeps the old feature names stored in 268 the `feature_names_in_` attribute. :pr:`21389` by 269 :user:`Jérémie du Boisberranger <jeremiedbb>`. 270 271.. _changes_1_0: 272 273Version 1.0.0 274============= 275 276**September 2021** 277 278For a short description of the main highlights of the release, please 279refer to 280:ref:`sphx_glr_auto_examples_release_highlights_plot_release_highlights_1_0_0.py`. 281 282.. include:: changelog_legend.inc 283 284Minimal dependencies 285-------------------- 286 287Version 1.0.0 of scikit-learn requires python 3.7+, numpy 1.14.6+ and 288scipy 1.1.0+. Optional minimal dependency is matplotlib 2.2.2+. 289 290Enforcing keyword-only arguments 291-------------------------------- 292 293In an effort to promote clear and non-ambiguous use of the library, most 294constructor and function parameters must now be passed as keyword arguments 295(i.e. using the `param=value` syntax) instead of positional. If a keyword-only 296parameter is used as positional, a `TypeError` is now raised. 297:issue:`15005` :pr:`20002` by `Joel Nothman`_, `Adrin Jalali`_, `Thomas Fan`_, 298`Nicolas Hug`_, and `Tom Dupre la Tour`_. See `SLEP009 299<https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/slep009/proposal.html>`_ 300for more details. 301 302Changed models 303-------------- 304 305The following estimators and functions, when fit with the same data and 306parameters, may produce different models from the previous version. This often 307occurs due to changes in the modelling logic (bug fixes or enhancements), or in 308random sampling procedures. 309 310- |Fix| :class:`manifold.TSNE` now avoids numerical underflow issues during 311 affinity matrix computation. 312 313- |Fix| :class:`manifold.Isomap` now connects disconnected components of the 314 neighbors graph along some minimum distance pairs, instead of changing 315 every infinite distances to zero. 316 317- |Fix| The splitting criterion of :class:`tree.DecisionTreeClassifier` and 318 :class:`tree.DecisionTreeRegressor` can be impacted by a fix in the handling 319 of rounding errors. Previously some extra spurious splits could occur. 320 321Details are listed in the changelog below. 322 323(While we are trying to better inform users by providing this information, we 324cannot assure that this list is complete.) 325 326 327Changelog 328--------- 329 330.. 331 Entries should be grouped by module (in alphabetic order) and prefixed with 332 one of the labels: |MajorFeature|, |Feature|, |Efficiency|, |Enhancement|, 333 |Fix| or |API| (see whats_new.rst for descriptions). 334 Entries should be ordered by those labels (e.g. |Fix| after |Efficiency|). 335 Changes not specific to a module should be listed under *Multiple Modules* 336 or *Miscellaneous*. 337 Entries should end with: 338 :pr:`123456` by :user:`Joe Bloggs <joeongithub>`. 339 where 123456 is the *pull request* number, not the issue number. 340 341- |API| The option for using the squared error via ``loss`` and 342 ``criterion`` parameters was made more consistent. The preferred way is by 343 setting the value to `"squared_error"`. Old option names are still valid, 344 produce the same models, but are deprecated and will be removed in version 345 1.2. 346 :pr:`19310` by :user:`Christian Lorentzen <lorentzenchr>`. 347 348 - For :class:`ensemble.ExtraTreesRegressor`, `criterion="mse"` is deprecated, 349 use `"squared_error"` instead which is now the default. 350 351 - For :class:`ensemble.GradientBoostingRegressor`, `loss="ls"` is deprecated, 352 use `"squared_error"` instead which is now the default. 353 354 - For :class:`ensemble.RandomForestRegressor`, `criterion="mse"` is deprecated, 355 use `"squared_error"` instead which is now the default. 356 357 - For :class:`ensemble.HistGradientBoostingRegressor`, `loss="least_squares"` 358 is deprecated, use `"squared_error"` instead which is now the default. 359 360 - For :class:`linear_model.RANSACRegressor`, `loss="squared_loss"` is 361 deprecated, use `"squared_error"` instead. 362 363 - For :class:`linear_model.SGDRegressor`, `loss="squared_loss"` is 364 deprecated, use `"squared_error"` instead which is now the default. 365 366 - For :class:`tree.DecisionTreeRegressor`, `criterion="mse"` is deprecated, 367 use `"squared_error"` instead which is now the default. 368 369 - For :class:`tree.ExtraTreeRegressor`, `criterion="mse"` is deprecated, 370 use `"squared_error"` instead which is now the default. 371 372- |API| The option for using the absolute error via ``loss`` and 373 ``criterion`` parameters was made more consistent. The preferred way is by 374 setting the value to `"absolute_error"`. Old option names are still valid, 375 produce the same models, but are deprecated and will be removed in version 376 1.2. 377 :pr:`19733` by :user:`Christian Lorentzen <lorentzenchr>`. 378 379 - For :class:`ensemble.ExtraTreesRegressor`, `criterion="mae"` is deprecated, 380 use `"absolute_error"` instead. 381 382 - For :class:`ensemble.GradientBoostingRegressor`, `loss="lad"` is deprecated, 383 use `"absolute_error"` instead. 384 385 - For :class:`ensemble.RandomForestRegressor`, `criterion="mae"` is deprecated, 386 use `"absolute_error"` instead. 387 388 - For :class:`ensemble.HistGradientBoostingRegressor`, 389 `loss="least_absolute_deviation"` is deprecated, use `"absolute_error"` 390 instead. 391 392 - For :class:`linear_model.RANSACRegressor`, `loss="absolute_loss"` is 393 deprecated, use `"absolute_error"` instead which is now the default. 394 395 - For :class:`tree.DecisionTreeRegressor`, `criterion="mae"` is deprecated, 396 use `"absolute_error"` instead. 397 398 - For :class:`tree.ExtraTreeRegressor`, `criterion="mae"` is deprecated, 399 use `"absolute_error"` instead. 400 401- |API| `np.matrix` usage is deprecated in 1.0 and will raise a `TypeError` in 402 1.2. :pr:`20165` by `Thomas Fan`_. 403 404- |API| :term:`get_feature_names_out` has been added to the transformer API 405 to get the names of the output features. :term:`get_feature_names` has in 406 turn been deprecated. :pr:`18444` by `Thomas Fan`_. 407 408- |API| All estimators store `feature_names_in_` when fitted on pandas Dataframes. 409 These feature names are compared to names seen in non-`fit` methods, e.g. 410 `transform` and will raise a `FutureWarning` if they are not consistent. 411 These ``FutureWarning`` s will become ``ValueError`` s in 1.2. :pr:`18010` by 412 `Thomas Fan`_. 413 414:mod:`sklearn.base` 415................... 416 417- |Fix| :func:`config_context` is now threadsafe. :pr:`18736` by `Thomas Fan`_. 418 419:mod:`sklearn.calibration` 420.......................... 421 422- |Feature| :func:`calibration.CalibrationDisplay` added to plot 423 calibration curves. :pr:`17443` by :user:`Lucy Liu <lucyleeow>`. 424 425- |Fix| The ``predict`` and ``predict_proba`` methods of 426 :class:`calibration.CalibratedClassifierCV` can now properly be used on 427 prefitted pipelines. :pr:`19641` by :user:`Alek Lefebvre <AlekLefebvre>`. 428 429- |Fix| Fixed an error when using a :class:`ensemble.VotingClassifier` 430 as `base_estimator` in :class:`calibration.CalibratedClassifierCV`. 431 :pr:`20087` by :user:`Clément Fauchereau <clement-f>`. 432 433 434:mod:`sklearn.cluster` 435...................... 436 437- |Efficiency| The ``"k-means++"`` initialization of :class:`cluster.KMeans` 438 and :class:`cluster.MiniBatchKMeans` is now faster, especially in multicore 439 settings. :pr:`19002` by :user:`Jon Crall <Erotemic>` and :user:`Jérémie du 440 Boisberranger <jeremiedbb>`. 441 442- |Efficiency| :class:`cluster.KMeans` with `algorithm='elkan'` is now faster 443 in multicore settings. :pr:`19052` by 444 :user:`Yusuke Nagasaka <YusukeNagasaka>`. 445 446- |Efficiency| :class:`cluster.MiniBatchKMeans` is now faster in multicore 447 settings. :pr:`17622` by :user:`Jérémie du Boisberranger <jeremiedbb>`. 448 449- |Efficiency| :class:`cluster.OPTICS` can now cache the output of the 450 computation of the tree, using the `memory` parameter. :pr:`19024` by 451 :user:`Frankie Robertson <frankier>`. 452 453- |Enhancement| The `predict` and `fit_predict` methods of 454 :class:`cluster.AffinityPropagation` now accept sparse data type for input 455 data. 456 :pr:`20117` by :user:`Venkatachalam Natchiappan <venkyyuvy>` 457 458- |Fix| Fixed a bug in :class:`cluster.MiniBatchKMeans` where the sample 459 weights were partially ignored when the input is sparse. :pr:`17622` by 460 :user:`Jérémie du Boisberranger <jeremiedbb>`. 461 462- |Fix| Improved convergence detection based on center change in 463 :class:`cluster.MiniBatchKMeans` which was almost never achievable. 464 :pr:`17622` by :user:`Jérémie du Boisberranger <jeremiedbb>`. 465 466- |FIX| :class:`cluster.AgglomerativeClustering` now supports readonly 467 memory-mapped datasets. 468 :pr:`19883` by :user:`Julien Jerphanion <jjerphan>`. 469 470- |Fix| :class:`cluster.AgglomerativeClustering` correctly connects components 471 when connectivity and affinity are both precomputed and the number 472 of connected components is greater than 1. :pr:`20597` by 473 `Thomas Fan`_. 474 475- |Fix| :class:`cluster.FeatureAgglomeration` does not accept a ``**params`` kwarg in 476 the ``fit`` function anymore, resulting in a more concise error message. :pr:`20899` 477 by :user:`Adam Li <adam2392>`. 478 479- |Fix| Fixed a bug in :class:`cluster.KMeans`, ensuring reproducibility and equivalence 480 between sparse and dense input. :pr:`20200` 481 by :user:`Jérémie du Boisberranger <jeremiedbb>`. 482 483- |API| :class:`cluster.Birch` attributes, `fit_` and `partial_fit_`, are 484 deprecated and will be removed in 1.2. :pr:`19297` by `Thomas Fan`_. 485 486- |API| the default value for the `batch_size` parameter of 487 :class:`cluster.MiniBatchKMeans` was changed from 100 to 1024 due to 488 efficiency reasons. The `n_iter_` attribute of 489 :class:`cluster.MiniBatchKMeans` now reports the number of started epochs and 490 the `n_steps_` attribute reports the number of mini batches processed. 491 :pr:`17622` by :user:`Jérémie du Boisberranger <jeremiedbb>`. 492 493- |API| :func:`cluster.spectral_clustering` raises an improved error when passed 494 a `np.matrix`. :pr:`20560` by `Thomas Fan`_. 495 496:mod:`sklearn.compose` 497...................... 498 499- |Enhancement| :class:`compose.ColumnTransformer` now records the output 500 of each transformer in `output_indices_`. :pr:`18393` by 501 :user:`Luca Bittarello <lbittarello>`. 502 503- |Enhancement| :class:`compose.ColumnTransformer` now allows DataFrame input to 504 have its columns appear in a changed order in `transform`. Further, columns that 505 are dropped will not be required in transform, and additional columns will be 506 ignored if `remainder='drop'`. :pr:`19263` by `Thomas Fan`_. 507 508- |Enhancement| Adds `**predict_params` keyword argument to 509 :meth:`compose.TransformedTargetRegressor.predict` that passes keyword 510 argument to the regressor. 511 :pr:`19244` by :user:`Ricardo <ricardojnf>`. 512 513- |FIX| :meth:`compose.ColumnTransformer.get_feature_names` supports 514 non-string feature names returned by any of its transformers. However, note 515 that ``get_feature_names`` is deprecated, use ``get_feature_names_out`` 516 instead. :pr:`18459` by :user:`Albert Villanova del Moral <albertvillanova>` 517 and :user:`Alonso Silva Allende <alonsosilvaallende>`. 518 519- |Fix| :class:`compose.TransformedTargetRegressor` now takes nD targets with 520 an adequate transformer. 521 :pr:`18898` by :user:`Oras Phongpanagnam <panangam>`. 522 523- |API| Adds `verbose_feature_names_out` to :class:`compose.ColumnTransformer`. 524 This flag controls the prefixing of feature names out in 525 :term:`get_feature_names_out`. :pr:`18444` and :pr:`21080` by `Thomas Fan`_. 526 527:mod:`sklearn.covariance` 528......................... 529 530- |Fix| Adds arrays check to :func:`covariance.ledoit_wolf` and 531 :func:`covariance.ledoit_wolf_shrinkage`. :pr:`20416` by :user:`Hugo Defois 532 <defoishugo>`. 533 534- |API| Deprecates the following keys in `cv_results_`: `'mean_score'`, 535 `'std_score'`, and `'split(k)_score'` in favor of `'mean_test_score'` 536 `'std_test_score'`, and `'split(k)_test_score'`. :pr:`20583` by `Thomas Fan`_. 537 538:mod:`sklearn.datasets` 539....................... 540 541- |Enhancement| :func:`datasets.fetch_openml` now supports categories with 542 missing values when returning a pandas dataframe. :pr:`19365` by 543 `Thomas Fan`_ and :user:`Amanda Dsouza <amy12xx>` and 544 :user:`EL-ATEIF Sara <elateifsara>`. 545 546- |Enhancement| :func:`datasets.fetch_kddcup99` raises a better message 547 when the cached file is invalid. :pr:`19669` `Thomas Fan`_. 548 549- |Enhancement| Replace usages of ``__file__`` related to resource file I/O 550 with ``importlib.resources`` to avoid the assumption that these resource 551 files (e.g. ``iris.csv``) already exist on a filesystem, and by extension 552 to enable compatibility with tools such as ``PyOxidizer``. 553 :pr:`20297` by :user:`Jack Liu <jackzyliu>`. 554 555- |Fix| Shorten data file names in the openml tests to better support 556 installing on Windows and its default 260 character limit on file names. 557 :pr:`20209` by `Thomas Fan`_. 558 559- |Fix| :func:`datasets.fetch_kddcup99` returns dataframes when 560 `return_X_y=True` and `as_frame=True`. :pr:`19011` by `Thomas Fan`_. 561 562- |API| Deprecates :func:`datasets.load_boston` in 1.0 and it will be removed 563 in 1.2. Alternative code snippets to load similar datasets are provided. 564 Please report to the docstring of the function for details. 565 :pr:`20729` by `Guillaume Lemaitre`_. 566 567 568:mod:`sklearn.decomposition` 569............................ 570 571- |Enhancement| added a new approximate solver (randomized SVD, available with 572 `eigen_solver='randomized'`) to :class:`decomposition.KernelPCA`. This 573 significantly accelerates computation when the number of samples is much 574 larger than the desired number of components. 575 :pr:`12069` by :user:`Sylvain Marié <smarie>`. 576 577- |Fix| Fixes incorrect multiple data-conversion warnings when clustering 578 boolean data. :pr:`19046` by :user:`Surya Prakash <jdsurya>`. 579 580- |Fix| Fixed :func:`dict_learning`, used by 581 :class:`decomposition.DictionaryLearning`, to ensure determinism of the 582 output. Achieved by flipping signs of the SVD output which is used to 583 initialize the code. :pr:`18433` by :user:`Bruno Charron <brcharron>`. 584 585- |Fix| Fixed a bug in :class:`decomposition.MiniBatchDictionaryLearning`, 586 :class:`decomposition.MiniBatchSparsePCA` and 587 :func:`decomposition.dict_learning_online` where the update of the dictionary 588 was incorrect. :pr:`19198` by :user:`Jérémie du Boisberranger <jeremiedbb>`. 589 590- |Fix| Fixed a bug in :class:`decomposition.DictionaryLearning`, 591 :class:`decomposition.SparsePCA`, 592 :class:`decomposition.MiniBatchDictionaryLearning`, 593 :class:`decomposition.MiniBatchSparsePCA`, 594 :func:`decomposition.dict_learning` and 595 :func:`decomposition.dict_learning_online` where the restart of unused atoms 596 during the dictionary update was not working as expected. :pr:`19198` by 597 :user:`Jérémie du Boisberranger <jeremiedbb>`. 598 599- |API| In :class:`decomposition.DictionaryLearning`, 600 :class:`decomposition.MiniBatchDictionaryLearning`, 601 :func:`decomposition.dict_learning` and 602 :func:`decomposition.dict_learning_online`, `transform_alpha` will be equal 603 to `alpha` instead of 1.0 by default starting from version 1.2 :pr:`19159` by 604 :user:`Benoît Malézieux <bmalezieux>`. 605 606- |API| Rename variable names in :class:`KernelPCA` to improve 607 readability. `lambdas_` and `alphas_` are renamed to `eigenvalues_` 608 and `eigenvectors_`, respectively. `lambdas_` and `alphas_` are 609 deprecated and will be removed in 1.2. 610 :pr:`19908` by :user:`Kei Ishikawa <kstoneriv3>`. 611 612- |API| The `alpha` and `regularization` parameters of :class:`decomposition.NMF` and 613 :func:`decomposition.non_negative_factorization` are deprecated and will be removed 614 in 1.2. Use the new parameters `alpha_W` and `alpha_H` instead. :pr:`20512` by 615 :user:`Jérémie du Boisberranger <jeremiedbb>`. 616 617:mod:`sklearn.dummy` 618.................... 619 620- |API| Attribute `n_features_in_` in :class:`dummy.DummyRegressor` and 621 :class:`dummy.DummyRegressor` is deprecated and will be removed in 1.2. 622 :pr:`20960` by `Thomas Fan`_. 623 624:mod:`sklearn.ensemble` 625....................... 626 627- |Enhancement| :class:`~sklearn.ensemble.HistGradientBoostingClassifier` and 628 :class:`~sklearn.ensemble.HistGradientBoostingRegressor` take cgroups quotas 629 into account when deciding the number of threads used by OpenMP. This 630 avoids performance problems caused by over-subscription when using those 631 classes in a docker container for instance. :pr:`20477` 632 by `Thomas Fan`_. 633 634- |Enhancement| :class:`~sklearn.ensemble.HistGradientBoostingClassifier` and 635 :class:`~sklearn.ensemble.HistGradientBoostingRegressor` are no longer 636 experimental. They are now considered stable and are subject to the same 637 deprecation cycles as all other estimators. :pr:`19799` by `Nicolas Hug`_. 638 639- |Enhancement| Improve the HTML rendering of the 640 :class:`ensemble.StackingClassifier` and :class:`ensemble.StackingRegressor`. 641 :pr:`19564` by `Thomas Fan`_. 642 643- |Enhancement| Added Poisson criterion to 644 :class:`ensemble.RandomForestRegressor`. :pr:`19836` by :user:`Brian Sun 645 <bsun94>`. 646 647- |Fix| Do not allow to compute out-of-bag (OOB) score in 648 :class:`ensemble.RandomForestClassifier` and 649 :class:`ensemble.ExtraTreesClassifier` with multiclass-multioutput target 650 since scikit-learn does not provide any metric supporting this type of 651 target. Additional private refactoring was performed. 652 :pr:`19162` by :user:`Guillaume Lemaitre <glemaitre>`. 653 654- |Fix| Improve numerical precision for weights boosting in 655 :class:`ensemble.AdaBoostClassifier` and :class:`ensemble.AdaBoostRegressor` 656 to avoid underflows. 657 :pr:`10096` by :user:`Fenil Suchak <fenilsuchak>`. 658 659- |Fix| Fixed the range of the argument ``max_samples`` to be ``(0.0, 1.0]`` 660 in :class:`ensemble.RandomForestClassifier`, 661 :class:`ensemble.RandomForestRegressor`, where `max_samples=1.0` is 662 interpreted as using all `n_samples` for bootstrapping. :pr:`20159` by 663 :user:`murata-yu`. 664 665- |Fix| Fixed a bug in :class:`ensemble.AdaBoostClassifier` and 666 :class:`ensemble.AdaBoostRegressor` where the `sample_weight` parameter 667 got overwritten during `fit`. 668 :pr:`20534` by :user:`Guillaume Lemaitre <glemaitre>`. 669 670- |API| Removes `tol=None` option in 671 :class:`ensemble.HistGradientBoostingClassifier` and 672 :class:`ensemble.HistGradientBoostingRegressor`. Please use `tol=0` for 673 the same behavior. :pr:`19296` by `Thomas Fan`_. 674 675:mod:`sklearn.feature_extraction` 676................................. 677 678- |Fix| Fixed a bug in :class:`feature_extraction.text.HashingVectorizer` 679 where some input strings would result in negative indices in the transformed 680 data. :pr:`19035` by :user:`Liu Yu <ly648499246>`. 681 682- |Fix| Fixed a bug in :class:`feature_extraction.DictVectorizer` by raising an 683 error with unsupported value type. 684 :pr:`19520` by :user:`Jeff Zhao <kamiyaa>`. 685 686- |Fix| Fixed a bug in :func:`feature_extraction.image.img_to_graph` 687 and :func:`feature_extraction.image.grid_to_graph` where singleton connected 688 components were not handled properly, resulting in a wrong vertex indexing. 689 :pr:`18964` by `Bertrand Thirion`_. 690 691- |Fix| Raise a warning in :class:`feature_extraction.text.CountVectorizer` 692 with `lowercase=True` when there are vocabulary entries with uppercase 693 characters to avoid silent misses in the resulting feature vectors. 694 :pr:`19401` by :user:`Zito Relova <zitorelova>` 695 696:mod:`sklearn.feature_selection` 697................................ 698 699- |Feature| :func:`feature_selection.r_regression` computes Pearson's R 700 correlation coefficients between the features and the target. 701 :pr:`17169` by :user:`Dmytro Lituiev <DSLituiev>` 702 and :user:`Julien Jerphanion <jjerphan>`. 703 704- |Enhancement| :func:`feature_selection.RFE.fit` accepts additional estimator 705 parameters that are passed directly to the estimator's `fit` method. 706 :pr:`20380` by :user:`Iván Pulido <ijpulidos>`, :user:`Felipe Bidu <fbidu>`, 707 :user:`Gil Rutter <g-rutter>`, and :user:`Adrin Jalali <adrinjalali>`. 708 709- |FIX| Fix a bug in :func:`isotonic.isotonic_regression` where the 710 `sample_weight` passed by a user were overwritten during ``fit``. 711 :pr:`20515` by :user:`Carsten Allefeld <allefeld>`. 712 713- |Fix| Change :func:`feature_selection.SequentialFeatureSelector` to 714 allow for unsupervised modelling so that the `fit` signature need not 715 do any `y` validation and allow for `y=None`. 716 :pr:`19568` by :user:`Shyam Desai <ShyamDesai>`. 717 718- |API| Raises an error in :class:`feature_selection.VarianceThreshold` 719 when the variance threshold is negative. 720 :pr:`20207` by :user:`Tomohiro Endo <europeanplaice>` 721 722- |API| Deprecates `grid_scores_` in favor of split scores in `cv_results_` in 723 :class:`feature_selection.RFECV`. `grid_scores_` will be removed in 724 version 1.2. 725 :pr:`20161` by :user:`Shuhei Kayawari <wowry>` and :user:`arka204`. 726 727:mod:`sklearn.inspection` 728......................... 729 730- |Enhancement| Add `max_samples` parameter in 731 :func:`inspection.permutation_importance`. It enables to draw a subset of the 732 samples to compute the permutation importance. This is useful to keep the 733 method tractable when evaluating feature importance on large datasets. 734 :pr:`20431` by :user:`Oliver Pfaffel <o1iv3r>`. 735 736- |Enhancement| Add kwargs to format ICE and PD lines separately in partial 737 dependence plots :func:`inspection.plot_partial_dependence` and 738 :meth:`inspection.PartialDependenceDisplay.plot`. :pr:`19428` by :user:`Mehdi 739 Hamoumi <mhham>`. 740 741- |Fix| Allow multiple scorers input to 742 :func:`inspection.permutation_importance`. :pr:`19411` by :user:`Simona 743 Maggio <simonamaggio>`. 744 745- |API| :class:`inspection.PartialDependenceDisplay` exposes a class method: 746 :func:`~inspection.PartialDependenceDisplay.from_estimator`. 747 :func:`inspection.plot_partial_dependence` is deprecated in favor of the 748 class method and will be removed in 1.2. :pr:`20959` by `Thomas Fan`_. 749 750:mod:`sklearn.kernel_approximation` 751................................... 752 753- |Fix| Fix a bug in :class:`kernel_approximation.Nystroem` 754 where the attribute `component_indices_` did not correspond to the subset of 755 sample indices used to generate the approximated kernel. :pr:`20554` by 756 :user:`Xiangyin Kong <kxytim>`. 757 758:mod:`sklearn.linear_model` 759........................... 760 761- |Feature| Added :class:`linear_model.QuantileRegressor` which implements 762 linear quantile regression with L1 penalty. 763 :pr:`9978` by :user:`David Dale <avidale>` and 764 :user:`Christian Lorentzen <lorentzenchr>`. 765 766- |Feature| The new :class:`linear_model.SGDOneClassSVM` provides an SGD 767 implementation of the linear One-Class SVM. Combined with kernel 768 approximation techniques, this implementation approximates the solution of 769 a kernelized One Class SVM while benefitting from a linear 770 complexity in the number of samples. 771 :pr:`10027` by :user:`Albert Thomas <albertcthomas>`. 772 773- |Feature| Added `sample_weight` parameter to 774 :class:`linear_model.LassoCV` and :class:`linear_model.ElasticNetCV`. 775 :pr:`16449` by :user:`Christian Lorentzen <lorentzenchr>`. 776 777- |Feature| Added new solver `lbfgs` (available with `solver="lbfgs"`) 778 and `positive` argument to :class:`linear_model.Ridge`. When `positive` is 779 set to `True`, forces the coefficients to be positive (only supported by 780 `lbfgs`). :pr:`20231` by :user:`Toshihiro Nakae <tnakae>`. 781 782- |Efficiency| The implementation of :class:`linear_model.LogisticRegression` 783 has been optimised for dense matrices when using `solver='newton-cg'` and 784 `multi_class!='multinomial'`. 785 :pr:`19571` by :user:`Julien Jerphanion <jjerphan>`. 786 787- |Enhancement| `fit` method preserves dtype for numpy.float32 in 788 :class:`linear_model.Lars`, :class:`linear_model.LassoLars`, 789 :class:`linear_model.LassoLars`, :class:`linear_model.LarsCV` and 790 :class:`linear_model.LassoLarsCV`. :pr:`20155` by :user:`Takeshi Oura 791 <takoika>`. 792 793- |Enhancement| Validate user-supplied gram matrix passed to linear models 794 via the `precompute` argument. :pr:`19004` by :user:`Adam Midvidy <amidvidy>`. 795 796- |Fix| :meth:`linear_model.ElasticNet.fit` no longer modifies `sample_weight` 797 in place. :pr:`19055` by `Thomas Fan`_. 798 799- |Fix| :class:`linear_model.Lasso` and :class:`linear_model.ElasticNet` no 800 longer have a `dual_gap_` not corresponding to their objective. :pr:`19172` 801 by :user:`Mathurin Massias <mathurinm>` 802 803- |Fix| `sample_weight` are now fully taken into account in linear models 804 when `normalize=True` for both feature centering and feature 805 scaling. 806 :pr:`19426` by :user:`Alexandre Gramfort <agramfort>` and 807 :user:`Maria Telenczuk <maikia>`. 808 809- |Fix| Points with residuals equal to ``residual_threshold`` are now considered 810 as inliers for :class:`linear_model.RANSACRegressor`. This allows fitting 811 a model perfectly on some datasets when `residual_threshold=0`. 812 :pr:`19499` by :user:`Gregory Strubel <gregorystrubel>`. 813 814- |Fix| Sample weight invariance for :class:`linear_model.Ridge` was fixed in 815 :pr:`19616` by :user:`Oliver Grisel <ogrisel>` and :user:`Christian Lorentzen 816 <lorentzenchr>`. 817 818- |Fix| The dictionary `params` in :func:`linear_model.enet_path` and 819 :func:`linear_model.lasso_path` should only contain parameter of the 820 coordinate descent solver. Otherwise, an error will be raised. 821 :pr:`19391` by :user:`Shao Yang Hong <hongshaoyang>`. 822 823- |API| Raise a warning in :class:`linear_model.RANSACRegressor` that from 824 version 1.2, `min_samples` need to be set explicitly for models other than 825 :class:`linear_model.LinearRegression`. :pr:`19390` by :user:`Shao Yang Hong 826 <hongshaoyang>`. 827 828- |API|: The parameter ``normalize`` of :class:`linear_model.LinearRegression` 829 is deprecated and will be removed in 1.2. Motivation for this deprecation: 830 ``normalize`` parameter did not take any effect if ``fit_intercept`` was set 831 to False and therefore was deemed confusing. The behavior of the deprecated 832 ``LinearModel(normalize=True)`` can be reproduced with a 833 :class:`~sklearn.pipeline.Pipeline` with ``LinearModel`` (where 834 ``LinearModel`` is :class:`~linear_model.LinearRegression`, 835 :class:`~linear_model.Ridge`, :class:`~linear_model.RidgeClassifier`, 836 :class:`~linear_model.RidgeCV` or :class:`~linear_model.RidgeClassifierCV`) 837 as follows: ``make_pipeline(StandardScaler(with_mean=False), 838 LinearModel())``. The ``normalize`` parameter in 839 :class:`~linear_model.LinearRegression` was deprecated in :pr:`17743` by 840 :user:`Maria Telenczuk <maikia>` and :user:`Alexandre Gramfort <agramfort>`. 841 Same for :class:`~linear_model.Ridge`, 842 :class:`~linear_model.RidgeClassifier`, :class:`~linear_model.RidgeCV`, and 843 :class:`~linear_model.RidgeClassifierCV`, in: :pr:`17772` by :user:`Maria 844 Telenczuk <maikia>` and :user:`Alexandre Gramfort <agramfort>`. Same for 845 :class:`~linear_model.BayesianRidge`, :class:`~linear_model.ARDRegression` 846 in: :pr:`17746` by :user:`Maria Telenczuk <maikia>`. Same for 847 :class:`~linear_model.Lasso`, :class:`~linear_model.LassoCV`, 848 :class:`~linear_model.ElasticNet`, :class:`~linear_model.ElasticNetCV`, 849 :class:`~linear_model.MultiTaskLasso`, 850 :class:`~linear_model.MultiTaskLassoCV`, 851 :class:`~linear_model.MultiTaskElasticNet`, 852 :class:`~linear_model.MultiTaskElasticNetCV`, in: :pr:`17785` by :user:`Maria 853 Telenczuk <maikia>` and :user:`Alexandre Gramfort <agramfort>`. 854 855- |API| The ``normalize`` parameter of 856 :class:`~linear_model.OrthogonalMatchingPursuit` and 857 :class:`~linear_model.OrthogonalMatchingPursuitCV` will default to False in 858 1.2 and will be removed in 1.4. :pr:`17750` by :user:`Maria Telenczuk 859 <maikia>` and :user:`Alexandre Gramfort <agramfort>`. Same for 860 :class:`~linear_model.Lars` :class:`~linear_model.LarsCV` 861 :class:`~linear_model.LassoLars` :class:`~linear_model.LassoLarsCV` 862 :class:`~linear_model.LassoLarsIC`, in :pr:`17769` by :user:`Maria Telenczuk 863 <maikia>` and :user:`Alexandre Gramfort <agramfort>`. 864 865- |API| Keyword validation has moved from `__init__` and `set_params` to `fit` 866 for the following estimators conforming to scikit-learn's conventions: 867 :class:`~linear_model.SGDClassifier`, 868 :class:`~linear_model.SGDRegressor`, 869 :class:`~linear_model.SGDOneClassSVM`, 870 :class:`~linear_model.PassiveAggressiveClassifier`, and 871 :class:`~linear_model.PassiveAggressiveRegressor`. 872 :pr:`20683` by `Guillaume Lemaitre`_. 873 874:mod:`sklearn.manifold` 875....................... 876 877- |Enhancement| Implement `'auto'` heuristic for the `learning_rate` in 878 :class:`manifold.TSNE`. It will become default in 1.2. The default 879 initialization will change to `pca` in 1.2. PCA initialization will 880 be scaled to have standard deviation 1e-4 in 1.2. 881 :pr:`19491` by :user:`Dmitry Kobak <dkobak>`. 882 883- |Fix| Change numerical precision to prevent underflow issues 884 during affinity matrix computation for :class:`manifold.TSNE`. 885 :pr:`19472` by :user:`Dmitry Kobak <dkobak>`. 886 887- |Fix| :class:`manifold.Isomap` now uses `scipy.sparse.csgraph.shortest_path` 888 to compute the graph shortest path. It also connects disconnected components 889 of the neighbors graph along some minimum distance pairs, instead of changing 890 every infinite distances to zero. :pr:`20531` by `Roman Yurchak`_ and `Tom 891 Dupre la Tour`_. 892 893- |Fix| Decrease the numerical default tolerance in the lobpcg call 894 in :func:`manifold.spectral_embedding` to prevent numerical instability. 895 :pr:`21194` by :user:`Andrew Knyazev <lobpcg>`. 896 897:mod:`sklearn.metrics` 898...................... 899 900- |Feature| :func:`metrics.mean_pinball_loss` exposes the pinball loss for 901 quantile regression. :pr:`19415` by :user:`Xavier Dupré <sdpython>` 902 and :user:`Oliver Grisel <ogrisel>`. 903 904- |Feature| :func:`metrics.d2_tweedie_score` calculates the D^2 regression 905 score for Tweedie deviances with power parameter ``power``. This is a 906 generalization of the `r2_score` and can be interpreted as percentage of 907 Tweedie deviance explained. 908 :pr:`17036` by :user:`Christian Lorentzen <lorentzenchr>`. 909 910- |Feature| :func:`metrics.mean_squared_log_error` now supports 911 `squared=False`. 912 :pr:`20326` by :user:`Uttam kumar <helper-uttam>`. 913 914- |Efficiency| Improved speed of :func:`metrics.confusion_matrix` when labels 915 are integral. 916 :pr:`9843` by :user:`Jon Crall <Erotemic>`. 917 918- |Enhancement| A fix to raise an error in :func:`metrics.hinge_loss` when 919 ``pred_decision`` is 1d whereas it is a multiclass classification or when 920 ``pred_decision`` parameter is not consistent with the ``labels`` parameter. 921 :pr:`19643` by :user:`Pierre Attard <PierreAttard>`. 922 923- |Fix| :meth:`metrics.ConfusionMatrixDisplay.plot` uses the correct max 924 for colormap. :pr:`19784` by `Thomas Fan`_. 925 926- |Fix| Samples with zero `sample_weight` values do not affect the results 927 from :func:`metrics.det_curve`, :func:`metrics.precision_recall_curve` 928 and :func:`metrics.roc_curve`. 929 :pr:`18328` by :user:`Albert Villanova del Moral <albertvillanova>` and 930 :user:`Alonso Silva Allende <alonsosilvaallende>`. 931 932- |Fix| avoid overflow in :func:`metrics.cluster.adjusted_rand_score` with 933 large amount of data. :pr:`20312` by :user:`Divyanshu Deoli 934 <divyanshudeoli>`. 935 936- |API| :class:`metrics.ConfusionMatrixDisplay` exposes two class methods 937 :func:`~metrics.ConfusionMatrixDisplay.from_estimator` and 938 :func:`~metrics.ConfusionMatrixDisplay.from_predictions` allowing to create 939 a confusion matrix plot using an estimator or the predictions. 940 :func:`metrics.plot_confusion_matrix` is deprecated in favor of these two 941 class methods and will be removed in 1.2. 942 :pr:`18543` by `Guillaume Lemaitre`_. 943 944- |API| :class:`metrics.PrecisionRecallDisplay` exposes two class methods 945 :func:`~metrics.PrecisionRecallDisplay.from_estimator` and 946 :func:`~metrics.PrecisionRecallDisplay.from_predictions` allowing to create 947 a precision-recall curve using an estimator or the predictions. 948 :func:`metrics.plot_precision_recall_curve` is deprecated in favor of these 949 two class methods and will be removed in 1.2. 950 :pr:`20552` by `Guillaume Lemaitre`_. 951 952- |API| :class:`metrics.DetCurveDisplay` exposes two class methods 953 :func:`~metrics.DetCurveDisplay.from_estimator` and 954 :func:`~metrics.DetCurveDisplay.from_predictions` allowing to create 955 a confusion matrix plot using an estimator or the predictions. 956 :func:`metrics.plot_det_curve` is deprecated in favor of these two 957 class methods and will be removed in 1.2. 958 :pr:`19278` by `Guillaume Lemaitre`_. 959 960:mod:`sklearn.mixture` 961...................... 962 963- |Fix| Ensure that the best parameters are set appropriately 964 in the case of divergency for :class:`mixture.GaussianMixture` and 965 :class:`mixture.BayesianGaussianMixture`. 966 :pr:`20030` by :user:`Tingshan Liu <tliu68>` and 967 :user:`Benjamin Pedigo <bdpedigo>`. 968 969:mod:`sklearn.model_selection` 970.............................. 971 972- |Feature| added :class:`model_selection.StratifiedGroupKFold`, that combines 973 :class:`model_selection.StratifiedKFold` and 974 :class:`model_selection.GroupKFold`, providing an ability to split data 975 preserving the distribution of classes in each split while keeping each 976 group within a single split. 977 :pr:`18649` by :user:`Leandro Hermida <hermidalc>` and 978 :user:`Rodion Martynov <marrodion>`. 979 980- |Enhancement| warn only once in the main process for per-split fit failures 981 in cross-validation. :pr:`20619` by :user:`Loïc Estève <lesteve>` 982 983- |Enhancement| The :class:`model_selection.BaseShuffleSplit` base class is 984 now public. :pr:`20056` by :user:`pabloduque0`. 985 986- |Fix| Avoid premature overflow in :func:`model_selection.train_test_split`. 987 :pr:`20904` by :user:`Tomasz Jakubek <t-jakubek>`. 988 989:mod:`sklearn.naive_bayes` 990.......................... 991 992- |Fix| The `fit` and `partial_fit` methods of the discrete naive Bayes 993 classifiers (:class:`naive_bayes.BernoulliNB`, 994 :class:`naive_bayes.CategoricalNB`, :class:`naive_bayes.ComplementNB`, 995 and :class:`naive_bayes.MultinomialNB`) now correctly handle the degenerate 996 case of a single class in the training set. 997 :pr:`18925` by :user:`David Poznik <dpoznik>`. 998 999- |API| The attribute ``sigma_`` is now deprecated in 1000 :class:`naive_bayes.GaussianNB` and will be removed in 1.2. 1001 Use ``var_`` instead. 1002 :pr:`18842` by :user:`Hong Shao Yang <hongshaoyang>`. 1003 1004:mod:`sklearn.neighbors` 1005........................ 1006 1007- |Enhancement| The creation of :class:`neighbors.KDTree` and 1008 :class:`neighbors.BallTree` has been improved for their worst-cases time 1009 complexity from :math:`\mathcal{O}(n^2)` to :math:`\mathcal{O}(n)`. 1010 :pr:`19473` by :user:`jiefangxuanyan <jiefangxuanyan>` and 1011 :user:`Julien Jerphanion <jjerphan>`. 1012 1013- |FIX| :class:`neighbors.DistanceMetric` subclasses now support readonly 1014 memory-mapped datasets. :pr:`19883` by :user:`Julien Jerphanion <jjerphan>`. 1015 1016- |FIX| :class:`neighbors.NearestNeighbors`, :class:`neighbors.KNeighborsClassifier`, 1017 :class:`neighbors.RadiusNeighborsClassifier`, :class:`neighbors.KNeighborsRegressor` 1018 and :class:`neighbors.RadiusNeighborsRegressor` do not validate `weights` in 1019 `__init__` and validates `weights` in `fit` instead. :pr:`20072` by 1020 :user:`Juan Carlos Alfaro Jiménez <alfaro96>`. 1021 1022- |API| The parameter `kwargs` of :class:`neighbors.RadiusNeighborsClassifier` is 1023 deprecated and will be removed in 1.2. 1024 :pr:`20842` by :user:`Juan Martín Loyola <jmloyola>`. 1025 1026:mod:`sklearn.neural_network` 1027............................. 1028 1029- |Fix| :class:`neural_network.MLPClassifier` and 1030 :class:`neural_network.MLPRegressor` now correctly support continued training 1031 when loading from a pickled file. :pr:`19631` by `Thomas Fan`_. 1032 1033:mod:`sklearn.pipeline` 1034....................... 1035 1036- |API| The `predict_proba` and `predict_log_proba` methods of the 1037 :class:`pipeline.Pipeline` now support passing prediction kwargs to the final 1038 estimator. :pr:`19790` by :user:`Christopher Flynn <crflynn>`. 1039 1040:mod:`sklearn.preprocessing` 1041............................ 1042 1043- |Feature| The new :class:`preprocessing.SplineTransformer` is a feature 1044 preprocessing tool for the generation of B-splines, parametrized by the 1045 polynomial ``degree`` of the splines, number of knots ``n_knots`` and knot 1046 positioning strategy ``knots``. 1047 :pr:`18368` by :user:`Christian Lorentzen <lorentzenchr>`. 1048 :class:`preprocessing.SplineTransformer` also supports periodic 1049 splines via the ``extrapolation`` argument. 1050 :pr:`19483` by :user:`Malte Londschien <mlondschien>`. 1051 :class:`preprocessing.SplineTransformer` supports sample weights for 1052 knot position strategy ``"quantile"``. 1053 :pr:`20526` by :user:`Malte Londschien <mlondschien>`. 1054 1055- |Feature| :class:`preprocessing.OrdinalEncoder` supports passing through 1056 missing values by default. :pr:`19069` by `Thomas Fan`_. 1057 1058- |Feature| :class:`preprocessing.OneHotEncoder` now supports 1059 `handle_unknown='ignore'` and dropping categories. :pr:`19041` by 1060 `Thomas Fan`_. 1061 1062- |Feature| :class:`preprocessing.PolynomialFeatures` now supports passing 1063 a tuple to `degree`, i.e. `degree=(min_degree, max_degree)`. 1064 :pr:`20250` by :user:`Christian Lorentzen <lorentzenchr>`. 1065 1066- |Efficiency| :class:`preprocessing.StandardScaler` is faster and more memory 1067 efficient. :pr:`20652` by `Thomas Fan`_. 1068 1069- |Efficiency| Changed ``algorithm`` argument for :class:`cluster.KMeans` in 1070 :class:`preprocessing.KBinsDiscretizer` from ``auto`` to ``full``. 1071 :pr:`19934` by :user:`Gleb Levitskiy <GLevV>`. 1072 1073- |Efficiency| The implementation of `fit` for 1074 :class:`preprocessing.PolynomialFeatures` transformer is now faster. This is 1075 especially noticeable on large sparse input. :pr:`19734` by :user:`Fred 1076 Robinson <frrad>`. 1077 1078- |Fix| The :func:`preprocessing.StandardScaler.inverse_transform` method 1079 now raises error when the input data is 1D. :pr:`19752` by :user:`Zhehao Liu 1080 <Max1993Liu>`. 1081 1082- |Fix| :func:`preprocessing.scale`, :class:`preprocessing.StandardScaler` 1083 and similar scalers detect near-constant features to avoid scaling them to 1084 very large values. This problem happens in particular when using a scaler on 1085 sparse data with a constant column with sample weights, in which case 1086 centering is typically disabled. :pr:`19527` by :user:`Oliver Grisel 1087 <ogrisel>` and :user:`Maria Telenczuk <maikia>` and :pr:`19788` by 1088 :user:`Jérémie du Boisberranger <jeremiedbb>`. 1089 1090- |Fix| :meth:`preprocessing.StandardScaler.inverse_transform` now 1091 correctly handles integer dtypes. :pr:`19356` by :user:`makoeppel`. 1092 1093- |Fix| :meth:`preprocessing.OrdinalEncoder.inverse_transform` is not 1094 supporting sparse matrix and raises the appropriate error message. 1095 :pr:`19879` by :user:`Guillaume Lemaitre <glemaitre>`. 1096 1097- |Fix| The `fit` method of :class:`preprocessing.OrdinalEncoder` will not 1098 raise error when `handle_unknown='ignore'` and unknown categories are given 1099 to `fit`. 1100 :pr:`19906` by :user:`Zhehao Liu <MaxwellLZH>`. 1101 1102- |Fix| Fix a regression in :class:`preprocessing.OrdinalEncoder` where large 1103 Python numeric would raise an error due to overflow when casted to C type 1104 (`np.float64` or `np.int64`). 1105 :pr:`20727` by `Guillaume Lemaitre`_. 1106 1107- |Fix| :class:`preprocessing.FunctionTransformer` does not set `n_features_in_` 1108 based on the input to `inverse_transform`. :pr:`20961` by `Thomas Fan`_. 1109 1110- |API| The `n_input_features_` attribute of 1111 :class:`preprocessing.PolynomialFeatures` is deprecated in favor of 1112 `n_features_in_` and will be removed in 1.2. :pr:`20240` by 1113 :user:`Jérémie du Boisberranger <jeremiedbb>`. 1114 1115:mod:`sklearn.svm` 1116................... 1117 1118- |API| The parameter `**params` of :func:`svm.OneClassSVM.fit` is 1119 deprecated and will be removed in 1.2. 1120 :pr:`20843` by :user:`Juan Martín Loyola <jmloyola>`. 1121 1122:mod:`sklearn.tree` 1123................... 1124 1125- |Enhancement| Add `fontname` argument in :func:`tree.export_graphviz` 1126 for non-English characters. :pr:`18959` by :user:`Zero <Zeroto521>` 1127 and :user:`wstates <wstates>`. 1128 1129- |Fix| Improves compatibility of :func:`tree.plot_tree` with high DPI screens. 1130 :pr:`20023` by `Thomas Fan`_. 1131 1132- |Fix| Fixed a bug in :class:`tree.DecisionTreeClassifier`, 1133 :class:`tree.DecisionTreeRegressor` where a node could be split whereas it 1134 should not have been due to incorrect handling of rounding errors. 1135 :pr:`19336` by :user:`Jérémie du Boisberranger <jeremiedbb>`. 1136 1137- |API| The `n_features_` attribute of :class:`tree.DecisionTreeClassifier`, 1138 :class:`tree.DecisionTreeRegressor`, :class:`tree.ExtraTreeClassifier` and 1139 :class:`tree.ExtraTreeRegressor` is deprecated in favor of `n_features_in_` 1140 and will be removed in 1.2. :pr:`20272` by 1141 :user:`Jérémie du Boisberranger <jeremiedbb>`. 1142 1143:mod:`sklearn.utils` 1144.................... 1145 1146- |Enhancement| Deprecated the default value of the `random_state=0` in 1147 :func:`~sklearn.utils.extmath.randomized_svd`. Starting in 1.2, 1148 the default value of `random_state` will be set to `None`. 1149 :pr:`19459` by :user:`Cindy Bezuidenhout <cinbez>` and 1150 :user:`Clifford Akai-Nettey<cliffordEmmanuel>`. 1151 1152- |Enhancement| Added helper decorator :func:`utils.metaestimators.available_if` 1153 to provide flexiblity in metaestimators making methods available or 1154 unavailable on the basis of state, in a more readable way. 1155 :pr:`19948` by `Joel Nothman`_. 1156 1157- |Enhancement| :func:`utils.validation.check_is_fitted` now uses 1158 ``__sklearn_is_fitted__`` if available, instead of checking for attributes 1159 ending with an underscore. This also makes :class:`pipeline.Pipeline` and 1160 :class:`preprocessing.FunctionTransformer` pass 1161 ``check_is_fitted(estimator)``. :pr:`20657` by `Adrin Jalali`_. 1162 1163- |Fix| Fixed a bug in :func:`utils.sparsefuncs.mean_variance_axis` where the 1164 precision of the computed variance was very poor when the real variance is 1165 exactly zero. :pr:`19766` by :user:`Jérémie du Boisberranger <jeremiedbb>`. 1166 1167- |Fix| The docstrings of propreties that are decorated with 1168 :func:`utils.deprecated` are now properly wrapped. :pr:`20385` by `Thomas 1169 Fan`_. 1170 1171- |Fix| :func:`utils.stats._weighted_percentile` now correctly ignores 1172 zero-weighted observations smaller than the smallest observation with 1173 positive weight for ``percentile=0``. Affected classes are 1174 :class:`dummy.DummyRegressor` for ``quantile=0`` and 1175 :class:`ensemble.HuberLossFunction` and :class:`ensemble.HuberLossFunction` 1176 for ``alpha=0``. :pr:`20528` by :user:`Malte Londschien <mlondschien>`. 1177 1178- |Fix| :func:`utils._safe_indexing` explicitly takes a dataframe copy when 1179 integer indices are provided avoiding to raise a warning from Pandas. This 1180 warning was previously raised in resampling utilities and functions using 1181 those utilities (e.g. :func:`model_selection.train_test_split`, 1182 :func:`model_selection.cross_validate`, 1183 :func:`model_selection.cross_val_score`, 1184 :func:`model_selection.cross_val_predict`). 1185 :pr:`20673` by :user:`Joris Van den Bossche <jorisvandenbossche>`. 1186 1187- |Fix| Fix a regression in :func:`utils.is_scalar_nan` where large Python 1188 numbers would raise an error due to overflow in C types (`np.float64` or 1189 `np.int64`). 1190 :pr:`20727` by `Guillaume Lemaitre`_. 1191 1192- |Fix| Support for `np.matrix` is deprecated in 1193 :func:`~sklearn.utils.check_array` in 1.0 and will raise a `TypeError` in 1194 1.2. :pr:`20165` by `Thomas Fan`_. 1195 1196- |API| :func:`utils._testing.assert_warns` and 1197 :func:`utils._testing.assert_warns_message` are deprecated in 1.0 and will 1198 be removed in 1.2. Used `pytest.warns` context manager instead. Note that 1199 these functions were not documented and part from the public API. 1200 :pr:`20521` by :user:`Olivier Grisel <ogrisel>`. 1201 1202- |API| Fixed several bugs in :func:`utils.graph.graph_shortest_path`, which is 1203 now deprecated. Use `scipy.sparse.csgraph.shortest_path` instead. :pr:`20531` 1204 by `Tom Dupre la Tour`_. 1205 1206Code and Documentation Contributors 1207----------------------------------- 1208 1209Thanks to everyone who has contributed to the maintenance and improvement of 1210the project since version 0.24, including: 1211 1212Abdulelah S. Al Mesfer, Abhinav Gupta, Abhishek Gupta, Adam J. Stewart, Adam 1213Li, Adam Midvidy, adijohar, Aditya Kumawat, Adrian Garcia Badaracco, Adrian 1214Sadłocha, Adrin Jalali, Agamemnon Krasoulis, AJ Druck, Albert Thomas, Albert 1215Villanova del Moral, Alberto Mario Ceballos-Arroyo, Alberto Rubiales, Alek 1216Lefebvre, Alessia Marcolini, Alexandr Fonari, Alihan Zihna, Aline Ribeiro de 1217Almeida, almeidayoel, Amanda, Amanda Dsouza, Amol Deshmukh, amrcode, Ana 1218Pessoa, Anavelyz, András Simon, Andreas Mueller, Andrew Delong, Andrew Knyazev, 1219Angus L'Herrou, Arisa, Arth, Arturo Amor, Ashish, Ashvith Shetty, Atsushi 1220Nukariya, Aurélien Geron, Avi Gupta, Ayush Singh, baam, BaptBillard, Benjamin 1221Pedigo, Bertrand Thirion, Bharat Raghunathan, bmalezieux, Brian Rice, Brian 1222Sun, Bruno Charron, Bryan Chen, bumblebee, caherrera-meli, Carsten Allefeld, 1223CeeThinwa, Chiara Marmo, Chitteti Srinath Reddy, chrissobel, Christian 1224Lorentzen, Christian Ritter, Christopher Yeh, christopherlim98, Christos 1225Aridas, Chuliang Xiao, Clément Fauchereau, cliffordEmmanuel, Conner Shen, 1226Connor Tann, David Dale, David Katz, David Poznik, Dimitri Papadopoulos 1227Orfanos, Divyanshu Deoli, dmallia17, Dmitry Kobak, DS_anas, Eduardo Jardim, 1228EdwinWenink, EL-ATEIF Sara, Eleni Markou, Eric Fiegel, Eric Larson, Eric 1229Ndirangu, EricEllwanger, Erich Schubert, Estefania Barreto-Ojeda, eyast, 1230Ezri-Mudde, Fatos Morina, Federico Luna, Felipe Rodrigues, Felix Glushchenkov, 1231Felix Hafner, Fenil Suchak, flyingdutchman23, Flynn, Fortune Uwha, FPGAwesome, 1232Francois Berenger, Frankie Robertson, Frans Larsson, Frederick Robinson, Gabor 1233Kertesz, Gabriel S Vicente, Gabriel Stefanini Vicente, Gael Varoquaux, Gauthier 1234I, genvalen, Geoffrey Thomas, geroldcsendes, Giancarlo Pablo, Gleb Levitskiy, 1235Glen, glennfrutiz, Glòria Macià Muñoz, gregorystrubel, groceryheist, Guillaume 1236Lemaitre, guiweber, Haidar Almubarak, Hans Moritz Günther, Haoyin Xu, Harris 1237Mirza, Harry Wei, Harutaka Kawamura, Hassan Alsawadi, Helder Geovane Gomes de 1238Lima, Himanshu Kumar, Hugo DEFOIS, Igor Ilic, Ikko Ashimine, iofall, Isaack 1239Mungui, Ishaan Bhat, Ishan Kumar, Ishan Mishra, Iván Pulido, iwhalvic, J 1240Alexander, Jack Liu, jalexand3r, James Alan Preiss, James Budarz, James Lamb, 1241Jannik, Jauhar, Jeff Hale, Jeff Zhao, Jennifer Maldonado, Jenny Vo, Jérémie du 1242Boisberranger, Jesse Lima, Jianzhu Guo, Jirka Borovec, jnboehm, Joel Nothman, 1243JohanWork, John Paton, Jon Crall, Jon Haitz Legarreta Gorroño, Jonathan 1244Schneider, Jorge Loayza, Joris Van den Bossche, José Manuel Nápoles Duarte, 1245Juan Carlos Alfaro Jiménez, Juan Martin Loyola, Julien Jerphanion, Julio 1246Batista Silva, julyrashchenko, JVM, Kadatatlu Kishore, Karen Palacio, katotten, 1247Kaushik Roy Chowdhury, Kei Ishikawa, Ken4git, KimAYoung, kmatt10, kobaski, 1248Kot271828, Kranthi Sedamaki, krumetoft, Kunj, KurumeYuta, kxytim, lacrosse91, 1249LalliAcqua, Laveen Bagai, Leonardo Rocco, Leonardo Uieda, Leopoldo Corona, Loic 1250Esteve, LSturtew, Luca Bittarello, Luccas Quadros, LucieClair, Lucy Jiménez, 1251Lucy Liu, Luiz Eduardo Amaral, ly648499246, Mabu Manaileng, MaggieChege, 1252makoeppel, mandjevant, Manimaran, Marco Gorelli, Maren Westermann, Maria 1253Telenczuk, Mariangela, marielaraj, Martin Hirzel, Mateo Noreña, Mathieu 1254Blondel, Mathis Batoul, mathurinm, Matthew Calcote, Maxime Prieur, Maxwell, 1255Mehdi Hamoumi, Mehmet Ali Özer, melemo2, Miao Cai, Michal Karbownik, 1256michalkrawczyk, milana2, millawell, Mitzi, mlant, mlondschien, Mohamed Haseeb, 1257Mohamed Khoualed, Mr. Leu, MrinalTyagi, Muhammad Jarir Kanji, murata-yu, N, 1258Nadim Kawwa, Nanshan Li, naozin555, nastegiano, Nate Parsons, Neal Fultz, Nic 1259Annau, Nico Stefani, Nicolas Hug, Nicolas Miller, Nigel Bosch, Niket Jain, 1260Nikita Titov, Nikolay Kondratyev, Nodar Okroshiashvili, Norbert Preining, 1261novaya, Ogbonna Chibuike Stephen, OGordon100, Oliver Pfaffel, Olivier Grisel, 1262Oras Phongpanangam, Pablo Duque, Pablo Ibieta-Jimenez, partev, Patric Lacouth, 1263Patrick Ferreira, Paul, Paulo S. Costa, Paweł Olszewski, pelennor, Peter Dye, 1264Pierre-Yves Le Borgne, PierreAttard, Pinky, Pramod Anantharam, PranayAnchuri, 1265Prince Canuma, puhuk, putschblos, qdeffense, RamyaNP, Randall Boyes, 1266ranjanikrishnan, Ray Bell, Rene Jean Corneille, Reshama Shaikh, ricardojnf, 1267RichardScottOZ, Rishabh, Rodion Martynov, Rohan Paul, Roman Lutz, Roman 1268Yurchak, Ross Barnowski, Samuel Brice, Sandy Khosasi, Sean Benhur J, Sebastian 1269Flores, Sebastian Pölsterl, Shao Yang Hong, shinehide, shinnar, shivamgargsya, 1270Shooter23, Shuhei Kayawari, Shyam Desai, siavrez, simonamaggio, Sina 1271Tootoonian, solosilence, spikebh, sply88, Steve Stagg, Steven Kolawole, Surya 1272Prakash, Sven Eschlbeck, Swapnil Jha, swpease, Sylvain Marié, t-jakubek, 1273t-kusanagi, Takeshi Oura, Tamires Santana, Terence Honles, TFiFiE, Thomas A 1274Caswell, Thomas J. Fan, Tim Gates, Tim Vink, TimotheeMathieu, Timothy Wolodzko, 1275tliu68, Tobias Uhmann, Tom Dupré la Tour, tom1092, Tomás Moreyra, Tomás Ronald 1276Hughes, Tommaso Di Noto, Tomohiro Endo, TONY GEORGE, Toshihiro NAKAE, tsuga, 1277Tyler Martin, Uttam kumar, vadim-ushtanit, Vangelis Gkiastas, Venkatachalam N, 1278Vikas Vishwakarma, Vikrant khedkar, Vilém Zouhar, Vinicius Rios Fuck, Vladimir 1279Chernyy, Vlasovets, waijean, Whidou, xavier dupré, Xiao Yuan, xiaoyuchai, Yar 1280Khine Phyo, Yasmeen Alsaedy, yoch, Yosuke KOBAYASHI, Yu Feng, YusukeNagasaka, 1281yzhenman, Zeel B Patel, Zero, ZeyuSun, ZhaoweiWang, Zito, Zito Relova, Zhao Feng 1282