.. currentmodule:: sklearn
In Development
The following estimators and functions, when fit with the same data and parameters, may produce different models from the previous version. This often occurs due to changes in the modelling logic (bug fixes or enhancements), or in random sampling procedures.
|Efficiency| :class:`linear_model.LogisticRegression` and :class:`linear_model.LogisticRegressionCV` now have much better convergence for solvers "lbfgs" and "newton-cg". Both solvers can now reach much higher precision for the coefficients depending on the specified tol. Additionally, lbfgs can make better use of tol, i.e., stop sooner or reach higher precision. :pr:`26721` by :user:`Christian Lorentzen <lorentzenchr>`.
Note
The lbfgs is the default solver, so this change might effect many models.
This change also means that with this new version of scikit-learn, the resulting coefficients coef_ and intercept_ of your models will change for these two solvers (when fit on the same data again). The amount of change depends on the specified tol, for small values you will get more precise results.
- |Enhancement| All estimators now recognizes the column names from any dataframe that adopts the DataFrame Interchange Protocol. Dataframes that return a correct representation through np.asarray(df) is expected to work with our estimators and functions. :pr:`26464` by `Thomas Fan`_.
- |Fix| Fixed a bug in most estimators and functions where setting a parameter to a large integer would cause a TypeError. :pr:`26648` by :user:`Naoise Holohan <naoise-h>`.
The following models now support metadata routing in one or more or their methods. Refer to the :ref:`Metadata Routing User Guide <metadata_routing>` for more details.
- |Feature| :class:`LarsCV` and :class:`LassoLarsCV` now support metadata routing in their fit method and route metadata to the CV splitter. :pr:`27538` by :user:`Omar Salman <OmarManzoor>`.
- |Feature| :class:`multiclass.OneVsRestClassifier`,
:class:`multiclass.OneVsOneClassifier` and
:class:`multiclass.OutputCodeClassifier` now support metadata routing in
their
fit
andpartial_fit
, and route metadata to the underlying estimator'sfit
andpartial_fit
. :pr:`27308` by :user:`Stefanie Senger <StefanieSenger>`. - |Feature| :class:`pipeline.Pipeline` now supports metadata routing according to :ref:`metadata routing user guide <metadata_routing>`. :pr:`26789` by `Adrin Jalali`_.
- |Feature| :func:`~model_selection.cross_validate`, :func:`~model_selection.cross_val_score`, and :func:`~model_selection.cross_val_predict` now support metadata routing. The metadata are routed to the estimator's fit, the scorer, and the CV splitter's split. The metadata is accepted via the new params parameter. fit_params is deprecated and will be removed in version 1.6. groups parameter is also not accepted as a separate argument when metadata routing is enabled and should be passed via the params parameter. :pr:`26896` by `Adrin Jalali`_.
- |Feature| :class:`~model_selection.GridSearchCV`,
:class:`~model_selection.RandomizedSearchCV`,
:class:`~model_selection.HalvingGridSearchCV`, and
:class:`~model_selection.HalvingRandomSearchCV` now support metadata routing
in their
fit
andscore
, and route metadata to the underlying estimator'sfit
, the CV splitter, and the scorer. :pr:`27058` by `Adrin Jalali`_. - |Feature| :class:`~compose.ColumnTransformer` now supports metadata routing according to :ref:`metadata routing user guide <metadata_routing>`. :pr:`27005` by `Adrin Jalali`_.
- |Feature| :class:`linear_model.LogisticRegressionCV` now supports
metadata routing. :meth:`linear_model.LogisticRegressionCV.fit` now
accepts
**params
which are passed to the underlying splitter and scorer. :meth:`linear_model.LogisticRegressionCV.score` now accepts**score_params
which are passed to the underlying scorer. :pr:`26525` by :user:`Omar Salman <OmarManzoor>`. - |Feature| :class:`linear_model.OrthogonalMatchingPursuitCV` now supports
metadata routing. Its fit now accepts
**fit_params
, which are passed to the underlying splitter. :pr:`27500` by :user:`Stefanie Senger <StefanieSenger>`. - |Fix| All meta-estimators for which metadata routing is not yet implemented now raise a NotImplementedError on get_metadata_routing and on fit if metadata routing is enabled and any metadata is passed to them. :pr:`27389` by `Adrin Jalali`_.
- |Feature| :class:`ElasticNetCV`, :class:`LassoCV`, :class:`MultiTaskElasticNetCV` and :class:`MultiTaskLassoCV` now support metadata routing and route metadata to the CV splitter. :pr:`27478` by :user:`Omar Salman <OmarManzoor>`.
Several estimators are now supporting SciPy sparse arrays. The following functions and classes are impacted:
Functions:
- :func:`cluster.compute_optics_graph` in :pr:`27250` by :user:`Yao Xiao <Charlie-XIAO>`;
- :func:`cluster.kmeans_plusplus` in :pr:`27179` by :user:`Nurseit Kamchyev <Bncer>`;
- :func:`decomposition.non_negative_factorization` in :pr:`27100` by :user:`Isaac Virshup <ivirshup>`;
- :func:`feature_selection.f_regression` in :pr:`27239` by :user:`Yaroslav Korobko <Tialo>`;
- :func:`feature_selection.r_regression` in :pr:`27239` by :user:`Yaroslav Korobko <Tialo>`;
- :func:`manifold.trustworthiness` in :pr:`27250` by :user:`Yao Xiao <Charlie-XIAO>`;
- :func:`metrics.pairwise_distances` in :pr:`27250` by :user:`Yao Xiao <Charlie-XIAO>`;
- :func:`metrics.pairwise_distances_chunked` in :pr:`27250` by :user:`Yao Xiao <Charlie-XIAO>`;
- :func:`metrics.pairwise.pairwise_kernels` in :pr:`27250` by :user:`Yao Xiao <Charlie-XIAO>`;
- :func:`sklearn.utils.multiclass.type_of_target` in :pr:`27274` by :user:`Yao Xiao <Charlie-XIAO>`.
Classes:
- :class:`cluster.HDBSCAN` in :pr:`27250` by :user:`Yao Xiao <Charlie-XIAO>`;
- :class:`cluster.KMeans` in :pr:`27179` by :user:`Nurseit Kamchyev <Bncer>`;
- :class:`cluster.MiniBatchKMeans` in :pr:`27179` by :user:`Nurseit Kamchyev <Bncer>`;
- :class:`cluster.OPTICS` in :pr:`27250` by :user:`Yao Xiao <Charlie-XIAO>`;
- :class:`decomposition.NMF` in :pr:`27100` by :user:`Isaac Virshup <ivirshup>`;
- :class:`decomposition.MiniBatchNMF` in :pr:`27100` by :user:`Isaac Virshup <ivirshup>`;
- :class:`feature_extraction.text.TfidfTransformer` in :pr:`27219` by :user:`Yao Xiao <Charlie-XIAO>`;
- :class:`cluster.Isomap` in :pr:`27250` by :user:`Yao Xiao <Charlie-XIAO>`;
- :class:`manifold.TSNE` in :pr:`27250` by :user:`Yao Xiao <Charlie-XIAO>`;
- :class:`impute.SimpleImputer` in :pr:`27277` by :user:`Yao Xiao <Charlie-XIAO>`;
- :class:`impute.IterativeImputer` in :pr:`27277` by :user:`Yao Xiao <Charlie-XIAO>`;
- :class:`impute.KNNImputer` in :pr:`27277` by :user:`Yao Xiao <Charlie-XIAO>`;
- :class:`kernel_approximation.PolynomialCountSketch` in :pr:`27301` by :user:`Lohit SundaramahaLingam <lohitslohit>`;
- :class:`neural_network.BernoulliRBM` in :pr:`27252` by :user:`Yao Xiao <Charlie-XIAO>`;
- :class:`preprocessing.PolynomialFeatures` in :pr:`27166` by :user:`Mohit Joshi <work-mohit>`.
- |Enhancement| :meth:`base.ClusterMixin.fit_predict` and
:meth:`base.OutlierMixin.fit_predict` now accept
**kwargs
which are passed to thefit
method of the estimator. :pr:`26506` by `Adrin Jalali`_. - |Enhancement| :meth:`base.TransformerMixin.fit_transform` and
:meth:`base.OutlierMixin.fit_predict` now raise a warning if
transform
/predict
consume metadata, but no customfit_transform
/fit_predict
is defined in the class inheriting from them correspondingly. :pr:`26831` by `Adrin Jalali`_. - |Enhancement| :func:`base.clone` now supports dict as input and creates a copy. :pr:`26786` by `Adrin Jalali`_.
- |API|:func:`~utils.metadata_routing.process_routing` now has a different signature. The first two (the object and the method) are positional only, and all metadata are passed as keyword arguments. :pr:`26909` by `Adrin Jalali`_.
- |Enhancement| The internal objective and gradient of the sigmoid method of :class:`calibration.CalibratedClassifierCV` have been replaced by the private loss module. :pr:`27185` by :user:`Omar Salman <OmarManzoor>`.
- |API| : kdtree and balltree values are now deprecated and are renamed as kd_tree and ball_tree respectively for the algorithm parameter of :class:`cluster.HDBSCAN` ensuring consistency in naming convention. kdtree and balltree values will be removed in 1.6. :pr:`26744` by :user:`Shreesha Kumar Bhat <Shreesha3112>`.
- |API| |FIX| :class:`~compose.ColumnTransformer` now replaces "passthrough"
with a corresponding :class:`~preprocessing.FunctionTransformer` in the
fitted
transformers_
attribute. :pr:`27204` by `Adrin Jalali`_.
- |Enhancement| :func:`datasets.make_sparse_spd_matrix` now uses a more memory- efficient sparse layout. It also accepts a new keyword sparse_format that allows specifying the output format of the sparse matrix. By default sparse_format=None, which returns a dense numpy ndarray as before. :pr:`27438` by :user:`Yao Xiao <Charlie-XIAO>`.
- |Enhancement| An "auto" option was added to the n_components parameter of :func:`decomposition.non_negative_factorization`, :class:`decomposition.NMF` and :class:`decomposition.MiniBatchNMF` to automatically infer the number of components from W or H shapes when using a custom initialization. The default value of this parameter will change from None to auto in version 1.6. :pr:`26634` by :user:`Alexandre Landeau <AlexL>` and :user:`Alexandre Vigny <avigny>`.
- |Enhancement| :class:`decomposition.PCA` now supports the Array API for the full and randomized solvers (with QR power iterations). See :ref:`array_api` for more details. :pr:`26315` and :pr:`27098` by :user:`Mateusz Sokół <mtsokol>`, :user:`Olivier Grisel <ogrisel>` and :user:`Edoardo Abati <EdAbati>`.
- |MajorFeature| :class:`ensemble.RandomForestClassifier` and :class:`ensemble.RandomForestRegressor` support missing values when the criterion is gini, entropy, or log_loss, for classification or squared_error, friedman_mse, or poisson for regression. :pr:`26391` by `Thomas Fan`_.
- |Feature| :class:`ensemble.RandomForestClassifier`, :class:`ensemble.RandomForestRegressor`, :class:`ensemble.ExtraTreesClassifier` and :class:`ensemble.ExtraTreesRegressor` now support monotonic constraints, useful when features are supposed to have a positive/negative effect on the target. Missing values in the train data and multi-output targets are not supported. :pr:`13649` by :user:`Samuel Ronsin <samronsin>`, initiated by :user:`Patrick O'Reilly <pat-oreilly>`.
- |Efficiency| :class:`ensemble.GradientBoostingClassifier` is faster, for binary and in particular for multiclass problems thanks to the private loss function module. :pr:`26278` by :user:`Christian Lorentzen <lorentzenchr>`.
- |Efficiency| Improves runtime and memory usage for :class:`ensemble.GradientBoostingClassifier` and :class:`ensemble.GradientBoostingRegressor` when trained on sparse data. :pr:`26957` by `Thomas Fan`_.
- |API| In :class:`ensemble.AdaBoostClassifier`, the algorithm argument SAMME.R was deprecated and will be removed in 1.6. :pr:`26830` by :user:`Stefanie Senger <StefanieSenger>`.
- |Enhancement| :class:`inspection.DecisionBoundaryDisplay` now accepts a parameter class_of_interest to select the class of interest when plotting the response provided by response_method="predict_proba" or response_method="decision_function". It allows to plot the decision boundary for both binary and multiclass classifiers. :pr:`27291` by :user:`Guillaume Lemaitre <glemaitre>`.
- |API| :class:`inspection.DecisionBoundaryDisplay` raise an AttributeError instead of a ValueError when an estimator does not implement the requested response method. :pr:`27291` by :user:`Guillaume Lemaitre <glemaitre>`.
|Efficiency| :class:`linear_model.LogisticRegression` and :class:`linear_model.LogisticRegressionCV` now have much better convergence for solvers "lbfgs" and "newton-cg". Both solvers can now reach much higher precision for the coefficients depending on the specified tol. Additionally, lbfgs can make better use of tol, i.e., stop sooner or reach higher precision. This is accomplished by better scaling of the objective function, i.e., using average per sample losses instead of sum of per sample losses. :pr:`26721` by :user:`Christian Lorentzen <lorentzenchr>`.
Note
This change also means that with this new version of scikit-learn, the resulting coefficients coef_ and intercept_ of your models will change for these two solvers (when fit on the same data again). The amount of change depends on the specified tol, for small values you will get more precise results.
|Efficiency| :class:`linear_model.LogisticRegression` and :class:`linear_model.LogisticRegressionCV` with solver "newton-cg" can now be considerably faster for some data and parameter settings. This is accomplished by a better line search convergence check for negligible loss improvements that takes into account gradient information. :pr:`26721` by :user:`Christian Lorentzen <lorentzenchr>`.
|Efficiency| Solver "newton-cg" in :class:`linear_model.LogisticRegression` and :class:`linear_model.LogisticRegressionCV` uses a little less memory. The effect is proportional to the number of coefficients (n_features * n_classes). :pr:`27417` by :user:`Christian Lorentzen <lorentzenchr>`.
- |Efficiency| Computing pairwise distances via :class:`metrics.DistanceMetric` for CSR × CSR, Dense × CSR, and CSR × Dense datasets is now 1.5x faster. :pr:`26765` by :user:`Meekail Zain <micky774>`
- |Efficiency| Computing distances via :class:`metrics.DistanceMetric` for CSR × CSR, Dense × CSR, and CSR × Dense now uses ~50% less memory, and outputs distances in the same dtype as the provided data. :pr:`27006` by :user:`Meekail Zain <micky774>`
- |Enhancement| Improve the rendering of the plot obtained with the :class:`metrics.PrecisionRecallDisplay` and :class:`metrics.RocCurveDisplay` classes. the x- and y-axis limits are set to [0, 1] and the aspect ratio between both axis is set to be 1 to get a square plot. :pr:`26366` by :user:`Mojdeh Rastgoo <mrastgoo>`.
- |Enhancement| Added neg_root_mean_squared_log_error_scorer as scorer :pr:`26734` by :user:`Alejandro Martin Gil <101AlexMartin>`.
- |Enhancement| :func:`sklearn.metrics.accuracy_score` and :func:`sklearn.metrics.zero_one_loss` now support Array API compatible inputs. :pr:`27137` by :user:`Edoardo Abati <EdAbati>`.
- |API| Deprecated needs_threshold and needs_proba from :func:`metrics.make_scorer`. These parameters will be removed in version 1.6. Instead, use response_method that accepts "predict", "predict_proba" or "decision_function" or a list of such values. needs_proba=True is equivalent to response_method="predict_proba" and needs_threshold=True is equivalent to response_method=("decision_function", "predict_proba"). :pr:`26840` by :user:`Guillaume Lemaitre <glemaitre>`.
- |API| The squared parameter of :func:`metrics.mean_squared_error` and :func:`metrics.mean_squared_log_error` is deprecated and will be removed in 1.6. Use the new functions :func:`metrics.root_mean_squared_error` and :func:`metrics.root_mean_squared_log_error` instead. :pr:`26734` by :user:`Alejandro Martin Gil <101AlexMartin>`.
- |Fix| :func:`metrics.make_scorer` now raises an error when using a regressor on a scorer requesting a non-thresholded decision function (from decision_function or predict_proba). Such scorer are specific to classification. :pr:`26840` by :user:`Guillaume Lemaitre <glemaitre>`.
- |Enhancement| :func:`sklearn.model_selection.train_test_split` now supports Array API compatible inputs. :pr:`26855` by `Tim Head`_.
- |Fix| :class:`model_selection.GridSearchCV`, :class:`model_selection.RandomizedSearchCV`, and :class:`model_selection.HalvingGridSearchCV` now don't change the given object in the parameter grid if it's an estimator. :pr:`26786` by `Adrin Jalali`_.
- |Efficiency| :meth:`sklearn.neighbors.KNeighborsRegressor.predict` and :meth:`sklearn.neighbors.KNeighborsClassifier.predict_proba` now efficiently support pairs of dense and sparse datasets. :pr:`27018` by :user:`Julien Jerphanion <jjerphan>`.
- |API| :class:`neighbors.KNeighborsRegressor` now accepts :class:`metrics.DistanceMetric` objects directly via the metric keyword argument allowing for the use of accelerated third-party :class:`metrics.DistanceMetric` objects. :pr:`26267` by :user:`Meekail Zain <micky774>`.
- |Efficiency| The performance of :meth:`neighbors.RadiusNeighborsClassifier.predict` and of :meth:`neighbors.RadiusNeighborsClassifier.predict_proba` has been improved when radius is large and algorithm="brute" with non-Euclidean metrics. :pr:`26828` by :user:`Omar Salman <OmarManzoor>`.
- |MajorFeature| :class:`preprocessing.MinMaxScaler` and :class:`preprocessing.MaxAbsScaler` now support the Array API. Array API support is considered experimental and might evolve without being subject to our usual rolling deprecation cycle policy. See :ref:`array_api` for more details. :pr:`26243` by `Tim Head`_ and :pr:`27110` by :user:`Edoardo Abati <EdAbati>`.
- |Efficiency| :class:`preprocessing.OrdinalEncoder` avoids calculating missing indices twice to improve efficiency. :pr:`27017` by :user:`Xuefeng Xu <xuefeng-xu>`.
- |Enhancement| Improves warnings in :class:`preprocessing.FunctionTransformer` when func returns a pandas dataframe and the output is configured to be pandas. :pr:`26944` by `Thomas Fan`_.
- |Enhancement| :class:`preprocessing.TargetEncoder` now supports target_type 'multiclass'. :pr:`26674` by :user:`Lucy Liu <lucyleeow>`.
- |Feature| :class:`tree.DecisionTreeClassifier`, :class:`tree.DecisionTreeRegressor`, :class:`tree.ExtraTreeClassifier` and :class:`tree.ExtraTreeRegressor` now support monotonic constraints, useful when features are supposed to have a positive/negative effect on the target. Missing values in the train data and multi-output targets are not supported. :pr:`13649` by :user:`Samuel Ronsin <samronsin>`, initiated by :user:`Patrick O'Reilly <pat-oreilly>`.
- |Enhancement| :func:`sklearn.utils.estimator_html_repr` dynamically adapts diagram colors based on the browser's prefers-color-scheme, providing improved adaptability to dark mode environments. :pr:`26862` by :user:`Andrew Goh Yisheng <9y5>`, `Thomas Fan`_, `Adrin Jalali`_.
- |Enhancement| :class:`~utils.metadata_routing.MetadataRequest` and
:class:`~utils.metadata_routing.MetadataRouter` now have a
consumes
method which can be used to check whether a given set of parameters would be consumed. :pr:`26831` by `Adrin Jalali`_. - |Fix| :func:`sklearn.utils.check_array` should accept both matrix and array from the sparse SciPy module. The previous implementation would fail if copy=True by calling specific NumPy np.may_share_memory that does not work with SciPy sparse array and does not return the correct result for SciPy sparse matrix. :pr:`27336` by :user:`Guillaume Lemaitre <glemaitre>`.
- |API| :func:`sklearn.extmath.log_logistic` is deprecated and will be removed in 1.6. Use -np.logaddexp(0, -x) instead. :pr:`27544` by :user:`Christian Lorentzen <lorentzenchr>`.
Thanks to everyone who has contributed to the maintenance and improvement of the project since version 1.3, including:
TODO: update at the time of the release.