sklearn
In Development
The following estimators and functions, when fit with the same data and parameters, may produce different models from the previous version. This often occurs due to changes in the modelling logic (bug fixes or enhancements), or in random sampling procedures.
linear_model.LogisticRegression
andlinear_model.LogisticRegressionCV
now have much better convergence for solvers "lbfgs" and "newton-cg". Both solvers can now reach much higher precision for the coefficients depending on the specified tol. Additionally, lbfgs can make better use of tol, i.e., stop sooner or reach higher precision.26721
byChristian Lorentzen <lorentzenchr>
.Note
The lbfgs is the default solver, so this change might effect many models.
This change also means that with this new version of scikit-learn, the resulting coefficients coef_ and intercept_ of your models will change for these two solvers (when fit on the same data again). The amount of change depends on the specified tol, for small values you will get more precise results.
- All estimators now recognizes the column names from any dataframe that adopts the DataFrame Interchange Protocol. Dataframes that return a correct representation through np.asarray(df) is expected to work with our estimators and functions.
26464
by Thomas Fan. - Fixed a bug in most estimators and functions where setting a parameter to a large integer would cause a TypeError.
26648
byNaoise Holohan <naoise-h>
.
The following models now support metadata routing in one or more or their methods. Refer to the Metadata Routing User Guide <metadata_routing>
for more details.
LarsCV
andLassoLarsCV
now support metadata routing in their fit method and route metadata to the CV splitter.27538
byOmar Salman <OmarManzoor>
.multiclass.OneVsRestClassifier
,multiclass.OneVsOneClassifier
andmulticlass.OutputCodeClassifier
now support metadata routing in theirfit
andpartial_fit
, and route metadata to the underlying estimator'sfit
andpartial_fit
.27308
byStefanie Senger <StefanieSenger>
.pipeline.Pipeline
now supports metadata routing according tometadata routing user guide <metadata_routing>
.26789
by Adrin Jalali.~model_selection.cross_validate
,~model_selection.cross_val_score
, and~model_selection.cross_val_predict
now support metadata routing. The metadata are routed to the estimator's fit, the scorer, and the CV splitter's split. The metadata is accepted via the new params parameter. fit_params is deprecated and will be removed in version 1.6. groups parameter is also not accepted as a separate argument when metadata routing is enabled and should be passed via the params parameter.26896
by Adrin Jalali.~model_selection.GridSearchCV
,~model_selection.RandomizedSearchCV
,~model_selection.HalvingGridSearchCV
, and~model_selection.HalvingRandomSearchCV
now support metadata routing in theirfit
andscore
, and route metadata to the underlying estimator'sfit
, the CV splitter, and the scorer.27058
by Adrin Jalali.~compose.ColumnTransformer
now supports metadata routing according tometadata routing user guide <metadata_routing>
.27005
by Adrin Jalali.linear_model.LogisticRegressionCV
now supports metadata routing.linear_model.LogisticRegressionCV.fit
now accepts**params
which are passed to the underlying splitter and scorer.linear_model.LogisticRegressionCV.score
now accepts**score_params
which are passed to the underlying scorer.26525
byOmar Salman <OmarManzoor>
.linear_model.OrthogonalMatchingPursuitCV
now supports metadata routing. Its fit now accepts**fit_params
, which are passed to the underlying splitter.27500
byStefanie Senger <StefanieSenger>
.- All meta-estimators for which metadata routing is not yet implemented now raise a NotImplementedError on get_metadata_routing and on fit if metadata routing is enabled and any metadata is passed to them.
27389
by Adrin Jalali. ElasticNetCV
,LassoCV
,MultiTaskElasticNetCV
andMultiTaskLassoCV
now support metadata routing and route metadata to the CV splitter.27478
byOmar Salman <OmarManzoor>
.
Several estimators are now supporting SciPy sparse arrays. The following functions and classes are impacted:
Functions:
cluster.compute_optics_graph
in27250
byYao Xiao <Charlie-XIAO>
;cluster.kmeans_plusplus
in27179
byNurseit Kamchyev <Bncer>
;decomposition.non_negative_factorization
in27100
byIsaac Virshup <ivirshup>
;feature_selection.f_regression
in27239
byYaroslav Korobko <Tialo>
;feature_selection.r_regression
in27239
byYaroslav Korobko <Tialo>
;manifold.trustworthiness
in27250
byYao Xiao <Charlie-XIAO>
;metrics.pairwise_distances
in27250
byYao Xiao <Charlie-XIAO>
;metrics.pairwise_distances_chunked
in27250
byYao Xiao <Charlie-XIAO>
;metrics.pairwise.pairwise_kernels
in27250
byYao Xiao <Charlie-XIAO>
;sklearn.utils.multiclass.type_of_target
in27274
byYao Xiao <Charlie-XIAO>
.
Classes:
cluster.HDBSCAN
in27250
byYao Xiao <Charlie-XIAO>
;cluster.KMeans
in27179
byNurseit Kamchyev <Bncer>
;cluster.MiniBatchKMeans
in27179
byNurseit Kamchyev <Bncer>
;cluster.OPTICS
in27250
byYao Xiao <Charlie-XIAO>
;decomposition.NMF
in27100
byIsaac Virshup <ivirshup>
;decomposition.MiniBatchNMF
in27100
byIsaac Virshup <ivirshup>
;feature_extraction.text.TfidfTransformer
in27219
byYao Xiao <Charlie-XIAO>
;cluster.Isomap
in27250
byYao Xiao <Charlie-XIAO>
;manifold.TSNE
in27250
byYao Xiao <Charlie-XIAO>
;impute.SimpleImputer
in27277
byYao Xiao <Charlie-XIAO>
;impute.IterativeImputer
in27277
byYao Xiao <Charlie-XIAO>
;impute.KNNImputer
in27277
byYao Xiao <Charlie-XIAO>
;kernel_approximation.PolynomialCountSketch
in27301
byLohit SundaramahaLingam <lohitslohit>
;neural_network.BernoulliRBM
in27252
byYao Xiao <Charlie-XIAO>
;preprocessing.PolynomialFeatures
in27166
byMohit Joshi <work-mohit>
.
base.ClusterMixin.fit_predict
andbase.OutlierMixin.fit_predict
now accept**kwargs
which are passed to thefit
method of the estimator.26506
by Adrin Jalali.base.TransformerMixin.fit_transform
andbase.OutlierMixin.fit_predict
now raise a warning iftransform
/predict
consume metadata, but no customfit_transform
/fit_predict
is defined in the class inheriting from them correspondingly.26831
by Adrin Jalali.base.clone
now supports dict as input and creates a copy.26786
by Adrin Jalali.~utils.metadata_routing.process_routing
now has a different signature. The first two (the object and the method) are positional only, and all metadata are passed as keyword arguments.26909
by Adrin Jalali.
- The internal objective and gradient of the sigmoid method of
calibration.CalibratedClassifierCV
have been replaced by the private loss module.27185
byOmar Salman <OmarManzoor>
.
- : kdtree and balltree values are now deprecated and are renamed as kd_tree and ball_tree respectively for the algorithm parameter of
cluster.HDBSCAN
ensuring consistency in naming convention. kdtree and balltree values will be removed in 1.6.26744
byShreesha Kumar Bhat <Shreesha3112>
.
~compose.ColumnTransformer
now replaces "passthrough" with a corresponding~preprocessing.FunctionTransformer
in the fittedtransformers_
attribute.27204
by Adrin Jalali.
datasets.make_sparse_spd_matrix
now uses a more memory-efficient sparse layout. It also accepts a new keyword sparse_format that allows specifying the output format of the sparse matrix. By default sparse_format=None, which returns a dense numpy ndarray as before.27438
byYao Xiao <Charlie-XIAO>
.
- An "auto" option was added to the n_components parameter of
decomposition.non_negative_factorization
,decomposition.NMF
anddecomposition.MiniBatchNMF
to automatically infer the number of components from W or H shapes when using a custom initialization. The default value of this parameter will change from None to auto in version 1.6.26634
byAlexandre Landeau <AlexL>
andAlexandre Vigny <avigny>
. decomposition.PCA
now supports the Array API for the full and randomized solvers (with QR power iterations). Seearray_api
for more details.26315
and27098
byMateusz Sokół <mtsokol>
,Olivier Grisel <ogrisel>
andEdoardo Abati <EdAbati>
.
ensemble.RandomForestClassifier
andensemble.RandomForestRegressor
support missing values when the criterion is gini, entropy, or log_loss, for classification or squared_error, friedman_mse, or poisson for regression.26391
by Thomas Fan.ensemble.RandomForestClassifier
,ensemble.RandomForestRegressor
,ensemble.ExtraTreesClassifier
andensemble.ExtraTreesRegressor
now support monotonic constraints, useful when features are supposed to have a positive/negative effect on the target. Missing values in the train data and multi-output targets are not supported.13649
bySamuel Ronsin <samronsin>
, initiated byPatrick O'Reilly <pat-oreilly>
.ensemble.GradientBoostingClassifier
is faster, for binary and in particular for multiclass problems thanks to the private loss function module.26278
byChristian Lorentzen <lorentzenchr>
.- Improves runtime and memory usage for
ensemble.GradientBoostingClassifier
andensemble.GradientBoostingRegressor
when trained on sparse data.26957
by Thomas Fan. - In
ensemble.AdaBoostClassifier
, the algorithm argument SAMME.R was deprecated and will be removed in 1.6.26830
byStefanie Senger <StefanieSenger>
.
inspection.DecisionBoundaryDisplay
now accepts a parameter class_of_interest to select the class of interest when plotting the response provided by response_method="predict_proba" or response_method="decision_function". It allows to plot the decision boundary for both binary and multiclass classifiers.27291
byGuillaume Lemaitre <glemaitre>
.inspection.DecisionBoundaryDisplay
raise an AttributeError instead of a ValueError when an estimator does not implement the requested response method.27291
byGuillaume Lemaitre <glemaitre>
.
linear_model.LogisticRegression
andlinear_model.LogisticRegressionCV
now have much better convergence for solvers "lbfgs" and "newton-cg". Both solvers can now reach much higher precision for the coefficients depending on the specified tol. Additionally, lbfgs can make better use of tol, i.e., stop sooner or reach higher precision. This is accomplished by better scaling of the objective function, i.e., using average per sample losses instead of sum of per sample losses.26721
byChristian Lorentzen <lorentzenchr>
.Note
This change also means that with this new version of scikit-learn, the resulting coefficients coef_ and intercept_ of your models will change for these two solvers (when fit on the same data again). The amount of change depends on the specified tol, for small values you will get more precise results.
linear_model.LogisticRegression
andlinear_model.LogisticRegressionCV
with solver "newton-cg" can now be considerably faster for some data and parameter settings. This is accomplished by a better line search convergence check for negligible loss improvements that takes into account gradient information.26721
byChristian Lorentzen <lorentzenchr>
.- Solver "newton-cg" in
linear_model.LogisticRegression
andlinear_model.LogisticRegressionCV
uses a little less memory. The effect is proportional to the number of coefficients (n_features * n_classes).27417
byChristian Lorentzen <lorentzenchr>
.
- Computing pairwise distances via
metrics.DistanceMetric
for CSR × CSR, Dense × CSR, and CSR × Dense datasets is now 1.5x faster.26765
byMeekail Zain <micky774>
- Computing distances via
metrics.DistanceMetric
for CSR × CSR, Dense × CSR, and CSR × Dense now uses ~50% less memory, and outputs distances in the same dtype as the provided data.27006
byMeekail Zain <micky774>
- Improve the rendering of the plot obtained with the
metrics.PrecisionRecallDisplay
andmetrics.RocCurveDisplay
classes. the x- and y-axis limits are set to [0, 1] and the aspect ratio between both axis is set to be 1 to get a square plot.26366
byMojdeh Rastgoo <mrastgoo>
. - Added neg_root_mean_squared_log_error_scorer as scorer
26734
byAlejandro Martin Gil <101AlexMartin>
. sklearn.metrics.accuracy_score
andsklearn.metrics.zero_one_loss
now support Array API compatible inputs.27137
byEdoardo Abati <EdAbati>
.- Deprecated needs_threshold and needs_proba from
metrics.make_scorer
. These parameters will be removed in version 1.6. Instead, use response_method that accepts "predict", "predict_proba" or "decision_function" or a list of such values. needs_proba=True is equivalent to response_method="predict_proba" and needs_threshold=True is equivalent to response_method=("decision_function", "predict_proba").26840
byGuillaume Lemaitre <glemaitre>
. - The squared parameter of
metrics.mean_squared_error
andmetrics.mean_squared_log_error
is deprecated and will be removed in 1.6. Use the new functionsmetrics.root_mean_squared_error
andmetrics.root_mean_squared_log_error
instead.26734
byAlejandro Martin Gil <101AlexMartin>
. metrics.make_scorer
now raises an error when using a regressor on a scorer requesting a non-thresholded decision function (from decision_function or predict_proba). Such scorer are specific to classification.26840
byGuillaume Lemaitre <glemaitre>
.
sklearn.model_selection.train_test_split
now supports Array API compatible inputs.26855
by Tim Head.model_selection.GridSearchCV
,model_selection.RandomizedSearchCV
, andmodel_selection.HalvingGridSearchCV
now don't change the given object in the parameter grid if it's an estimator.26786
by Adrin Jalali.
sklearn.neighbors.KNeighborsRegressor.predict
andsklearn.neighbors.KNeighborsClassifier.predict_proba
now efficiently support pairs of dense and sparse datasets.27018
byJulien Jerphanion <jjerphan>
.neighbors.KNeighborsRegressor
now acceptsmetrics.DistanceMetric
objects directly via the metric keyword argument allowing for the use of accelerated third-partymetrics.DistanceMetric
objects.26267
byMeekail Zain <micky774>
.- The performance of
neighbors.RadiusNeighborsClassifier.predict
and ofneighbors.RadiusNeighborsClassifier.predict_proba
has been improved when radius is large and algorithm="brute" with non-Euclidean metrics.26828
byOmar Salman <OmarManzoor>
.
preprocessing.MinMaxScaler
andpreprocessing.MaxAbsScaler
now support the Array API. Array API support is considered experimental and might evolve without being subject to our usual rolling deprecation cycle policy. Seearray_api
for more details.26243
by Tim Head and27110
byEdoardo Abati <EdAbati>
.preprocessing.OrdinalEncoder
avoids calculating missing indices twice to improve efficiency.27017
byXuefeng Xu <xuefeng-xu>
.- Improves warnings in
preprocessing.FunctionTransformer
when func returns a pandas dataframe and the output is configured to be pandas.26944
by Thomas Fan. preprocessing.TargetEncoder
now supports target_type 'multiclass'.26674
byLucy Liu <lucyleeow>
.
tree.DecisionTreeClassifier
,tree.DecisionTreeRegressor
,tree.ExtraTreeClassifier
andtree.ExtraTreeRegressor
now support monotonic constraints, useful when features are supposed to have a positive/negative effect on the target. Missing values in the train data and multi-output targets are not supported.13649
bySamuel Ronsin <samronsin>
, initiated byPatrick O'Reilly <pat-oreilly>
.
sklearn.utils.estimator_html_repr
dynamically adapts diagram colors based on the browser's prefers-color-scheme, providing improved adaptability to dark mode environments.26862
byAndrew Goh Yisheng <9y5>
, Thomas Fan, Adrin Jalali.~utils.metadata_routing.MetadataRequest
and~utils.metadata_routing.MetadataRouter
now have aconsumes
method which can be used to check whether a given set of parameters would be consumed.26831
by Adrin Jalali.sklearn.utils.check_array
should accept both matrix and array from the sparse SciPy module. The previous implementation would fail if copy=True by calling specific NumPy np.may_share_memory that does not work with SciPy sparse array and does not return the correct result for SciPy sparse matrix.27336
byGuillaume Lemaitre <glemaitre>
.sklearn.extmath.log_logistic
is deprecated and will be removed in 1.6. Use -np.logaddexp(0, -x) instead.27544
byChristian Lorentzen <lorentzenchr>
.
Thanks to everyone who has contributed to the maintenance and improvement of the project since version 1.3, including:
TODO: update at the time of the release.