Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC Fix dropdown-related warnings #27418

Merged
merged 3 commits into from
Sep 20, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
120 changes: 65 additions & 55 deletions doc/modules/compose.rst
Original file line number Diff line number Diff line change
Expand Up @@ -54,9 +54,8 @@ The last estimator may be any type (transformer, classifier, etc.).
Usage
-----

|details-start|
**Construction**
|details-split|
Build a pipeline
................

The :class:`Pipeline` is built using a list of ``(key, value)`` pairs, where
the ``key`` is a string containing the name you want to give this step and ``value``
Expand All @@ -70,6 +69,10 @@ is an estimator object::
>>> pipe
Pipeline(steps=[('reduce_dim', PCA()), ('clf', SVC())])

|details-start|
**Shortand version using :func:`make_pipeline`**
|details-split|

The utility function :func:`make_pipeline` is a shorthand
for constructing pipelines;
it takes a variable number of estimators and returns a pipeline,
Expand All @@ -81,14 +84,26 @@ filling in the names automatically::

|details-end|

Access pipeline steps
.....................

The estimators of a pipeline are stored as a list in the ``steps`` attribute.
A sub-pipeline can be extracted using the slicing notation commonly used
for Python Sequences such as lists or strings (although only a step of 1 is
permitted). This is convenient for performing only some of the transformations
(or their inverse):

>>> pipe[:1]
Pipeline(steps=[('reduce_dim', PCA())])
>>> pipe[-1:]
Pipeline(steps=[('clf', SVC())])

|details-start|
**Accessing steps**
**Accessing a step by name or position**
|details-split|


The estimators of a pipeline are stored as a list in the ``steps`` attribute,
but can be accessed by index or name by indexing (with ``[idx]``) the
Pipeline::
A specific step can also be accessed by index or name by indexing (with ``[idx]``) the
pipeline::

>>> pipe.steps[0]
('reduce_dim', PCA())
Expand All @@ -97,36 +112,61 @@ Pipeline::
>>> pipe['reduce_dim']
PCA()

Pipeline's `named_steps` attribute allows accessing steps by name with tab
`Pipeline`'s `named_steps` attribute allows accessing steps by name with tab
completion in interactive environments::

>>> pipe.named_steps.reduce_dim is pipe['reduce_dim']
True

A sub-pipeline can also be extracted using the slicing notation commonly used
for Python Sequences such as lists or strings (although only a step of 1 is
permitted). This is convenient for performing only some of the transformations
(or their inverse):
|details-end|

>>> pipe[:1]
Pipeline(steps=[('reduce_dim', PCA())])
>>> pipe[-1:]
Pipeline(steps=[('clf', SVC())])
Tracking feature names in a pipeline
....................................

|details-end|
To enable model inspection, :class:`~sklearn.pipeline.Pipeline` has a
``get_feature_names_out()`` method, just like all transformers. You can use
pipeline slicing to get the feature names going into each step::

.. _pipeline_nested_parameters:
>>> from sklearn.datasets import load_iris
>>> from sklearn.feature_selection import SelectKBest
>>> iris = load_iris()
>>> pipe = Pipeline(steps=[
... ('select', SelectKBest(k=2)),
... ('clf', LogisticRegression())])
>>> pipe.fit(iris.data, iris.target)
Pipeline(steps=[('select', SelectKBest(...)), ('clf', LogisticRegression(...))])
>>> pipe[:-1].get_feature_names_out()
array(['x2', 'x3'], ...)

|details-start|
**Nested parameters**
**Customize feature names**
|details-split|

Parameters of the estimators in the pipeline can be accessed using the
``<estimator>__<parameter>`` syntax::
You can also provide custom feature names for the input data using
``get_feature_names_out``::

>>> pipe[:-1].get_feature_names_out(iris.feature_names)
array(['petal length (cm)', 'petal width (cm)'], ...)

|details-end|

.. _pipeline_nested_parameters:

Access to nested parameters
...........................

It is common to adjust the parameters of an estimator within a pipeline. This parameter
is therefore nested because it belongs to a particular sub-step. Parameters of the
estimators in the pipeline are accessible using the ``<estimator>__<parameter>``
syntax::

>>> pipe.set_params(clf__C=10)
Pipeline(steps=[('reduce_dim', PCA()), ('clf', SVC(C=10))])

|details-start|
**When does it matter?**
|details-split|

This is particularly important for doing grid searches::

>>> from sklearn.model_selection import GridSearchCV
Expand All @@ -143,36 +183,11 @@ ignored by setting them to ``'passthrough'``::
... clf__C=[0.1, 10, 100])
>>> grid_search = GridSearchCV(pipe, param_grid=param_grid)

The estimators of the pipeline can be retrieved by index:

>>> pipe[0]
PCA()

or by name::

>>> pipe['reduce_dim']
PCA()

To enable model inspection, :class:`~sklearn.pipeline.Pipeline` has a
``get_feature_names_out()`` method, just like all transformers. You can use
pipeline slicing to get the feature names going into each step::

>>> from sklearn.datasets import load_iris
>>> from sklearn.feature_selection import SelectKBest
>>> iris = load_iris()
>>> pipe = Pipeline(steps=[
... ('select', SelectKBest(k=2)),
... ('clf', LogisticRegression())])
>>> pipe.fit(iris.data, iris.target)
Pipeline(steps=[('select', SelectKBest(...)), ('clf', LogisticRegression(...))])
>>> pipe[:-1].get_feature_names_out()
array(['x2', 'x3'], ...)
.. topic:: See Also:

You can also provide custom feature names for the input data using
``get_feature_names_out``::
* :ref:`composite_grid_search`

>>> pipe[:-1].get_feature_names_out(iris.feature_names)
array(['petal length (cm)', 'petal width (cm)'], ...)
|details-end|

.. topic:: Examples:

Expand All @@ -184,11 +199,6 @@ You can also provide custom feature names for the input data using
* :ref:`sphx_glr_auto_examples_compose_plot_compare_reduction.py`
* :ref:`sphx_glr_auto_examples_miscellaneous_plot_pipeline_display.py`

.. topic:: See Also:

* :ref:`composite_grid_search`

|details-end|

.. _pipeline_cache:

Expand Down
9 changes: 3 additions & 6 deletions doc/modules/feature_extraction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -225,7 +225,7 @@ it is advisable to use a power of two as the ``n_features`` parameter;
otherwise the features will not be mapped evenly to the columns.

.. topic:: References:

* `MurmurHash3 <https://github.com/aappleby/smhasher>`_.

|details-end|
Expand Down Expand Up @@ -398,9 +398,8 @@ last document::

.. _stop_words:

|details-start|
**Using stop words**
|details-split|
Using stop words
----------------

Stop words are words like "and", "the", "him", which are presumed to be
uninformative in representing the content of a text, and which may be
Expand Down Expand Up @@ -431,8 +430,6 @@ identify and warn about some kinds of inconsistencies.
In *Proc. Workshop for NLP Open Source Software*.
|details-end|

.. _tfidf:

Tf–idf term weighting
Expand Down