Skip to content

Commit

Permalink
DOC Improve visibility of warning message on example "Pitfalls in the…
Browse files Browse the repository at this point in the history
… interpretation of coefficients of linear models" (#25441)
  • Loading branch information
ArturoAmorQ authored and adrinjalali committed Jan 24, 2023
1 parent 2b3f385 commit e9f9da9
Show file tree
Hide file tree
Showing 2 changed files with 26 additions and 27 deletions.
5 changes: 3 additions & 2 deletions examples/inspection/plot_causal_interpretation.py
Original file line number Diff line number Diff line change
Expand Up @@ -124,8 +124,7 @@
ax = coef.plot.barh()
ax.set_xlabel("Coefficient values")
ax.set_title("Coefficients of the linear regression including the ability features")
plt.tight_layout()
plt.show()
_ = plt.tight_layout()

# %%
# Income prediction with partial observations
Expand Down Expand Up @@ -158,6 +157,8 @@
ax = coef.plot.barh()
ax.set_xlabel("Coefficient values")
_ = ax.set_title("Coefficients of the linear regression excluding the ability feature")
plt.tight_layout()
plt.show()

# %%
# To compensate for the omitted variable, the model inflates the coefficient of
Expand Down
48 changes: 23 additions & 25 deletions examples/inspection/plot_linear_model_coefficient_interpretation.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,25 +3,35 @@
Common pitfalls in the interpretation of coefficients of linear models
======================================================================
In linear models, the target value is modeled as
a linear combination of the features (see the :ref:`linear_model` User Guide
section for a description of a set of linear models available in
scikit-learn).
Coefficients in multiple linear models represent the relationship between the
given feature, :math:`X_i` and the target, :math:`y`, assuming that all the
other features remain constant (`conditional dependence
<https://en.wikipedia.org/wiki/Conditional_dependence>`_).
This is different from plotting :math:`X_i` versus :math:`y` and fitting a
linear relationship: in that case all possible values of the other features are
taken into account in the estimation (marginal dependence).
In linear models, the target value is modeled as a linear combination of the
features (see the :ref:`linear_model` User Guide section for a description of a
set of linear models available in scikit-learn). Coefficients in multiple linear
models represent the relationship between the given feature, :math:`X_i` and the
target, :math:`y`, assuming that all the other features remain constant
(`conditional dependence
<https://en.wikipedia.org/wiki/Conditional_dependence>`_). This is different
from plotting :math:`X_i` versus :math:`y` and fitting a linear relationship: in
that case all possible values of the other features are taken into account in
the estimation (marginal dependence).
This example will provide some hints in interpreting coefficient in linear
models, pointing at problems that arise when either the linear model is not
appropriate to describe the dataset, or when features are correlated.
.. note::
Keep in mind that the features :math:`X` and the outcome :math:`y` are in
general the result of a data generating process that is unknown to us.
Machine learning models are trained to approximate the unobserved
mathematical function that links :math:`X` to :math:`y` from sample data. As
a result, any interpretation made about a model may not necessarily
generalize to the true data generating process. This is especially true when
the model is of bad quality or when the sample data is not representative of
the population.
We will use data from the `"Current Population Survey"
<https://www.openml.org/d/534>`_ from 1985 to predict
wage as a function of various features such as experience, age, or education.
<https://www.openml.org/d/534>`_ from 1985 to predict wage as a function of
various features such as experience, age, or education.
.. contents::
:local:
Expand Down Expand Up @@ -729,18 +739,6 @@
# See the :ref:`sphx_glr_auto_examples_inspection_plot_causal_interpretation.py`
# for a simulated case of ability OVB.
#
# Warning: data and model quality
# -------------------------------
#
# Keep in mind that the outcome `y` and features `X` are the product
# of a data generating process that is hidden from us. Machine
# learning models are trained to approximate the unobserved
# mathematical function that links `X` to `y` from sample data. As a
# result, any interpretation made about a model may not necessarily
# generalize to the true data generating process. This is especially
# true when the model is of bad quality or when the sample data is
# not representative of the population.
#
# Lessons learned
# ---------------
#
Expand Down

0 comments on commit e9f9da9

Please sign in to comment.