DOC Improve visibility of warning message on example "Pitfalls in the…

… interpretation of coefficients of linear models" (#25441)
scikit-learn · Jan 24, 2023 · e9f9da9 · e9f9da9
1 parent 2b3f385
commit e9f9da9
Show file tree

Hide file tree

Showing 2 changed files with 26 additions and 27 deletions.
diff --git a/examples/inspection/plot_causal_interpretation.py b/examples/inspection/plot_causal_interpretation.py
@@ -124,8 +124,7 @@
 ax = coef.plot.barh()
 ax.set_xlabel("Coefficient values")
 ax.set_title("Coefficients of the linear regression including the ability features")
-plt.tight_layout()
-plt.show()
+_ = plt.tight_layout()
 
 # %%
 # Income prediction with partial observations
@@ -158,6 +157,8 @@
 ax = coef.plot.barh()
 ax.set_xlabel("Coefficient values")
 _ = ax.set_title("Coefficients of the linear regression excluding the ability feature")
+plt.tight_layout()
+plt.show()
 
 # %%
 # To compensate for the omitted variable, the model inflates the coefficient of

diff --git a/examples/inspection/plot_linear_model_coefficient_interpretation.py b/examples/inspection/plot_linear_model_coefficient_interpretation.py
@@ -3,25 +3,35 @@
 Common pitfalls in the interpretation of coefficients of linear models
 ======================================================================
 
-In linear models, the target value is modeled as
-a linear combination of the features (see the :ref:`linear_model` User Guide
-section for a description of a set of linear models available in
-scikit-learn).
-Coefficients in multiple linear models represent the relationship between the
-given feature, :math:`X_i` and the target, :math:`y`, assuming that all the
-other features remain constant (`conditional dependence
-<https://en.wikipedia.org/wiki/Conditional_dependence>`_).
-This is different from plotting :math:`X_i` versus :math:`y` and fitting a
-linear relationship: in that case all possible values of the other features are
-taken into account in the estimation (marginal dependence).
+In linear models, the target value is modeled as a linear combination of the
+features (see the :ref:`linear_model` User Guide section for a description of a
+set of linear models available in scikit-learn). Coefficients in multiple linear
+models represent the relationship between the given feature, :math:`X_i` and the
+target, :math:`y`, assuming that all the other features remain constant
+(`conditional dependence
+<https://en.wikipedia.org/wiki/Conditional_dependence>`_). This is different
+from plotting :math:`X_i` versus :math:`y` and fitting a linear relationship: in
+that case all possible values of the other features are taken into account in
+the estimation (marginal dependence).
 
 This example will provide some hints in interpreting coefficient in linear
 models, pointing at problems that arise when either the linear model is not
 appropriate to describe the dataset, or when features are correlated.
 
+.. note::
+
+    Keep in mind that the features :math:`X` and the outcome :math:`y` are in
+    general the result of a data generating process that is unknown to us.
+    Machine learning models are trained to approximate the unobserved
+    mathematical function that links :math:`X` to :math:`y` from sample data. As
+    a result, any interpretation made about a model may not necessarily
+    generalize to the true data generating process. This is especially true when
+    the model is of bad quality or when the sample data is not representative of
+    the population.
+
 We will use data from the `"Current Population Survey"
-<https://www.openml.org/d/534>`_ from 1985 to predict
-wage as a function of various features such as experience, age, or education.
+<https://www.openml.org/d/534>`_ from 1985 to predict wage as a function of
+various features such as experience, age, or education.
 
 .. contents::
    :local:
@@ -729,18 +739,6 @@
 # See the :ref:`sphx_glr_auto_examples_inspection_plot_causal_interpretation.py`
 # for a simulated case of ability OVB.
 #
-# Warning: data and model quality
-# -------------------------------
-#
-# Keep in mind that the outcome `y` and features `X` are the product
-# of a data generating process that is hidden from us. Machine
-# learning models are trained to approximate the unobserved
-# mathematical function that links `X` to `y` from sample data. As a
-# result, any interpretation made about a model may not necessarily
-# generalize to the true data generating process. This is especially
-# true when the model is of bad quality or when the sample data is
-# not representative of the population.
-#
 # Lessons learned
 # ---------------
 #