[Bug]: constant valued line plot has strange/unexpected y-axis #28229

fkiraly · 2024-05-15T13:09:33Z

Bug summary

When plotting a line plot with near-constant values, the y axis has a very small range when compared with the magnitude of the actual values.

Very similar issue in #5657 which was closed as resolved - but it might not be resolved, given the below?

Code for reproduction

from matplotlib.pyplot import plot
import numpy as np

x = np.linspace(-0.49, 0.49, 10)
y = [
    0.9999999999999997,
    1.0,
    1.0000000000000002,
    1.0000000000000002,
    1.0,
    0.9999999999999999,
    1.0,
    0.9999999999999996,
    0.999999999999999,
    1.0000000000000036,
]

plot(x, y)

Actual outcome

Expected outcome

similar to the following, y range from 0.94 to 1.06:

from matplotlib.pyplot import plot
import numpy as np

x = np.linspace(-0.49, 0.49, 10)
y = np.repeat(1, 10)

plot(x, y)

Additional information

No response

Operating system

No response

Matplotlib Version

3.8.0

Matplotlib Backend

No response

Python version

3.11

Jupyter version

No response

Installation

pip

tacaswell · 2024-05-15T17:43:12Z

See https://matplotlib.org/stable/gallery/ticks/scalarformatter.html#sphx-glr-gallery-ticks-scalarformatter-py and https://matplotlib.org/stable/api/ticker_api.html#matplotlib.ticker.ScalarFormatter.set_useOffset , https://matplotlib.org/stable/api/ticker_api.html#matplotlib.ticker.ScalarFormatter.set_powerlimits

This is expected behavior to try and make the numbers on the axis readable. In this case you can turn off both behaviors via

import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots(layout='constrained')

x = np.linspace(-0.49, 0.49, 10)
y = [
    0.9999999999999997,
    1.0,
    1.0000000000000002,
    1.0000000000000002,
    1.0,
    0.9999999999999999,
    1.0,
    0.9999999999999996,
    0.999999999999999,
    1.0000000000000036,
]
ax.yaxis.get_major_formatter().set_scientific(False)
ax.yaxis.get_major_formatter().set_useOffset(False)
ax.plot(x, y)
plt.show()

but it is unreadable in its own way (and if you do not use constrained or tight layout the y tick labels will be clipped):

I think this should be closed with no action because this is a particularly pathological case. Both behaviors make sense in one context or the other but there is no clear winner and the current behavior has been incumbent for most of the lifetime of the library (I suspect https://matplotlib.org/stable/users/prev_whats_new/whats_new_2.0.0.html#improved-offset-text-choice is the change that changed the behavior of #5657) so it is unlikely that we are going to change it.

@fkiraly If you disagree please comment and we can re-open.

fkiraly · 2024-05-15T20:09:45Z

I see, thanks for the explanation, @tacaswell.

With your explanation I disagree for now that there is no bug or issue, and that is because of the inconsistency in behaviour, between the two code examples in "actual" and "expected". Why:

the "expected" plot is what you get if all the numbers are exactly 1.0 (float or int). This has range 0.94 to 1.06.
the "actual" differs just in minuscule values from 1.0, and leads to a significantly different behaviour.

Given your explanation, I would expect the example in "expected" also to behave like "actual" - i.e., the bug lies in the inconsistency, and the plots in expected and actual should be switched.

In particular, given your explanation, it seems the code snippet in "expected" is the buggy one, then.

tacaswell · 2024-05-15T21:15:56Z

The behavior is dependent on the range of the data:

if the variation is "big" things just work (case not shown)
if the variation is "small", then we get the "actual" (where the formatters do weird things) (and in this case is probably small enough that it is triggering the non-singular code paths which may be actual bug here)
if the variation is "zero", then any hope of trying to guess good limits correctly is out the window and we fall back to a 10% range (+/-5%) around the only value we have (as it gives ticks that look "reasonable" around your value)

Explicitly setting the ylim to ax.set_ylim(1-1e-14, 1+1e-14) looks better on first glance, but there are some issues like the spines do not actually meet at the corners. I suspect our range expansion logic in the auto-limit code is trying to stay in a regime where we can actually do math across the full range of the axis limits.

The "actual" data is all down near the limits of float precision (it looks like all but the last two values are within +/- 2 represent-able float values of 1). Maybe in some cases we would want to treat a range of < 100 eps as "all the same", but that would not really solve this problem just move it to a different threshold. It also does not make sense to auto-limit singular values to +/- the smallest range we can represent. Hence I think we are stuck in a bind where there is this abrupt change, but it is the least bad option (as "all being the same" and "all not being the same" is a discontinuous change).

Plotting ax.plot(x, 1-y) works much better (I suspect we are getting into sub-normals so there is more space to do math?).

I'm still skeptical that we can do anything differently here, am I missing something @fkiraly ?

jklymak · 2024-05-15T22:59:37Z

We cannot start making assumptions about user data, so I am not clear what heuristic would give us the "expected" version but not wipe out a plot for someone who was looking at a very small signal centered on 1.0. I think we are erring on the side of giving a usable plot, and if the user thinks it is wrong they can either normalize their data or set the limits manually.

fkiraly · 2024-05-16T13:20:27Z

In my opinion, the "discontinuous case distinction" is the root of evil here and should be considered a bug.

That is, it is possible that there are only minor changes to the data, and the display "jumps" between display conventions in the "small" and "zero" cases. It is impossible to control for the user, because it is not properly documented when the one or the other default case applies, and it is impossible to "trust" the visual interface in some corner cases as it may jump back/forth between the case conventions.

The "direct" solution is making the case treatment continuous, or constant even - as a subcase. What I mean with this: small changes in the data should never lead to big changes in the display setup, only gradual/small changes.

Examples of continuous case treatment:

"all case zero" - in "zero" and "small" case, always display like "zero" case currently
"all case small" - in "zero" and "small" case, always display like "small" case currently
"interpolate" - there are multiple options here. For instance, you could do as follows. I assume you estimate a variation range by max/min or robustified versions thereof, e.g., outlier cutting. So, you could do: if variation range < 5%, use the "zero" convention. If variation range >= 5%, use the "large" convention.
- in this solution, if you mentally imagine any curve with growing amplitude, the display y range will start constant at plusminus 5%, and expand continuously and linearly, once the amplitude reaches the threshold value.

fkiraly · 2024-05-16T13:25:28Z

If I may also ask a small question, more "helpdesk" style - this may be a workaround in my use case, but would not resolve the more general issue.

What is the most idiomatic way to plot a non-negative graph in a way that ensures that 0 is always the bottom of the y-axis, while the top is left free to scale with standard axis range heuristics?

timhoffm · 2024-05-16T15:26:28Z

What is the most idiomatic way to plot a non-negative graph in a way that ensures that 0 is always the bottom of the y-axis, while the top is left free to scale with standard axis range heuristics?

ax.set_ylim(0, None)

fkiraly · 2024-05-16T15:32:07Z

Well, that's very convenient. Did not know you could pass None.

jklymak · 2024-05-16T15:46:01Z

ax.set_ylim(bottom=0) is also acceptable.

That is, it is possible that there are only minor changes to the data, and the display "jumps" between display conventions in the "small" and "zero" cases.

I don't think people run into the two cases in practice. Either your data is enforced to be constant, or your data has numerical noise and/or a small signal.

Feel free to suggest an algorithm, but I strongly suspect it is not possible to both make something that is always pleasing and also doesn't hide some types of signals. Matplotlib strives to err on the side of showing all the data and the signal versus aesthetics, and we assume users can adjust the x and y limits to tell their own story. For instance I just came back from an intensive field course where I made many plots. I think all of them had manual adjustment to the limits.

fkiraly · 2024-05-16T16:36:29Z

I don't think people run into the two cases in practice. Either your data is enforced to be constant, or your data has numerical noise and/or a small signal.

Well, with a package as widely used as matlotlib this claim is imho a bit dodgy. More positively phrased, with a wide enough user base, every public feature will be relied on, and that also extends to combination cases of features such as the above.

Specifically, I have a use case where you do not have "either or", but I am producing panel plots which may contain cases of either type, or sequences of plots where you transition from the "constant" over the "near-constant with minimal noise" to the "varies a lot" case, gradually.

This is a common occurrence, as I am dealing with probability distributions where constant means uniform, and that is a common case, just as non-constant or almost constant is.

Feel free to suggest an algorithm, but I strongly suspect it is not possible to both make something that is always pleasing and also doesn't hide some types of signals

My suggestion is as above, and it is based on the opinion that the plot I see for the near-constant signal is hard to read and borders on uninformative. There is no signal that is "revealed", it's just funny axis behaviour.

My preferred behaviour would be to always have

ax.yaxis.get_major_formatter().set_scientific(False)
ax.yaxis.get_major_formatter().set_useOffset(False)

as @tacaswell suggested. I can live with this for my use case - but why should this not be the case always?

timhoffm · 2024-05-17T10:02:37Z

The "direct" solution is making the case treatment continuous,

"interpolate" - there are multiple options here. For instance, you could do as follows. I assume you estimate a variation range by max/min or robustified versions thereof, e.g., outlier cutting. So, you could do: if variation range < 5%, use the "zero" convention. If variation range >= 5%, use the "large" convention.

in this solution, if you mentally imagine any curve with growing amplitude, the display y range will start constant at plusminus 5%, and expand continuously and linearly, once the amplitude reaches the threshold value.

While you could do this, it's not feasible in practice: This would mean that any data with <%5 variation would be plotted with a 5% margin, which means as soon as your data has ~<0.05% variation, it appears as as constant line on the chosen scale.

For any reasonably sized data (*), we want to show them scaled to the maximal available space. Given that, there is no way to make the transition continuous to a 5% scale in the all-zero case. Changing the 5% scale is not an option either for backward-compatibility (and it would also be not too helpful to show constant data on an eps-scale).

(*) The limit is somewhere around the scale of ~1e-12. For smaller values, you get discretization artefacts, which we've chosen to not draw by default: You divide the scale in ~1000px visually so the data distance of a pixel at 1e-12 scale is ~1e-15, which is slightly above double precision ulp. - But irrespective of whether we limit scaling here or not, you don't get continuity from any small scale to the 5% scale.

fkiraly · 2024-05-17T17:57:05Z

While you could do this, it's not feasible in practice: This would mean that any data with <%5 variation would be plotted with a 5% margin, which means as soon as your data has ~<0.05% variation, it appears as as constant line on the chosen scale.

I agree, that's why I think this is "weaker" than my preferred solution, as stated above:

My preferred behaviour would be to always have

ax.yaxis.get_major_formatter().set_scientific(False)
ax.yaxis.get_major_formatter().set_useOffset(False)

Assuming I understand the effect correctly, see plot in post of @tacaswell here:
#28229 (comment)

This will lead to a margin that is proportional to the variation for small variation, without centering the y axis at zero by subtracting the average.

tacaswell · 2024-05-17T20:06:34Z

ax.yaxis.get_major_formatter().set_scientific(False)
ax.yaxis.get_major_formatter().set_useOffset(False)

only affects the formatting not the tick labels, not what the limits are.

tacaswell · 2024-05-28T21:20:08Z

I am going to close this again as the consensus is that we can not do anything to avoid this discrete change (either the data is singular or it is not (or we treat it as singular or not)), and the default behavior of ensuring relatively short labels is both defend-able and long standing behavior.

This is also only an issue with autoscaling where we try to infer behavior from a very limited set of information, if the user explicitly gives us any bounds we always respect those. To some level, autoscaling is always best effort and it will fail in some cases where there is reasonable user ambiguity about what "right" is but have to do something and can not avoid guessing.

[edited because I hit post too soon]

tacaswell closed this as not planned Won't fix, can't repro, duplicate, stale May 15, 2024

tacaswell reopened this May 15, 2024

timhoffm closed this as not planned Won't fix, can't repro, duplicate, stale May 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: constant valued line plot has strange/unexpected y-axis #28229

[Bug]: constant valued line plot has strange/unexpected y-axis #28229

fkiraly commented May 15, 2024 •

edited

tacaswell commented May 15, 2024

fkiraly commented May 15, 2024 •

edited

tacaswell commented May 15, 2024

jklymak commented May 15, 2024

fkiraly commented May 16, 2024 •

edited

fkiraly commented May 16, 2024

timhoffm commented May 16, 2024

fkiraly commented May 16, 2024

jklymak commented May 16, 2024

fkiraly commented May 16, 2024

timhoffm commented May 17, 2024 •

edited

fkiraly commented May 17, 2024

tacaswell commented May 17, 2024

tacaswell commented May 28, 2024 •

edited

[Bug]: constant valued line plot has strange/unexpected y-axis #28229

[Bug]: constant valued line plot has strange/unexpected y-axis #28229

Comments

fkiraly commented May 15, 2024 • edited

Bug summary

Code for reproduction

Actual outcome

Expected outcome

Additional information

Operating system

Matplotlib Version

Matplotlib Backend

Python version

Jupyter version

Installation

tacaswell commented May 15, 2024

fkiraly commented May 15, 2024 • edited

tacaswell commented May 15, 2024

jklymak commented May 15, 2024

fkiraly commented May 16, 2024 • edited

fkiraly commented May 16, 2024

timhoffm commented May 16, 2024

fkiraly commented May 16, 2024

jklymak commented May 16, 2024

fkiraly commented May 16, 2024

timhoffm commented May 17, 2024 • edited

fkiraly commented May 17, 2024

tacaswell commented May 17, 2024

tacaswell commented May 28, 2024 • edited

fkiraly commented May 15, 2024 •

edited

fkiraly commented May 15, 2024 •

edited

fkiraly commented May 16, 2024 •

edited

timhoffm commented May 17, 2024 •

edited

tacaswell commented May 28, 2024 •

edited