Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: constant valued line plot has strange/unexpected y-axis #28229

Closed
fkiraly opened this issue May 15, 2024 · 14 comments
Closed

[Bug]: constant valued line plot has strange/unexpected y-axis #28229

fkiraly opened this issue May 15, 2024 · 14 comments

Comments

@fkiraly
Copy link

fkiraly commented May 15, 2024

Bug summary

When plotting a line plot with near-constant values, the y axis has a very small range when compared with the magnitude of the actual values.

Very similar issue in #5657 which was closed as resolved - but it might not be resolved, given the below?

Code for reproduction

from matplotlib.pyplot import plot
import numpy as np

x = np.linspace(-0.49, 0.49, 10)
y = [
    0.9999999999999997,
    1.0,
    1.0000000000000002,
    1.0000000000000002,
    1.0,
    0.9999999999999999,
    1.0,
    0.9999999999999996,
    0.999999999999999,
    1.0000000000000036,
]

plot(x, y)

Actual outcome

image

Expected outcome

similar to the following, y range from 0.94 to 1.06:

from matplotlib.pyplot import plot
import numpy as np

x = np.linspace(-0.49, 0.49, 10)
y = np.repeat(1, 10)

plot(x, y)

image

Additional information

No response

Operating system

No response

Matplotlib Version

3.8.0

Matplotlib Backend

No response

Python version

3.11

Jupyter version

No response

Installation

pip

@tacaswell
Copy link
Member

See https://matplotlib.org/stable/gallery/ticks/scalarformatter.html#sphx-glr-gallery-ticks-scalarformatter-py and https://matplotlib.org/stable/api/ticker_api.html#matplotlib.ticker.ScalarFormatter.set_useOffset , https://matplotlib.org/stable/api/ticker_api.html#matplotlib.ticker.ScalarFormatter.set_powerlimits

This is expected behavior to try and make the numbers on the axis readable. In this case you can turn off both behaviors via

import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots(layout='constrained')

x = np.linspace(-0.49, 0.49, 10)
y = [
    0.9999999999999997,
    1.0,
    1.0000000000000002,
    1.0000000000000002,
    1.0,
    0.9999999999999999,
    1.0,
    0.9999999999999996,
    0.999999999999999,
    1.0000000000000036,
]
ax.yaxis.get_major_formatter().set_scientific(False)
ax.yaxis.get_major_formatter().set_useOffset(False)
ax.plot(x, y)
plt.show()

but it is unreadable in its own way (and if you do not use constrained or tight layout the y tick labels will be clipped):

so


I think this should be closed with no action because this is a particularly pathological case. Both behaviors make sense in one context or the other but there is no clear winner and the current behavior has been incumbent for most of the lifetime of the library (I suspect https://matplotlib.org/stable/users/prev_whats_new/whats_new_2.0.0.html#improved-offset-text-choice is the change that changed the behavior of #5657) so it is unlikely that we are going to change it.

@fkiraly If you disagree please comment and we can re-open.

@tacaswell tacaswell closed this as not planned Won't fix, can't repro, duplicate, stale May 15, 2024
@fkiraly
Copy link
Author

fkiraly commented May 15, 2024

I see, thanks for the explanation, @tacaswell.

With your explanation I disagree for now that there is no bug or issue, and that is because of the inconsistency in behaviour, between the two code examples in "actual" and "expected". Why:

  • the "expected" plot is what you get if all the numbers are exactly 1.0 (float or int). This has range 0.94 to 1.06.
  • the "actual" differs just in minuscule values from 1.0, and leads to a significantly different behaviour.

Given your explanation, I would expect the example in "expected" also to behave like "actual" - i.e., the bug lies in the inconsistency, and the plots in expected and actual should be switched.

In particular, given your explanation, it seems the code snippet in "expected" is the buggy one, then.

@tacaswell
Copy link
Member

The behavior is dependent on the range of the data:

  • if the variation is "big" things just work (case not shown)
  • if the variation is "small", then we get the "actual" (where the formatters do weird things) (and in this case is probably small enough that it is triggering the non-singular code paths which may be actual bug here)
  • if the variation is "zero", then any hope of trying to guess good limits correctly is out the window and we fall back to a 10% range (+/-5%) around the only value we have (as it gives ticks that look "reasonable" around your value)

Explicitly setting the ylim to ax.set_ylim(1-1e-14, 1+1e-14) looks better on first glance, but there are some issues like the spines do not actually meet at the corners. I suspect our range expansion logic in the auto-limit code is trying to stay in a regime where we can actually do math across the full range of the axis limits.

The "actual" data is all down near the limits of float precision (it looks like all but the last two values are within +/- 2 represent-able float values of 1). Maybe in some cases we would want to treat a range of < 100 eps as "all the same", but that would not really solve this problem just move it to a different threshold. It also does not make sense to auto-limit singular values to +/- the smallest range we can represent. Hence I think we are stuck in a bind where there is this abrupt change, but it is the least bad option (as "all being the same" and "all not being the same" is a discontinuous change).

Plotting ax.plot(x, 1-y) works much better (I suspect we are getting into sub-normals so there is more space to do math?).


I'm still skeptical that we can do anything differently here, am I missing something @fkiraly ?

@tacaswell tacaswell reopened this May 15, 2024
@jklymak
Copy link
Member

jklymak commented May 15, 2024

We cannot start making assumptions about user data, so I am not clear what heuristic would give us the "expected" version but not wipe out a plot for someone who was looking at a very small signal centered on 1.0. I think we are erring on the side of giving a usable plot, and if the user thinks it is wrong they can either normalize their data or set the limits manually.

@fkiraly
Copy link
Author

fkiraly commented May 16, 2024

In my opinion, the "discontinuous case distinction" is the root of evil here and should be considered a bug.

That is, it is possible that there are only minor changes to the data, and the display "jumps" between display conventions in the "small" and "zero" cases. It is impossible to control for the user, because it is not properly documented when the one or the other default case applies, and it is impossible to "trust" the visual interface in some corner cases as it may jump back/forth between the case conventions.

The "direct" solution is making the case treatment continuous, or constant even - as a subcase. What I mean with this: small changes in the data should never lead to big changes in the display setup, only gradual/small changes.

Examples of continuous case treatment:

  • "all case zero" - in "zero" and "small" case, always display like "zero" case currently
  • "all case small" - in "zero" and "small" case, always display like "small" case currently
  • "interpolate" - there are multiple options here. For instance, you could do as follows. I assume you estimate a variation range by max/min or robustified versions thereof, e.g., outlier cutting. So, you could do: if variation range < 5%, use the "zero" convention. If variation range >= 5%, use the "large" convention.
    • in this solution, if you mentally imagine any curve with growing amplitude, the display y range will start constant at plusminus 5%, and expand continuously and linearly, once the amplitude reaches the threshold value.

@fkiraly
Copy link
Author

fkiraly commented May 16, 2024

If I may also ask a small question, more "helpdesk" style - this may be a workaround in my use case, but would not resolve the more general issue.

What is the most idiomatic way to plot a non-negative graph in a way that ensures that 0 is always the bottom of the y-axis, while the top is left free to scale with standard axis range heuristics?

@timhoffm
Copy link
Member

What is the most idiomatic way to plot a non-negative graph in a way that ensures that 0 is always the bottom of the y-axis, while the top is left free to scale with standard axis range heuristics?

ax.set_ylim(0, None)

@fkiraly
Copy link
Author

fkiraly commented May 16, 2024

Well, that's very convenient. Did not know you could pass None.

@jklymak
Copy link
Member

jklymak commented May 16, 2024

ax.set_ylim(bottom=0) is also acceptable.

That is, it is possible that there are only minor changes to the data, and the display "jumps" between display conventions in the "small" and "zero" cases.

I don't think people run into the two cases in practice. Either your data is enforced to be constant, or your data has numerical noise and/or a small signal.

Feel free to suggest an algorithm, but I strongly suspect it is not possible to both make something that is always pleasing and also doesn't hide some types of signals. Matplotlib strives to err on the side of showing all the data and the signal versus aesthetics, and we assume users can adjust the x and y limits to tell their own story. For instance I just came back from an intensive field course where I made many plots. I think all of them had manual adjustment to the limits.

@fkiraly
Copy link
Author

fkiraly commented May 16, 2024

I don't think people run into the two cases in practice. Either your data is enforced to be constant, or your data has numerical noise and/or a small signal.

Well, with a package as widely used as matlotlib this claim is imho a bit dodgy. More positively phrased, with a wide enough user base, every public feature will be relied on, and that also extends to combination cases of features such as the above.

Specifically, I have a use case where you do not have "either or", but I am producing panel plots which may contain cases of either type, or sequences of plots where you transition from the "constant" over the "near-constant with minimal noise" to the "varies a lot" case, gradually.

This is a common occurrence, as I am dealing with probability distributions where constant means uniform, and that is a common case, just as non-constant or almost constant is.

Feel free to suggest an algorithm, but I strongly suspect it is not possible to both make something that is always pleasing and also doesn't hide some types of signals

My suggestion is as above, and it is based on the opinion that the plot I see for the near-constant signal is hard to read and borders on uninformative. There is no signal that is "revealed", it's just funny axis behaviour.

My preferred behaviour would be to always have

ax.yaxis.get_major_formatter().set_scientific(False)
ax.yaxis.get_major_formatter().set_useOffset(False)

as @tacaswell suggested. I can live with this for my use case - but why should this not be the case always?

@timhoffm
Copy link
Member

timhoffm commented May 17, 2024

The "direct" solution is making the case treatment continuous,

  • "interpolate" - there are multiple options here. For instance, you could do as follows. I assume you estimate a variation range by max/min or robustified versions thereof, e.g., outlier cutting. So, you could do: if variation range < 5%, use the "zero" convention. If variation range >= 5%, use the "large" convention.

    • in this solution, if you mentally imagine any curve with growing amplitude, the display y range will start constant at plusminus 5%, and expand continuously and linearly, once the amplitude reaches the threshold value.

While you could do this, it's not feasible in practice: This would mean that any data with <%5 variation would be plotted with a 5% margin, which means as soon as your data has ~<0.05% variation, it appears as as constant line on the chosen scale.

For any reasonably sized data (*), we want to show them scaled to the maximal available space. Given that, there is no way to make the transition continuous to a 5% scale in the all-zero case. Changing the 5% scale is not an option either for backward-compatibility (and it would also be not too helpful to show constant data on an eps-scale).

(*) The limit is somewhere around the scale of ~1e-12. For smaller values, you get discretization artefacts, which we've chosen to not draw by default: You divide the scale in ~1000px visually so the data distance of a pixel at 1e-12 scale is ~1e-15, which is slightly above double precision ulp. - But irrespective of whether we limit scaling here or not, you don't get continuity from any small scale to the 5% scale.

@fkiraly
Copy link
Author

fkiraly commented May 17, 2024

While you could do this, it's not feasible in practice: This would mean that any data with <%5 variation would be plotted with a 5% margin, which means as soon as your data has ~<0.05% variation, it appears as as constant line on the chosen scale.

I agree, that's why I think this is "weaker" than my preferred solution, as stated above:

My preferred behaviour would be to always have

ax.yaxis.get_major_formatter().set_scientific(False)
ax.yaxis.get_major_formatter().set_useOffset(False)

Assuming I understand the effect correctly, see plot in post of @tacaswell here:
#28229 (comment)

This will lead to a margin that is proportional to the variation for small variation, without centering the y axis at zero by subtracting the average.

@tacaswell
Copy link
Member

ax.yaxis.get_major_formatter().set_scientific(False)
ax.yaxis.get_major_formatter().set_useOffset(False)

only affects the formatting not the tick labels, not what the limits are.

@tacaswell
Copy link
Member

tacaswell commented May 28, 2024

I am going to close this again as the consensus is that we can not do anything to avoid this discrete change (either the data is singular or it is not (or we treat it as singular or not)), and the default behavior of ensuring relatively short labels is both defend-able and long standing behavior.

This is also only an issue with autoscaling where we try to infer behavior from a very limited set of information, if the user explicitly gives us any bounds we always respect those. To some level, autoscaling is always best effort and it will fail in some cases where there is reasonable user ambiguity about what "right" is but have to do something and can not avoid guessing.

[edited because I hit post too soon]

@timhoffm timhoffm closed this as not planned Won't fix, can't repro, duplicate, stale May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants