Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

namedtuple are not rendered in the variable browser as expected #1477

Closed
WSF-SEO-AM opened this issue Dec 1, 2023 · 16 comments
Closed

namedtuple are not rendered in the variable browser as expected #1477

WSF-SEO-AM opened this issue Dec 1, 2023 · 16 comments
Labels
enhancement New feature or request

Comments

@WSF-SEO-AM
Copy link

I've created a collections.namedtuple to store the results of a report which is used to create 'Rows' appended into an array.

While everything seems to be working fine from a logic perspective, I noticed VS Code no longer show the repr from the second item onward. This was used to be the case, but I can't tell when it did stop.

I can replicate this with VS code 1.83 and the Insider v.185. To exclude the possibility on an error code, I ran past PyCharm where code works just fine.

A minimal reproducible code below:

class Test:
    def __init__(self, raw):
      self.dimensions = ['A']
      self.columns = ['X', 'Y', 'Z']
      self.row = collections.namedtuple("Row", self.columns)
      self.append(raw)

    def append(raw):
      for row in raw:
        row = row.copy()
        dimensions = dict(zip(self.dimensions, row.pop("keys", [])))
        res = self.row(**dimensions, **row)
        self.rows.append(res)

myrawdata = ({'A': ['https://www.url.eu/'], 'X': 0, 'Y': 2, 'Z': 0}, {'A': ['https://www.url.com/'], 'X': 1, 'Y': 2, 'Z': 3})
x = Test(myrawdata)
x.rows

[Row(page='https://ww...sition=18), <Row, len() = 5>]

@eleanorjboyd eleanorjboyd transferred this issue from microsoft/vscode-python Dec 4, 2023
@int19h
Copy link
Collaborator

int19h commented Dec 4, 2023

This code doesn't run for me - append is missing self, but then beyond that construction of namedtuple with supplied arguments fails. Could you clarify it?

Having said that, the particular output that you quote: <Row, len() = 5> - is triggered when iterable is deemed "too long to display". You can see the logic that is used to determine that here:

# Determines whether an iterable exceeds the limits set in
# maxlimits, and is therefore unsafe to repr().
def _is_long_iter(self, obj, level=0):
try:
# Strings have their own limits (and do not nest). Because
# they don't have __iter__ in 2.x, this check goes before
# the next one.
if isinstance(obj, self.string_types):
return len(obj) > self.maxstring_inner
# If it's not an iterable (and not a string), it's fine.
if not hasattr(obj, '__iter__'):
return False
# If it's not an instance of these collection types then it
# is fine. Note: this is a fix for
# https://github.com/Microsoft/ptvsd/issues/406
if not isinstance(obj, self.long_iter_types):
return False
# Iterable is its own iterator - this is a one-off iterable
# like generator or enumerate(). We can't really count that,
# but repr() for these should not include any elements anyway,
# so we can treat it the same as non-iterables.
if obj is iter(obj):
return False
# range reprs fine regardless of length.
if isinstance(obj, range):
return False
# numpy and scipy collections (ndarray etc) have
# self-truncating repr, so they're always safe.
try:
module = type(obj).__module__.partition('.')[0]
if module in ('numpy', 'scipy'):
return False
except Exception:
pass
# Iterables that nest too deep are considered long.
if level >= len(self.maxcollection):
return True
# It is too long if the length exceeds the limit, or any
# of its elements are long iterables.
if hasattr(obj, '__len__'):
try:
size = len(obj)
except Exception:
size = None
if size is not None and size > self.maxcollection[level]:
return True
return any((self._is_long_iter(item, level + 1) for item in obj)) # noqa
return any(i > self.maxcollection[level] or self._is_long_iter(item, level + 1) for i, item in enumerate(obj)) # noqa
except Exception:
# If anything breaks, assume the worst case.
return True

So I suspect that the actual URL that you have in your data is running afoul of string limits for collection items - this is maxstring_inner in the same file, and it is set to 30.

This isn't configurable via launch.json, unfortunately, though you can of course just edit your local copy of debugpy. There's also an undocumented launch.json setting debugpyAdapterPath that, if present, is used to launch the debug adapter in lieu of the one bundled with VSCode - this way you can use a custom fork that wouldn't be touched by vscode-python updates.

@int19h
Copy link
Collaborator

int19h commented Dec 4, 2023

And I think these should be made configurable going forward. We're unlikely to do so for the current major version of debugpy, but given that a major rewrite is in the works, I suggest filing a feature request to configure everything that you think would be useful to configure, and then I'll add that under #1448

@int19h int19h added the enhancement New feature or request label Dec 4, 2023
@WSF-SEO-AM
Copy link
Author

Thanks @int19h. I've written that code on the fly, so excuse me if that was not running at first.
But I guess you answered the question by saying there is too many items and those are running past the limit, which is the case more often than not.

Not sure a configurable option in the setting file is the best here. Debugger and Python in general are way too complex already and rich of settings. To me this should be either it shows everything or not.
Ideally, we should have the same behaviour as PyCharm, where I am getting my visual representation?

If this fits the debugpy direction, I will file a feature request. I don't want to waste each other times if it does not.

@int19h
Copy link
Collaborator

int19h commented Dec 5, 2023

Ideally, we should have the same behaviour as PyCharm, where I am getting my visual representation?

To clarify, do you mean that for this tuple it shouldn't be trying to exclude anything at all?

The problem with showing everything in the most general case is that once you get a local variable containing, say, a multi-megabyte JSON string, or a collection of a few million items, it effectively makes the debug session useless because it gets stuck trying to repr() that and then send the result over the wire and render it. And since repr() isn't cancelable, we can't implement cancellation generically either in a way that doesn't risk breaking things even worse (although we could do so for builtin collections since we repr them ourselves - I'll add that to the list of things for consideration!). But we can 1) raise the default limits, and 2) do a better job of deciding what to exclude when we do exclude. And, of course, if the limit is configurable, you can just set it to a very high value to make it effectively unlimited anyway, if you know for sure that you're not going to be dealing with that kind of large data in your debug sessions, or if you're okay with having to force-stop your debug session if it ever gets stuck inside a repr. It should be safe by default, but not a straitjacket.

As far as direction, on this particular subject we're not married to any particular approach. There are technical limitations on what we can do, but beyond that I would like to determine what works best for the users, and make it sufficiently configurable to cover all the disparate use cases. The only catch here aside from dealing with possibility of extremely large values is that whatever repr is used, if it is not trimmed, it should be possible to eval() it to produce the original value, since otherwise you wouldn't be able to edit variable values by editing the text, which I think is desirable for small collections.

@andreamoro
Copy link

I've not extensively tested PyCharm to a million rows resultset or MB of data, but indeed I was put off by the fact that during my testing phase with 30 lines I was getting anything.
I believe there is a middle way in between for which "all" can be shown?

@int19h
Copy link
Collaborator

int19h commented Dec 5, 2023

Absolutely! The current values are definitely overly conservative, which shouldn't be surprising given that they were picked something like 13 years ago, in the context of typical hardware of that time and VS as a client, by an extremely scientific process of "let me make a guess". It's been so long that I can't remember all the details, but IIRC another intent was to produce values that usually fit within the typical length of the Variables pane (again, in VS, and given typical screen size & resolution back then), on the basis that it can be expanded to look at individual items. If actual use shows that these limits are awkward, and if people prefer to get a long value that can be scrolled, they should be revised.

Keep in mind tho that scenarios here involve more than just a tuple. For example, consider something like this:

x = ([1, 2, 3, ..., 100], "foo")

If we use a large limit for that inner list, then in Variable explorer you will not be able to see "foo" at a glance, because it will be all the way to the right, so you'll have to scroll every time to check what it is. IIRC the logic behind this was that non-collection items are generally more interesting than nested collections, so the latter should be collapsed more aggressively to provide a view that provides a good overall idea of what the value is without unduly biasing any particular item. That's why the max length is different on different levels:

maxcollection = (15, 10)

Similarly if there's a tuple where the first item is a string followed by a bunch of stuff, if we show the full length of the string, it effectively hides the other items from view. Thus the limits are also drastically different for strings:

maxstring_outer = 2 ** 16
maxstring_inner = 30

With that in mind, what do you think sensible limits for collection items would look like, such that they'd work well not only for this particular case, but for other cases you can anticipate in real world data?

@int19h
Copy link
Collaborator

int19h commented Dec 5, 2023

@zljubisic I'd also love to hear your opinion on this.

@andreamoro
Copy link

You made a point for designing a variable explorer as a plugin. Though I think I've seen something once, have to check.

I guess that, taking in consideration strong cases were extended iteration will make impossible to see anything, at least I would consider the full display of the very first line (or perhaps every other X, which yes can be at a config).
After all, the role of the variable explorer as is it's to facilitate the inspection prior deep diving with command line queries.

@zljubisic
Copy link

@zljubisic I'd also love to hear your opinion on this.

Firstly, @WSF-SEO-AM reported non functional code. I tried to make it work, but with no success. @WSF-SEO-AM you are trying to pop key "keys" that doesn't exist. Than you a trying to create res from more than X, Y, and Z.
I don't get it.

Apart from this, it is a really bad practice to create rows property inside of append method that you call in the Test constructor.

Can @WSF-SEO-AM update the example to something that works, that we can continue the discussion?

@int19h
Copy link
Collaborator

int19h commented Dec 6, 2023

@zljubisic I think the specific issue here is clear enough. I'd like to tackle the bigger problem here, which is that, broadly speaking, values displayed in Variables pane for collections often aren't helpful for their intended purpose (which is to provide a useful condensation of information at a glance). This is not quite identical to the debug console issues we've been discussing, but it is very closely related (the main difference here, really, is the typical size of the "output window").

@int19h
Copy link
Collaborator

int19h commented Dec 6, 2023

You made a point for designing a variable explorer as a plugin. Though I think I've seen something once, have to check.

pydevd actually has a plugin system for this already, and the vendored version that we ship in debugpy should be able pick up any installed plugins. Take a look here.

However, I'm not aware of any libraries actually using it in practice, even though it has been around for a very long time now (predating debugpy). It's effectively only used to special-case popular libraries like numpy and pandas in pydevd itself.

This is not surprising given that, from the perspective of libraries' authors, it is an API specific to one - even if popular - Python debugger implementation. It basically reverses the problem - now, instead of the debugger having to support each library separately, the library has to support each debugger separately. The way I see it, ideally, debugpy shouldn't have to special-case numpy, but numpy also shouldn't have to special-case debugpy. So I would say that this begs not just for plugins, but for an API that is a well-defined standard way of doing them that strives to be debugger-agnostic while allowing for debugger-specific extensions, similar to DAP itself. Ideally, in form of a PEP, so that builtins and stdlib can also use it.

But that is a long-term goal that would take a while to achieve. In the meantime we're likely to use debugpy as a platform to prototype said API, which would also serve as a customization point. Still, this leaves open the question of what the actual implementation of any given plugin would be like. Named tuples are a standard library data type, so debugpy is going to provide the default implementation for that for the foreseeable future. I want to make sure that this implementation is "good enough" for most people that they wouldn't need to bother customizing (beyond maybe tweaking a few configuration knobs).

@andreamoro
Copy link

Can't really say how good this could be for the mass. Even a plugin could be just an overarching but over complex approach at this stage.
After all, I started from a basic need to see a couple of lines content in the variable explorer. Or perhaps what is already available for the Interactive windows and Jupyter cells?
https://devblogs.microsoft.com/python/python-in-visual-studio-code-april-2019-release/

@zljubisic
Copy link

We are trying here to (re)invent the wheel.
Everything is solved long time ago in PyCharm.
@WSF-SEO-AM do you see any problem here?

image

@int19h What do you say?

@andreamoro
Copy link

No, I can see only the expected results.

@int19h
Copy link
Collaborator

int19h commented Dec 7, 2023

That screenshot does not contain any examples of values that are too big to fit, which is what the issue is fundamentally about. Sure, by rearranging the UI you can make for more space, but that doesn't really solve the problem - at some point you'll get a tuple with a 200-char URL in it; what then?

As I noted earlier, the current limits are obviously too small and will be revised upwards. But, again, that doesn't solve the problem, just makes it less acute.

@zljubisic
Copy link

Pavel, stop resisting. :)

image

int19h added a commit to int19h/debugpy that referenced this issue Dec 14, 2023
…er as expected

Significantly increase safe repr limits for strings and collections.
@int19h int19h closed this as completed in e9a39f7 Dec 15, 2023
int19h added a commit to int19h/debugpy that referenced this issue Jan 11, 2024
…er as expected

Significantly increase safe repr limits for strings and collections.
int19h added a commit to int19h/debugpy that referenced this issue Jan 30, 2024
…er as expected

Significantly increase safe repr limits for strings and collections.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants