Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Submodule.iter_items sometimes attaches second empty iterator to its StopIteration #1779

Closed
EliahKagan opened this issue Dec 22, 2023 · 2 comments · Fixed by #1780
Closed

Comments

@EliahKagan
Copy link
Contributor

EliahKagan commented Dec 22, 2023

This is minor bug, and the main benefit of fixing it may be an improvement in code clarity rather than the behavioral difference. I think it makes sense to fix this in the same pull request that fixes #1775, and I've opened #1780 for that, but I think it's clearer to document it here than in the pull request.

Submodule.iter_items is a generator function as intended, because yield appears in its scope:

yield sm
# END for each section

Sometimes when one writes a function that returns a generator object, it is a regular function rather than a generator function, because the function returns a generator expression or because it calls another function that returns a generator object. That technique is sometimes used to fail fast on errors, since calling a generator function does not immediately run its code, but only returns a generator object, and the code runs when that generator object is iterated. However, that is not done here. That it is not appears intentional and reasonable, and even if not, changing it would probably break backwards compatibility.

However, it contains this code, at the top:

try:
pc = repo.commit(parent_commit) # Parent commit instance
parser = cls._config_parser(repo, pc, read_only=True)
except (IOError, BadName):
return iter([])
# END handle empty iterator

The intention there is to make it so that calling iter_items returns an empty iterator when either of those two exceptions is raised. It succeeds at doing that, but its success is unrelated to using iter([]) as the operand of return. Returning from a generator function--like falling off the end, which is more common--causes there to be no more items to iterate through, so a next call in which this happens raises StopIteration.

The operand of return in a generator function is neither returned to the caller--a generator object is returned--nor returned when next is called on that generator object. Instead, it becomes the value attribute of the StopIteration exception raised on the next call to indicate there are no more values. This feature of generators is rarely used, and the common ways of using generators, such as a for loop, ignore it. (When yield from is used as an expression, it evaluates to that value, which allows generators to get status information from their subgenerators, in the rare case their subgenerators return a meaningful value. It can also be accessed directly by explicitly catching StopIteration, binding the exception object to a variable, and accessing its value attribute. That's about it.)

I suppose it might make sense for Submodule.iter_items to have its returned generator object signal, in some way, that it is empty due to OSError (IOError and OSError are the same in Python 3) or BadName. But I think that having it attach a second empty iterator object to the StopIteration exception is clearly not intended to do so.

The statement was originally a bare return. The iter([]) operand was added in 82b131c (#1282). It came in along with numerous additions and improvements in type hinting, and it looks like it was mistakenly added with the idea it might be needed for make type-checking succeed. However, I don't believe the major type checkers were as stable back then as they are today, so perhaps it was instead added as a workaround for a bug in a static type checker.

Interestingly, this is conceptually related to #1755, and I would've fixed it there, had I been aware of it at that time. The logic in the body Submodule.iter_items usually implicitly "returns" None by falling off the end (and None is bound as the StopIteration exception's value attribute), which is its intended behavior, but sometimes it inadvertently "returns" a list_iterator object instead.

EliahKagan added a commit to EliahKagan/GitPython that referenced this issue Dec 22, 2023
Returning an explicit value from a generator function causes that
value to be bound to the `value` attribute of the StopIteration
exception. This is available as the result of "yield from" when it
is used as an expression; or by explicitly catching StopIteration,
binding the StopIteration exception to a variable, and accessing
the attribute. This feature of generators is rarely used.

The `return iter([])` statement in Submodule.iter_items uses this
feature, causing the resulting StopIteration exception object to
have a `value` attribute that refers to a separate second iterator
that also yields no values (gitpython-developers#1779).

From context, this behavior is clearly not the goal; a bare return
statement should be used here (which has the same effect except for
the `value` attribute of the StopIteration exception). The code had
used a bare return prior to 82b131c (gitpython-developers#1282), when `return` was
changed to `return iter([])`. That was part of a change that added
numerous type annotations. It looks like it was either a mistake,
or possibly an attempt to work around an old bug in a static type
checker.

This commit extends the test_iter_items_from_invalid_hash test to
assert that the `value` attribute of the StopIteration is its usual
default value of None. This commit only extends the test; it does
not fix the bug.
EliahKagan added a commit to EliahKagan/GitPython that referenced this issue Dec 22, 2023
This fixes the minor bug where a separate empty iterator was bound
to the StopIteration exception raised as a result of returning from
the generator function (gitpython-developers#1779).

This change does not cause what exceptions are raised from
GitPython code in any situations, nor how many items any iterators
yield.
@Byron
Copy link
Member

Byron commented Dec 23, 2023

Thanks for investigating this! I agree that its certainly better to return nothing choose an idiomatic end-of-iterator instead, assuming that type-checkers don't need this kind of hint anymore.

@EliahKagan
Copy link
Contributor Author

I suspect no type checker did need it, I just can't rule that out since I know type checkers were less stable and less widely used at that time (and I don't know what all the type checkers in use with GitPython were, though mypy was surely one of them). It would definitely be a bug for a type checker to be happier with it with the iter([]) operand, though that of course doesn't mean it wasn't so. I'd say that, ideally, a type checker should have even warned about it.

The output of generator functions is usually typed Iterator[YieldType], because that it is a generator is most often an implementation detail. The most specific abstract type is Generator[YieldType, SendType, ReturnType]. In practice, that tends to be used in the cases where it's least directly useful to human readers but needed for a type checker to understand the resulting function after decoration, such as when using @contextlib.contextmanager to turn a generator function into a factory for context manager objects. Anyway, if one infers a type of the form Generator[YieldType, SendType, ReturnType], then ReturnType would have been list_iterator | None, but if that were intended then the bare return should've been return None (this is the connection to #1755). Type checkers don't always detect that sort of thing. It's one of those things that's not technically a type error but that a type checker could detect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

2 participants