Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable checking against mypy with minimal set of fixes #1402

Closed
wants to merge 11 commits into from

Conversation

oprypin
Copy link
Contributor

@oprypin oprypin commented Nov 2, 2023

4 significant checks are disabled:

  • strict_optional - check that an object might be None on access
  • warn_no_return - requiring a return
  • assignment - any assignment of a variable of a wrong type
  • var-annotated - a type annotation is missing

4 significant checks are disabled:
* `strict_optional` - check that an object might be `None` on access
* `warn_no_return` - requiring a `return`
* `assignment` - any assignment of a variable of a wrong type
* `var-annotated` - a type annotation is missing
@waylan
Copy link
Member

waylan commented Nov 2, 2023

I'm sorry, this is is still way to much rewriting of code. I will not under any circumstances change the way I write code in a dynamically typed language to suite a type checker. The checker should only be verifying that existing annotations are correct.

The only edits which are acceptable are edits to the annotations. If you can't make that work, then mypy is not the right tool.

I see three ways forward:

  1. Configure mypy to do what I want (seems unlikely).
  2. Find a different tool which does what I want.
  3. Do nothing.

markdown/treeprocessors.py Outdated Show resolved Hide resolved
@@ -274,6 +261,27 @@ def get_stash(m):
return util.INLINE_PLACEHOLDER_RE.sub(get_stash, text)


class LegacyPattern(_BasePattern):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have very serious objections to the changes to this file. We have API changes. Under no circumstances am I interested in making API changes for type checking. If that means it's not possible to move forward with mypy, then we don't move forward with mypy.

To be clear, this specific line of code is not any more or less relevant than any other. However, the changes are rather sweeping and it was not possible to easily highlight and comment on all relevant lines. I'm speaking about most of the changes to this file in general.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no actual change to the API, all usages can remain the same.

But this is still indeed the worst change here. I spent a lot of time thinking of this but it's the best I came up with. We could also solicit ideas regarding this.

I'll comment further on this soon.

Copy link
Contributor Author

@oprypin oprypin Nov 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me explain what the problem here is.

This is the general explanation:
https://stackoverflow.com/questions/6034662/python-method-overriding-does-signature-matter

For the particular situation at hand:
class InlineProcessor(Pattern): violates the Liskov Substitution Principle.

And let's just quote directly from mypy's output:

Signature of "handleMatch" incompatible with supertype "Pattern"  [override]
     Superclass:
         def handleMatch(self, m: Match[str]) -> Element | str
     Subclass:
         def handleMatch(self, m: Match[str], data: str) -> tuple[Element | str | None, int | None, int | None]

The particular violation happens regardless of mypy, it's an actual run-time problem if users expect it to work like is widely accepted:

If you write:

if isinstance(foo, Pattern):
    foo.handleMatch(single_argument)

it must work, because every instance of Pattern must have a single-argument handleMatch method, because it's written so right there!

An instance of InlineProcessor is an instance of Pattern, and every instance of it violates this principle.

More specific code example:

>>> import re
>>> from markdown.inlinepatterns import EscapeInlineProcessor, Pattern

>>> foo = Pattern('a')
>>> foo.handleMatch(re.search('a', 'a'))  # OK

>>> foo = EscapeInlineProcessor('a')
>>> isinstance(foo, Pattern)
True
>>> foo.handleMatch(re.search('a', 'a'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: EscapeInlineProcessor.handleMatch() missing 1 required positional argument: 'data'

The correct hierarchy would be this:

  graph TD;
      BasePattern["BasePattern<br>defines helper methods only"]-->Pattern["Pattern<br>abstract def handleMatch(m):"];
      BasePattern-->InlineProcessor["InlineProcessor<br>abstract def handleMatch(m, data):"];
      Pattern-->SimpleTextPattern["SimpleTextPattern<br>def handleMatch(m):"]
      Pattern-->SimpleTagPattern["SimpleTagPattern<br>def handleMatch(m):"]
      InlineProcessor-->EscapeInlineProcessor["EscapeInlineProcessor<br>def handleMatch(m, data):"]
      InlineProcessor-->AsteriskProcessor["AsteriskProcessor<br>def handleMatch(m, data):"]

But the current hierarchy is:

  graph TD;
      Pattern["Pattern<br>defines helper methods<br>abstract def handleMatch(m):"];
      Pattern-->InlineProcessor["InlineProcessor<br>abstract def handleMatch(m, data):"];
      Pattern-->SimpleTextPattern["SimpleTextPattern<br>def handleMatch(m):"]
      Pattern-->SimpleTagPattern["SimpleTagPattern<br>def handleMatch(m):"]
      InlineProcessor-->EscapeInlineProcessor["EscapeInlineProcessor<br>def handleMatch(m, data):"]
      InlineProcessor-->AsteriskProcessor["AsteriskProcessor<br>def handleMatch(m, data):"]

As such, the entire hierarchy of InlineProcessor is said to have improper overrides.

What my change accomplishes is to construct the correct type hierarchy to mypy and to all typechecking users going forward.
However at run time I am keeping everything just like it was. LegacyPattern is just a new alias to Pattern

Copy link
Contributor Author

@oprypin oprypin Nov 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See a code search of occurrences in the wild: currently every subclass of InlineProcessor needs to add an ignore comment for this override.

https://github.com/search?q=%2FInlineProcessor%5C%29%2F+%2FhandleMatch.%2Bignore%2F+NOT+path%3Azerver&type=code

Of course at the moment that's not this repository's fault (as it doesn't have py.typed) but just because the type stubs are written in a way that directly mirrors it:
https://github.com/python/typeshed/blob/9ff393558a6f4fedc086bc9dc4386dbe529a3331/stubs/Markdown/markdown/inlinepatterns.pyi#L48

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also this issue report (which I found through the above code search because it's directly inspired by this case). Which if accepted would resolve this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An equivalent change to the external stubs would be this:
oprypin/typeshed@43593a4
Maybe this more condensed view is clearer to understand.

Copy link
Member

@waylan waylan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just went through and noted (nearly) every objection I have to these changes.

@@ -175,7 +175,7 @@ def test(self, parent: etree.Element, block: str) -> bool:
return block.startswith(' '*self.tab_length) and \
not self.parser.state.isstate('detabbed') and \
(parent.tag in self.ITEM_TYPES or
(len(parent) and parent[-1] is not None and
(len(parent) > 0 and parent[-1] is not None and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change should be completely unnecessary. The code works just fine and any tool which prevents me from writing code that way is unwanted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code can actually sometimes return an instance of int rather than bool. If we have to, I'll follow the usual over-the-top process of proving to you that this is a bug and writing a test.

markdown/core.py Outdated
@@ -85,7 +85,7 @@ class Markdown:
callable which accepts an [`Element`][xml.etree.ElementTree.Element] and returns a `str`.
"""

def __init__(self, **kwargs):
def __init__(self, **kwargs) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This addition adds no value and I will not tolerate being forced to add it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding any annotation indicates "please type-check the body of this function". The cool thing about it is that if you add a new function and not write any type annotations on it, mypy will not complain whatsoever. It is a good thing to have this tool. So this un-nuanced take on this is unproductive.

Here I'm adding this -> None because this function contains some type annotations that are being wasted otherwise. To not have problems with this, I'll have to move the annotations to class level. Let me know if that works better for your preference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But when does an __init__ method return anything other than None? That's my point. Adding this annotation adds no value in that it tells me nothing I didn't already know. I should not be required to add this annotation every time I define and __init__ function.

You seem to object based on the fact that eliminating the annotation causes the tool to not run other checks on the line. That is a problem with the tool. If the tool can't do that, then fix the tool or use a different tool. Don't force me to waste time and effort adding otherwise pointless annotations.

Copy link
Contributor Author

@oprypin oprypin Nov 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding -> None annotations to __init__ is useless, I totally agree on that. However you ignore the point that I am actually arguing for. You want the tool to not do anything unless you call for it. As such, not adding any annotations accomplishes that. But when you do want to go ahead and tell the tool "please type-check this function body", this is the designated way to do that. You are not forced, and I demonstrated that with the current state of this pull request.

Separately on your last point: the tool does not require this annotation, you can simply configure it to check all function bodies. If you want it to check only some function bodies, this is how to indicate it.

markdown/core.py Outdated
Comment on lines 448 to 449
html = html.encode(encoding, "xmlcharrefreplace")
sys.stdout.buffer.write(html)
html_bytes = html.encode(encoding, "xmlcharrefreplace")
sys.stdout.buffer.write(html_bytes)
Copy link
Member

@waylan waylan Nov 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I have stated repeatedly, I embrace the fact that Python is dynamically typed and allows reuse of a variable. Any tool which disallows this is unacceptable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK let's use an ignore comment instead

Yeah it's totally a limitation of mypy

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then lets find a tool which doesn't have this limitation. Quit wasting my time with this nonsense.

markdown/extensions/md_in_html.py Outdated Show resolved Hide resolved
pattern += (md,)
pattern = SubstituteTextPattern(*pattern)
for ind, pattern_args in enumerate(patterns):
pattern = SubstituteTextPattern(*pattern_args, md)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again the dynamic nature of Python is being taken away here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed a limitation of mypy.
That being said, this change is just a mordernization that could've been done anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear. if the code had been written that way originally when it was submitted in a PR, I would have accepted It. But I would be willing to accept the existing code also (as I did). I don't have an objection to either way, but I do have an objection to a tool forcing one way on me and not allowing the other. That is not a tool I am interested in using.

Comment on lines 78 to 79
html = self.stash_to_string(self.md.htmlStash.rawHtmlBlocks[i])
raw: str = self.md.htmlStash.rawHtmlBlocks[i]
html = self.stash_to_string(raw)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, so mypy doesn't allow his pattern (funct1(funct2(value)))? That's ridiculous.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not that it doesn't allow nested calls, it's that there is another problem:

Argument 1 to stash_to_string of RawHtmlPostprocessor has incompatible type str | Element; expected str

self.md.htmlStash.rawHtmlBlocks[i] is type-annotated as str | Element
-which is correct because md_in_html can indeed put an Element there:

self.cleandoc.append(self.md.htmlStash.store(element))

However, stash_to_string is guarded by a str type annotation because indeed for an Element it will just produce garbage such as "<Element 'a' at 0x7f55e83a8ea0>", and mypy rightly warns about it.

So once again there is a potential bug that mypy warns us about.
Upon looking into this though, I concluded that maybe there is no actual bug because RawHtmlPostprocessor consistently only puts strings into htmlStash.

For now turned it into an ignore comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upon looking into this though, I concluded that maybe there is no actual bug because RawHtmlPostprocessor consistently only puts strings into htmlStash.

Exactly. Again, this is a waste of time and not helpful at all. Unless maybe this is telling us that we need to better annotate the stash to indicate that it only accepts strings. Would doing that resolve the issue? If so, then that would be how to address this, not this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you did not read my comment very attentively because I explicitly say that there is in fact code that puts Elements into the stash.

Comment on lines 274 to 279
new_style = isinstance(pattern, inlinepatterns.InlineProcessor)
if isinstance(pattern, inlinepatterns.InlineProcessor):
new_style = True
new_pattern = pattern
else: # pragma: no cover
new_style = False
legacy_pattern = pattern
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again this (and all the associated changes in this file that follow) is removing the dynamic nature of python.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK let's just ignore-comment everything instead, I don't know

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A cool thing to do here would be to refactor this method into 2 methods. Despite some code duplication, it's actually less error-prone going forward, and even provides a clearer deprecation for old-style patterns. (Also only one # pragma: no cover comment now, rather than per-case ones 😊

This is demonstrated here: oprypin@1c047f0?diff=split

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK let's just ignore-comment everything instead, I don't know

Actually, as per my other comment, not don't just ignore-comment everything.

A cool thing to do here would be to refactor this method into 2 methods.

Perhaps, but that should be addresses separately.

@waylan
Copy link
Member

waylan commented Nov 3, 2023

So, it seems that if you can configure mypy to not require any of the code changes included here, then it would be something we can use.

However, I will note that simply commenting lines of code may not be the best solution in some cases. For example. there is little difference in def __init__(self) -> None: and def __init__(self): # ignore. I would expect to be able to do def __init__(self):. And the same goes for every time we use Python's dynamic nature. There shouldn't need to be a comment on every such line. A global config option would be preferable. If mypy can't do that, then its not the right tool for the job.

@facelessuser
Copy link
Collaborator

Yes, it seems that the projects goals and this PR's goals are a bit at odds. It sounds like Python Markdown only cares about checking that types are defined, but type checking itself is not desired.

I will add one point though. There seems to be some disdain for explicitly defining a function has no return, but if that is disabled, it seems you have no actual way of checking if a return is required and missed? Maybe I'm wrong about this?

@waylan
Copy link
Member

waylan commented Nov 3, 2023

I will add one point though. There seems to be some disdain for explicitly defining a function has no return, but if that is disabled, it seems you have no actual way of checking if a return is required and missed? Maybe I'm wrong about this?

I'm okay with that. The only thing that should be checked is that existing annotations are correct. Anything that is not annotated should be ignored. Seems like a simple concept. I don't understand why that is so hard... except that perhaps it is because the goal of the type checker is to enforce static typing, which is something I have repeatedly said I am not interested in.

Comment on lines +88 to +95
tab_length: int
ESCAPED_CHARS: list[str]
block_level_elements: list[str]
registeredExtensions: list[Extension]
stripTopLevelTags: bool
references: dict[str, tuple[str, str]]
htmlStash: util.HtmlStash

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are instance attributes, not class attributes. Will you please stop changing the code to fit a tool which I have repeatedly stated does not fit our needs.

I am closing this issue as you are clearly either not understanding my direction or intentionally ignoring it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the correct way to annotate instance attributes according to all tools and documentation

@waylan waylan closed this Nov 3, 2023
@oprypin
Copy link
Contributor Author

oprypin commented Nov 3, 2023

The only thing that should be checked is that existing annotations are correct.

In which way, just purely conceptually, do you think this is supposed to be accomplished? What is "correct" in isolation?

The only known way to do this is to check code that uses annotated members. If something doesn't match up then it shows an error, and you need to check if it's the code that is wrong or the annotation that is wrong.

@waylan waylan added the rejected The pull request is rejected for the stated reasons. label Nov 3, 2023
@facelessuser
Copy link
Collaborator

I'm okay with that. The only thing that should be checked is that existing annotations are correct. Anything that is not annotated should be ignored. Seems like a simple concept. I don't understand why that is so hard... except that perhaps it is because the goal of the type checker is to enforce static typing, which is something I have repeatedly said I am not interested in.

That's fair, and it is possible I've misinterpreted some interaction. So yeah, I'm good with that.

@waylan
Copy link
Member

waylan commented Nov 3, 2023

The only known way to do this is to check code that uses annotated members. If something doesn't match up then it shows an error, and you need to check if it's the code that is wrong or the annotation that is wrong.

Yes, when its annotated that is reasonable. And when it is not annotated, it should be ignored.

In other words, suppose we have to functions, one of which is annotated and one which is not. The tool would see a line of code which calls the annotated function, It would see the annotation and confirm that the return value matches up. However, when it sees the unannotated function is would not confirm anything about the return value as there is nothing to confirm. That is what I expect to happen.

In the first case, the assumption would then be that perhaps the annotation is wrong. However, we could check for a potential bug in the code and fix that instead if need be. In the second case, we are alerted to nothing as it has been for many years.

And no, I don't want to have to add an ignore comment to every one of those lines. The tool should just ignore them either by default or via a global setting.

@oprypin
Copy link
Contributor Author

oprypin commented Nov 3, 2023

I think mypy in the current config matches what you describe - it validates only things that are annotated.

Which of the comment threads don't match this? There was an example where something clearly annotated as str | Element is being passed to something clearly annotated as str. How is that not a correct finding that either the annotation is wrong or the code can have a problem?

Although a big snag might be in the fact that the whole standard library is annotated, so usages (inside annotated functions) are checked against it as well.

And there was one consistently mentioned limitation of mypy - that it can't cope with reassigning a different-typed value to the same variable name, and it should be able to. I have replied with "limitation" to those threads. But it's a conceptually different problem. And if we do look at it separately, perhaps this pull request can be said to have only this as a blocker.

@waylan
Copy link
Member

waylan commented Nov 3, 2023

Which of the comment threads don't match this? There was an example where something clearly annotated as str | Element is being passed to something clearly annotated as str. How is that not a correct finding that either the annotation is wrong or the code can have a problem?

I've never suggested that that was the case. In fact, if the issue is that the annotation was wrong, I have made no objection. However, I have consistently stated that if the code is wrong, then each individual instance should be addressed as a separate bug, not included wholesale in this large PR. This PR (or one like it) should not be changing code. only annotations. And yet, you have repeatedly ignored that request.

And there was one consistently mentioned limitation of mypy - that it can't cope with reassigning a different-typed value to the same variable name, and it should be able to. I have replied with "limitation" to those threads. But it's a conceptually different problem. And if we do look at it separately, perhaps this pull request can be said to have only this as a blocker.

And a very big blocker is it, I have stated repeatedly that I will not change this lib to be statically typed, which mypy requires. In other words, I will not change the code to appease a tool which is designed to enforce something we are not enforcing. And it is unreasonable to expect us to ignore-comment every single dynamically typed object in our code line-by-line. If mypy had a global option to not enforce that, then it wouldn't be a tool which enforced static types. Therefore, it seems that if does not fit our needs.

@Python-Markdown Python-Markdown locked as resolved and limited conversation to collaborators Nov 3, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
rejected The pull request is rejected for the stated reasons.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants