Enable checking against mypy with minimal set of fixes #1402

oprypin · 2023-11-02T16:42:04Z

4 significant checks are disabled:

strict_optional - check that an object might be None on access
warn_no_return - requiring a return
assignment - any assignment of a variable of a wrong type
var-annotated - a type annotation is missing

4 significant checks are disabled: * `strict_optional` - check that an object might be `None` on access * `warn_no_return` - requiring a `return` * `assignment` - any assignment of a variable of a wrong type * `var-annotated` - a type annotation is missing

waylan · 2023-11-02T17:27:58Z

I'm sorry, this is is still way to much rewriting of code. I will not under any circumstances change the way I write code in a dynamically typed language to suite a type checker. The checker should only be verifying that existing annotations are correct.

The only edits which are acceptable are edits to the annotations. If you can't make that work, then mypy is not the right tool.

I see three ways forward:

Configure mypy to do what I want (seems unlikely).
Find a different tool which does what I want.
Do nothing.

markdown/treeprocessors.py

waylan · 2023-11-03T14:29:40Z

markdown/inlinepatterns.py

@@ -274,6 +261,27 @@ def get_stash(m):
        return util.INLINE_PLACEHOLDER_RE.sub(get_stash, text)


+class LegacyPattern(_BasePattern):


I have very serious objections to the changes to this file. We have API changes. Under no circumstances am I interested in making API changes for type checking. If that means it's not possible to move forward with mypy, then we don't move forward with mypy.

To be clear, this specific line of code is not any more or less relevant than any other. However, the changes are rather sweeping and it was not possible to easily highlight and comment on all relevant lines. I'm speaking about most of the changes to this file in general.

There is no actual change to the API, all usages can remain the same.

But this is still indeed the worst change here. I spent a lot of time thinking of this but it's the best I came up with. We could also solicit ideas regarding this.

I'll comment further on this soon.

Let me explain what the problem here is.

This is the general explanation:
https://stackoverflow.com/questions/6034662/python-method-overriding-does-signature-matter

For the particular situation at hand:
class InlineProcessor(Pattern): violates the Liskov Substitution Principle.

And let's just quote directly from mypy's output:

Signature of "handleMatch" incompatible with supertype "Pattern" [override] Superclass: def handleMatch(self, m: Match[str]) -> Element | str Subclass: def handleMatch(self, m: Match[str], data: str) -> tuple[Element | str | None, int | None, int | None]

The particular violation happens regardless of mypy, it's an actual run-time problem if users expect it to work like is widely accepted:

If you write:

if isinstance(foo, Pattern): foo.handleMatch(single_argument)

it must work, because every instance of Pattern must have a single-argument handleMatch method, because it's written so right there!

An instance of InlineProcessor is an instance of Pattern, and every instance of it violates this principle.

More specific code example:

>>> import re >>> from markdown.inlinepatterns import EscapeInlineProcessor, Pattern >>> foo = Pattern('a') >>> foo.handleMatch(re.search('a', 'a')) # OK >>> foo = EscapeInlineProcessor('a') >>> isinstance(foo, Pattern) True >>> foo.handleMatch(re.search('a', 'a')) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: EscapeInlineProcessor.handleMatch() missing 1 required positional argument: 'data'

The correct hierarchy would be this:

graph TD; BasePattern["BasePattern defines helper methods only"]-->Pattern["Pattern abstract def handleMatch(m):"]; BasePattern-->InlineProcessor["InlineProcessor abstract def handleMatch(m, data):"]; Pattern-->SimpleTextPattern["SimpleTextPattern def handleMatch(m):"] Pattern-->SimpleTagPattern["SimpleTagPattern def handleMatch(m):"] InlineProcessor-->EscapeInlineProcessor["EscapeInlineProcessor def handleMatch(m, data):"] InlineProcessor-->AsteriskProcessor["AsteriskProcessor def handleMatch(m, data):"]

But the current hierarchy is:

graph TD; Pattern["Pattern defines helper methods abstract def handleMatch(m):"]; Pattern-->InlineProcessor["InlineProcessor abstract def handleMatch(m, data):"]; Pattern-->SimpleTextPattern["SimpleTextPattern def handleMatch(m):"] Pattern-->SimpleTagPattern["SimpleTagPattern def handleMatch(m):"] InlineProcessor-->EscapeInlineProcessor["EscapeInlineProcessor def handleMatch(m, data):"] InlineProcessor-->AsteriskProcessor["AsteriskProcessor def handleMatch(m, data):"]

As such, the entire hierarchy of InlineProcessor is said to have improper overrides.

What my change accomplishes is to construct the correct type hierarchy to mypy and to all typechecking users going forward.
However at run time I am keeping everything just like it was. LegacyPattern is just a new alias to Pattern

See a code search of occurrences in the wild: currently every subclass of InlineProcessor needs to add an ignore comment for this override.

https://github.com/search?q=%2FInlineProcessor%5C%29%2F+%2FhandleMatch.%2Bignore%2F+NOT+path%3Azerver&type=code

Of course at the moment that's not this repository's fault (as it doesn't have py.typed) but just because the type stubs are written in a way that directly mirrors it:
https://github.com/python/typeshed/blob/9ff393558a6f4fedc086bc9dc4386dbe529a3331/stubs/Markdown/markdown/inlinepatterns.pyi#L48

There's also this issue report (which I found through the above code search because it's directly inspired by this case). Which if accepted would resolve this

type: ignore for “incompatible with supertype” error in subclass should suffice for subsubclasses too python/mypy#10197

An equivalent change to the external stubs would be this:
oprypin/typeshed@43593a4
Maybe this more condensed view is clearer to understand.

waylan

I just went through and noted (nearly) every objection I have to these changes.

waylan · 2023-11-03T14:31:45Z

markdown/blockprocessors.py

@@ -175,7 +175,7 @@ def test(self, parent: etree.Element, block: str) -> bool:
        return block.startswith(' '*self.tab_length) and \
            not self.parser.state.isstate('detabbed') and \
            (parent.tag in self.ITEM_TYPES or
-                (len(parent) and parent[-1] is not None and
+                (len(parent) > 0 and parent[-1] is not None and


This change should be completely unnecessary. The code works just fine and any tool which prevents me from writing code that way is unwanted.

The code can actually sometimes return an instance of int rather than bool. If we have to, I'll follow the usual over-the-top process of proving to you that this is a bug and writing a test.

waylan · 2023-11-03T14:32:15Z

markdown/core.py

@@ -85,7 +85,7 @@ class Markdown:
    callable which accepts an [`Element`][xml.etree.ElementTree.Element] and returns a `str`.
    """

-    def __init__(self, **kwargs):
+    def __init__(self, **kwargs) -> None:


This addition adds no value and I will not tolerate being forced to add it.

Adding any annotation indicates "please type-check the body of this function". The cool thing about it is that if you add a new function and not write any type annotations on it, mypy will not complain whatsoever. It is a good thing to have this tool. So this un-nuanced take on this is unproductive.

Here I'm adding this -> None because this function contains some type annotations that are being wasted otherwise. To not have problems with this, I'll have to move the annotations to class level. Let me know if that works better for your preference.

But when does an __init__ method return anything other than None? That's my point. Adding this annotation adds no value in that it tells me nothing I didn't already know. I should not be required to add this annotation every time I define and __init__ function.

You seem to object based on the fact that eliminating the annotation causes the tool to not run other checks on the line. That is a problem with the tool. If the tool can't do that, then fix the tool or use a different tool. Don't force me to waste time and effort adding otherwise pointless annotations.

Adding -> None annotations to __init__ is useless, I totally agree on that. However you ignore the point that I am actually arguing for. You want the tool to not do anything unless you call for it. As such, not adding any annotations accomplishes that. But when you do want to go ahead and tell the tool "please type-check this function body", this is the designated way to do that. You are not forced, and I demonstrated that with the current state of this pull request.

Separately on your last point: the tool does not require this annotation, you can simply configure it to check all function bodies. If you want it to check only some function bodies, this is how to indicate it.

waylan · 2023-11-03T14:33:07Z

markdown/core.py

-            html = html.encode(encoding, "xmlcharrefreplace")
-            sys.stdout.buffer.write(html)
+            html_bytes = html.encode(encoding, "xmlcharrefreplace")
+            sys.stdout.buffer.write(html_bytes)


As I have stated repeatedly, I embrace the fact that Python is dynamically typed and allows reuse of a variable. Any tool which disallows this is unacceptable.

OK let's use an ignore comment instead

Yeah it's totally a limitation of mypy

Then lets find a tool which doesn't have this limitation. Quit wasting my time with this nonsense.

markdown/extensions/md_in_html.py

waylan · 2023-11-03T14:35:36Z

markdown/extensions/smarty.py

-            pattern += (md,)
-            pattern = SubstituteTextPattern(*pattern)
+        for ind, pattern_args in enumerate(patterns):
+            pattern = SubstituteTextPattern(*pattern_args, md)


Again the dynamic nature of Python is being taken away here.

This is indeed a limitation of mypy.
That being said, this change is just a mordernization that could've been done anyway.

To be clear. if the code had been written that way originally when it was submitted in a PR, I would have accepted It. But I would be willing to accept the existing code also (as I did). I don't have an objection to either way, but I do have an objection to a tool forcing one way on me and not allowing the other. That is not a tool I am interested in using.

waylan · 2023-11-03T14:38:14Z

markdown/postprocessors.py

-            html = self.stash_to_string(self.md.htmlStash.rawHtmlBlocks[i])
+            raw: str = self.md.htmlStash.rawHtmlBlocks[i]
+            html = self.stash_to_string(raw)


Wait, so mypy doesn't allow his pattern (funct1(funct2(value)))? That's ridiculous.

It's not that it doesn't allow nested calls, it's that there is another problem:

Argument 1 to stash_to_string of RawHtmlPostprocessor has incompatible type str | Element; expected str

self.md.htmlStash.rawHtmlBlocks[i] is type-annotated as str | Element
-which is correct because md_in_html can indeed put an Element there:

markdown/markdown/extensions/md_in_html.py

Line 166 in a63e6f3

self.cleandoc.append(self.md.htmlStash.store(element))

However, stash_to_string is guarded by a str type annotation because indeed for an Element it will just produce garbage such as "<Element 'a' at 0x7f55e83a8ea0>", and mypy rightly warns about it.

So once again there is a potential bug that mypy warns us about.
Upon looking into this though, I concluded that maybe there is no actual bug because RawHtmlPostprocessor consistently only puts strings into htmlStash.

For now turned it into an ignore comment.

Upon looking into this though, I concluded that maybe there is no actual bug because RawHtmlPostprocessor consistently only puts strings into htmlStash.

Exactly. Again, this is a waste of time and not helpful at all. Unless maybe this is telling us that we need to better annotate the stash to indicate that it only accepts strings. Would doing that resolve the issue? If so, then that would be how to address this, not this.

I think you did not read my comment very attentively because I explicitly say that there is in fact code that puts Elements into the stash.

waylan · 2023-11-03T14:41:06Z

markdown/treeprocessors.py

-        new_style = isinstance(pattern, inlinepatterns.InlineProcessor)
+        if isinstance(pattern, inlinepatterns.InlineProcessor):
+            new_style = True
+            new_pattern = pattern
+        else:  # pragma: no cover
+            new_style = False
+            legacy_pattern = pattern


Again this (and all the associated changes in this file that follow) is removing the dynamic nature of python.

OK let's just ignore-comment everything instead, I don't know

A cool thing to do here would be to refactor this method into 2 methods. Despite some code duplication, it's actually less error-prone going forward, and even provides a clearer deprecation for old-style patterns. (Also only one # pragma: no cover comment now, rather than per-case ones 😊

This is demonstrated here: oprypin@1c047f0?diff=split

OK let's just ignore-comment everything instead, I don't know

Actually, as per my other comment, not don't just ignore-comment everything.

A cool thing to do here would be to refactor this method into 2 methods.

Perhaps, but that should be addresses separately.

waylan · 2023-11-03T14:53:13Z

So, it seems that if you can configure mypy to not require any of the code changes included here, then it would be something we can use.

However, I will note that simply commenting lines of code may not be the best solution in some cases. For example. there is little difference in def __init__(self) -> None: and def __init__(self): # ignore. I would expect to be able to do def __init__(self):. And the same goes for every time we use Python's dynamic nature. There shouldn't need to be a comment on every such line. A global config option would be preferable. If mypy can't do that, then its not the right tool for the job.

facelessuser · 2023-11-03T15:14:17Z

Yes, it seems that the projects goals and this PR's goals are a bit at odds. It sounds like Python Markdown only cares about checking that types are defined, but type checking itself is not desired.

I will add one point though. There seems to be some disdain for explicitly defining a function has no return, but if that is disabled, it seems you have no actual way of checking if a return is required and missed? Maybe I'm wrong about this?

waylan · 2023-11-03T17:36:03Z

I will add one point though. There seems to be some disdain for explicitly defining a function has no return, but if that is disabled, it seems you have no actual way of checking if a return is required and missed? Maybe I'm wrong about this?

I'm okay with that. The only thing that should be checked is that existing annotations are correct. Anything that is not annotated should be ignored. Seems like a simple concept. I don't understand why that is so hard... except that perhaps it is because the goal of the type checker is to enforce static typing, which is something I have repeatedly said I am not interested in.

waylan · 2023-11-03T17:41:20Z

markdown/core.py

+    tab_length: int
+    ESCAPED_CHARS: list[str]
+    block_level_elements: list[str]
+    registeredExtensions: list[Extension]
+    stripTopLevelTags: bool
+    references: dict[str, tuple[str, str]]
+    htmlStash: util.HtmlStash
+


These are instance attributes, not class attributes. Will you please stop changing the code to fit a tool which I have repeatedly stated does not fit our needs.

I am closing this issue as you are clearly either not understanding my direction or intentionally ignoring it.

This is the correct way to annotate instance attributes according to all tools and documentation

oprypin · 2023-11-03T17:41:47Z

The only thing that should be checked is that existing annotations are correct.

In which way, just purely conceptually, do you think this is supposed to be accomplished? What is "correct" in isolation?

The only known way to do this is to check code that uses annotated members. If something doesn't match up then it shows an error, and you need to check if it's the code that is wrong or the annotation that is wrong.

facelessuser · 2023-11-03T17:43:40Z

I'm okay with that. The only thing that should be checked is that existing annotations are correct. Anything that is not annotated should be ignored. Seems like a simple concept. I don't understand why that is so hard... except that perhaps it is because the goal of the type checker is to enforce static typing, which is something I have repeatedly said I am not interested in.

That's fair, and it is possible I've misinterpreted some interaction. So yeah, I'm good with that.

waylan · 2023-11-03T17:54:01Z

The only known way to do this is to check code that uses annotated members. If something doesn't match up then it shows an error, and you need to check if it's the code that is wrong or the annotation that is wrong.

Yes, when its annotated that is reasonable. And when it is not annotated, it should be ignored.

In other words, suppose we have to functions, one of which is annotated and one which is not. The tool would see a line of code which calls the annotated function, It would see the annotation and confirm that the return value matches up. However, when it sees the unannotated function is would not confirm anything about the return value as there is nothing to confirm. That is what I expect to happen.

In the first case, the assumption would then be that perhaps the annotation is wrong. However, we could check for a potential bug in the code and fix that instead if need be. In the second case, we are alerted to nothing as it has been for many years.

And no, I don't want to have to add an ignore comment to every one of those lines. The tool should just ignore them either by default or via a global setting.

oprypin · 2023-11-03T18:10:39Z

I think mypy in the current config matches what you describe - it validates only things that are annotated.

Which of the comment threads don't match this? There was an example where something clearly annotated as str | Element is being passed to something clearly annotated as str. How is that not a correct finding that either the annotation is wrong or the code can have a problem?

Although a big snag might be in the fact that the whole standard library is annotated, so usages (inside annotated functions) are checked against it as well.

And there was one consistently mentioned limitation of mypy - that it can't cope with reassigning a different-typed value to the same variable name, and it should be able to. I have replied with "limitation" to those threads. But it's a conceptually different problem. And if we do look at it separately, perhaps this pull request can be said to have only this as a blocker.

waylan · 2023-11-03T18:47:22Z

Which of the comment threads don't match this? There was an example where something clearly annotated as str | Element is being passed to something clearly annotated as str. How is that not a correct finding that either the annotation is wrong or the code can have a problem?

I've never suggested that that was the case. In fact, if the issue is that the annotation was wrong, I have made no objection. However, I have consistently stated that if the code is wrong, then each individual instance should be addressed as a separate bug, not included wholesale in this large PR. This PR (or one like it) should not be changing code. only annotations. And yet, you have repeatedly ignored that request.

And there was one consistently mentioned limitation of mypy - that it can't cope with reassigning a different-typed value to the same variable name, and it should be able to. I have replied with "limitation" to those threads. But it's a conceptually different problem. And if we do look at it separately, perhaps this pull request can be said to have only this as a blocker.

And a very big blocker is it, I have stated repeatedly that I will not change this lib to be statically typed, which mypy requires. In other words, I will not change the code to appease a tool which is designed to enforce something we are not enforcing. And it is unreasonable to expect us to ignore-comment every single dynamically typed object in our code line-by-line. If mypy had a global option to not enforce that, then it wouldn't be a tool which enforced static types. Therefore, it seems that if does not fit our needs.

This was referenced Nov 2, 2023

Language Server does not find extension attribute: Cannot access member "Meta" for type "Markdown" Member "Meta" is unknown #1383

Open

Fix all mypy typechecking errors, add a lot of type annotations #1399

Closed

Merge remote-tracking branch 'origin/master' into annot2

b71ac4d

oprypin added 4 commits November 2, 2023 18:27

This branch has no coverage

dc41e0d

Fixup

ab34074

Further reduce changes

944d89b

Merge remote-tracking branch 'origin/master' into annot2

3c6cebf

waylan reviewed Nov 2, 2023

View reviewed changes

markdown/treeprocessors.py Outdated Show resolved Hide resolved

oprypin added 3 commits November 2, 2023 19:55

Remove another check

7dc2f9d

Not adding py.typed yet

0dca660

Merge remote-tracking branch 'origin/master' into annot2

50612bd

waylan reviewed Nov 3, 2023

View reviewed changes

Remove all -> None annotations

75cb5d7

Ignore some problems away

389cceb

waylan reviewed Nov 3, 2023

View reviewed changes

waylan closed this Nov 3, 2023

waylan added the rejected The pull request is rejected for the stated reasons. label Nov 3, 2023

oprypin mentioned this pull request Nov 3, 2023

Markdown: add fake base class to Pattern to fix overrides python/typeshed#10970

Open

Python-Markdown locked as resolved and limited conversation to collaborators Nov 3, 2023

		@@ -274,6 +261,27 @@ def get_stash(m):
		return util.INLINE_PLACEHOLDER_RE.sub(get_stash, text)


		class LegacyPattern(_BasePattern):

Enable checking against mypy with minimal set of fixes #1402

Enable checking against mypy with minimal set of fixes #1402

Conversation

oprypin commented Nov 2, 2023

waylan commented Nov 2, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oprypin Nov 3, 2023 • edited

Choose a reason for hiding this comment

oprypin Nov 3, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

waylan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oprypin Nov 3, 2023 • edited

Choose a reason for hiding this comment

waylan Nov 3, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

waylan commented Nov 3, 2023

facelessuser commented Nov 3, 2023

waylan commented Nov 3, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oprypin commented Nov 3, 2023

facelessuser commented Nov 3, 2023

waylan commented Nov 3, 2023

oprypin commented Nov 3, 2023

waylan commented Nov 3, 2023

waylan commented Nov 2, 2023 •

edited

oprypin Nov 3, 2023 •

edited

oprypin Nov 3, 2023 •

edited

oprypin Nov 3, 2023 •

edited

waylan Nov 3, 2023 •

edited