Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add lexer for MediaWiki Wikitext #2373

Merged
merged 13 commits into from Apr 5, 2023
Merged

Add lexer for MediaWiki Wikitext #2373

merged 13 commits into from Apr 5, 2023

Conversation

diskdance
Copy link
Contributor

This PR implements #827.

Because Wikitext is a very extendable and inconsistent language, it is practically impossible to lex without given the configuration of a MediaWiki installation. This lexer will only catch common syntaxes and tags provided by common extensions.

Copy link
Contributor

@jeanas jeanas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed this until "wikilinks" only for now.

In general it looks good, congrats on the extensive tests. Please fix the cases of catastrophic backtracking, though.

pygments/lexers/markup.py Outdated Show resolved Hide resolved
pygments/lexers/markup.py Outdated Show resolved Hide resolved
pygments/lexers/markup.py Outdated Show resolved Hide resolved
pygments/lexers/markup.py Outdated Show resolved Hide resolved
pygments/lexers/markup.py Outdated Show resolved Hide resolved
It is actually a "transclusion modifier", but it should be OK to treat
it as a parser function.
@diskdance diskdance requested a review from jeanas March 15, 2023 15:02
@diskdance
Copy link
Contributor Author

diskdance commented Mar 28, 2023

@jeanas Could you please give this another review? I have updated it according to the review, and my concerns were left above.

Copy link
Contributor

@jeanas jeanas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, but I really suggest to make separate states (see my comment about redirects) for very long regexes such as the ones for pre/nowiki/math/chem/etc. It's pretty exhausting to review long regexes for catastrophic backtracking (we don't have tooling for it yet), and using an extra state makes it immediately clear that the backtracking doesn't occur (plus it's generally more readable IMHO).

@diskdance diskdance requested a review from jeanas April 1, 2023 09:24
This workaround produces slightly different tokens without any visible
difference. I have manully reviewed them to ensure I didn't introduce any bugs.
@diskdance
Copy link
Contributor Author

@jeanas Done. Could you please check whether the catastrophic backtracking issue is solved?

Copy link
Contributor

@jeanas jeanas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks much better. I agree it's a bit sad to make the lexer a little less correct, but it's quite annoying if invalid constructs (while editing, for example) cause the engine / editor / whatever is using Pygments to hang. Hopefully we'll switch from re to regex, which has ++ and the like to avoid this.

pygments/lexers/markup.py Show resolved Hide resolved
@jeanas
Copy link
Contributor

jeanas commented Apr 2, 2023

Uh-oh, we have a problem in the CI.

thatch/regexlint#50

I have to see if it's possible to make regexlint ignore these regexes.

I have no idea why it is complaining about (?s) usage, perhaps it's
bugged, but I still fixed it by using [\s\S].
I forgot about this. It still has some usage.
@diskdance
Copy link
Contributor Author

@jeanas The linter problem should be addressed now.

@jeanas
Copy link
Contributor

jeanas commented Apr 5, 2023

Ah, thanks, I feared it would be too verbose, but looks fine.

@jeanas jeanas merged commit eaca690 into pygments:master Apr 5, 2023
15 checks passed
@jeanas
Copy link
Contributor

jeanas commented Apr 5, 2023

Thank you for your work!

@In1quity
Copy link

In1quity commented Apr 6, 2023

Thank you for your work!

Hello! Can you tell me approximately when the next release (with this lexer) is planned? :)

@Anteru
Copy link
Collaborator

Anteru commented Apr 6, 2023

As soon as I get to it. With a bit of luck, this weekend or next weekend. We're in reasonably good shape.

@In1quity
Copy link

In1quity commented Apr 6, 2023

Super! I'm very glad to hear it :) Thank you!

@Anteru Anteru added this to the 2.15.0 milestone Apr 10, 2023
@Anteru Anteru added the A-lexing area: changes to individual lexers label Apr 10, 2023
@diskdance diskdance deleted the wikitext branch April 30, 2023 10:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-lexing area: changes to individual lexers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants