New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add lexer for MediaWiki Wikitext #2373
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reviewed this until "wikilinks" only for now.
In general it looks good, congrats on the extensive tests. Please fix the cases of catastrophic backtracking, though.
It is actually a "transclusion modifier", but it should be OK to treat it as a parser function.
@jeanas Could you please give this another review? I have updated it according to the review, and my concerns were left above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM, but I really suggest to make separate states (see my comment about redirects) for very long regexes such as the ones for pre/nowiki/math/chem/etc. It's pretty exhausting to review long regexes for catastrophic backtracking (we don't have tooling for it yet), and using an extra state makes it immediately clear that the backtracking doesn't occur (plus it's generally more readable IMHO).
This workaround produces slightly different tokens without any visible difference. I have manully reviewed them to ensure I didn't introduce any bugs.
@jeanas Done. Could you please check whether the catastrophic backtracking issue is solved? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this looks much better. I agree it's a bit sad to make the lexer a little less correct, but it's quite annoying if invalid constructs (while editing, for example) cause the engine / editor / whatever is using Pygments to hang. Hopefully we'll switch from re
to regex
, which has ++
and the like to avoid this.
Uh-oh, we have a problem in the CI. I have to see if it's possible to make regexlint ignore these regexes. |
I have no idea why it is complaining about (?s) usage, perhaps it's bugged, but I still fixed it by using [\s\S].
I forgot about this. It still has some usage.
@jeanas The linter problem should be addressed now. |
Ah, thanks, I feared it would be too verbose, but looks fine. |
Thank you for your work! |
Hello! Can you tell me approximately when the next release (with this lexer) is planned? :) |
As soon as I get to it. With a bit of luck, this weekend or next weekend. We're in reasonably good shape. |
Super! I'm very glad to hear it :) Thank you! |
This PR implements #827.
Because Wikitext is a very extendable and inconsistent language, it is practically impossible to lex without given the configuration of a MediaWiki installation. This lexer will only catch common syntaxes and tags provided by common extensions.