-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Major performance issue when parsing a long list of reference links #996
Comments
Now I read a bit more code and I'm not sure I understand the spec correctly 🤔 markdown-it/lib/rules_block/reference.mjs Line 166 in d07d585
|
It never makes sense to put In any case, if you have a solution that massively improves performance, and all tests still pass, it's worth adding PR. |
@rlidwka but is it about interrupting those tags or interrupting BY those tags. I'm confused by the comment in the code:
|
This means that This also means that |
Thanks! Created PR here: #998 |
This is not just a performance issue, this is algorithmic complexity issue. [ref1]: url 'title'
[ref2]: url
[ref3]: url
[ref4]: url
...
[ref10000]: url Where does first reference end? Keep in mind, that it could look like this, which is one big reference: [ref1]: url '
[ref2]: url
[ref3]: url
[ref4]: url
...
[ref10000]: url
' Currently, reference 1 is parsed from line 1-10000, reference 2 is parsed from line 2-10000, etc., and this entire block of text has to be extracted beforehand (to remove indents, leading Do you see Commonmark implementation probably doesn't have it, because they can strip indents on the fly (I guess), but here it would mean a big rewrite (possibly can't even be done without losing modularity). Would be interesting to know if you have any ideas regarding that. |
May be we could restrict max lines count for refs to reasonable number? As far as I remember, we have some hard limits in emphasis. All those hard limits can be exposed to options |
@rlidwka yes, I already figured it out. I started digging into the algorithm and I have some progress already but nothing that I can share yet.
This is a great idea. If everyone is happy with that (and I can't come up with anything better) I will adjust my PR to use the limit. |
We noticed a major performance problem when parsing a long list of references similar to this benchmark: https://github.com/markdown-it/markdown-it/blob/master/benchmark/samples/block-ref-list.md
In our case we have a list of 1000+ references.
The root cause seems to be this termination logic:
markdown-it/lib/rules_block/reference.mjs
Lines 29 to 51 in d07d585
Removing this logic doesn't break any tests and improves speed of parsing our long list 30x 🙀
I tried to find some similar problems and found this thread: #54
I believe this table is incorrect but I'm not sure:
markdown-it/lib/parser_block.js
Lines 13 to 25 in d72c68b
From the CommonMark spec I can't see that reference can be terminated by other rules and it's the other way around actually - the reference can terminate some of the rules. Am I correct?
I tried modifying the code above to the variant below and all the tests are passing performance is still fast:
Could someone check if my understanding is correct? I would be happy to open a PR.
The text was updated successfully, but these errors were encountered: