Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow attr_list quoted values to contain curly braces #1414

Merged
merged 9 commits into from Mar 12, 2024

Conversation

oprypin
Copy link
Contributor

@oprypin oprypin commented Nov 11, 2023

How it worked before:

  • Extract the content without allowing any } in it, and require that it ends with } - for block elements anchored to the end of the line, otherwise not.
  • Parse the content in more detail. No edge cases with } can arise. If parsing is interrupted by some unrecognized token, discard the rest of the string.

How it works now:

  • Extract the content and allow } in it, and require that it ends with } - for block elements it's anchored to the end of the line, otherwise not.
  • Parse the content in more detail. Allow } only within the quoted parts, otherwise interrupt parsing like for any other unrecognized token.
    If parsing is interrupted, there is remaining unrecognized text. Ideally perhaps we would bail out at this point entirely (and not recognize it as an attr_list), but to preserve historic behavior, any extra text before } is just discarded.
    If there is an extra } in the remaining text:
    • For block elements: that must mean that the attr_list syntax did not in fact terminate at the end of the line but earlier. So, bail out and do not register any attributes and do not change the original text.
    • For inline elements: that must mean that we just overmatched a bit, but that's OK, we just assign attrs as normal and put the extra text back into the string. As mentioned, any extra text before } is just discarded.

How it worked before:

  * Extract the content without allowing any `}` in it, and require that it ends with `}` - for block elements anchored to the end of the line, otherwise not.
  * Parse the content in more detail. No edge cases with `}` can arise. If parsing is interrupted by some unrecognized token, discard the rest of the string.

How it works now:

  * Extract the content *and allow* `}` in it, and require that it ends with `}` - for block elements it's anchored to the end of the line, otherwise not.
  * Parse the content in more detail. Allow `}` only within the quoted parts, otherwise interrupt parsing like for any other unrecognized token.
    If parsing is interrupted, there is remaining unrecognized text. Ideally perhaps we would bail out at this point entirely (and not recognize it as an attr_list), but to preserve historic behavior, any extra text before `}` is just discarded.
    If there is an extra `}` in the remaining text:
      * For block elements: that must mean that the attr_list syntax did not in fact terminate at the end of the line but earlier. So, bail out and do not register any attributes and do not change the original text.
      * For inline elements: that must mean that we just overmatched a bit, but that's OK, we just assign attrs as normal and put the extra text back into the string. As mentioned, any extra text *before* `}` is just discarded.
@oprypin
Copy link
Contributor Author

oprypin commented Nov 11, 2023

I'm quite happy and confident with this change, and actually it preserves prior behavior very well in cases that were already recognized as an attr_list. (As evidenced by not needing to change anything in existing test cases).
But maybe we should give it time in case someone finds more edge cases.

Copy link
Member

@waylan waylan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we added this to the Contributing Guide, but I can't find it now. In any event, it has been our policy for some time that whenever we need to add/edit tests, that they be moved to the new testing framework rather then be added to or edited in the legacy framework. In other words, in this case, the new test should be in test_syntax/extensions/test_attr_list.py

Otherwise, generally speaking I think this is a reasonable approach. However, I have not considered the specifics closely yet. I will come back to this this later (probably in a week or more as I am unavailable the rest of this week).

@oprypin
Copy link
Contributor Author

oprypin commented Nov 13, 2023

I think the "legacy" approach is way more developer-friendly. I don't want to develop a new test file. If that's a blocker, I hope someone can help out with a commit based on this one.

@waylan
Copy link
Member

waylan commented Nov 13, 2023

The test file already exists. I even pointed to it, but here it is again test_syntax/extensions/test_attr_list.py. You just need to add a test case. And yes, this would be a blocker.

@waylan waylan added the requires-changes Awaiting updates after a review. label Jan 12, 2024
@waylan
Copy link
Member

waylan commented Mar 8, 2024

In this comment I stated:

if someone were to submit a PR which consistently altered the attr_list behavior across all elements to allow curly braces, I would be wiling to give it consideration.

"All elements" includes fenced code blocks (which is clear if you read the proceeding paragraph). However, fenced code blocks have not been addressed here at all. Note that I have added a relevant test, which is currently failing. That test, and potentially some other similar (yet to be created) tests need to be passing before the stated requirements are met.

@oprypin
Copy link
Contributor Author

oprypin commented Mar 8, 2024

Thanks for the test. OK let me quickly check that out.

@oprypin
Copy link
Contributor Author

oprypin commented Mar 8, 2024

Alright, done!

Note:
Out of all the newly added tests, only these 3 tests don't pass with preexisting code:
testFencedCodeCurlyInAttrs
test_curly_in_double_quote
test_curly_in_single_quote

For all others, preexisting behavior is preserved.

@waylan waylan merged commit 3d8afc6 into Python-Markdown:master Mar 12, 2024
17 checks passed
@waylan waylan added approved The pull request is ready to be merged. and removed requires-changes Awaiting updates after a review. labels Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved The pull request is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

attr_list breaks if the value contains braces, requires HTML-encoding them
2 participants