Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any text with \r (including with an escaped \) is removed by parser #1634

Closed
QuinnStraus opened this issue Dec 10, 2024 · 5 comments · Fixed by #1642 or #1644
Closed

Any text with \r (including with an escaped \) is removed by parser #1634

QuinnStraus opened this issue Dec 10, 2024 · 5 comments · Fixed by #1642 or #1644
Assignees
Labels
bug Something isn't working

Comments

@QuinnStraus
Copy link

QuinnStraus commented Dec 10, 2024

Expected Behavior

In a text document use of an escaped backslash before an r character should not remove the r character. This is necessary for eg. latex, where commands like \right) are common.

Actual Behavior

All \r are removed from the html document, even if the raw string is formatted as "\\r".

This seems to have been caused by the fix to #864

Steps to Reproduce

Create a html document with \\r inside the html string, run it through the html parser.

Reproducible Demo

https://stackblitz.com/edit/html-react-parser-typescript-w9j4u9vu?file=src%2Findex.tsx

Environment

  • Version: 5.2.0
  • Browser: Chrome
  • OS: Macos

Keywords

@QuinnStraus QuinnStraus added the bug Something isn't working label Dec 10, 2024
@deadlyhifi
Copy link

deadlyhifi commented Dec 12, 2024

I have just encountered the same issue which has broken equations containing \right. The solution at the moment is to rollback to version 5.1.18.

In latex \left and \right are used for delimiters when they have to change the size dynamically depending on the content.

👍 Nicely explained and reproduced @QuinnStraus.

@remarkablemark
Copy link
Owner

Thanks for opening this issue! It's related to remarkablemark/html-dom-parser#902.

I wonder what's the best way to fix this without adding more complexity. Do you think introducing another option makes sense? E.g.:

parse(html, {
  escapeCarriageReturn: true, // defaults to false
});

@QuinnStraus
Copy link
Author

QuinnStraus commented Dec 16, 2024

I don't think there is necessarily a conflict here, since in the raw string the latex \right will have an escaped backslash (\\right), so given that the original issue was about the character \r it should be fine.

After looking at the pull request it seems like it

  1. replaces all \r with \\r
  2. runs the DOM parsing (which presumably removes all \r)
  3. Replaces all \\r with \r again.
    However the final step also replaces all \\r in the original document with the carriage return \r, which leads to the issue.

I think a fix would be to replace \r with a string that is not likely to be used, such as __CAR_RETURN_(random symbols)__ so performing the reverse replacement does not replace anything else.

@remarkablemark
Copy link
Owner

Thanks for your help on this issue! Can you verify that the bug has been fixed in:

@QuinnStraus
Copy link
Author

Does seem to be working now! Thanks for your quick fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants