Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

$ is not line-ending aware #3943

Closed
sbromberger opened this issue Sep 22, 2022 · 7 comments
Closed

$ is not line-ending aware #3943

sbromberger opened this issue Sep 22, 2022 · 7 comments
Labels
A-helix-term Area: Helix term improvements C-bug Category: This is a bug upstream

Comments

@sbromberger
Copy link
Contributor

sbromberger commented Sep 22, 2022

Summary

In vim, $ represents the end of the line regardless of how the line ending is encoded (specifically for this issue, CRLF vs LF). In helix, $ appears to map only to \n.

Having $ map to the end-of-line sequence (\r\n for CRLF encodings) would obviate the need to understand the line encoding of the file being edited, so that searches for things at the end of the line can be consistent.

Reproduction Steps

  1. Create a file with CRLF line endings and a comma as the last character of a line.
  2. Search for "comma followed by line ending" which is typically /,$ - it will not be found.
  3. Search for /,\r$ - it will be found.

Repeating for a file with LF line endings, step 2 will succeed, and step 3 will fail.

Platform

Linux

Terminal Emulator

blink.sh / mosh

Helix Version

22.08.1-121-g8e363178

@sbromberger sbromberger added the C-bug Category: This is a bug label Sep 22, 2022
@kirawi
Copy link
Member

kirawi commented Sep 22, 2022

rust-lang/regex#244

According to the upstream issue, it looks like using \r?$ is the best workaround.

@kirawi kirawi added A-helix-term Area: Helix term improvements upstream labels Sep 22, 2022
@sbromberger
Copy link
Contributor Author

sbromberger commented Sep 22, 2022

Related question: is helix beholden to whatever the rust regex package decides? That is, helix could interpret $ as \r?$ internally, and that would "fix" helix.

@dead10ck
Copy link
Member

I totally empathize with the issue here, and I absolutely agree \r\n should be seen as an eol. However, I shudder to start implementing subtle changes to the regex language in use, which amounts to starting a new dialect.

But also, that issue has been open since 2016, so who knows how long it will take to get fixed?

I'm really torn on this one.

@sbromberger
Copy link
Contributor Author

sbromberger commented Sep 22, 2022

I get it - but consider the argument that helix is using the regex library a bit differently than most developers use a regex library. That is, file-based regex parsing is different from stream- or data-based parsing in a few ways, line termination among them*.

Fundamentally they're the same thing, but an argument could be made that the implementations/usages diverge enough that it might not be completely out of line to make some changes that benefit Helix's file-based use of regex more than other uses.

Just food for thought. I'm torn too but this particular issue caused me a couple hours of grief. (It was only because I saw the CRLF indicator in the statusline that caused me to search for a solution. That, and a question posed on Matrix.)

*Edited to clarify: the main difference is that line endings are almost always significant in file-based regex parsing, since eol is typically the "record terminator", but are only sometimes significant in stream/data-based parsing.

@BurntSushi
Copy link

I'm expecting $ (and ^) to be capable of being CRLF aware once rust-lang/regex#656 lands. I don't have a specific timeline, but I'm hoping for the next few months.

If y'all wouldn't mind reviewing my summary here to make sure it looks right to you, that'd be great: rust-lang/regex#244 (comment)

Also note that ripgrep has its own work-around for this issue (which I hope to remove once this lands in the regex engine), and it essentially works by rewriting $ as (?:\r??$). See: https://docs.rs/grep-regex/latest/grep_regex/struct.RegexMatcherBuilder.html#method.crlf

@pascalkuthe
Copy link
Member

This has been fixed upstream

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-helix-term Area: Helix term improvements C-bug Category: This is a bug upstream
Projects
None yet
Development

No branches or pull requests

5 participants