Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't count form feed character (\f or ^L) as newline, optionally #2609

Closed
skangas opened this issue Nov 18, 2022 · 3 comments
Closed

Don't count form feed character (\f or ^L) as newline, optionally #2609

skangas opened this issue Nov 18, 2022 · 3 comments

Comments

@skangas
Copy link
Collaborator

skangas commented Nov 18, 2022

In Emacs, the form feed \f or ^L (Ctrl+L) character, 0xC in ASCII, is not considered to start a new line. Wikipedia explains:

The form feed character is sometimes used in plain text files of source code as a delimiter for a page break, or as marker for sections of code. Some editors, in particular emacs and vi, have built-in commands to page up/down on the form feed character. This convention is predominantly used in Lisp code, and is also seen in C and Python source code. GNU Coding Standards require such form feeds in C.[2] Editors like Vim and Emacs understand such sections and have shortcuts for moving among them.

In Emacs, a file containing just the three characters \n\f\d will be considered to have two (2) lines, the first of them being \n and the second line \f\n. But codespell counts this file as a file with three (3) lines.

Recipe to reproduce:

$ echo -ne "foo\n\f\nte\n" > /tmp/foo.txt
$ codespell /tmp/foo.txt
/tmp/foo.txt:4: te ==> the, be, we, to

If I open /tmp/foo.txt in Emacs, and try to jump to line 4, I end up on the empty line at the end of the file. I do not go to the line containing the typo, which in Emacs is line 3. This obviously gets worse the more \n\f\n there are in a file: every one means we land further and further from the actual typo.

Would it be possible to add an option to treat \f in the way that is expected by Emacs? I'm not sure what it should be called, but something like --form-feed-no-newline or --emacs-form-feed perhaps.

This would help tremendously when using codespell on Emacs Lisp source code. There are many, many Lisp files which contain the character sequence \n\f\n.

Thanks.

@DimitriPapadopoulos
Copy link
Collaborator

DimitriPapadopoulos commented Nov 19, 2022

It looks more understands this too, by default:

$ cat foo.txt 
foo


te
$ 
$ more foo.txt 
foo
^L
te
$ 

Gnome editors too, by default as well:
Geany

Perhaps it should be the default in codespell too.

@skangas
Copy link
Collaborator Author

skangas commented Nov 19, 2022

Yes, indeed it could/should be the default.

However, when I re-run the above using codespell from the master branch I do get the expected result:

/tmp/foo.txt:3: te ==> the, be, we, to

This is surprising to me, as the version I tried with previously was the latest released 2.2.2 on Debian. I can't see any relevant changes in codespell itself in the intervening period. Perhaps some bug was fixed in a dependency that was not yet updated in Debian?

Are you seeing the same as me when running current master?

@DimitriPapadopoulos
Copy link
Collaborator

Fixed by #2378.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants