Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace codecs.open with open #2378

Merged
merged 1 commit into from
Oct 29, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
15 changes: 7 additions & 8 deletions codespell_lib/_codespell.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@
"""

import argparse
import codecs
import configparser
import fnmatch
import os
Expand Down Expand Up @@ -172,7 +171,7 @@ def open(self, filename):

def open_with_chardet(self, filename):
self.encdetector.reset()
with codecs.open(filename, 'rb') as f:
with open(filename, 'rb') as f:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 This one should have been open() in the first place, since we do not use the encoding argument.

for line in f:
self.encdetector.feed(line)
if self.encdetector.done:
Expand All @@ -181,7 +180,7 @@ def open_with_chardet(self, filename):
encoding = self.encdetector.result['encoding']

try:
f = codecs.open(filename, 'r', encoding=encoding)
f = open(filename, encoding=encoding, newline='')
except UnicodeDecodeError:
print("ERROR: Could not detect encoding: %s" % filename,
file=sys.stderr)
Expand All @@ -205,7 +204,7 @@ def open_with_internal(self, filename):
elif not self.quiet_level & QuietLevels.ENCODING:
print("WARNING: Trying next encoding %s"
% encoding, file=sys.stderr)
with codecs.open(filename, 'r', encoding=encoding) as f:
with open(filename, encoding=encoding, newline='') as f:
try:
lines = f.readlines()
except UnicodeDecodeError:
Expand Down Expand Up @@ -463,19 +462,19 @@ def parse_ignore_words_option(ignore_words_option):


def build_exclude_hashes(filename, exclude_lines):
with codecs.open(filename, mode='r', encoding='utf-8') as f:
with open(filename, encoding='utf-8') as f:
for line in f:
exclude_lines.add(line)


def build_ignore_words(filename, ignore_words):
with codecs.open(filename, mode='r', encoding='utf-8') as f:
with open(filename, encoding='utf-8') as f:
for line in f:
ignore_words.add(line.strip())


def build_dict(filename, misspellings, ignore_words):
with codecs.open(filename, mode='r', encoding='utf-8') as f:
with open(filename, encoding='utf-8') as f:
for line in f:
[key, data] = line.split('->')
# TODO for now, convert both to lower. Someday we can maybe add
Expand Down Expand Up @@ -767,7 +766,7 @@ def parse_file(filename, colors, summary, misspellings, exclude_lines,
print("%sFIXED:%s %s"
% (colors.FWORD, colors.DISABLE, filename),
file=sys.stderr)
with codecs.open(filename, 'w', encoding=encoding) as f:
with open(filename, 'w', encoding=encoding, newline='') as f:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to look this up in codecs.open:

Note: If encoding is not None, then the underlying encoded files are always opened in binary mode. No automatic conversion of '\n' is done on reading and writing. The mode argument may be any binary mode acceptable to the built-in open() function; the 'b' is automatically added.

I wonder whether strictly equivalent functionality wouldn't require changing the mode from 'w' to 'wb', instead of adding newline=''. But then, from the open documentation:

encoding is the name of the encoding used to decode or encode the file. This should only be used in text mode. The default encoding is platform dependent (whatever locale.getpreferredencoding() returns), but any text encoding supported by Python can be used.

Therefore, 'w' is required.

But then FileOpener used to open files with codecs.open(filename, 'r', encoding=encoding):

f = codecs.open(filename, 'r', encoding=encoding)

with codecs.open(filename, 'r', encoding=encoding) as f:

With codecs, all files were read in binary mode, then written back in binary mode, which preserves newlines. Without codecs, how do we preserve newlines? Maybe I am missing something here, but I believe more work is needed to preserve newlines.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I get it:

  1. Files are now opened with open(filename, encoding=encoding, newline='') instead of codecs.open(filename, 'r', encoding=encoding), so newlines are preserved when reading files, while at the same time changing mode from 'rb' to 'r' as required when using the encoding argument of open().
  2. Files are now written with open(filename, 'w', encoding=encoding, newline='') instead of codecs.open(filename, 'w', encoding=encoding), so newlines are preserved when writing files, while at the same time changing mode from 'wb' to 'w' as required when using the encoding argument of open().

Perhaps this should be explained in the commit?

f.writelines(lines)
return bad_count

Expand Down