Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C API] Add an efficient public PyUnicodeWriter API #119182

Open
vstinner opened this issue May 19, 2024 · 28 comments
Open

[C API] Add an efficient public PyUnicodeWriter API #119182

vstinner opened this issue May 19, 2024 · 28 comments
Labels
topic-C-API type-feature A feature request or enhancement

Comments

@vstinner
Copy link
Member

vstinner commented May 19, 2024

Feature or enhancement

Creating a Python string object in an efficient way is complicated. Python has private _PyUnicodeWriter API. It's being used by these projects:

Affected projects (5):

  • Cython (3.0.9)
  • asyncpg (0.29.0)
  • catboost (1.2.3)
  • frozendict (2.4.0)
  • immutables (0.20)

I propose making the API public to promote it and help C extensions maintainers to write more efficient code to create Python string objects.

API:

typedef struct PyUnicodeWriter PyUnicodeWriter;

PyAPI_FUNC(PyUnicodeWriter*) PyUnicodeWriter_Create(void);
PyAPI_FUNC(void) PyUnicodeWriter_Discard(PyUnicodeWriter *writer);
PyAPI_FUNC(PyObject*) PyUnicodeWriter_Finish(PyUnicodeWriter *writer);

PyAPI_FUNC(void) PyUnicodeWriter_SetOverallocate(
    PyUnicodeWriter *writer,
    int overallocate);

PyAPI_FUNC(int) PyUnicodeWriter_WriteChar(
    PyUnicodeWriter *writer,
    Py_UCS4 ch);
PyAPI_FUNC(int) PyUnicodeWriter_WriteUTF8(
    PyUnicodeWriter *writer,
    const char *str,  // decoded from UTF-8
    Py_ssize_t len);  // use strlen() if len < 0
PyAPI_FUNC(int) PyUnicodeWriter_Format(
    PyUnicodeWriter *writer,
    const char *format,
    ...);

// Write str(obj)
PyAPI_FUNC(int) PyUnicodeWriter_WriteStr(
    PyUnicodeWriter *writer,
    PyObject *obj);

// Write repr(obj)
PyAPI_FUNC(int) PyUnicodeWriter_WriteRepr(
    PyUnicodeWriter *writer,
    PyObject *obj);

// Write str[start:end]
PyAPI_FUNC(int) PyUnicodeWriter_WriteSubstring(
    PyUnicodeWriter *writer,
    PyObject *str,
    Py_ssize_t start,
    Py_ssize_t end);

The internal writer buffer is overallocated by default. PyUnicodeWriter_Finish() truncates the buffer to the exact size if the buffer was overallocated.

Overallocation reduces the cost of exponential complexity when adding short strings in a loop. Use PyUnicodeWriter_SetOverallocate(writer, 0) to disable overallocation just before the last write.

The writer takes care of the internal buffer kind: Py_UCS1 (latin1), Py_UCS2 (BMP) or Py_UCS4 (full Unicode Character Set). It also implements an optimization if a single write is made using PyUnicodeWriter_WriteStr(): it returns the string unchanged without any copy.


Example of usage (simplified code from Python/unionobject.c):

static PyObject *
union_repr(PyObject *self)
{
    unionobject *alias = (unionobject *)self;
    Py_ssize_t len = PyTuple_GET_SIZE(alias->args);

    PyUnicodeWriter *writer = PyUnicodeWriter_Create();
    if (writer == NULL) {
        return NULL;
    }

    for (Py_ssize_t i = 0; i < len; i++) {
        if (i > 0 && PyUnicodeWriter_WriteUTF8(writer, " | ", 3) < 0) {
            goto error;
        }
        PyObject *p = PyTuple_GET_ITEM(alias->args, i);
        if (PyUnicodeWriter_WriteRepr(writer, p) < 0) {
            goto error;
        }
    }
    return PyUnicodeWriter_Finish(writer);

error:
    PyUnicodeWriter_Discard(writer);
    return NULL;
}

Linked PRs

@vstinner vstinner added type-feature A feature request or enhancement topic-C-API labels May 19, 2024
vstinner added a commit to vstinner/cpython that referenced this issue May 19, 2024
Move the private _PyUnicodeWriter API to the internal C API.
@vstinner
Copy link
Member Author

Benchmark using:

bench_concat: Mean +- std dev: 2.07 us +- 0.03 us
bench_writer: Mean +- std dev: 894 ns +- 13 ns

PyUnicodeWriter is 2.3x faster than PyUnicode_Concat()+PyUnicode_Append().

The difference comes from overallocation: if I add PyUnicodeWriter_SetOverallocate(writer, 0); after PyUnicodeWriter_Create(), PyUnicodeWriter has the same performance than PyUnicode_Concat()+PyUnicode_Append(). Overallocation avoids str += str quadratic complexity (well, at least, it reduces the complexity).

The PyUnicodeWriter API makes overallocation easy to use.

cc @serhiy-storchaka

@vstinner
Copy link
Member Author

By the way, PyPy provides __pypy__.builders.StringBuilder for "Fast String Concatenation": https://doc.pypy.org/en/latest/__pypy__-module.html#fast-string-concatenation to work around the str += str quadratic complexity.

$ pypy3.9 
>>>> import __pypy__
>>>> b=__pypy__.builders.StringBuilder()
>>>> b.append('x')
>>>> b.append('=')
>>>> b.append('value')
>>>> b.build()
'x=value'

@vstinner
Copy link
Member Author

Article about this performance problem in Python: https://lwn.net/Articles/816415/

@gvanrossum
Copy link
Member

Curious if this warrants a further API PyUnicodeWriter_WriteStr(writer, obj) which appends repr(obj) (just as WriteStr(writer, obj) can be seen to append str(obj)), and eventually the development of a new type slot that writes the repr or str of an object to a writer rather than returning a string object. (And maybe even an "WriteAscii" to write ascii(obj)) and WriteFormat to do something with formats. :-)

I know, I know, hyper-generalization, yet this is what the Union example is screaming for... I suppose we can add those later.

How long has the internal writer API existed?

Would these be in the Stable ABI / Limited API from the start? (API-wise these look stable.)

@vstinner
Copy link
Member Author

Curious if this warrants a further API PyUnicodeWriter_WriteStr(writer, obj) which appends repr(obj)

I suppose that you mean PyUnicodeWriter_WriteRepr().

Curious if this warrants a further API PyUnicodeWriter_WriteStr(writer, obj) which appends repr(obj) (just as WriteStr(writer, obj) can be seen to append str(obj)), and eventually the development of a new type slot that writes the repr or str of an object to a writer rather than returning a string object. (And maybe even an "WriteAscii" to write ascii(obj)) and WriteFormat to do something with formats. :-)

There is already a collection of helper function accepting a writer and I find this really cool. It's not "slot-based", since each function has many formatting options.

extern int _PyLong_FormatWriter(
    _PyUnicodeWriter *writer,
    PyObject *obj,
    int base,
    int alternate);

extern int _PyLong_FormatAdvancedWriter(
    _PyUnicodeWriter *writer,
    PyObject *obj,
    PyObject *format_spec,
    Py_ssize_t start,
    Py_ssize_t end);

extern int _PyFloat_FormatAdvancedWriter(
    _PyUnicodeWriter *writer,
    PyObject *obj,
    PyObject *format_spec,
    Py_ssize_t start,
    Py_ssize_t end);

extern int _PyComplex_FormatAdvancedWriter(
    _PyUnicodeWriter *writer,
    PyObject *obj,
    PyObject *format_spec,
    Py_ssize_t start,
    Py_ssize_t end);

extern int _PyUnicode_FormatAdvancedWriter(
    _PyUnicodeWriter *writer,
    PyObject *obj,
    PyObject *format_spec,
    Py_ssize_t start,
    Py_ssize_t end);

extern Py_ssize_t _PyUnicode_InsertThousandsGrouping(
    _PyUnicodeWriter *writer,
    Py_ssize_t n_buffer,
    PyObject *digits,
    Py_ssize_t d_pos,
    Py_ssize_t n_digits,
    Py_ssize_t min_width,
    const char *grouping,
    PyObject *thousands_sep,
    Py_UCS4 *maxchar);

These functions avoid memory copies. For example, _PyLong_FormatWriter() writes directly digits in the writter buffer, without the need of a temporary buffer.

How long has the internal writer API existed?

12 years: I added it in 2012.

commit 202fdca133ce8f5b0c37cca1353070e0721c688d
Author: Victor Stinner <victor.stinner@gmail.com>
Date:   Mon May 7 12:47:02 2012 +0200

    Close #14716: str.format() now uses the new "unicode writer" API instead of the
    PyAccu API. For example, it makes str.format() from 25% to 30% faster on Linux.

I wrote this API to fix the major performance regression after PEP 393 – Flexible String Representation was implemented. After my optimization work, many string operations on Unicode objects became faster than Python 2 operations on bytes! Especially when treating only ASCII characters which is the most common case. I mostly optimized str.format() and str % args where are powerful but complex.

In 2016, I wrote an article about the two "writer" APIs that I wrote to optimize: https://vstinner.github.io/pybyteswriter.html

Would these be in the Stable ABI / Limited API from the start? (API-wise these look stable.)

I would prefer to not add it to the limited C API directly, but wait one Python version to see how it goes.

@gvanrossum
Copy link
Member

(Yes, I meant WriteRepr.) I like these other helpers -- can we just add them all to the public API? Or are there issues with any of them?

@vstinner
Copy link
Member Author

vstinner commented May 20, 2024

(Yes, I meant WriteRepr.) I like these other helpers -- can we just add them all to the public API? Or are there issues with any of them?

I added the following function which should fit most of these use cases:

PyAPI_FUNC(int) PyUnicodeWriter_FromFormat(
    PyUnicodeWriter *writer,
    const char *format,
    ...);

Example to write repr(obj):

PyUnicodeWriter_FromFormat(writer, "%R", obj);

Example to write str(obj):

PyUnicodeWriter_FromFormat(writer, "%S", obj);

It's the same format than PyUnicode_FromFormat(). Example:

PyUnicodeWriter_FromFormat(writer, "Hello %s, %i.", "Python", 123);

@encukou
Copy link
Member

encukou commented May 21, 2024

Thank you, this looks very useful!

I see that PyUnicodeWriter_Finish frees the writer. That's great; it allows optimizations we can also use in other writers/builders in the future. (Those should have a consistent API.)
One thing to note is that PyUnicodeWriter_Finish should free the writer even when an error occurs.
Maybe PyUnicodeWriter_Free should be named e.g. PyUnicodeWriter_Discard to emphasize that you should only call it if you didn't Finish.

The va_arg function is problematic for non-C languages, but it's possible to get the functionality with other functions – especially if we add a number-writing helper, so I'm OK with adding it.

The proposed API is nice and minimal. My bet about what users will ask for next goes to PyUnicodeWriter_WriteUTF8String (for IO) & PyUnicodeWriter_WriteUTF16String (for Windows or Java interop).

Name bikeshedding:

  • PyUnicodeWriter_WriteUCS4Char rather than PyUnicodeWriter_WriteChar -- character is an overloaded term, let's be specific.
  • PyUnicodeWriter_WriteFormat (or WriteFromFormat?) rather than PyUnicodeWriter_FromFormat -- it's writing, not creating a writer.

I see the PR hides underscored API that some existing projects use. I thought we weren't doing that any more.

@vstinner
Copy link
Member Author

PyUnicodeWriter_WriteUCS4Char rather than PyUnicodeWriter_WriteChar -- character is an overloaded term, let's be specific.

"WriteChar" name comes from PyUnicode_ReadChar() and PyUnicode_WriteChar() names. I don't think that mentioning UCS4 is useful.

PyUnicodeWriter_WriteFormat (or WriteFromFormat?) rather than PyUnicodeWriter_FromFormat -- it's writing, not creating a writer.

I would prefer just "PyUnicodeWriter_Format()". I prefer to not support str.format() which is more a "Python API" than a C API. It's less convenient to use in C. If we don't support str.format(), "PyUnicodeWriter_Format()" is fine for the "PyUnicode_FormFormat()" variant.

@encukou
Copy link
Member

encukou commented May 21, 2024

Yeah, PyUnicodeWriter_Format sounds good. It avoids the PyX_FromY scheme we use for constructing new objects.

I think that using unqualified Char for a UCS4 codepoint was a mistake we shouldn't continue, but I'm happy to be outvoted on that.

vstinner added a commit to vstinner/cpython that referenced this issue May 21, 2024
Move the private _PyUnicodeWriter API to the internal C API.
@vstinner
Copy link
Member Author

The proposed API is nice and minimal. My bet about what users will ask for next goes to PyUnicodeWriter_WriteUTF8String (for IO) & PyUnicodeWriter_WriteUTF16String (for Windows or Java interop).

I propose to add PyUnicodeWriter_WriteString() which decodes from UTF-8 (in strict mode).

PyUnicodeWriter_WriteASCIIString() has an undefined behavior if the string contains non-ASCII characters. Maybe it should be removed in favor of PyUnicodeWriter_WriteString() which is safer (well defined behavior for non-ASCII characters: decode them from UTF-8).

vstinner added a commit to vstinner/cpython that referenced this issue May 22, 2024
Add unicode_decode_utf8_writer() to write directly characters into a
_PyUnicodeWriter writer. Optimize PyUnicode_FromFormat() by using the
new unicode_decode_utf8_writer().

Rename unicode_fromformat_write_cstr() to
unicode_fromformat_write_utf8().

Microbenchmark on the code:

    return PyUnicode_FromFormat(
        "%s %s %s %s %s.",
        "format", "multiple", "utf8", "short", "strings");

Result: 620 ns +- 8 ns -> 382 ns +- 2 ns: 1.62x faster.
vstinner added a commit to vstinner/cpython that referenced this issue May 22, 2024
Add unicode_decode_utf8_writer() to write directly characters into a
_PyUnicodeWriter writer: avoid the creation of a temporary string.
Optimize PyUnicode_FromFormat() by using the new
unicode_decode_utf8_writer().

Rename unicode_fromformat_write_cstr() to
unicode_fromformat_write_utf8().

Microbenchmark on the code:

    return PyUnicode_FromFormat(
        "%s %s %s %s %s.",
        "format", "multiple", "utf8", "short", "strings");

Result: 620 ns +- 8 ns -> 382 ns +- 2 ns: 1.62x faster.
vstinner added a commit to vstinner/cpython that referenced this issue May 22, 2024
Add unicode_decode_utf8_writer() to write directly characters into a
_PyUnicodeWriter writer: avoid the creation of a temporary string.
Optimize PyUnicode_FromFormat() by using the new
unicode_decode_utf8_writer().

Rename unicode_fromformat_write_cstr() to
unicode_fromformat_write_utf8().

Microbenchmark on the code:

    return PyUnicode_FromFormat(
        "%s %s %s %s %s.",
        "format", "multiple", "utf8", "short", "strings");

Result: 620 ns +- 8 ns -> 382 ns +- 2 ns: 1.62x faster.
vstinner added a commit to vstinner/cpython that referenced this issue May 22, 2024
Add unicode_decode_utf8_writer() to write directly characters into a
_PyUnicodeWriter writer: avoid the creation of a temporary string.
Optimize PyUnicode_FromFormat() by using the new
unicode_decode_utf8_writer().

Rename unicode_fromformat_write_cstr() to
unicode_fromformat_write_utf8().

Microbenchmark on the code:

    return PyUnicode_FromFormat(
        "%s %s %s %s %s.",
        "format", "multiple", "utf8", "short", "strings");

Result: 620 ns +- 8 ns -> 382 ns +- 2 ns: 1.62x faster.
@serhiy-storchaka
Copy link
Member

The main problem with the current private PyUnicodeWriter C API is that it requires allocating the PyUnicodeWriter value on the stack, but its layout is an implementation detail, and exposing such API would prevent future changes. The proposed new C API allocates the data in dynamic memory, which makes it more portable and future proof. But this can add additional overhead. Also, if we use dynamic memory, why not make PyUnicodeWriter a subclass of PyObject? Then Py_DECREF could be used to destroy it, we could store multiple writers in a collection, and we can even provide Python interface for it.

@vstinner
Copy link
Member Author

The proposed new C API allocates the data in dynamic memory, which makes it more portable and future proof. But this can add additional overhead.

I ran benchmarks and using the proposed public API remains interesting in terms of performance: see benchmarks below.

Also, if we use dynamic memory, why not make PyUnicodeWriter a subclass of PyObject? Then Py_DECREF could be used to destroy it, we could store multiple writers in a collection, and we can even provide Python interface for it.

Adding a Python API is appealing, but I prefer to restrict this discussion to a C API and only discuss later the idea of exposing it at the Python level.

For the C API, I don't think that Py_DECREF() semantics and inheriting from PyObject are really worth it.

vstinner added a commit to vstinner/cpython that referenced this issue May 22, 2024
Add unicode_decode_utf8_writer() to write directly characters into a
_PyUnicodeWriter writer: avoid the creation of a temporary string.
Optimize PyUnicode_FromFormat() by using the new
unicode_decode_utf8_writer().

Rename unicode_fromformat_write_cstr() to
unicode_fromformat_write_utf8().

Microbenchmark on the code:

    return PyUnicode_FromFormat(
        "%s %s %s %s %s.",
        "format", "multiple", "utf8", "short", "strings");

Result: 620 ns +- 8 ns -> 382 ns +- 2 ns: 1.62x faster.
@vstinner
Copy link
Member Author

I renamed functions:

  • PyUnicodeWriter_WriteString() => PyUnicodeWriter_WriteUTF8(): API with const char *str.
  • PyUnicodeWriter_WriteStr() => PyUnicodeWriter_WriteString(): API with PyObject *str.
  • PyUnicodeWriter_FromFormat() => PyUnicodeWriter_Format().

@vstinner
Copy link
Member Author

@encukou:

I see the PR hides underscored API that some existing projects use. I thought we weren't doing that any more.

Right, I would like to hide/remove the internal API from the public C API in Python 3.14 while adding the new public C API. The private _PyUnicodeWriter API exposes the _PyUnicodeWriter structure (members). Its API is more complicated and more error-prone.

I prepared a PR for pythoncapi-compat to check that it's possible to implement the new API on Python 3.6-3.13: python/pythoncapi-compat#95

vstinner added a commit to vstinner/cpython that referenced this issue May 23, 2024
Move the private _PyUnicodeWriter API to the internal C API.
@serhiy-storchaka
Copy link
Member

There is some confusion with names. The String suffix usually means the C string (const char *) argument. Str is only used in PyObject_Str() which is the C analogue of the str() function.

So, for consistency we should use PyUnicodeWriter_WriteString() for writing the C string. This left us with the question what to do with Python strings. PyUnicodeWriter_WriteStr() implies that str() is called for argument. Even if we add such API, it is worth to have also a more restricted function which fails if non-string is passed by accident.

@vstinner
Copy link
Member Author

This left us with the question what to do with Python strings.

We can refer to them as "Unicode", such as: PyUnicodeWriter_WriteUnicode(). Even if the Python type is called "str", in C, it's the PyUnicodeObject: https://docs.python.org/dev/c-api/unicode.html

@serhiy-storchaka
Copy link
Member

Unfortunately Unicode as a suffix was used in the legacy C API related to Py_UNICODE *. Currently it is only used in the C API to "unicode-escape" and "raw-unicode-escape", so we could restore it with a new meaning, but it will be the first case of using it in this role.

Perhaps we can just omit any suffix and use PyUnicodeWriter_Write()?

@vstinner
Copy link
Member Author

About bikeshedding. PyPy provides __pypy__.builders.StringBuilder with append() and build() methods. Do you think that "String Builder" with append and build methods API (names) makes more sense than "Unicode Writer" with write and finish methods?

@encukou
Copy link
Member

encukou commented May 23, 2024

I'd prefer a PyUnicodeWriter_WriteStr that calls PyObject_Str() (which should be cheap for actual PyUnicode objects). It would pair well with the proposed future additions, PyUnicodeWriter_WriteRepr & PyUnicodeWriter_WriteAscii :)

PyUnicodeWriter_WriteSubstring can still take PyUnicode only.

PyUnicodeWriter_WriteUTF8 is a good name. Do you want to support zero-terminated strings (e.g. by passing -1 as the length)?

@vstinner
Copy link
Member Author

I'd prefer a PyUnicodeWriter_WriteStr that calls PyObject_Str() (which should be cheap for actual PyUnicode objects). It would pair well with the proposed future additions, PyUnicodeWriter_WriteRepr & PyUnicodeWriter_WriteAscii :)

As written previously, you can already use:

  • PyUnicodeWriter_Format("%S", obj) to write str(obj).
  • PyUnicodeWriter_Format("%R", obj) to write repr(obj).
  • PyUnicodeWriter_Format("%A", obj) to write ascii(obj).

Currently, there is no optimization for these code paths. It's the same as creating a temporary string, write the string, delete the string. It's just a convenient API for that. Later we can imagine further optimizations.

Proposed PyUnicodeWriter_WriteStr() / PyUnicodeWriter_WriteString() (not sure about the name) fails with TypeError if the argument is not a Python str object. I'm only looking for a good name for the name function. I don't want to call PyObject_Str(). The API is really designed for performance. It should do at least work as possible and have a straightforward API.

If later we consider that a new function would be added, I would prefer PyUnicodeWriter_WriteObjectStr() name for str(obj).

PyUnicodeWriter_WriteUTF8 is a good name. Do you want to support zero-terminated strings (e.g. by passing -1 as the length)?

I didn't write the API documentation yet. It's already supported, passing -1 already calls strlen().

@vstinner
Copy link
Member Author

PyUnicodeWriter_WriteSubstring can still take PyUnicode only.

Right, it raises TypeError if the argument is not a Python str objet. Same than PyUnicodeWriter_WriteString().

@encukou
Copy link
Member

encukou commented May 23, 2024

IMO, the best name is PyUnicodeWriter_WriteStr, except it's a bit ambiguous -- people might expect it to call str(). We can solve the ambiguity by simply making it do that, as a convenience to the user. It won't affect performance in any meaningful way.

@vstinner
Copy link
Member Author

IMO, the best name is PyUnicodeWriter_WriteStr, except it's a bit ambiguous -- people might expect it to call str().

I don't see why users would expect that. I don't know any existing API with a similar name which call str(), only PyObject_Str() calls it. If it's ambiguous, we can make it explicit in the documentation.

It won't affect performance in any meaningful way.

It's not about performance, but the API. I want a function to only write a string, and nothing else.

@gvanrossum
Copy link
Member

It's not about performance, but the API. I want a function to only write a string, and nothing else.

But why? This API feels more like print() (which implicitly calls str() if needed) or f-string interpolation (which does something similar) and less like TextIO.write() (which insists on a str instance). I like this convenience.

@vstinner
Copy link
Member Author

My issue is that my proposed API is based on an existing implementation which is around for 12 years. It's uneasy for me to think "ouf of the box" to design a new better API, but that's why I opened this discussion :-) To get other opinions to help me to design a better usable API.

If the majority prefers calling str(), ok, let's switch to that for PyUnicodeWriter_WriteStr().

I checked the Python code base, there are a few code places using repr() with a writer: dict, list, tuple, union, context, token, etc. So I propose to add also PyUnicodeWriter_WriteRepr().

@vstinner
Copy link
Member Author

Update:

  • Rename PyUnicodeWriter_Free() to PyUnicodeWriter_Discard().
  • Rename PyUnicodeWriter_WriteString() to PyUnicodeWriter_WriteStr(): the function now calls str(obj).
  • Add PyUnicodeWriter_WriteRepr(): call repr(obj).

@vstinner
Copy link
Member Author

I opened an issue for the C API Working Group: capi-workgroup/decisions#27

vstinner added a commit to vstinner/cpython that referenced this issue May 24, 2024
Move the private _PyUnicodeWriter API to the internal C API.
vstinner added a commit to vstinner/cpython that referenced this issue May 24, 2024
vstinner added a commit to vstinner/cpython that referenced this issue Jun 5, 2024
vstinner added a commit to vstinner/cpython that referenced this issue Jun 7, 2024
vstinner added a commit to vstinner/cpython that referenced this issue Jun 7, 2024
PyUnicode_FromFormat() now decodes the format string from UTF-8 with
the "replace" error handler, instead of decoding it from ASCII.

Remove unused 'consumed' parameter of unicode_decode_utf8_writer().
vstinner added a commit to vstinner/cpython that referenced this issue Jun 7, 2024
PyUnicode_FromFormat() now decodes the format string from UTF-8 with
the "replace" error handler, instead of decoding it from ASCII.

Remove unused 'consumed' parameter of unicode_decode_utf8_writer().
vstinner added a commit to vstinner/cpython that referenced this issue Jun 10, 2024
PyUnicode_FromFormat() now decodes the "%s" format argument from
UTF-8 with the "strict" error handler, instead of the "replace" error
handler.

Remove the unused 'consumed' parameter of
unicode_decode_utf8_writer().
vstinner added a commit to vstinner/cpython that referenced this issue Jun 10, 2024
PyUnicode_FromFormat() now decodes the "%s" format argument from
UTF-8 with the "strict" error handler, instead of the "replace" error
handler.

Remove the unused 'consumed' parameter of
unicode_decode_utf8_writer().
vstinner added a commit to vstinner/cpython that referenced this issue Jun 10, 2024
PyUnicode_FromFormat() now decodes the "%s" format argument from
UTF-8 with the "strict" error handler, instead of the "replace" error
handler.

Remove the unused 'consumed' parameter of
unicode_decode_utf8_writer().
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-C-API type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

4 participants