gh-119182: Add PyUnicodeWriter C API #119184

vstinner · 2024-05-19T16:05:22Z

Issue: [C API] Add an efficient public PyUnicodeWriter API #119182

Remove PyUnicodeWriter_SetOverallocate().

vstinner · 2024-06-07T20:21:24Z

@erlend-aasland @serhiy-storchaka @encukou: Do you want to review this change?

Doc/c-api/unicode.rst

erlend-aasland · 2024-06-09T20:04:40Z

Doc/c-api/unicode.rst

+   *str* must be Python :class:`str` object. *start* must be greater than or
+   equal to 0, and less than or equal to *end*. *end* must be less than or
+   equal to *str* length.


Nit; I prefer to use SemBr for paragraphs like this.

Suggested change

*str* must be Python :class:`str` object. *start* must be greater than or

equal to 0, and less than or equal to *end*. *end* must be less than or

equal to *str* length.

*str* must be Python :class:`str` object.

*start* must be greater than or equal to 0,

and less than or equal to *end*.

*end* must be less than or equal to *str* length.

TIL that this is called SemBr!

Breaking on comma may be too much, but I prefer to break at the sentence boundary.

Doc/c-api/unicode.rst

Objects/unicodeobject.c

Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>

Objects/unicodeobject.c

Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>

vstinner · 2024-06-10T08:14:41Z

@erlend-aasland: I addressed your review. Would you mind to review the updated PR?

Doc/c-api/unicode.rst

Objects/unicodeobject.c

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>

Doc/c-api/unicode.rst

Include/cpython/unicodeobject.h

serhiy-storchaka · 2024-06-10T08:38:00Z

Objects/unicodeobject.c

+                               Py_ssize_t start, Py_ssize_t end)
+{
+    if (!PyUnicode_Check(str)) {
+        PyErr_Format(PyExc_TypeError, "expect str, not %T", str);


Would not be better to raise SystemError to distinguish programming errors from from data driven errors? In all current use cases of _PyUnicodeWriter_WriteSubstring() it is impossible to pass wrong arguments, and if this happens, it is a programming error.

TypeError is when the user pass the wrong type. Previously, we used SystemError more often. But it seems like the new trend (C API Working Group recommendations) is more to accept a generic PyObject* and raises TypeError if the function gets the wrong type.

Recent example: PyLong_GetSign() raises TypeError, not SystemError.

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>

serhiy-storchaka

Except using TypeError and ValueError instead of SystemError, LGTM.

But I do not insist if other core developers prefer TypeError and ValueError. I just think that SystemError is more appropriate for programming errors in using the C API.

malemburg

A more general question in the light of the free threading patches: Are these writer APIs thread-safe ?

Doc/c-api/unicode.rst

vstinner · 2024-06-10T15:05:01Z

A more general question in the light of the free threading patches: Are these writer APIs thread-safe ?

I did nothing to make them safe, so I would say that they are not thread safe. You must not share a writer between two threads. Use one writer per thread.

vstinner · 2024-06-10T16:04:49Z

@malemburg @serhiy-storchaka: I modified the implementation to make write operations atomic. Either the whole string is written, or "nothing is written".

In the doc, I wrote "On error, leave the writer unchanged". In practice, it's more subtle, the internal buffer can be enlarged, or its kind (UCS1, UCS2 or UCS4) can change. But from the API consumer point of view, it's as if nothing was written.

I added two unit tests to show that it's still possible to use a writer after an error.

The hack is just to save/restore writer->pos internally in the 2 functions which are not atomic: WriteUTF8() and WriteFormat().

Doc/c-api/unicode.rst

I withdraw my approve because writing cannot be atomic.

bedevere-app bot mentioned this pull request May 19, 2024

[C API] Add an efficient public PyUnicodeWriter API #119182

Open

vstinner added the topic-C-API label May 19, 2024

vstinner force-pushed the WIP_unicode_writer branch from 8de16a1 to 4d18300 Compare May 21, 2024 18:49

vstinner mentioned this pull request May 22, 2024

WIP: Add PyUnicodeWriter API python/pythoncapi-compat#95

Draft

vstinner force-pushed the WIP_unicode_writer branch from 4d18300 to 308d608 Compare May 23, 2024 12:08

vstinner mentioned this pull request May 24, 2024

Add PyUnicodeWriter API capi-workgroup/decisions#27

Open

vstinner force-pushed the WIP_unicode_writer branch 2 times, most recently from 49a46ac to 14e739b Compare May 24, 2024 07:19

vstinner force-pushed the WIP_unicode_writer branch from 14e739b to fd7432e Compare June 5, 2024 14:54

vstinner added the skip news label Jun 5, 2024

vstinner marked this pull request as ready for review June 7, 2024 19:29

bedevere-app bot added the awaiting core review label Jun 7, 2024

vstinner added 2 commits June 7, 2024 21:29

pythongh-119182: Add PyUnicodeWriter C API

3c4da2e

PyUnicodeWriter_Create() expects a length

b12f085

Remove PyUnicodeWriter_SetOverallocate().

vstinner force-pushed the WIP_unicode_writer branch from fd7432e to b12f085 Compare June 7, 2024 19:33

vstinner added 2 commits June 7, 2024 21:34

Rename str to repr

175c239

Add documentation

99fa2cb

vstinner removed the skip news label Jun 7, 2024

serhiy-storchaka self-requested a review June 7, 2024 20:45

erlend-aasland reviewed Jun 9, 2024

View reviewed changes

vstinner and others added 2 commits June 10, 2024 10:02

Apply suggestions from code review

e3e15f0

Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>

Apply suggestions from code review

1dbb5df

Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>

erlend-aasland reviewed Jun 10, 2024

View reviewed changes

Objects/unicodeobject.c Outdated Show resolved Hide resolved

vstinner and others added 3 commits June 10, 2024 10:10

Update the doc

8f02e33

Add dots in Changelog

a1d0ab0

Update Objects/unicodeobject.c

e6195b7

Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>

serhiy-storchaka reviewed Jun 10, 2024

View reviewed changes

Doc/c-api/unicode.rst Outdated Show resolved Hide resolved

Objects/unicodeobject.c Outdated Show resolved Hide resolved

Update Objects/unicodeobject.c

4865d43

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>

serhiy-storchaka reviewed Jun 10, 2024

View reviewed changes

Apply suggestions from code review

79b7c09

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>

serhiy-storchaka reviewed Jun 10, 2024

View reviewed changes

serhiy-storchaka previously approved these changes Jun 10, 2024

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting core review labels Jun 10, 2024

malemburg reviewed Jun 10, 2024

View reviewed changes

Doc/c-api/unicode.rst Show resolved Hide resolved

Doc/c-api/unicode.rst Outdated Show resolved Hide resolved

Make the API atomic

db02dae

serhiy-storchaka reviewed Jun 10, 2024

View reviewed changes

Doc/c-api/unicode.rst Show resolved Hide resolved

serhiy-storchaka self-requested a review June 10, 2024 16:14

bedevere-app bot added awaiting review and removed awaiting merge labels Jun 10, 2024

serhiy-storchaka approved these changes Jun 10, 2024

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting review labels Jun 10, 2024

Fix typo

10343b0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-119182: Add PyUnicodeWriter C API #119184

gh-119182: Add PyUnicodeWriter C API #119184

vstinner commented May 19, 2024 •

edited

vstinner commented Jun 7, 2024

erlend-aasland Jun 9, 2024

serhiy-storchaka Jun 10, 2024

vstinner commented Jun 10, 2024

serhiy-storchaka Jun 10, 2024

vstinner Jun 10, 2024

vstinner Jun 10, 2024

serhiy-storchaka left a comment

malemburg left a comment

vstinner commented Jun 10, 2024

vstinner commented Jun 10, 2024

gh-119182: Add PyUnicodeWriter C API #119184

Are you sure you want to change the base?

gh-119182: Add PyUnicodeWriter C API #119184

Conversation

vstinner commented May 19, 2024 • edited

vstinner commented Jun 7, 2024

erlend-aasland Jun 9, 2024

Choose a reason for hiding this comment

serhiy-storchaka Jun 10, 2024

Choose a reason for hiding this comment

vstinner commented Jun 10, 2024

serhiy-storchaka Jun 10, 2024

Choose a reason for hiding this comment

vstinner Jun 10, 2024

Choose a reason for hiding this comment

vstinner Jun 10, 2024

Choose a reason for hiding this comment

serhiy-storchaka left a comment

Choose a reason for hiding this comment

malemburg left a comment

Choose a reason for hiding this comment

vstinner commented Jun 10, 2024

vstinner commented Jun 10, 2024

vstinner commented May 19, 2024 •

edited