Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-119182: Add PyUnicodeWriter C API #119184

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented May 19, 2024

@vstinner vstinner removed the skip news label Jun 7, 2024
@vstinner
Copy link
Member Author

vstinner commented Jun 7, 2024

@erlend-aasland @serhiy-storchaka @encukou: Do you want to review this change?

@serhiy-storchaka serhiy-storchaka self-requested a review June 7, 2024 20:45
Doc/c-api/unicode.rst Show resolved Hide resolved
Doc/c-api/unicode.rst Outdated Show resolved Hide resolved
Doc/c-api/unicode.rst Outdated Show resolved Hide resolved
Doc/c-api/unicode.rst Show resolved Hide resolved
Doc/c-api/unicode.rst Outdated Show resolved Hide resolved
Comment on lines +1568 to +1570
*str* must be Python :class:`str` object. *start* must be greater than or
equal to 0, and less than or equal to *end*. *end* must be less than or
equal to *str* length.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit; I prefer to use SemBr for paragraphs like this.

Suggested change
*str* must be Python :class:`str` object. *start* must be greater than or
equal to 0, and less than or equal to *end*. *end* must be less than or
equal to *str* length.
*str* must be Python :class:`str` object.
*start* must be greater than or equal to 0,
and less than or equal to *end*.
*end* must be less than or equal to *str* length.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL that this is called SemBr!

Breaking on comma may be too much, but I prefer to break at the sentence boundary.

Doc/c-api/unicode.rst Outdated Show resolved Hide resolved
Doc/c-api/unicode.rst Outdated Show resolved Hide resolved
Objects/unicodeobject.c Outdated Show resolved Hide resolved
Objects/unicodeobject.c Outdated Show resolved Hide resolved
vstinner and others added 2 commits June 10, 2024 10:02
Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
vstinner and others added 3 commits June 10, 2024 10:10
Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
@vstinner
Copy link
Member Author

@erlend-aasland: I addressed your review. Would you mind to review the updated PR?

Doc/c-api/unicode.rst Outdated Show resolved Hide resolved
Objects/unicodeobject.c Outdated Show resolved Hide resolved
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Doc/c-api/unicode.rst Outdated Show resolved Hide resolved
Include/cpython/unicodeobject.h Outdated Show resolved Hide resolved
Py_ssize_t start, Py_ssize_t end)
{
if (!PyUnicode_Check(str)) {
PyErr_Format(PyExc_TypeError, "expect str, not %T", str);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would not be better to raise SystemError to distinguish programming errors from from data driven errors? In all current use cases of _PyUnicodeWriter_WriteSubstring() it is impossible to pass wrong arguments, and if this happens, it is a programming error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TypeError is when the user pass the wrong type. Previously, we used SystemError more often. But it seems like the new trend (C API Working Group recommendations) is more to accept a generic PyObject* and raises TypeError if the function gets the wrong type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recent example: PyLong_GetSign() raises TypeError, not SystemError.

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except using TypeError and ValueError instead of SystemError, LGTM.

But I do not insist if other core developers prefer TypeError and ValueError. I just think that SystemError is more appropriate for programming errors in using the C API.

Copy link
Member

@malemburg malemburg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more general question in the light of the free threading patches: Are these writer APIs thread-safe ?

Doc/c-api/unicode.rst Show resolved Hide resolved
Doc/c-api/unicode.rst Outdated Show resolved Hide resolved
@vstinner
Copy link
Member Author

A more general question in the light of the free threading patches: Are these writer APIs thread-safe ?

I did nothing to make them safe, so I would say that they are not thread safe. You must not share a writer between two threads. Use one writer per thread.

@vstinner
Copy link
Member Author

@malemburg @serhiy-storchaka: I modified the implementation to make write operations atomic. Either the whole string is written, or "nothing is written".

In the doc, I wrote "On error, leave the writer unchanged". In practice, it's more subtle, the internal buffer can be enlarged, or its kind (UCS1, UCS2 or UCS4) can change. But from the API consumer point of view, it's as if nothing was written.

I added two unit tests to show that it's still possible to use a writer after an error.

The hack is just to save/restore writer->pos internally in the 2 functions which are not atomic: WriteUTF8() and WriteFormat().

@serhiy-storchaka serhiy-storchaka self-requested a review June 10, 2024 16:14
@serhiy-storchaka serhiy-storchaka dismissed their stale review June 10, 2024 16:15

I withdraw my approve because writing cannot be atomic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants