-
-
Notifications
You must be signed in to change notification settings - Fork 29.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-119182: Add PyUnicodeWriter C API #119184
base: main
Are you sure you want to change the base?
Conversation
vstinner
commented
May 19, 2024
•
edited
edited
- Issue: [C API] Add an efficient public PyUnicodeWriter API #119182
8de16a1
to
4d18300
Compare
4d18300
to
308d608
Compare
49a46ac
to
14e739b
Compare
14e739b
to
fd7432e
Compare
Remove PyUnicodeWriter_SetOverallocate().
fd7432e
to
b12f085
Compare
@erlend-aasland @serhiy-storchaka @encukou: Do you want to review this change? |
*str* must be Python :class:`str` object. *start* must be greater than or | ||
equal to 0, and less than or equal to *end*. *end* must be less than or | ||
equal to *str* length. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit; I prefer to use SemBr for paragraphs like this.
*str* must be Python :class:`str` object. *start* must be greater than or | |
equal to 0, and less than or equal to *end*. *end* must be less than or | |
equal to *str* length. | |
*str* must be Python :class:`str` object. | |
*start* must be greater than or equal to 0, | |
and less than or equal to *end*. | |
*end* must be less than or equal to *str* length. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL that this is called SemBr!
Breaking on comma may be too much, but I prefer to break at the sentence boundary.
Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
@erlend-aasland: I addressed your review. Would you mind to review the updated PR? |
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Py_ssize_t start, Py_ssize_t end) | ||
{ | ||
if (!PyUnicode_Check(str)) { | ||
PyErr_Format(PyExc_TypeError, "expect str, not %T", str); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would not be better to raise SystemError to distinguish programming errors from from data driven errors? In all current use cases of _PyUnicodeWriter_WriteSubstring()
it is impossible to pass wrong arguments, and if this happens, it is a programming error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TypeError is when the user pass the wrong type. Previously, we used SystemError more often. But it seems like the new trend (C API Working Group recommendations) is more to accept a generic PyObject*
and raises TypeError if the function gets the wrong type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recent example: PyLong_GetSign() raises TypeError, not SystemError.
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except using TypeError and ValueError instead of SystemError, LGTM.
But I do not insist if other core developers prefer TypeError and ValueError. I just think that SystemError is more appropriate for programming errors in using the C API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A more general question in the light of the free threading patches: Are these writer APIs thread-safe ?
I did nothing to make them safe, so I would say that they are not thread safe. You must not share a writer between two threads. Use one writer per thread. |
@malemburg @serhiy-storchaka: I modified the implementation to make write operations atomic. Either the whole string is written, or "nothing is written". In the doc, I wrote "On error, leave the writer unchanged". In practice, it's more subtle, the internal buffer can be enlarged, or its kind (UCS1, UCS2 or UCS4) can change. But from the API consumer point of view, it's as if nothing was written. I added two unit tests to show that it's still possible to use a writer after an error. The hack is just to save/restore |
I withdraw my approve because writing cannot be atomic.