Tracking Issue for ASCII trim functions on byte slices #94035

dbrgn · 2022-02-15T23:06:07Z

Feature gate: #![feature(byte_slice_trim_ascii)]

This is a tracking issue for ASCII trim functions on byte slices.

Public API

The feature adds three new methods to byte slices ([u8]):

const fn trim_ascii_start(&self) -> &[u8]: Remove leading ASCII whitespace
const fn trim_ascii_end(&self) -> &[u8]: Remove trailing ASCII whitespace
const fn trim_ascii(&self) -> &[u8]: Remove leading and trailing ASCII whitespace

For deciding what bytes to treat as whitespace, u8::is_ascii_whitespace is used. See the linked docs for more details.

Examples:

assert_eq!(b" \t hello world\n".trim_ascii_start(), b"hello world\n");
assert_eq!(b"\r hello world\n ".trim_ascii_end(), b"\r hello world");
assert_eq!(b"\r hello world\n ".trim_ascii(), b"hello world");

Steps / History

Implementation: core: Implement ASCII trim functions on byte slices #93686
Final comment period (FCP)
Stabilization PR

Unresolved Questions

Should the methods be made const by using a pattern matching approach instead of indexing?
Should corresponding methods be added to str as well?

The text was updated successfully, but these errors were encountered:

nagisa · 2022-02-16T15:47:43Z

If we're adding ascii variants to [u8], we should also add equivalent operations to str at the same time. I feel this is pretty important because otherwise the APIs made available by libstd can come across like a grab bag of random stuff.

It seems reasonable enough to want to trim ASCII whitespace from a &str too (there's already split_ascii_whitespace), and so today one would resort to trim_matches or with this implementation they could convert between [u8] and str. The need to do latter can come across as a particularly frustrating API surface omission.

naming bikeshed: might trim_ascii_start and trim_ascii_end be more consistent with other functions already in libstd? There's some prior art in naming of str::split_ascii_whitespace vs str::split_whitespace, {char,u8}::is_ascii_* vs {char,u8}::is_* etc.

dbrgn · 2022-02-16T16:06:37Z

If we're adding ascii variants to [u8], we should also add equivalent operations to str at the same time. I feel this is pretty important because otherwise the APIs made available by libstd can come across like a grab bag of random stuff.

Since str is validated unicode, trim_start and trim_end could be used, which uses the unicode definition of whitespace. On the other hand, we have char::is_ascii_whitespace, so it should be fairly easy to add these functions to string slices as well. However, I don't really see a use case for this. Why would someone use trim_start_ascii over trim_start?

naming bikeshed: might trim_ascii_start and trim_ascii_end be more consistent with other functions already in libstd? There's some prior art in naming of str::split_ascii_whitespace vs str::split_whitespace, {char,u8}::is_ascii_* vs {char,u8}::is_* etc.

Yes, you have a point there! There's also str::to_lowercase and str::to_ascii_lowercase. I will update the proposal and implementation PR.

…iplett core: Implement ASCII trim functions on byte slices Hi `@rust-lang/libs!` This is a feature that I wished for when implementing serial protocols with microcontrollers. Often these protocols may contain leading or trailing whitespace, which needs to be removed. Because oftentimes drivers will operate on the byte level, decoding to unicode and checking for unicode whitespace is unnecessary overhead. This PR adds three new methods to byte slices: - `trim_ascii_start` - `trim_ascii_end` - `trim_ascii` I did not find any pre-existing discussions about this, which surprises me a bit. Maybe I'm missing something, and this functionality is already possible through other means? There's rust-lang/rfcs#2547 ("Trim methods on slices"), but that has a different purpose. As per the [std dev guide](https://std-dev-guide.rust-lang.org/feature-lifecycle/new-unstable-features.html), this is a proposed implementation without any issue / RFC. If this is the wrong process, please let me know. However, I thought discussing code is easier than discussing a mere idea, and hacking on the stdlib was fun. Tracking issue: rust-lang#94035

…iplett core: Implement ASCII trim functions on byte slices Hi ``@rust-lang/libs!`` This is a feature that I wished for when implementing serial protocols with microcontrollers. Often these protocols may contain leading or trailing whitespace, which needs to be removed. Because oftentimes drivers will operate on the byte level, decoding to unicode and checking for unicode whitespace is unnecessary overhead. This PR adds three new methods to byte slices: - `trim_ascii_start` - `trim_ascii_end` - `trim_ascii` I did not find any pre-existing discussions about this, which surprises me a bit. Maybe I'm missing something, and this functionality is already possible through other means? There's rust-lang/rfcs#2547 ("Trim methods on slices"), but that has a different purpose. As per the [std dev guide](https://std-dev-guide.rust-lang.org/feature-lifecycle/new-unstable-features.html), this is a proposed implementation without any issue / RFC. If this is the wrong process, please let me know. However, I thought discussing code is easier than discussing a mere idea, and hacking on the stdlib was fun. Tracking issue: rust-lang#94035

…iplett core: Implement ASCII trim functions on byte slices Hi ```@rust-lang/libs!``` This is a feature that I wished for when implementing serial protocols with microcontrollers. Often these protocols may contain leading or trailing whitespace, which needs to be removed. Because oftentimes drivers will operate on the byte level, decoding to unicode and checking for unicode whitespace is unnecessary overhead. This PR adds three new methods to byte slices: - `trim_ascii_start` - `trim_ascii_end` - `trim_ascii` I did not find any pre-existing discussions about this, which surprises me a bit. Maybe I'm missing something, and this functionality is already possible through other means? There's rust-lang/rfcs#2547 ("Trim methods on slices"), but that has a different purpose. As per the [std dev guide](https://std-dev-guide.rust-lang.org/feature-lifecycle/new-unstable-features.html), this is a proposed implementation without any issue / RFC. If this is the wrong process, please let me know. However, I thought discussing code is easier than discussing a mere idea, and hacking on the stdlib was fun. Tracking issue: rust-lang#94035

…iplett core: Implement ASCII trim functions on byte slices Hi ````@rust-lang/libs!```` This is a feature that I wished for when implementing serial protocols with microcontrollers. Often these protocols may contain leading or trailing whitespace, which needs to be removed. Because oftentimes drivers will operate on the byte level, decoding to unicode and checking for unicode whitespace is unnecessary overhead. This PR adds three new methods to byte slices: - `trim_ascii_start` - `trim_ascii_end` - `trim_ascii` I did not find any pre-existing discussions about this, which surprises me a bit. Maybe I'm missing something, and this functionality is already possible through other means? There's rust-lang/rfcs#2547 ("Trim methods on slices"), but that has a different purpose. As per the [std dev guide](https://std-dev-guide.rust-lang.org/feature-lifecycle/new-unstable-features.html), this is a proposed implementation without any issue / RFC. If this is the wrong process, please let me know. However, I thought discussing code is easier than discussing a mere idea, and hacking on the stdlib was fun. Tracking issue: rust-lang#94035

…iplett core: Implement ASCII trim functions on byte slices Hi `````@rust-lang/libs!````` This is a feature that I wished for when implementing serial protocols with microcontrollers. Often these protocols may contain leading or trailing whitespace, which needs to be removed. Because oftentimes drivers will operate on the byte level, decoding to unicode and checking for unicode whitespace is unnecessary overhead. This PR adds three new methods to byte slices: - `trim_ascii_start` - `trim_ascii_end` - `trim_ascii` I did not find any pre-existing discussions about this, which surprises me a bit. Maybe I'm missing something, and this functionality is already possible through other means? There's rust-lang/rfcs#2547 ("Trim methods on slices"), but that has a different purpose. As per the [std dev guide](https://std-dev-guide.rust-lang.org/feature-lifecycle/new-unstable-features.html), this is a proposed implementation without any issue / RFC. If this is the wrong process, please let me know. However, I thought discussing code is easier than discussing a mere idea, and hacking on the stdlib was fun. Tracking issue: rust-lang#94035

…iplett core: Implement ASCII trim functions on byte slices Hi ``````@rust-lang/libs!`````` This is a feature that I wished for when implementing serial protocols with microcontrollers. Often these protocols may contain leading or trailing whitespace, which needs to be removed. Because oftentimes drivers will operate on the byte level, decoding to unicode and checking for unicode whitespace is unnecessary overhead. This PR adds three new methods to byte slices: - `trim_ascii_start` - `trim_ascii_end` - `trim_ascii` I did not find any pre-existing discussions about this, which surprises me a bit. Maybe I'm missing something, and this functionality is already possible through other means? There's rust-lang/rfcs#2547 ("Trim methods on slices"), but that has a different purpose. As per the [std dev guide](https://std-dev-guide.rust-lang.org/feature-lifecycle/new-unstable-features.html), this is a proposed implementation without any issue / RFC. If this is the wrong process, please let me know. However, I thought discussing code is easier than discussing a mere idea, and hacking on the stdlib was fun. Tracking issue: rust-lang#94035

…iplett core: Implement ASCII trim functions on byte slices Hi ```````@rust-lang/libs!``````` This is a feature that I wished for when implementing serial protocols with microcontrollers. Often these protocols may contain leading or trailing whitespace, which needs to be removed. Because oftentimes drivers will operate on the byte level, decoding to unicode and checking for unicode whitespace is unnecessary overhead. This PR adds three new methods to byte slices: - `trim_ascii_start` - `trim_ascii_end` - `trim_ascii` I did not find any pre-existing discussions about this, which surprises me a bit. Maybe I'm missing something, and this functionality is already possible through other means? There's rust-lang/rfcs#2547 ("Trim methods on slices"), but that has a different purpose. As per the [std dev guide](https://std-dev-guide.rust-lang.org/feature-lifecycle/new-unstable-features.html), this is a proposed implementation without any issue / RFC. If this is the wrong process, please let me know. However, I thought discussing code is easier than discussing a mere idea, and hacking on the stdlib was fun. Tracking issue: rust-lang#94035

…iplett core: Implement ASCII trim functions on byte slices Hi ````````@rust-lang/libs!```````` This is a feature that I wished for when implementing serial protocols with microcontrollers. Often these protocols may contain leading or trailing whitespace, which needs to be removed. Because oftentimes drivers will operate on the byte level, decoding to unicode and checking for unicode whitespace is unnecessary overhead. This PR adds three new methods to byte slices: - `trim_ascii_start` - `trim_ascii_end` - `trim_ascii` I did not find any pre-existing discussions about this, which surprises me a bit. Maybe I'm missing something, and this functionality is already possible through other means? There's rust-lang/rfcs#2547 ("Trim methods on slices"), but that has a different purpose. As per the [std dev guide](https://std-dev-guide.rust-lang.org/feature-lifecycle/new-unstable-features.html), this is a proposed implementation without any issue / RFC. If this is the wrong process, please let me know. However, I thought discussing code is easier than discussing a mere idea, and hacking on the stdlib was fun. Tracking issue: rust-lang#94035

Kage-Yami · 2022-06-19T07:35:28Z

On the other hand, we have char::is_ascii_whitespace, so it should be fairly easy to add these functions to string slices as well. However, I don't really see a use case for this. Why would someone use trim_start_ascii over trim_start?

Naively (without knowing the internals at all), I could think of "performance" as a reason.

That being said, does this unresolved question really need to block this from being stabilised? After all, it's specifically about &[u8], not slices in general. If we can agree to that, then what's stopping this from proceeding to FCP? (I'm having trouble finding documentation for the process for tracking issue FCPs, could only find something for RFCs...)

BurntSushi · 2022-09-27T12:17:24Z

Both byte slices and string slices can have both Unicode-aware trimming and ASCII-aware trimming. Unicode support needn't be limited to str and ASCII support needn't be limited to [u8].

With that said, I think adding these methods starts us down the path of treating byte slices as byte strings. I think I am in general in favor of that, but I do think we want to proceed deliberately here. It might make sense to block this on a higher level policy question of 1) do we want byte slices to be byte strings? and 2) if so, what do we want to add? (2) is particularly relevant to make sure we end up with good names I think.

golddranks · 2022-10-10T06:09:01Z

I have a use case where I have to trim a bunch of b'\0's from the end of a byte slice. To me, trim_matches-like methods for slices in all generality would be warranted. I wonder whether people agree with that?

kornelski · 2023-05-10T16:17:41Z

I think these would be useful (including also on str). I often find myself writing mini-parsers that deal with identifiers and syntaxes that are definitely ASCII-only. I don't want to pay code size increase for proper Unicode processing. I also don't want to create APIs or data formats that unnecessarily allow complexity of all of Unicode whitespace.

c410-f3r · 2023-07-31T22:47:47Z

What is the status of this feature?

dbrgn · 2023-08-25T16:31:12Z

With that said, I think adding these methods starts us down the path of treating byte slices as byte strings. I think I am in general in favor of that, but I do think we want to proceed deliberately here.
(...)
do we want byte slices to be byte strings?

I don't think that this a policy decision like this is needed. Byte slices may contain byte strings, or something else. Of course trimming ASCII whitespace results in nonsensical results, if the slices aren't actually ASCII strings, but that's all.

Note that there are already other slice methods that treat bytes as potential strings, for example:

What would be needed to move this proposal into FCP?

okaneco · 2023-12-16T04:11:17Z

I filed a libs ACP, which was accepted, for extending these functions to &str rust-lang/libs-team#313

Implementation PR #118523

Add trim_ascii_start, trim_ascii_end, and trim_ascii functions to &str for trimming ASCII whitespace

Add #[inline] to [u8] trim_ascii functions

I was instructed to use this tracking issue.

Add ASCII whitespace trimming functions to `&str` - Add `trim_ascii_start`, `trim_ascii_end`, and `trim_ascii` functions to `&str` for trimming ASCII whitespace - Add `#[inline]` to `[u8]` `trim_ascii` functions These functions are feature-gated by `#![feature(byte_slice_trim_ascii)]` rust-lang#94035

Rollup merge of rust-lang#118523 - okaneco:trim_ascii, r=Mark-Simulacrum Add ASCII whitespace trimming functions to `&str` - Add `trim_ascii_start`, `trim_ascii_end`, and `trim_ascii` functions to `&str` for trimming ASCII whitespace - Add `#[inline]` to `[u8]` `trim_ascii` functions These functions are feature-gated by `#![feature(byte_slice_trim_ascii)]` rust-lang#94035

okaneco · 2024-02-07T17:20:07Z

Stabilization Report

Implementation History

Original PR is implemented core: Implement ASCII trim functions on byte slices #93686
Libs ACP is proposed and accepted for adding the trimming functions to &str ACP: Add ASCII whitespace trimming functions to &str libs-team#313
PR that extends the trimming functions to &str is merged Add ASCII whitespace trimming functions to &str #118523

API Summary

The following six functions would be stabilized: three functions on &[u8] and three functions on &str.
u8::is_ascii_whitespace is used to determine whitespace.

Remove leading and trailing ASCII whitespace
- const fn trim_ascii(&self) -> &[u8]
- const fn trim_ascii(&self) -> &str
Remove trailing ASCII whitespace
- const fn trim_ascii_end(&self) -> &[u8]
- const fn trim_ascii_end(&self) -> &str
Remove leading ASCII whitespace
- const fn trim_ascii_start(&self) -> &[u8]
- const fn trim_ascii_start(&self) -> &str

Examples:

// &[u8]
assert_eq!(b"\r hello world\n ".trim_ascii(), b"hello world");
assert_eq!(b"\r hello world\n ".trim_ascii_end(), b"\r hello world");
assert_eq!(b" \t hello world\n".trim_ascii_start(), b"hello world\n");
// &str
assert_eq!("\r hello world\n ".trim_ascii(), "hello world");
assert_eq!("\r hello world\n ".trim_ascii_end(), "\r hello world");
assert_eq!(" \t hello world\n".trim_ascii_start(), "hello world\n");

Possibly Unresolved Questions

Question: Does a policy decision need to be made about treating byte slices as byte strings? What methods should be added?

With that said, I think adding these methods starts us down the path of treating byte slices as byte strings. I think I am in general in favor of that, but I do think we want to proceed deliberately here. It might make sense to block this on a higher level policy question of 1) do we want byte slices to be byte strings? and 2) if so, what do we want to add? (2) is particularly relevant to make sure we end up with good names I think.

#94035 (comment)

Response:

I don't think that this a policy decision like this is needed. Byte slices may contain byte strings, or something else. Of course trimming ASCII whitespace results in nonsensical results, if the slices aren't actually ASCII strings, but that's all.

Note that there are already other slice methods that treat bytes as potential strings, for example:

escape_ascii

is_ascii

eq_ignore_ascii_case

make_ascii_uppercase

make_ascii_lowercase

#94035 (comment)

For more context, escape_ascii was stabilized in 1.60 and the other four methods were stabilized in 1.23.

If this is adequate to start FCP, I can make the stabilization PR.

vacuus · 2024-04-28T16:30:07Z

Seems like this is more than ready for stabilization.

jhpratt · 2024-04-28T20:34:26Z

Nominating for T-libs-api, as there seem to be no blockers to stabilization, with a report already being written.

BurntSushi · 2024-04-29T11:20:39Z

This overall seems reasonable to me. We've clearly already started down the path of "&[u8] is sometimes a byte string." I also think we should continue to that path.

@rfcbot fcp merge

rfcbot · 2024-04-29T11:20:41Z

Team member @BurntSushi has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

rfcbot · 2024-04-29T13:42:50Z

🔔 This is now entering its final comment period, as per the review above. 🔔

rfcbot · 2024-05-09T13:43:16Z

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This will be merged soon.

Rollup merge of rust-lang#124928 - okaneco:trim_ascii, r=workingjubilee Stabilize `byte_slice_trim_ascii` for `&[u8]`/`&str` Remove feature from documentation examples Update intra-doc link for `u8::is_ascii_whitespace` on `&[u8]` functions Closes rust-lang#94035 FCP has successfully completed rust-lang#94035 (comment)

jonhoo · 2024-05-12T07:07:49Z

🎉 Is this worth an explicit relnotes tag?

jhpratt · 2024-05-12T07:16:00Z

@jonhoo Definitely; I've added it.

Stabilize `byte_slice_trim_ascii` for `&[u8]`/`&str` Remove feature from documentation examples Update intra-doc link for `u8::is_ascii_whitespace` on `&[u8]` functions Closes #94035 FCP has successfully completed rust-lang/rust#94035 (comment)

dbrgn added C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Feb 15, 2022

dbrgn mentioned this issue Feb 15, 2022

core: Implement ASCII trim functions on byte slices #93686

Merged

Kage-Yami mentioned this issue Jun 19, 2022

Trim methods on slices rust-lang/rfcs#2547

Open

workingjubilee added the A-unicode Area: Unicode label Jul 22, 2023

okaneco mentioned this issue Dec 2, 2023

Add ASCII whitespace trimming functions to &str #118523

Merged

okaneco mentioned this issue Dec 9, 2023

ACP: Add ASCII whitespace trimming functions to &str rust-lang/libs-team#313

Closed

cpu mentioned this issue Feb 28, 2024

Trim leading contiguous whitespace, fix nightly clippy warns rustls/pemfile#41

Merged

jhpratt added the I-libs-api-nominated The issue / PR has been nominated for discussion during a libs-api team meeting. label Apr 28, 2024

m-ou-se removed the I-libs-api-nominated The issue / PR has been nominated for discussion during a libs-api team meeting. label Apr 30, 2024

Sammo98 mentioned this issue May 3, 2024

Add trim options to !command chain LucasPickering/slumber#197

Merged

3 tasks

rfcbot added finished-final-comment-period The final comment period is finished for this PR / Issue. and removed final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. labels May 9, 2024

rfcbot added the to-announce Announce this issue on triage meeting label May 9, 2024

okaneco mentioned this issue May 9, 2024

Stabilize byte_slice_trim_ascii for &[u8]/&str #124928

Merged

apiraino removed the to-announce Announce this issue on triage meeting label May 10, 2024

bors closed this as completed in 03ff775 May 11, 2024

jhpratt added the relnotes Marks issues that should be documented in the release notes of the next release. label May 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking Issue for ASCII trim functions on byte slices #94035

Tracking Issue for ASCII trim functions on byte slices #94035

dbrgn commented Feb 15, 2022 •

edited

nagisa commented Feb 16, 2022 •

edited

dbrgn commented Feb 16, 2022

Kage-Yami commented Jun 19, 2022 •

edited

BurntSushi commented Sep 27, 2022

golddranks commented Oct 10, 2022 •

edited

kornelski commented May 10, 2023 •

edited

c410-f3r commented Jul 31, 2023

dbrgn commented Aug 25, 2023

okaneco commented Dec 16, 2023 •

edited

okaneco commented Feb 7, 2024

vacuus commented Apr 28, 2024

jhpratt commented Apr 28, 2024

BurntSushi commented Apr 29, 2024

rfcbot commented Apr 29, 2024 •

edited by m-ou-se

rfcbot commented Apr 29, 2024

rfcbot commented May 9, 2024

jonhoo commented May 12, 2024

jhpratt commented May 12, 2024

Tracking Issue for ASCII trim functions on byte slices #94035

Tracking Issue for ASCII trim functions on byte slices #94035

Comments

dbrgn commented Feb 15, 2022 • edited

Public API

Steps / History

Unresolved Questions

nagisa commented Feb 16, 2022 • edited

dbrgn commented Feb 16, 2022

Kage-Yami commented Jun 19, 2022 • edited

BurntSushi commented Sep 27, 2022

golddranks commented Oct 10, 2022 • edited

kornelski commented May 10, 2023 • edited

c410-f3r commented Jul 31, 2023

dbrgn commented Aug 25, 2023

okaneco commented Dec 16, 2023 • edited

okaneco commented Feb 7, 2024

Stabilization Report

Implementation History

API Summary

Possibly Unresolved Questions

vacuus commented Apr 28, 2024

jhpratt commented Apr 28, 2024

BurntSushi commented Apr 29, 2024

rfcbot commented Apr 29, 2024 • edited by m-ou-se

rfcbot commented Apr 29, 2024

rfcbot commented May 9, 2024

jonhoo commented May 12, 2024

jhpratt commented May 12, 2024

dbrgn commented Feb 15, 2022 •

edited

nagisa commented Feb 16, 2022 •

edited

Kage-Yami commented Jun 19, 2022 •

edited

golddranks commented Oct 10, 2022 •

edited

kornelski commented May 10, 2023 •

edited

okaneco commented Dec 16, 2023 •

edited

rfcbot commented Apr 29, 2024 •

edited by m-ou-se