Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trim methods on slices #2547

Open
SoniEx2 opened this issue Sep 22, 2018 · 14 comments
Open

Trim methods on slices #2547

SoniEx2 opened this issue Sep 22, 2018 · 14 comments
Labels
T-libs-api Relevant to the library API team, which will review and decide on the RFC.

Comments

@SoniEx2
Copy link

SoniEx2 commented Sep 22, 2018

/// Trims this slice from the left.
fn trim_left_matches<F: Fn(T) -> bool>(&self, f: F) -> &[T] {
    let mut res = self;
    while res.len() > 0 && f(res[0]) {
        res = res[1..];
    }
    res
}

/// Trims this slice from the right.
fn trim_right_matches<F: Fn(T) -> bool>(&self, f: F) -> &[T] {
    let mut res = self;
    while res.len() > 0 && f(res[res.len()-1]) {
        res = res[..(res.len()-1)];
    }
    res
}

(and so on)

basically turns &["", "", "", "foo", ""] into &["foo", ""], &["", "foo", "", "", ""] into &["", "foo"], etc, depending on what you call.

@burdges
Copy link

burdges commented Sep 22, 2018

These sounds like noise since you implement them trivially with code like s.split_at_mut(s.len() - s.iter().rev().filter(|x| x.len()==0).count()).0.

@SoniEx2
Copy link
Author

SoniEx2 commented Sep 23, 2018

That's even less readable. :/

Noise is having the same code snippet over and over again and not having it in a well-documented standalone function.

Also, I don't think that code of yours actually works. You probably meant to use take_while and split_at.

@Lonami
Copy link

Lonami commented Oct 9, 2018

We have .skip_while() and .take_while() for iterators. Aren't those enough?

.iter().skip_while(|x| x == "").take_while(|x| x != "")

@SoniEx2
Copy link
Author

SoniEx2 commented Oct 9, 2018

No - that doesn't work like a trim method.

And they're not analogous to the string methods.

@Centril Centril added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Oct 15, 2018
@Aloso
Copy link

Aloso commented Nov 30, 2018

@SoniEx2 Can you give an example where this would be useful (except for Strings)?

@SoniEx2
Copy link
Author

SoniEx2 commented Nov 30, 2018

When you have a slice and don't want to allocate.

@josephlr
Copy link

josephlr commented May 5, 2020

I ended up needing something like this for c-string parsing. I have a sequence of bytes and want to return the prefix containing the c-string data (not including the null terminator).

But then I realized, you can use split to do this:

fn trim_c_string(s: &[u8]) -> &[u8] {
    s.split(|&b| b == 0).next().unwrap_or(&[])
}

However, this implementation cannot eliminate the bounds check unlike the naive loop implementation:

pub fn fast_trim_c_string(s: &[u8]) -> &[u8] {
    for i in 0..s.len() {
        if s[i] == 0 {
            return s.split_at(i).0;
        }
    }
    s
}

@serid
Copy link

serid commented May 14, 2020

It's nice to have trim methods on str but in the project I am working on right now, I use &[char] slices instead of &str, because I need indexed access to characters and slicing of strings which &str does not support since it's UTF-8. It is disturbing that str has a .trim() method and a generic [T] slice does not. Would be really nice if this issue was resolved, all the more so it is that easy to implement.

A sample implementation looks like this though I am sure it is suboptimal.

fn trim<P>(&self, mut predicate: P) -> &[T]
where
    P: FnMut(&T) -> bool,
{
    let mut left = 0;
    let mut right = self.len();

    let mut iter = self.iter();

    while let Some(e) = iter.next() {
        if predicate(e) {
            left += 1
        } else {
            break;
        }
    }

    while let Some(e) = iter.next_back() {
        if predicate(e) {
            right -= 1
        } else {
            break;
        }
    }

    &self[left..right]
}

@burdges
Copy link

burdges commented May 14, 2020

We prefer split_* methods for slices, so as to retain access to underlying subslices, so I still think trim_* methods add noise. We could discuss some split_change(f) that does split_inclusive(|x| changed(f(x))) where

let mut previous = true;
let changed = |x| if previous == x { false } else { previous=x; true };

so trim is split_change(f).skip(1).next().unwrap_or(&[]). We're maybe better off adding roughly this changed state machine somewhere like core::iter though, not sure.

@SoniEx2
Copy link
Author

SoniEx2 commented May 14, 2020

perhaps a more useful trim would use Default::default() to remove things.

@serid you should really be using &[&str] instead of &[char] because &[char] is useless.

@golddranks
Copy link

golddranks commented May 14, 2020

For information, there is a new-ish unstable API split_inclusive on slices that would help implementing such a thing. (The normal split API doesn't include the "split marker" in either of resulting sub slices. More info here: rust-lang/rust#67330) However, I neglected making a tracking issue, so there isn't a direct path toward stabilization at the moment. I'll try to scrape some time to create a tracking issue the next weekend!

@golddranks
Copy link

rust-lang/rust#72360 Tracking issue created.

@Lucretiel
Copy link

I have a use case- I'm currently working on updates to BufWriter, and specifically its implementation of write_vectored, as a part of rust-lang/rust#78551. write_vectored takes an &[IoSlice], and it'd be very useful to be able to trim empty slices from both ends. This would allow me to forward the trimmed list of slices to the inner write_vectored method, and also to specifically specialize the case where we received exactly 1 non-empty slice. These cases aren't served by iterator methods, because I need to transform slices into smaller slices to process & forward as necessary.

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Feb 18, 2022
…iplett

core: Implement ASCII trim functions on byte slices

Hi `@rust-lang/libs!` This is a feature that I wished for when implementing serial protocols with microcontrollers. Often these protocols may contain leading or trailing whitespace, which needs to be removed. Because oftentimes drivers will operate on the byte level, decoding to unicode and checking for unicode whitespace is unnecessary overhead.

This PR adds three new methods to byte slices:

- `trim_ascii_start`
- `trim_ascii_end`
- `trim_ascii`

I did not find any pre-existing discussions about this, which surprises me a bit. Maybe I'm missing something, and this functionality is already possible through other means? There's rust-lang/rfcs#2547 ("Trim methods on slices"), but that has a different purpose.

As per the [std dev guide](https://std-dev-guide.rust-lang.org/feature-lifecycle/new-unstable-features.html), this is a proposed implementation without any issue / RFC. If this is the wrong process, please let me know. However, I thought discussing code is easier than discussing a mere idea, and hacking on the stdlib was fun.

Tracking issue: rust-lang#94035
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Feb 19, 2022
…iplett

core: Implement ASCII trim functions on byte slices

Hi ``@rust-lang/libs!`` This is a feature that I wished for when implementing serial protocols with microcontrollers. Often these protocols may contain leading or trailing whitespace, which needs to be removed. Because oftentimes drivers will operate on the byte level, decoding to unicode and checking for unicode whitespace is unnecessary overhead.

This PR adds three new methods to byte slices:

- `trim_ascii_start`
- `trim_ascii_end`
- `trim_ascii`

I did not find any pre-existing discussions about this, which surprises me a bit. Maybe I'm missing something, and this functionality is already possible through other means? There's rust-lang/rfcs#2547 ("Trim methods on slices"), but that has a different purpose.

As per the [std dev guide](https://std-dev-guide.rust-lang.org/feature-lifecycle/new-unstable-features.html), this is a proposed implementation without any issue / RFC. If this is the wrong process, please let me know. However, I thought discussing code is easier than discussing a mere idea, and hacking on the stdlib was fun.

Tracking issue: rust-lang#94035
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Feb 19, 2022
…iplett

core: Implement ASCII trim functions on byte slices

Hi ```@rust-lang/libs!``` This is a feature that I wished for when implementing serial protocols with microcontrollers. Often these protocols may contain leading or trailing whitespace, which needs to be removed. Because oftentimes drivers will operate on the byte level, decoding to unicode and checking for unicode whitespace is unnecessary overhead.

This PR adds three new methods to byte slices:

- `trim_ascii_start`
- `trim_ascii_end`
- `trim_ascii`

I did not find any pre-existing discussions about this, which surprises me a bit. Maybe I'm missing something, and this functionality is already possible through other means? There's rust-lang/rfcs#2547 ("Trim methods on slices"), but that has a different purpose.

As per the [std dev guide](https://std-dev-guide.rust-lang.org/feature-lifecycle/new-unstable-features.html), this is a proposed implementation without any issue / RFC. If this is the wrong process, please let me know. However, I thought discussing code is easier than discussing a mere idea, and hacking on the stdlib was fun.

Tracking issue: rust-lang#94035
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Feb 19, 2022
…iplett

core: Implement ASCII trim functions on byte slices

Hi ````@rust-lang/libs!```` This is a feature that I wished for when implementing serial protocols with microcontrollers. Often these protocols may contain leading or trailing whitespace, which needs to be removed. Because oftentimes drivers will operate on the byte level, decoding to unicode and checking for unicode whitespace is unnecessary overhead.

This PR adds three new methods to byte slices:

- `trim_ascii_start`
- `trim_ascii_end`
- `trim_ascii`

I did not find any pre-existing discussions about this, which surprises me a bit. Maybe I'm missing something, and this functionality is already possible through other means? There's rust-lang/rfcs#2547 ("Trim methods on slices"), but that has a different purpose.

As per the [std dev guide](https://std-dev-guide.rust-lang.org/feature-lifecycle/new-unstable-features.html), this is a proposed implementation without any issue / RFC. If this is the wrong process, please let me know. However, I thought discussing code is easier than discussing a mere idea, and hacking on the stdlib was fun.

Tracking issue: rust-lang#94035
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Feb 19, 2022
…iplett

core: Implement ASCII trim functions on byte slices

Hi `````@rust-lang/libs!````` This is a feature that I wished for when implementing serial protocols with microcontrollers. Often these protocols may contain leading or trailing whitespace, which needs to be removed. Because oftentimes drivers will operate on the byte level, decoding to unicode and checking for unicode whitespace is unnecessary overhead.

This PR adds three new methods to byte slices:

- `trim_ascii_start`
- `trim_ascii_end`
- `trim_ascii`

I did not find any pre-existing discussions about this, which surprises me a bit. Maybe I'm missing something, and this functionality is already possible through other means? There's rust-lang/rfcs#2547 ("Trim methods on slices"), but that has a different purpose.

As per the [std dev guide](https://std-dev-guide.rust-lang.org/feature-lifecycle/new-unstable-features.html), this is a proposed implementation without any issue / RFC. If this is the wrong process, please let me know. However, I thought discussing code is easier than discussing a mere idea, and hacking on the stdlib was fun.

Tracking issue: rust-lang#94035
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Feb 19, 2022
…iplett

core: Implement ASCII trim functions on byte slices

Hi ``````@rust-lang/libs!`````` This is a feature that I wished for when implementing serial protocols with microcontrollers. Often these protocols may contain leading or trailing whitespace, which needs to be removed. Because oftentimes drivers will operate on the byte level, decoding to unicode and checking for unicode whitespace is unnecessary overhead.

This PR adds three new methods to byte slices:

- `trim_ascii_start`
- `trim_ascii_end`
- `trim_ascii`

I did not find any pre-existing discussions about this, which surprises me a bit. Maybe I'm missing something, and this functionality is already possible through other means? There's rust-lang/rfcs#2547 ("Trim methods on slices"), but that has a different purpose.

As per the [std dev guide](https://std-dev-guide.rust-lang.org/feature-lifecycle/new-unstable-features.html), this is a proposed implementation without any issue / RFC. If this is the wrong process, please let me know. However, I thought discussing code is easier than discussing a mere idea, and hacking on the stdlib was fun.

Tracking issue: rust-lang#94035
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Feb 19, 2022
…iplett

core: Implement ASCII trim functions on byte slices

Hi ```````@rust-lang/libs!``````` This is a feature that I wished for when implementing serial protocols with microcontrollers. Often these protocols may contain leading or trailing whitespace, which needs to be removed. Because oftentimes drivers will operate on the byte level, decoding to unicode and checking for unicode whitespace is unnecessary overhead.

This PR adds three new methods to byte slices:

- `trim_ascii_start`
- `trim_ascii_end`
- `trim_ascii`

I did not find any pre-existing discussions about this, which surprises me a bit. Maybe I'm missing something, and this functionality is already possible through other means? There's rust-lang/rfcs#2547 ("Trim methods on slices"), but that has a different purpose.

As per the [std dev guide](https://std-dev-guide.rust-lang.org/feature-lifecycle/new-unstable-features.html), this is a proposed implementation without any issue / RFC. If this is the wrong process, please let me know. However, I thought discussing code is easier than discussing a mere idea, and hacking on the stdlib was fun.

Tracking issue: rust-lang#94035
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Feb 19, 2022
…iplett

core: Implement ASCII trim functions on byte slices

Hi ````````@rust-lang/libs!```````` This is a feature that I wished for when implementing serial protocols with microcontrollers. Often these protocols may contain leading or trailing whitespace, which needs to be removed. Because oftentimes drivers will operate on the byte level, decoding to unicode and checking for unicode whitespace is unnecessary overhead.

This PR adds three new methods to byte slices:

- `trim_ascii_start`
- `trim_ascii_end`
- `trim_ascii`

I did not find any pre-existing discussions about this, which surprises me a bit. Maybe I'm missing something, and this functionality is already possible through other means? There's rust-lang/rfcs#2547 ("Trim methods on slices"), but that has a different purpose.

As per the [std dev guide](https://std-dev-guide.rust-lang.org/feature-lifecycle/new-unstable-features.html), this is a proposed implementation without any issue / RFC. If this is the wrong process, please let me know. However, I thought discussing code is easier than discussing a mere idea, and hacking on the stdlib was fun.

Tracking issue: rust-lang#94035
@Kage-Yami
Copy link

Kage-Yami commented Jun 19, 2022

Perhaps a different use-case for str::trim_*-alike methods for &[u8]... in my case, I have files (out of my control) that are mostly-ASCII-serialized structs from some unknown language/library which I'm parsing with nom, and I'm trimming leading whitespace from the input at every step as an easy way to ignore insignificant whitespace.

However, these files are only mostly ASCII - in some "fields", they contain straight binary, so I can't treat the entire file I'm reading as valid UTF-8 (this being the reason for using &[u8] over &str).

That being said, I did find rust-lang/rust#94035 - which is for the same as this, just restricted to ASCII specifically. In my case, that would be good enough. These methods are currently available in nightly: https://doc.rust-lang.org/std/primitive.slice.html#method.trim_ascii

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

No branches or pull requests

11 participants