-
Notifications
You must be signed in to change notification settings - Fork 296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add next_array
and collect_array
#560
base: master
Are you sure you want to change the base?
Conversation
A possible enhancement might be to return trait FromArray<T, const N: usize> {
fn from_array(array: [T; N]) -> Self;
}
impl<T, const N: usize> FromArray<T, N> for [T; N] { /* .. */ }
impl<T, const N: usize> FromArray<Option<T>, N> for Option<[T; N]> { /* .. */ }
impl<T, E, const N: usize> FromArray<Result<T, E>, N> for Result<[T; N], E> { /* .. */ } In fact, I think this is highly useful because it allows things like let ints = line.split_whitespace().map(|n| n.parse());
if let Ok([x, y, z]) = ints.collect_array() {
...
} This would be completely in line with |
So I have a working implementation of the above idea here: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=9dba690b0dfc362971635e21647a4c19. It makes this compile: fn main() {
let line = "32 -12 24";
let nums = line.split_whitespace().map(|n| n.parse::<i32>());
if let Some(Ok([x, y, z])) = nums.collect_array() {
println!("x: {} y: {} z: {}", x, y, z);
}
} It would change the interface to: trait ArrayCollectible<T>: Sized {
fn array_from_iter<I: IntoIterator<Item = T>>(iterable: I) -> Option<Self>;
}
trait Itertools: Iterator {
fn collect_array<A>(self) -> Option<A>
where
Self: Sized,
A: ArrayCollectible<Self::Item>;
} where
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi there! Thanks for this. I particularly like that you thought about a way of enabling const
-generic stuff without raising the minimum required rust version (even if I would imagine something else due to having an aversion against depending on other crates too much).
There has been some discussion recently about basically supporting not only tuples, but also arrays. I just want to make sure that we do not loose input from these discussions when actually settling with your solution:
- implement arrays and next_array #549
- Array combinations #546
- Const generics iterator candidates #547
On top of that, I think there are some changes in there that are not directly related to this issue. If you'd like to have them merged, could you possibly factor them out into separate PRs/commits?
src/next_array.rs
Outdated
@@ -0,0 +1,80 @@ | |||
use core::mem::MaybeUninit; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there was some discussion about building arrays:
@phimuemue Any update on this? |
I appreciate your effort, but unfortunately nothing substantial from my side: I changed my mind regarding |
@phimuemue Just for posterity's sake, |
@phimuemue Just checking in what the status is, I feel very strongly about the usefulness of |
Note that if you want I'll also mention |
This is a very useful feature. Today there was a thread on Reddit where the author basically asks if there's a crate that provides |
@Expurple |
I sometimes think about adding Another option I just saw: Crates can offer "nightly-only experimental API" (see https://docs.rs/arrayvec/latest/arrayvec/struct.ArrayVec.html#method.first_chunk for an example) - maybe this would help some users. I personally would lean towards |
I'm definitely not opposed to the idea but the EDIT: EDIT: Well I have some. With (My idea would be that @scottmcm Small discussion about temporarily adding |
For I can allocate some time to this next week. |
@jswrenn Please don't forget that we are discussing this on a PR that already has a working implementation without adding dependencies... |
@orlp, thanks, I had forgotten that this was a PR and not an issue when I made my reply. Still, we're talking about adding some extremely subtle unsafe code to Itertools. I'd like us to take extreme care to avoid accidentally introducing UB. A PR adding
If you can update this PR to do those things, I can see a path forward to merging it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR! I like the ArrayBuilder
abstraction quite a bit. As I mentioned, this will need additional documentation and testing before it can be merged. See the recent safety comments in my other project, zerocopy
for a sense of the paranoia rigor I'd like these safety comments to take.
@jswrenn I will be busy the upcoming week but I'm willing to bring this up to standards after that. If before then you could decide on whether or not to bump the MSRV to 1.51 I could include that in the rewrite. |
@orlp Think you might have time to revisit this soon? :-) |
@jswrenn Yes, I do. Have you reached a decision yet on the MSRV issue? |
I think it's time to set the MSRV to 1.51, I'm pretty sure @jswrenn and @phimuemue will agree. |
Absolutely. I'd even be fine going up to 1.55, which is nearly three years old. In my other crates, I've found that to be the lowest MSRV that users actually need. |
After quickly going though release notes, I may have missed something but I only noted two things 1.55 has over 1.51 that I considered to be of potential use for us: EDIT: 1.51 has those things over 1.43.1:
const-generics
|
Out of curiosity and slightly off-topic: What's a real reason to not update to stable Rust? Does it ever remove support for some platform or raise the system requirements dramatically? Or, put alternatively: Are there situations where someone could use cutting-edge |
Yes: Libraries that depend on itertools, but set a MSRV lower than stable. They are, of course, welcome to use an older, MSRV-compatible version of itertools, but we currently don't backport bugfixes to older versions.
Rust occasionally does remove support for platforms; e.g.: https://blog.rust-lang.org/2022/08/01/Increasing-glibc-kernel-requirements.html (The above post suggests that, conservatively, we could increase our MSRV to 1.63 without causing major problem for users. Maybe that's a good target MSRV for now?) |
After skimming all release notes up to 1.78.0, I noted this:
I'm not particularly waiting anything in there (maybe Unless we need |
@Philippe-Cholet I feel |
I have force-pushed with a new cleaner implementation, based on 1.55 for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This looks mostly good to me, though I've left some pedantic suggestions.
Can also you bump the MSRV in Cargo.toml
and do a quick check for newly-resolvable clippy lints:
Lines 18 to 19 in ad5cc96
# When bumping, please resolve all `#[allow(clippy::*)]` that are newly resolvable. | |
rust-version = "1.43.1" |
src/next_array.rs
Outdated
pub fn push(&mut self, value: T) { | ||
// We maintain the invariant here that arr[..len] is initialized. | ||
// Indexing with self.len also ensures self.len < N, and thus <= N after | ||
// the increment. | ||
self.arr[self.len] = MaybeUninit::new(value); | ||
self.len += 1; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documenting the panic here is important in cases where a panic may cause invariants to be invalidated. That doesn't happen yet in our code, but could occur in future uses of ArrayBuilder
.
pub fn push(&mut self, value: T) { | |
// We maintain the invariant here that arr[..len] is initialized. | |
// Indexing with self.len also ensures self.len < N, and thus <= N after | |
// the increment. | |
self.arr[self.len] = MaybeUninit::new(value); | |
self.len += 1; | |
} | |
/// Pushes `value` onto the end of the array list. | |
/// | |
/// # Panics | |
/// | |
/// This panics if `self.len() >= N`. | |
pub fn push(&mut self, value: T) { | |
// PANICS: This will panic if `self.len >= N`. | |
// SAFETY: Initializing an element of `self.arr` cannot violate its | |
// safety invariant. Even if this line panics, we have not created any | |
// intermediate invalid state. | |
self.arr[self.len] = MaybeUninit::new(value); | |
// SAFETY: By invariant on `self.arr`, all elements at indicies | |
// `0..self.len` are valid. Due to the above initialization, the element | |
// at `self.len` is now also valid. Consequently, all elements at | |
// indicies `0..(self.len + 1)` are valid, and `self.len` can be safely | |
// incremented without violating `self.arr`'s invariant. | |
self.len += 1; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really feel this is overly verbose and ultimately reduces readability of the code. In my opinion, sometimes less is more.
In my opinion a safety comment should guide a critical thinker towards the right thought process of seeing why code is correct, but should not provide a 'proof-by-nodding-along' experience which is rather dangerous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I share that opinion, except in the case of documenting unsafe code. This PR is the first substantial addition of unsafe code to itertools, and I'd like us to set our standard for unsafe code very high. I also suspect that ArrayBuilder
may turn out to be a major internal tool as we expand our support for const-generic adaptors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jswrenn Of course I think the standard should be very high. And for that reason I think the comments should be pristine, and not overly verbose blocks of text that just make it harder to review the code as correct. I sincerely mean here that less is more, the shorter unsafe comments are better at preventing unsoundness in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think about it like this, lines of code that do things are signal, comments that do not help you understand the code are noise. In this example in your suggestion there are two lines of code, and nine lines of comments.
If you as a reader can reach the same understanding of the code with the same effort from a three-line comment as a nine-line comment (which I believe you can), then that means six lines were unnecessary and thus noise, giving a signal-to-noise ratio of 25%. I really prefer the signal-to-noise ratio to be as high as possible in unsafe
code, allocating maximum headspace and visual area to the actual bits that matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll have to agree to disagree; you are unlikely to convince me otherwise.
Making these comments has been an essential step in vetting this PR, and these comments will continue to aid future contributors and reviewers in making and evaluating changes to this code.
For example, I only noticed the (temporary) invariant violation issue in ArrayBuilder::take
because I attempted to write detailed safety comments for this PR. I also see that you've accepted my suggested reordering, but not my suggested comments. Now, the effect achieved by reordering is totally implicit. We could easily regress on that issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the temporary invariant violation was ever an issue, just a preferred order for simplicity. That said, ultimately this is just style/documentation, so let's agree to disagree and move on. Can you push those comments/changes you consider to be not merely suggestions but essential directly to the branch?
src/next_array.rs
Outdated
unsafe { | ||
// SAFETY: arr[..len] is initialized, so must be dropped. | ||
// First we create a pointer to this initialized slice, then drop | ||
// that slice in-place. The cast from *mut MaybeUninit<T> to *mut T | ||
// is always sound by the layout guarantees of MaybeUninit. | ||
let ptr_to_first: *mut MaybeUninit<T> = self.arr.as_mut_ptr(); | ||
let ptr_to_slice = ptr::slice_from_raw_parts_mut(ptr_to_first.cast::<T>(), self.len); | ||
ptr::drop_in_place(ptr_to_slice); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since our MSRV is now >= 1.60 we can leverage MaybeUninit::assume_init_drop
. I know this is contrary to @scottmcm's suggestion here, but it entirely avoids working with raw pointers.
unsafe { | |
// SAFETY: arr[..len] is initialized, so must be dropped. | |
// First we create a pointer to this initialized slice, then drop | |
// that slice in-place. The cast from *mut MaybeUninit<T> to *mut T | |
// is always sound by the layout guarantees of MaybeUninit. | |
let ptr_to_first: *mut MaybeUninit<T> = self.arr.as_mut_ptr(); | |
let ptr_to_slice = ptr::slice_from_raw_parts_mut(ptr_to_first.cast::<T>(), self.len); | |
ptr::drop_in_place(ptr_to_slice); | |
} | |
// LEMMA: The elements of `init` reference the valid elements of | |
// `self.arr`. | |
// | |
// PROOF: `slice::split_at_mut(mid)` produces a pair of slices, the | |
// first of which contains the elements at the indices `0..mid`. By | |
// invariant on `self.arr`, the elements of `self.arr` at indicies | |
// `0..self.len` are valid. Assuming that `slice::split_at_mut` is | |
// correctly implemented, the slice `init` will only reference the | |
// valid elements of `self.arr`. | |
let (init, _) = self.arr.split_at_mut(self.len); | |
// AXIOM: We assume that `slice::into_iter` and the subsequent | |
// `for_each` are implemented correctly, and will yield each and every | |
// element of `init`. | |
init.into_iter().for_each(|elt| { | |
// SAFETY: `elt` references a valid `T`, because by the above AXIOM, | |
// `elt` is an element of `init`, and by the above LEMMA, the | |
// elements of `init` reference the valid elements of `self.arr`. | |
unsafe { elt.assume_init_drop() } | |
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually has the wrong semantics compared to e.g. Vec<T>
I realized, as it will leak values if a Drop
impl panics. ptr::drop_in_place
will forcibly keep dropping elements in that scenario, causing an abort if another Drop
panics.
Also, again I find this overly verbose, ultimately reducing readability of the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, again I find this overly verbose, ultimately reducing readability of the code
I think it is likely that we have different preferences for how unsafe code is written and documented.
Abstractions like the one we're adding here are a notorious source of soundness issues in the Rust ecosystem (see the RUSTSEC advisories for smallvec
, stackvector
, and stack
, for a sampler). Itertools has not yet had any soundness issues, and I'd like to keep it that way.
I am deeply reluctant to add any unsafe code to itertools
. If we are to do so, I will require that it meets the standards I set for the other unsafe crates I maintain. The safety comments I'm proposing are not just suggestions; they're prerequisites of merging this PR.
The argument about a semantic difference is interesting. Leaking is not a soundness issue, and we don't make any promises to match the drop behavior of Vec
under such conditions. Nonetheless, I'll grant that minimizing leaking is preferable. I'll have to give this one more thought.
for _ in 0..N { | ||
builder.push(it.next()?); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that these builder methods are safe, we can go back to calling .take
, which can have performance benefits in some cases:
for _ in 0..N { | |
builder.push(it.next()?); | |
} | |
it.take(N).for_each(|elt| builder.push(elt)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually believe take
here is more likely to negatively affect performance, either way the performance is bad due to the optimizer really not liking this code: rust-lang/rust#126000.
@jswrenn I actually wanted to suggest bumping the MSRV to whatever version you decide on in a separate PR (with whatever TODO's / clippy lints it resolves). Then we'd merge that one first rather than tacking it onto this one. I have comitted some of your suggestions, although others I disagree with, see above. |
With this pull request I add two new functions to the
Itertools
trait:These behave exactly like
next_tuple
andcollect_tuple
, however they return arrays instead. Since these functions requiremin_const_generics
, I added a tiny build script that checks if Rust's version is 1.51 or higher, and if yes to set thehas_min_const_generics
config variable. This means thatItertools
does not suddenly require 1.51 or higher, only these two functions do.In order to facilitate this I did have to bump the minimum required Rust version to 1.34 from the (documented) 1.32, since Rust 1.32 and 1.33 have trouble parsing the file even if stuff is conditionally compiled. However, this should not result in any (new) breakage, because
Itertools
actually already requires Rust 1.34 for 9+ months, since 83c0f04 usessaturating_pow
which wasn't stabilized until 1.34.As for rationale, I think these functions are useful, especially for pattern matching and parsing. I don't think there's a high probability they get added to the standard library either, so that's why I directly make a pull request here. When/if
TryFromIterator
stabilizes we can simplify the implementation, but even then I believe these functions remain a good addition similarly howcollect_vec
is nice to have despite.collect::<Vec<_>>
existing.