Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds triu and tril methods directly to ArrayBase #1386

Merged
merged 4 commits into from
May 27, 2024

Conversation

akern40
Copy link
Collaborator

@akern40 akern40 commented May 16, 2024

The implementation seems to work, but unsure about whether this should be a free function vs bound directly to ArrayBase vs a trait. Closes #1323.

src/tri.rs Show resolved Hide resolved
src/tri.rs Outdated Show resolved Hide resolved
src/tri.rs Outdated Show resolved Hide resolved
src/tri.rs Outdated Show resolved Hide resolved
@akern40 akern40 changed the title Adds triu and tril methods directly to ArrayBase to fix #1323 Adds triu and tril methods directly to ArrayBase May 18, 2024
@akern40
Copy link
Collaborator Author

akern40 commented May 18, 2024

Hold off on any merge for a moment; tril panics when given a 0D array, working on a fix.

@akern40
Copy link
Collaborator Author

akern40 commented May 18, 2024

Ok, fixed.

Not to move conversations, but two thoughts on the (now resolved) discussion about order of the input array:

  1. Would there be appetite for {zeros|ones|full}_like public functions? That way users don't have to do the rather-convoluted self.raw_dim().set_f(is_layout_f(&self.dim, &self.strides)) that I'm doing. Happy to make a tracking issue and put in the PRs for that.
  2. I did a quick benchmark of tri[ul]'s performance on c-order vs f-order, and it's not great; results are below. The slowdown is quite apparent at larger array sizes, as expected. The via_t lines are referring to the quick and dirty solution I could think of, which is x_tril = x.t().triu(0).t().to_owned().
running 3 tests (64 x 64)
test triu_c       ... bench:       1,681 ns/iter (+/- 45)
test triu_f       ... bench:       2,011 ns/iter (+/- 32)
test triu_f_via_t ... bench:       2,133 ns/iter (+/- 29)

running 3 tests (2048 x 2048)
test triu_c       ... bench:   2,418,649 ns/iter (+/- 320,729)
test triu_f       ... bench:  22,184,991 ns/iter (+/- 4,532,682)
test triu_f_via_t ... bench:   4,259,493 ns/iter (+/- 218,849)

@nilgoyette
Copy link
Collaborator

1. Would there be appetite for `{zeros|ones|full}_like` public functions? That way users don't have to do the rather-convoluted `self.raw_dim().set_f(is_layout_f(&self.dim, &self.strides))` that I'm doing. Happy to make a tracking issue and put in the PRs for that.

Well, I know that I'm interested in those functions, but it may not be what @bluss had in mind. The more methods we have, the harder it is to find them.

As for the benchmarks, it's kinda normal because those larger arrays don't fit in the cache and you're iterating in the wrong order. Seeing those results, I think adding a condition on the memory order is worth it. Some people (like me) often work in f-order :)

@akern40
Copy link
Collaborator Author

akern40 commented May 20, 2024

Well, I know that I'm interested in those functions, but it may not be what @bluss had in mind. The more methods we have, the harder it is to find them.

This is true; I don't want to bloat the API. That being said, I think some (difficult and time-consuming and I don't mean to imply it's trivial) documentation work could make the docs easier to navigate and alleviate concerns about more functions.

Some people (like me) often work in f-order :)

Me too :) too many column-vector algorithms / math. I was interested in seeing how well NumPy does on the transposed version and I guess they didn't write one that dispatches on order:

In [1]: import numpy as np

In [2]: x = np.ones((2048, 2048))

In [3]: %timeit np.triu(x)
4.34 ms ± 53.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [4]: x = np.ones((2048, 2048), order="F")

In [5]: %timeit np.triu(x)
19 ms ± 557 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

(Fun to see that ndarray is faster than NumPy here, though! Although that extra bit of time could just be Python overhead, rather than raw NumPy performance).

If we want to create a specialization, I think we can do it via the transpose-transpose method I mentioned above, with a slight tweak to avoid the copy: (EDITED)

pub fn triu(&self, k: isize) -> Array<A, D> {
    match self.ndim() > 1 && is_layout_f(&self.dim, &self.strides) {
        true => {
            let n = self.ndim();
            let mut x = self.view();
            x.swap_axes(n - 2, n - 1);
            let mut tril = x.tril(-k).into_shared();
            tril.swap_axes(n - 2, n - 1);
            tril.into_owned()
        },
        false => {
            let mut res = Array::zeros(self.raw_dim());
            Zip::indexed(self.rows())
                .and(res.rows_mut())
                .for_each(|i, src, mut dst| {
                    let row_num = i.into_dimension().last_elem();
                    let lower = max(row_num as isize + k, 0);
                    dst.slice_mut(s![lower..]).assign(&src.slice(s![lower..]));
                });

            res
        }
    }
}

which gets us the following performance: (EDITED)

running 3 tests (64 x 64)
test triu_c       ... bench:       1,709 ns/iter (+/- 58)
test triu_f       ... bench:       1,723 ns/iter (+/- 43)

running 3 tests (2048 x 2048)
test triu_c       ... bench:   2,429,621 ns/iter (+/- 171,843)
test triu_f       ... bench:   2,484,196 ns/iter (+/- 159,421)

If this gets some thumbs up I'll add it to the PR (and maybe squash at this point to reduce commits?)

Glad I got to explore the _owned methods - seems like it should be possible to do non-copy _owned on mutable views, but that's an improvement for another day / PR.

@akern40 akern40 force-pushed the triangular branch 2 times, most recently from 7d24c54 to 6316fbe Compare May 22, 2024 03:13
Includes branched implementations for f- and c-order arrays.
src/tri.rs Outdated Show resolved Hide resolved
@nilgoyette
Copy link
Collaborator

Thank you for your contribution @akern40

@nilgoyette nilgoyette added this pull request to the merge queue May 27, 2024
Merged via the queue into rust-ndarray:master with commit 72b0d09 May 27, 2024
10 checks passed
zackmdavis added a commit to zackmdavis/Radiance that referenced this pull request Aug 3, 2024
They added `tril` and `triu` in rust-ndarray/ndarray#1386, but the
maintainers are busy, and there hasn't been an ndarray release since
July 2022 (despite active development?). I could write my own tril (or
copy–paste it), or ... we could run tip of master?! It is written that
Y.O.L.O.
@zackmdavis
Copy link

Thanks @akern40!!

To anyone waiting to use this before 0.16 is released, remember that you can run master by changing the version specification in your Cargo.toml to {git = "https://github.com/rust-ndarray/ndarray"} . While it doesn't work with the ndarray-rand 0.14 from crates.io, you can just change the ndarray-rand dependency to also be {git = "https://github.com/rust-ndarray/ndarray"}, and then it works. (ndarray-rand lives in the same repo; Cargo is smart enough to find it.)

@akern40 akern40 deleted the triangular branch September 29, 2024 00:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement numpy's triu and tril methods
3 participants