Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

edwards: improve the performance of Add, MixedAdd and IsOnCurve #441

Merged
merged 4 commits into from
Sep 5, 2023

Conversation

jsign
Copy link
Contributor

@jsign jsign commented Sep 4, 2023

Description

This PR improves the performance if Edwards-related curves in methods: Add, MixedAdd and IsOnCuve. The crux of the problem is two indirect allocations produced by GetEdwardsCurve(...) call.

I’m pretty sure that, at least for MixedAdd, this is a regression since we didn’t have this allocation some version ago. While fixing it, I noticed this was also happening today for Add and IsOnCuve, so I’m unsure if those are regressions.

The benchmarks also surface a performance problem I’m not fixing in this PR to avoid convoluting topics. I’ll give a brief explanation in the benchmark section below.

Type of change

  • Performance improvement

How has this been tested?

This PR doesn’t change the logic or add new border cases, so existing tests cover correct behavior.

How has this been benchmarked?

The three mentioned methods (Add, MixedAdd and IsOnCurve) didn’t have benchmarks for Edwards curves, so that’s potentially the reason this was not detected before. I’ve created new benchmarks for each of them to measure the effect of this change.

Running these benchmarks before and after this PR:

$ benchstat pre.pprof post.pprof                                                             
name                    old time/op    new time/op    delta
MixedAdd/Projective-16     488ns ± 3%     218ns ± 0%   -55.23%  (p=0.000 n=10+8)
MixedAdd/Extended-16       221ns ± 2%     224ns ± 5%      ~     (p=0.148 n=10+10)
Add/Affine-16             4.86µs ± 1%    4.63µs ± 3%    -4.75%  (p=0.000 n=10+10)
Add/Projective-16          501ns ± 2%     238ns ± 1%   -52.45%  (p=0.000 n=10+8)
Add/Extended-16           3.70µs ± 1%    3.68µs ± 3%      ~     (p=0.255 n=10+10)
IsOnCurve/positive-16      357ns ± 3%     107ns ± 4%   -70.03%  (p=0.000 n=9+10)
IsOnCurve/negative-16      351ns ± 3%     108ns ± 4%   -69.12%  (p=0.000 n=9+10)

name                    old alloc/op   new alloc/op   delta
MixedAdd/Projective-16      128B ± 0%        0B       -100.00%  (p=0.000 n=10+10)
MixedAdd/Extended-16       0.00B          0.00B           ~     (all equal)
Add/Affine-16               128B ± 0%        0B       -100.00%  (p=0.000 n=10+10)
Add/Projective-16           128B ± 0%        0B       -100.00%  (p=0.000 n=10+10)
Add/Extended-16            0.00B          0.00B           ~     (all equal)
IsOnCurve/positive-16       128B ± 0%        0B       -100.00%  (p=0.000 n=10+10)
IsOnCurve/negative-16       128B ± 0%        0B       -100.00%  (p=0.000 n=10+10)

name                    old allocs/op  new allocs/op  delta
MixedAdd/Projective-16      2.00 ± 0%      0.00       -100.00%  (p=0.000 n=10+10)
MixedAdd/Extended-16        0.00           0.00           ~     (all equal)
Add/Affine-16               2.00 ± 0%      0.00       -100.00%  (p=0.000 n=10+10)
Add/Projective-16           2.00 ± 0%      0.00       -100.00%  (p=0.000 n=10+10)
Add/Extended-16             0.00           0.00           ~     (all equal)
IsOnCurve/positive-16       2.00 ± 0%      0.00       -100.00%  (p=0.000 n=10+10)
IsOnCurve/negative-16       2.00 ± 0%      0.00       -100.00%  (p=0.000 n=10+10)

Note: you can run the “before” by checking out commit 53784cadc54d39c1d1 where I’ve included the benchmarks for master without the performance improvements.
Note-2: I'm only showing differences for Bandersnatch; but the same would apply to other edwards.

In summary: 2x speedup in computation, and removing both existing allocations.

Now, notice the Add/Extended benchmark. This result is unchanged, which makes sense since it didn’t call GetEdwardsCurve(...) in the implementation, but the performance is odd since it should be faster than projective. I know the reason, so I’d recommend seeing this draft PR and maybe continuing the discussion there. :)

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
@jsign jsign marked this pull request as ready for review September 4, 2023 14:41
@yelhousni yelhousni self-requested a review September 4, 2023 19:28
@yelhousni yelhousni self-assigned this Sep 4, 2023
@yelhousni yelhousni added the perf label Sep 4, 2023
Copy link
Collaborator

@yelhousni yelhousni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @jsign ! This looks good to me. We have not optimised a lot twisted Edwards as they are only used in tests on the gnark side. If you have more optimisations, keep them coming our way! Thanks!

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants