Add missing atomic operators, refactor atomic operators, move atomic operators to detail namespace. #14962

bdice · 2024-02-03T05:05:40Z

Description

This PR does a thorough refactoring of device_atomics.cuh.

I moved all atomic-related functions to cudf::detail:: (making this an API-breaking change, but most likely a low-impact break)
I added all missing operators for natively supported types to atomicAdd, atomicMin, atomicMax, etc. as discussed in [BUG] Sum and multiply aggregations promote unsigned input types to a signed output #10149 and Revert sum/product aggregation to always produce int64_t type #14907.
- This should prevent fallback to the atomicCAS path for types that are natively supported for those atomic operators, which we suspect as the root cause of the performance regression in [BUG] Performance regression in cuDF after #14679 #14886.
I kept atomicAdd rather than cudf::detail::atomic_add in locations where a native CUDA overload exists, and the same for min/max/CAS operations. Aggregations are the only place where we use the special overloads. We were previously calling the native CUDA function rather than our special overloads in many cases, so I retained the previous behavior. This avoids including the additional headers that implement an unnecessary level of wrapping for natively supported overloads.
I enabled native 2-byte CAS operations (on unsigned short int) that eliminate the do-while loop and extra alignment-checking logic
- The CUDA docs don't state this, but some forum posts claim this is only supported by compute capability 7.0+. We now have 7.0 as a lower bound for RAPIDS so I'm not concerned by this as long as builds/tests pass.
I improved/cleaned the documentation and moved around some code so that the operators were in a logical order.
I assessed the existing tests and it looks like all the types are being covered. I'm not sure if there is a good way to enforce that certain types (like uint64_t) are passing through native atomicAdd calls.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

karthikeyann · 2024-02-03T06:06:54Z

I changed all uses of atomicAdd to use cudf::detail::atomic_add . I'd like to retain the previous behavior

I support to retain previous behaviour. If not, it introduces another templated header into include dependency of other files. Templated atomicAdd is created to support unsupported types in CUDA. It's not required for these other files.

GregoryKimball · 2024-02-04T20:38:27Z

@SurajAralihalli FYI

…omics

GregoryKimball · 2024-03-12T14:43:57Z

@SurajAralihalli Would you like to share a review on this change?

bdice · 2024-03-12T23:08:04Z

/merge

bdice added 3 commits February 2, 2024 21:37

Refactor atomicAdd to cudf::detail::atomic_add.

5cf4c16

Refactor other atomic operators.

69dc774

Refactor docs and clean up device_atomic.cuh.

25a1125

github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Feb 3, 2024

bdice changed the title ~~Refactor atomic operators, move to detail namespace.~~ Add missing atomic operators, and refactor/move atomic operators to detail namespace. Feb 3, 2024

bdice mentioned this pull request Feb 3, 2024

[BUG] Sum and multiply aggregations promote unsigned input types to a signed output #10149

Open

Revert atomic_add in most places, update tests.

5ecf52f

bdice changed the title ~~Add missing atomic operators, and refactor/move atomic operators to detail namespace.~~ Add missing atomic operators, refactor atomic operators, move atomic operators to detail namespace. Feb 4, 2024

bdice self-assigned this Feb 4, 2024

bdice added improvement Improvement / enhancement to an existing function breaking Breaking change labels Feb 4, 2024

GregoryKimball requested a review from SurajAralihalli February 7, 2024 04:12

bdice added 8 commits February 28, 2024 13:42

Merge remote-tracking branch 'upstream/branch-24.04' into refactor-at…

b69ed50

…omics

Merge remote-tracking branch 'upstream/branch-24.04' into refactor-at…

c10ce78

…omics

Merge remote-tracking branch 'upstream/branch-24.04' into refactor-at…

b1b8a5e

…omics

Merge branch 'branch-24.04' into refactor-atomics

d8fe49d

Merge branch 'branch-24.04' into refactor-atomics

86e325d

Fix copyright.

0d46d99

Add missing headers that were implicitly included before this PR.

282a341

Merge remote-tracking branch 'upstream/branch-24.04' into refactor-at…

2ccc3fc

…omics

bdice marked this pull request as ready for review March 9, 2024 01:27

bdice requested a review from a team as a code owner March 9, 2024 01:27

bdice requested review from davidwendt and nvdbaranec March 9, 2024 01:27

Merge branch 'branch-24.04' into refactor-atomics

c42b5f3

davidwendt approved these changes Mar 11, 2024

View reviewed changes

SurajAralihalli approved these changes Mar 12, 2024

View reviewed changes

rapids-bot bot merged commit 155405b into rapidsai:branch-24.04 Mar 12, 2024
73 checks passed

vyasr mentioned this pull request May 14, 2024

[FEA] Don't copy data in to/from_dlpack when unnecessary #10874

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add missing atomic operators, refactor atomic operators, move atomic operators to detail namespace. #14962

Add missing atomic operators, refactor atomic operators, move atomic operators to detail namespace. #14962

bdice commented Feb 3, 2024 •

edited

karthikeyann commented Feb 3, 2024

GregoryKimball commented Feb 4, 2024

GregoryKimball commented Mar 12, 2024

bdice commented Mar 12, 2024

Add missing atomic operators, refactor atomic operators, move atomic operators to detail namespace. #14962

Add missing atomic operators, refactor atomic operators, move atomic operators to detail namespace. #14962

Conversation

bdice commented Feb 3, 2024 • edited

Description

Checklist

karthikeyann commented Feb 3, 2024

GregoryKimball commented Feb 4, 2024

GregoryKimball commented Mar 12, 2024

bdice commented Mar 12, 2024

bdice commented Feb 3, 2024 •

edited