Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Sum and multiply aggregations promote unsigned input types to a signed output #10149

Open
jlowe opened this issue Jan 27, 2022 · 7 comments
Assignees
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS

Comments

@jlowe
Copy link
Member

jlowe commented Jan 27, 2022

Describe the bug
When performing aggregations, the output types are often upscaled to help combat overflow situations. For example, performing a sum aggregation on an INT32 column will produce an INT64 result. However performing a sum aggregation on a UINT32 column produces an INT64 result rather than a UINT64 result.

Steps/Code to reproduce bug
Perform a sum aggregation with an input column of UINT32 and note that the result is INT64. Here's a snippet of a session doing this with the cudf Java API in the Spark REPL shell:

scala> import ai.rapids.cudf._
import ai.rapids.cudf._

scala> val t = new Table(ColumnVector.fromInts(0), ColumnVector.fromUnsignedInts(0))
t: ai.rapids.cudf.Table = Table{columns=[ColumnVector{rows=1, type=INT32, nullCount=Optional[0], offHeap=(ID: 5 7feec1b5ac90)}, ColumnVector{rows=1, type=UINT32, nullCount=Optional[0], offHeap=(ID: 9 7feec1b5a280)}], cudfTable=140663428850592, rows=1}

scala> t.groupBy(0).aggregate(GroupByAggregation.sum().onColumn(1))
res0: ai.rapids.cudf.Table = Table{columns=[ColumnVector{rows=1, type=INT32, nullCount=Optional.empty, offHeap=(ID: 10 7ff0b7039860)}, ColumnVector{rows=1, type=INT64, nullCount=Optional.empty, offHeap=(ID: 11 7ff0b7039760)}], cudfTable=140671839344304, rows=1}

Expected behavior
Unsigned input types should be promoted to unsigned output types for any aggregations where the sign of the result cannot change for unsigned inputs (e.g.: sum and multiply)

Additional context
See @jrhemstad's comment at #10102 (comment)

@jlowe jlowe added bug Something isn't working Needs Triage Need team to review and classify libcudf Affects libcudf (C++/CUDA) code. labels Jan 27, 2022
@github-actions github-actions bot added this to Needs prioritizing in Bug Squashing Jan 27, 2022
@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@github-actions
Copy link

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@GregoryKimball GregoryKimball added Spark Functionality that helps Spark RAPIDS and removed Needs Triage Need team to review and classify labels Jun 26, 2022
@GregoryKimball
Copy link
Contributor

From #10102 (comment):

I think the machinery in question was added before unsigned support and then was just never updated. It should be updated to use uint64 for unsigned integer types:

// Summing/Multiplying integers of any type, always use int64_t accumulator
template <typename Source, aggregation::Kind k>
struct target_type_impl<
Source,
k,
std::enable_if_t<std::is_integral<Source>::value && is_sum_product_agg(k)>> {
using type = int64_t;
};

could be updated to:

// Summing/Multiplying integers of any type, always use int64_t accumulator
template <typename Source, aggregation::Kind k>
struct target_type_impl<
  Source,
  k,
  std::enable_if_t<std::is_integral<Source>::value && is_sum_product_agg(k)>> {
  using type = std::conditional_t<std::is_signed_v<Source>, int64_t, uint64_t>>;
};

@GregoryKimball GregoryKimball added the good first issue Good for newcomers label Jun 30, 2022
rapids-bot bot pushed a commit that referenced this issue Jan 23, 2024
… Unsigned Output for Sum and Multiply (#14679)

During aggregation, output types are modified to prevent overflow. Presently, summing INT32 yields INT64, but summing UINT32 still results in INT64 instead of UINT64. This pull request resolves Issue #[10149](#10149) to ensure the correct output type is used when summing or multiplying integers.

Authors:
  - Suraj Aralihalli (https://github.com/SurajAralihalli)
  - Karthikeyan (https://github.com/karthikeyann)
  - Nghia Truong (https://github.com/ttnghia)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - Shruti Shivakumar (https://github.com/shrshi)
  - Karthikeyan (https://github.com/karthikeyann)

URL: #14679
@GregoryKimball
Copy link
Contributor

GregoryKimball commented Feb 1, 2024

The work in #14679 to address this issue ended up needed to be reverted in #14907 due to a performance regression reported in #14886.

In addition to adding back the changes in #14679, we also need to:

  • add specializations in device_atomics.cuh for uint32_t and uint64_t as per this comment
  • for clarity, update namespace for atomicAdd in aggregation.cuh to specify the cudf::detail:: namespace
  • also for clarity, extend the cudf::detail:: namespace updates to atomicMul atomicMin atomicMax in aggregation.cuh

@bdice
Copy link
Contributor

bdice commented Feb 3, 2024

I started an experiment in this direction before I re-read this issue and realized @SurajAralihalli was assigned here. With apologies to @SurajAralihalli, I think I have a good start on the atomics refactoring in #14962. I would like to get that PR merged first, because it should be a standalone improvement, and then we can revisit the changes that were originally reverted.

@karthikeyann
Copy link
Contributor

The revert can be undone after merging #14962. I tested a similar fix while debugging this issue with @SurajAralihalli .
Although, a thorough testing on other types in similar scenario is required to ensure other bugs are not hidden. (perhaps test chrono types, decimal types).

@SurajAralihalli
Copy link
Contributor

I started an experiment in this direction before I re-read this issue and realized @SurajAralihalli was assigned here. With apologies to @SurajAralihalli, I think I have a good start on the atomics refactoring in #14962. I would like to get that PR merged first, because it should be a standalone improvement, and then we can revisit the changes that were originally reverted.

Thanks @bdice for letting me know!

rapids-bot bot pushed a commit that referenced this issue Mar 12, 2024
…operators to detail namespace. (#14962)

This PR does a thorough refactoring of `device_atomics.cuh`.

- I moved all atomic-related functions to `cudf::detail::` (making this an API-breaking change, but most likely a low-impact break)
- I added all missing operators for natively supported types to `atomicAdd`, `atomicMin`, `atomicMax`, etc. as discussed in #10149 and #14907.
  - This should prevent fallback to the `atomicCAS` path for types that are natively supported for those atomic operators, which we suspect as the root cause of the performance regression in #14886.
- I kept `atomicAdd` rather than `cudf::detail::atomic_add` in locations where a native CUDA overload exists, and the same for min/max/CAS operations. Aggregations are the only place where we use the special overloads. We were previously calling the native CUDA function rather than our special overloads in many cases, so I retained the previous behavior. This avoids including the additional headers that implement an unnecessary level of wrapping for natively supported overloads.
- I enabled native 2-byte CAS operations (on `unsigned short int`) that eliminate the do-while loop and extra alignment-checking logic
  - The CUDA docs don't state this, but some forum posts claim this is only supported by compute capability 7.0+. We now have 7.0 as a lower bound for RAPIDS so I'm not concerned by this as long as builds/tests pass.
- I improved/cleaned the documentation and moved around some code so that the operators were in a logical order.
- I assessed the existing tests and it looks like all the types are being covered. I'm not sure if there is a good way to enforce that certain types (like `uint64_t`) are passing through native `atomicAdd` calls.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - David Wendt (https://github.com/davidwendt)
  - Suraj Aralihalli (https://github.com/SurajAralihalli)

URL: #14962
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS
Projects
No open projects
Bug Squashing
Needs prioritizing
Development

No branches or pull requests

6 participants