Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-39565: [C++] Do not concatenate chunked values of fixed-width types to run "array_take" #41700

Draft
wants to merge 28 commits into
base: main
Choose a base branch
from

Conversation

felipecrv
Copy link
Contributor

@felipecrv felipecrv commented May 17, 2024

Rationale for this change

Concatenating a chunked array into a single array before running the array_take kernels is very inefficient and can lead to out-of-memory crashes. See also #25822.

What changes are included in this PR?

  • Improvements in the dispatching logic of TakeMetaFunction("take") to make "array_take" able to have a chunked_exec kernel for some types
  • Implementation of kernels for "array_take" that can receive a ChunkedArray as values and produce an output without concatenating these chunks

Are these changes tested?

By existing tests. Some tests were added in previous PRs that introduced some of the infrastructure to support this.

…pending nulls to a FixedSizeListBuilder"

This reverts commit b31590c.
TakeCAA will be used from TakeCCC soon.
…::chunked_exec

Before this commit, only the "take" meta function could handle CA
parameters.
@felipecrv felipecrv changed the title GH-39565: [C++] GH-39565: [C++] Do not concatenate ChunkedArray values to run "array_take" May 17, 2024
@felipecrv felipecrv changed the title GH-39565: [C++] Do not concatenate ChunkedArray values to run "array_take" GH-39565: [C++] Do not concatenate chunked values of fixed-width types to run "array_take" May 17, 2024
Comment on lines +794 to +795
// XXX: this loop can use TakeCAA once it can handle ChunkedArray
// without concatenating first
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm working on this one.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels May 17, 2024
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels May 17, 2024
@mapleFU
Copy link
Member

mapleFU commented May 17, 2024

May I ask a unrelated question, when would we call assert and when call DCHECK, since I think they would likely to be same?

@felipecrv
Copy link
Contributor Author

May I ask a unrelated question, when would we call assert and when call DCHECK, since I think they would likely to be same?

We call assert in headers because we don't want to pay the cost of including logging.h everywhere. Think of assert as lighter-weight debug checks. But if you see an assert in a .cc file tell me to change it to DCHECK*.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants