Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] Crashed at TempStack alloc when use Hashing32::HashBatch independently #40431

Closed
ZhangHuiGui opened this issue Mar 9, 2024 · 5 comments
Assignees
Milestone

Comments

@ZhangHuiGui
Copy link
Collaborator

Describe the bug, including details regarding any error messages, version, and platform.

The issue is similar to #40007, but they are different.
I want to use the Hashing32::HashBatch api for produce a hash-array for a batch. Although the Hashing32 and Hashing64 are used in join based codes, but they can be used independently.

Like below codes:

  auto arr = arrow::ArrayFromJSON(arrow::int32(), "[9,2,6]");
  const int batch_len = arr->length();
  arrow::compute::ExecBatch exec_batch({arr}, batch_len);
  auto ctx = arrow::compute::default_exec_context();
  arrow::util::TempVectorStack stack;
  ASSERT_OK(stack.Init(ctx->memory_pool(), batch_len * sizeof(uint32_t))); // I just alloc the stack size as i needed.

  std::vector<uint32_t> hashes(batch_len);
  std::vector<arrow::compute::KeyColumnArray> temp_column_arrays;
  ASSERT_OK(arrow::compute::Hashing32::HashBatch(
      exec_batch, hashes.data(), temp_column_arrays,
      ctx->cpu_info()->hardware_flags(), &stack, 0, batch_len));

The crash stack in HashBatch is:

arrow::compute::Hashing32::HashBatch
  arrow::compute::Hashing32::HashMultiColumn
      arrow::util::TempVectorHolder<unsigned int>::TempVectorHolder
        arrow::util::TempVectorStack::alloc
          ARROW_DCHECK(top_ <= buffer_size_); // top_=4176, buffer_size_=160

The reason is blow codes:

constexpr uint32_t max_batch_size = util::MiniBatch::kMiniBatchLength;
auto hash_temp_buf = util::TempVectorHolder<uint32_t>(ctx->stack, max_batch_size);

The holder use the max_batch_size which is 1024 as it's num_elements, it's far more than the temp stack's init buffer_size.

I know that the HashBatch is only used in hash-join or related codes. For join, they have already done line clipping at the upper level, ensuring that each input batch size is less_equal to kMiniBatchLength and the stack size is bigger enough.

But it can be used independently. So maybe we could use the num_rows rather than util::MiniBatch::kMiniBatchLength in HashBatch related apis?

Component(s)

C++

@ZhangHuiGui
Copy link
Collaborator Author

cc @kou PTAL this issue?

@kou
Copy link
Member

kou commented Mar 9, 2024

Can you provide a buildable C++ code that reproduces this problem?

@ZhangHuiGui
Copy link
Collaborator Author

ZhangHuiGui commented Mar 9, 2024

Can you provide a buildable C++ code that reproduces this problem?

Of course.

#include <arrow/compute/exec.h>
#include <arrow/compute/util.h>
#include <arrow/testing/gtest_util.h>
#include <arrow/testing/random.h>
#include <arrow/type_fwd.h>
#include <arrow/compute/light_array.h>
#include <arrow/compute/key_hash.h>
#include <arrow/util/async_util.h>
#include <arrow/util/future.h>
#include <arrow/util/task_group.h>
#include <arrow/util/thread_pool.h>
#include <arrow/util/logging.h>
#include <arrow/acero/options.h>
#include <arrow/compute/api_vector.h>
#include <arrow/memory_pool.h>
#include <arrow/record_batch.h>
#include <arrow/builder.h>
#include <arrow/result.h>
#include <arrow/array/diff.h>

#include <mutex>
#include <thread>
#include <unordered_map>
#include "gtest/gtest.h"

TEST(HashBatch, BasicTest) {
  auto arr = arrow::ArrayFromJSON(arrow::int32(), "[9,2,6]");
  const int batch_len = arr->length();
  arrow::compute::ExecBatch exec_batch({arr}, batch_len);
  auto ctx = arrow::compute::default_exec_context();
  arrow::util::TempVectorStack stack;
  ASSERT_OK(stack.Init(ctx->memory_pool(), batch_len * sizeof(uint32_t)));

  std::vector<uint32_t> hashes(batch_len);
  std::vector<arrow::compute::KeyColumnArray> temp_column_arrays;
  ASSERT_OK(arrow::compute::Hashing32::HashBatch(
      exec_batch, hashes.data(), temp_column_arrays,
      ctx->cpu_info()->hardware_flags(), &stack, 0, batch_len));

  for (int i = 0; i < batch_len; i++) {
    std::cout << hashes[i] << " ";
  }
}

cc @kou, do you think this problem needs to be solved?

@kou
Copy link
Member

kou commented Mar 11, 2024

Sorry. I missed this.

Thanks. I could run the code:

diff --git a/cpp/src/arrow/compute/key_hash_test.cc b/cpp/src/arrow/compute/key_hash_test.cc
index c998df7169..ccfddaa645 100644
--- a/cpp/src/arrow/compute/key_hash_test.cc
+++ b/cpp/src/arrow/compute/key_hash_test.cc
@@ -311,5 +311,24 @@ TEST(VectorHash, FixedLengthTailByteSafety) {
   HashFixedLengthFrom(/*key_length=*/19, /*num_rows=*/64, /*start_row=*/63);
 }
 
+TEST(HashBatch, BasicTest) {
+  auto arr = arrow::ArrayFromJSON(arrow::int32(), "[9,2,6]");
+  const int batch_len = arr->length();
+  arrow::compute::ExecBatch exec_batch({arr}, batch_len);
+  auto ctx = arrow::compute::default_exec_context();
+  arrow::util::TempVectorStack stack;
+  ASSERT_OK(stack.Init(ctx->memory_pool(), batch_len * sizeof(uint32_t)));
+
+  std::vector<uint32_t> hashes(batch_len);
+  std::vector<arrow::compute::KeyColumnArray> temp_column_arrays;
+  ASSERT_OK(arrow::compute::Hashing32::HashBatch(
+      exec_batch, hashes.data(), temp_column_arrays,
+      ctx->cpu_info()->hardware_flags(), &stack, 0, batch_len));
+
+  for (int i = 0; i < batch_len; i++) {
+    std::cout << hashes[i] << " ";
+  }
+}
+
 }  // namespace compute
 }  // namespace arrow

In general, allocating only required size is preferred. So using the num_rows rather than util::MiniBatch::kMiniBatchLength in HashBatch related apis may be better. (Sorry, I can't determine whether the num_rows is the correct size for it right now.)

But it seems that stack.Init(ctx->memory_pool(), batch_len * sizeof(uint32_t)) isn't enough for this case. Because we need at least 3 allocations for HashMultiColumn():

auto hash_temp_buf = util::TempVectorHolder<uint32_t>(ctx->stack, max_batch_size);
uint32_t* hash_temp = hash_temp_buf.mutable_data();
auto null_indices_buf = util::TempVectorHolder<uint16_t>(ctx->stack, max_batch_size);
uint16_t* null_indices = null_indices_buf.mutable_data();
int num_null_indices;
auto null_hash_temp_buf = util::TempVectorHolder<uint32_t>(ctx->stack, max_batch_size);
uint32_t* null_hash_temp = null_hash_temp_buf.mutable_data();

And we also need 16 bytes metadata:

int64_t new_top = top_ + PaddedAllocationSize(num_bytes) + 2 * sizeof(uint64_t);
// Stack overflow check (see GH-39582).
// XXX cannot return a regular Status because most consumers do not either.
ARROW_CHECK_LE(new_top, buffer_size_) << "TempVectorStack::alloc overflow";
*data = buffer_->mutable_data() + top_ + sizeof(uint64_t);
// We set 8 bytes before the beginning of the allocated range and
// 8 bytes after the end to check for stack overflow (which would
// result in those known bytes being corrupted).
reinterpret_cast<uint64_t*>(buffer_->mutable_data() + top_)[0] = kGuard1;
reinterpret_cast<uint64_t*>(buffer_->mutable_data() + new_top)[-1] = kGuard2;

Anyway, could you try this?

pitrou added a commit that referenced this issue Apr 2, 2024
…ternal for prevent using by users (#40484)

### Rationale for this change

These files expose implementation details and APIs that are not meant for third-party use. This PR explicitly marks them internal, which also avoids having them installed.

### Are these changes tested?

By existing builds and tests.

### Are there any user-facing changes?

No, except hiding some header files that were not supposed to be included externally.

* GitHub Issue: #40431

Lead-authored-by: ZhangHuiGui <hugo.zhang@openpie.com>
Co-authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
@pitrou pitrou added this to the 16.0.0 milestone Apr 2, 2024
@pitrou
Copy link
Member

pitrou commented Apr 2, 2024

Issue resolved by pull request 40484
#40484

@pitrou pitrou closed this as completed Apr 2, 2024
tolleybot pushed a commit to tmct/arrow that referenced this issue May 2, 2024
… to internal for prevent using by users (apache#40484)

### Rationale for this change

These files expose implementation details and APIs that are not meant for third-party use. This PR explicitly marks them internal, which also avoids having them installed.

### Are these changes tested?

By existing builds and tests.

### Are there any user-facing changes?

No, except hiding some header files that were not supposed to be included externally.

* GitHub Issue: apache#40431

Lead-authored-by: ZhangHuiGui <hugo.zhang@openpie.com>
Co-authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
vibhatha pushed a commit to vibhatha/arrow that referenced this issue May 25, 2024
… to internal for prevent using by users (apache#40484)

### Rationale for this change

These files expose implementation details and APIs that are not meant for third-party use. This PR explicitly marks them internal, which also avoids having them installed.

### Are these changes tested?

By existing builds and tests.

### Are there any user-facing changes?

No, except hiding some header files that were not supposed to be included externally.

* GitHub Issue: apache#40431

Lead-authored-by: ZhangHuiGui <hugo.zhang@openpie.com>
Co-authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants