[C++] Crashed at TempStack alloc when use Hashing32::HashBatch independently #40431

ZhangHuiGui · 2024-03-09T07:05:44Z

Describe the bug, including details regarding any error messages, version, and platform.

The issue is similar to #40007, but they are different.
I want to use the Hashing32::HashBatch api for produce a hash-array for a batch. Although the Hashing32 and Hashing64 are used in join based codes, but they can be used independently.

Like below codes:

  auto arr = arrow::ArrayFromJSON(arrow::int32(), "[9,2,6]");
  const int batch_len = arr->length();
  arrow::compute::ExecBatch exec_batch({arr}, batch_len);
  auto ctx = arrow::compute::default_exec_context();
  arrow::util::TempVectorStack stack;
  ASSERT_OK(stack.Init(ctx->memory_pool(), batch_len * sizeof(uint32_t))); // I just alloc the stack size as i needed.

  std::vector<uint32_t> hashes(batch_len);
  std::vector<arrow::compute::KeyColumnArray> temp_column_arrays;
  ASSERT_OK(arrow::compute::Hashing32::HashBatch(
      exec_batch, hashes.data(), temp_column_arrays,
      ctx->cpu_info()->hardware_flags(), &stack, 0, batch_len));

The crash stack in HashBatch is:

arrow::compute::Hashing32::HashBatch
  arrow::compute::Hashing32::HashMultiColumn
      arrow::util::TempVectorHolder<unsigned int>::TempVectorHolder
        arrow::util::TempVectorStack::alloc
          ARROW_DCHECK(top_ <= buffer_size_); // top_=4176, buffer_size_=160

The reason is blow codes:

arrow/cpp/src/arrow/compute/key_hash.cc

Lines 385 to 387 in 7e286dd

    
           constexpr uint32_t max_batch_size = util::MiniBatch::kMiniBatchLength; 
        
           auto hash_temp_buf = util::TempVectorHolder<uint32_t>(ctx->stack, max_batch_size);

The holder use the max_batch_size which is 1024 as it's num_elements, it's far more than the temp stack's init buffer_size.

I know that the HashBatch is only used in hash-join or related codes. For join, they have already done line clipping at the upper level, ensuring that each input batch size is less_equal to kMiniBatchLength and the stack size is bigger enough.

But it can be used independently. So maybe we could use the num_rows rather than util::MiniBatch::kMiniBatchLength in HashBatch related apis?

Component(s)

C++

The text was updated successfully, but these errors were encountered:

ZhangHuiGui · 2024-03-09T07:06:12Z

cc @kou PTAL this issue?

kou · 2024-03-09T08:33:39Z

Can you provide a buildable C++ code that reproduces this problem?

ZhangHuiGui · 2024-03-09T08:53:47Z

Can you provide a buildable C++ code that reproduces this problem?

Of course.

#include <arrow/compute/exec.h>
#include <arrow/compute/util.h>
#include <arrow/testing/gtest_util.h>
#include <arrow/testing/random.h>
#include <arrow/type_fwd.h>
#include <arrow/compute/light_array.h>
#include <arrow/compute/key_hash.h>
#include <arrow/util/async_util.h>
#include <arrow/util/future.h>
#include <arrow/util/task_group.h>
#include <arrow/util/thread_pool.h>
#include <arrow/util/logging.h>
#include <arrow/acero/options.h>
#include <arrow/compute/api_vector.h>
#include <arrow/memory_pool.h>
#include <arrow/record_batch.h>
#include <arrow/builder.h>
#include <arrow/result.h>
#include <arrow/array/diff.h>

#include <mutex>
#include <thread>
#include <unordered_map>
#include "gtest/gtest.h"

TEST(HashBatch, BasicTest) {
  auto arr = arrow::ArrayFromJSON(arrow::int32(), "[9,2,6]");
  const int batch_len = arr->length();
  arrow::compute::ExecBatch exec_batch({arr}, batch_len);
  auto ctx = arrow::compute::default_exec_context();
  arrow::util::TempVectorStack stack;
  ASSERT_OK(stack.Init(ctx->memory_pool(), batch_len * sizeof(uint32_t)));

  std::vector<uint32_t> hashes(batch_len);
  std::vector<arrow::compute::KeyColumnArray> temp_column_arrays;
  ASSERT_OK(arrow::compute::Hashing32::HashBatch(
      exec_batch, hashes.data(), temp_column_arrays,
      ctx->cpu_info()->hardware_flags(), &stack, 0, batch_len));

  for (int i = 0; i < batch_len; i++) {
    std::cout << hashes[i] << " ";
  }
}

cc @kou, do you think this problem needs to be solved?

kou · 2024-03-11T23:58:10Z

Sorry. I missed this.

Thanks. I could run the code:

diff --git a/cpp/src/arrow/compute/key_hash_test.cc b/cpp/src/arrow/compute/key_hash_test.cc
index c998df7169..ccfddaa645 100644
--- a/cpp/src/arrow/compute/key_hash_test.cc
+++ b/cpp/src/arrow/compute/key_hash_test.cc
@@ -311,5 +311,24 @@ TEST(VectorHash, FixedLengthTailByteSafety) {
   HashFixedLengthFrom(/*key_length=*/19, /*num_rows=*/64, /*start_row=*/63);
 }
 
+TEST(HashBatch, BasicTest) {
+  auto arr = arrow::ArrayFromJSON(arrow::int32(), "[9,2,6]");
+  const int batch_len = arr->length();
+  arrow::compute::ExecBatch exec_batch({arr}, batch_len);
+  auto ctx = arrow::compute::default_exec_context();
+  arrow::util::TempVectorStack stack;
+  ASSERT_OK(stack.Init(ctx->memory_pool(), batch_len * sizeof(uint32_t)));
+
+  std::vector<uint32_t> hashes(batch_len);
+  std::vector<arrow::compute::KeyColumnArray> temp_column_arrays;
+  ASSERT_OK(arrow::compute::Hashing32::HashBatch(
+      exec_batch, hashes.data(), temp_column_arrays,
+      ctx->cpu_info()->hardware_flags(), &stack, 0, batch_len));
+
+  for (int i = 0; i < batch_len; i++) {
+    std::cout << hashes[i] << " ";
+  }
+}
+
 }  // namespace compute
 }  // namespace arrow

In general, allocating only required size is preferred. So using the num_rows rather than util::MiniBatch::kMiniBatchLength in HashBatch related apis may be better. (Sorry, I can't determine whether the num_rows is the correct size for it right now.)

But it seems that stack.Init(ctx->memory_pool(), batch_len * sizeof(uint32_t)) isn't enough for this case. Because we need at least 3 allocations for HashMultiColumn():

arrow/cpp/src/arrow/compute/key_hash.cc

Lines 387 to 395 in 605f8a7

    
           auto hash_temp_buf = util::TempVectorHolder<uint32_t>(ctx->stack, max_batch_size); 
        
           uint32_t* hash_temp = hash_temp_buf.mutable_data(); 
        
           auto null_indices_buf = util::TempVectorHolder<uint16_t>(ctx->stack, max_batch_size); 
        
           uint16_t* null_indices = null_indices_buf.mutable_data(); 
        
           int num_null_indices; 
        
           auto null_hash_temp_buf = util::TempVectorHolder<uint32_t>(ctx->stack, max_batch_size); 
        
           uint32_t* null_hash_temp = null_hash_temp_buf.mutable_data();

And we also need 16 bytes metadata:

arrow/cpp/src/arrow/compute/util.cc

Lines 35 to 44 in 605f8a7

    
           int64_t new_top = top_ + PaddedAllocationSize(num_bytes) + 2 * sizeof(uint64_t); 
        
           // Stack overflow check (see GH-39582). 
        
           // XXX cannot return a regular Status because most consumers do not either. 
        
           ARROW_CHECK_LE(new_top, buffer_size_) << "TempVectorStack::alloc overflow"; 
        
           *data = buffer_->mutable_data() + top_ + sizeof(uint64_t); 
        
           // We set 8 bytes before the beginning of the allocated range and 
        
           // 8 bytes after the end to check for stack overflow (which would 
        
           // result in those known bytes being corrupted). 
        
           reinterpret_cast<uint64_t*>(buffer_->mutable_data() + top_)[0] = kGuard1; 
        
           reinterpret_cast<uint64_t*>(buffer_->mutable_data() + new_top)[-1] = kGuard2;

Anyway, could you try this?

…ternal for prevent using by users (#40484) ### Rationale for this change These files expose implementation details and APIs that are not meant for third-party use. This PR explicitly marks them internal, which also avoids having them installed. ### Are these changes tested? By existing builds and tests. ### Are there any user-facing changes? No, except hiding some header files that were not supposed to be included externally. * GitHub Issue: #40431 Lead-authored-by: ZhangHuiGui <hugo.zhang@openpie.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>

pitrou · 2024-04-02T16:02:01Z

Issue resolved by pull request 40484
#40484

… to internal for prevent using by users (apache#40484) ### Rationale for this change These files expose implementation details and APIs that are not meant for third-party use. This PR explicitly marks them internal, which also avoids having them installed. ### Are these changes tested? By existing builds and tests. ### Are there any user-facing changes? No, except hiding some header files that were not supposed to be included externally. * GitHub Issue: apache#40431 Lead-authored-by: ZhangHuiGui <hugo.zhang@openpie.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>

ZhangHuiGui added the Type: bug label Mar 9, 2024

github-actions bot added the Component: C++ label Mar 9, 2024

kou assigned ZhangHuiGui Mar 11, 2024

github-actions bot mentioned this issue Mar 12, 2024

GH-40431: [C++] Move key_hash/key_map/light_array related files to internal for prevent using by users #40484

Merged

pitrou added this to the 16.0.0 milestone Apr 2, 2024

pitrou closed this as completed Apr 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C++] Crashed at TempStack alloc when use Hashing32::HashBatch independently #40431

[C++] Crashed at TempStack alloc when use Hashing32::HashBatch independently #40431

ZhangHuiGui commented Mar 9, 2024

ZhangHuiGui commented Mar 9, 2024

kou commented Mar 9, 2024

ZhangHuiGui commented Mar 9, 2024 •

edited

kou commented Mar 11, 2024

pitrou commented Apr 2, 2024

[C++] Crashed at TempStack alloc when use Hashing32::HashBatch independently #40431

[C++] Crashed at TempStack alloc when use Hashing32::HashBatch independently #40431

Comments

ZhangHuiGui commented Mar 9, 2024

Describe the bug, including details regarding any error messages, version, and platform.

Component(s)

ZhangHuiGui commented Mar 9, 2024

kou commented Mar 9, 2024

ZhangHuiGui commented Mar 9, 2024 • edited

kou commented Mar 11, 2024

pitrou commented Apr 2, 2024

ZhangHuiGui commented Mar 9, 2024 •

edited