[C++][Acero] Acero's shared (per-thread) temp vector stack usage may overflow #41334

zanmato1984 · 2024-04-22T12:38:32Z

Describe the enhancement requested

Currently the size of Acero's temp stack is hardcoded in source:

arrow/cpp/src/arrow/acero/query_context.cc

Line 56 in 25bb627

memory_pool(), 32 * util::MiniBatch::kMiniBatchLength * sizeof(uint64_t)));

We've observed issues of hang/crash caused by this stack overflow, to name two: #39582, #39951 . And the hardcoded stack size has been once bumped, along with a more explicit stack overflow error, to solve them: #40007. But as long as the stack size is still hardcoded (and not insanely big), it won't be surprising if the overflow happens again for a more complex plan. And once that happened, one wouldn't be able to mitigate this issue without patching the source code (with a bigger stack size) and building the binary from scratch.

~~I think an env var might be well suitable to tune this stack size at runtime, to help the user quick mitigate the overflow.~~

But if we get lucky, overflow may still happen - a concrete case: https://github.com/apache/arrow/pull/41335/files#diff-6c3d49ae137b8279afb2610f6d16010618e451b4da2b4279637cf44175b0cd71R3206-R3254

Component(s)

C++

…te overflow (#41335) ### Rationale for this change The risk of temp vector stack overflow still exists as described in #41334 . Many people have agreed on a per-node basis approach: > 1) it doesn't introduce more performance penalty than shared stack; 2) it can mitigate the overflow in a natural way, i.e., expanding the stack size linear to the number of nodes; 3) it requires no more complexity to the existing stack implementation. The full (but long) story is also revealed in the subsequent discussion of this PR. Feel free to scroll down. ### What changes are included in this PR? 1. Change the current shared (per-thread) temp vector stack usage to per-node basis. 2. Make the stack size required by each stack user more explicit. ### Are these changes tested? UT included. ### Are there any user-facing changes? None. * GitHub Issue: #41334 Authored-by: Ruoxi Sun <zanmato1984@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>

pitrou · 2024-05-14T17:10:29Z

Issue resolved by pull request 41335
#41335

…Init called in hash_join_benchmark (#41716) ### Rationale for this change My local compilation parameters will include the compilation of some basic benchmarks. I discovered this compilation problem today. It seems that #41334 of `QueryContext::Init` is not synchronized to `hash_join_benchmark.cc`, and CI has not found this problem. . ### What changes are included in this PR? Remove the first arg . ### Are these changes tested? Needn't ### Are there any user-facing changes? No * GitHub Issue: #41720 Lead-authored-by: ZhangHuiGui <2689496754@qq.com> Co-authored-by: ZhangHuiGui <hugo.zhang@openpie.com> Signed-off-by: Antoine Pitrou <antoine@python.org>

…mitigate overflow (apache#41335) ### Rationale for this change The risk of temp vector stack overflow still exists as described in apache#41334 . Many people have agreed on a per-node basis approach: > 1) it doesn't introduce more performance penalty than shared stack; 2) it can mitigate the overflow in a natural way, i.e., expanding the stack size linear to the number of nodes; 3) it requires no more complexity to the existing stack implementation. The full (but long) story is also revealed in the subsequent discussion of this PR. Feel free to scroll down. ### What changes are included in this PR? 1. Change the current shared (per-thread) temp vector stack usage to per-node basis. 2. Make the stack size required by each stack user more explicit. ### Are these changes tested? UT included. ### Are there any user-facing changes? None. * GitHub Issue: apache#41334 Authored-by: Ruoxi Sun <zanmato1984@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>

…text::Init called in hash_join_benchmark (apache#41716) ### Rationale for this change My local compilation parameters will include the compilation of some basic benchmarks. I discovered this compilation problem today. It seems that apache#41334 of `QueryContext::Init` is not synchronized to `hash_join_benchmark.cc`, and CI has not found this problem. . ### What changes are included in this PR? Remove the first arg . ### Are these changes tested? Needn't ### Are there any user-facing changes? No * GitHub Issue: apache#41720 Lead-authored-by: ZhangHuiGui <2689496754@qq.com> Co-authored-by: ZhangHuiGui <hugo.zhang@openpie.com> Signed-off-by: Antoine Pitrou <antoine@python.org>

…mitigate overflow (apache#41335) ### Rationale for this change The risk of temp vector stack overflow still exists as described in apache#41334 . Many people have agreed on a per-node basis approach: > 1) it doesn't introduce more performance penalty than shared stack; 2) it can mitigate the overflow in a natural way, i.e., expanding the stack size linear to the number of nodes; 3) it requires no more complexity to the existing stack implementation. The full (but long) story is also revealed in the subsequent discussion of this PR. Feel free to scroll down. ### What changes are included in this PR? 1. Change the current shared (per-thread) temp vector stack usage to per-node basis. 2. Make the stack size required by each stack user more explicit. ### Are these changes tested? UT included. ### Are there any user-facing changes? None. * GitHub Issue: apache#41334 Authored-by: Ruoxi Sun <zanmato1984@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>

zanmato1984 added the Type: enhancement label Apr 22, 2024

github-actions bot added the Component: C++ label Apr 22, 2024

zanmato1984 mentioned this issue Apr 22, 2024

GH-41334: [C++][Acero] Use per-node basis temp vector stack to mitigate overflow #41335

Merged

github-actions bot assigned zanmato1984 Apr 22, 2024

zanmato1984 changed the title ~~[C++][Acero] Use env var to tune Acero's temp stack size~~ [C++][Acero] Acero's shared (per-thread) temp vector stack usage may overflow May 8, 2024

pitrou added this to the 17.0.0 milestone May 14, 2024

pitrou closed this as completed May 14, 2024

This was referenced May 19, 2024

GH-41720: [C++][Acero] Remove an useless parameter for QueryContext::Init called in hash_join_benchmark #41716

Merged

[C++][Acero] An useless parameter for QueryContext::Init called in hash_join_benchmark #41720

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C++][Acero] Acero's shared (per-thread) temp vector stack usage may overflow #41334

[C++][Acero] Acero's shared (per-thread) temp vector stack usage may overflow #41334

zanmato1984 commented Apr 22, 2024 •

edited

pitrou commented May 14, 2024

[C++][Acero] Acero's shared (per-thread) temp vector stack usage may overflow #41334

[C++][Acero] Acero's shared (per-thread) temp vector stack usage may overflow #41334

Comments

zanmato1984 commented Apr 22, 2024 • edited

Describe the enhancement requested

Component(s)

pitrou commented May 14, 2024

zanmato1984 commented Apr 22, 2024 •

edited