Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][Acero] Acero's shared (per-thread) temp vector stack usage may overflow #41334

Closed
zanmato1984 opened this issue Apr 22, 2024 · 1 comment

Comments

@zanmato1984
Copy link
Collaborator

zanmato1984 commented Apr 22, 2024

Describe the enhancement requested

Currently the size of Acero's temp stack is hardcoded in source:

memory_pool(), 32 * util::MiniBatch::kMiniBatchLength * sizeof(uint64_t)));

We've observed issues of hang/crash caused by this stack overflow, to name two: #39582, #39951 . And the hardcoded stack size has been once bumped, along with a more explicit stack overflow error, to solve them: #40007. But as long as the stack size is still hardcoded (and not insanely big), it won't be surprising if the overflow happens again for a more complex plan. And once that happened, one wouldn't be able to mitigate this issue without patching the source code (with a bigger stack size) and building the binary from scratch.

I think an env var might be well suitable to tune this stack size at runtime, to help the user quick mitigate the overflow.

But if we get lucky, overflow may still happen - a concrete case: https://github.com/apache/arrow/pull/41335/files#diff-6c3d49ae137b8279afb2610f6d16010618e451b4da2b4279637cf44175b0cd71R3206-R3254

Component(s)

C++

@zanmato1984 zanmato1984 changed the title [C++][Acero] Use env var to tune Acero's temp stack size [C++][Acero] Acero's shared (per-thread) temp vector stack usage may overflow May 8, 2024
pitrou pushed a commit that referenced this issue May 14, 2024
…te overflow (#41335)

### Rationale for this change

The risk of temp vector stack overflow still exists as described in #41334 . Many people have agreed on a per-node basis approach:
> 1) it doesn't introduce more performance penalty than shared stack; 2) it can mitigate the overflow in a natural way, i.e., expanding the stack size linear to the number of nodes; 3) it requires no more complexity to the existing stack implementation.

The full (but long) story is also revealed in the subsequent discussion of this PR. Feel free to scroll down.

### What changes are included in this PR?

1. Change the current shared (per-thread) temp vector stack usage to per-node basis.
2. Make the stack size required by each stack user more explicit.

### Are these changes tested?

UT included.

### Are there any user-facing changes?

None.

* GitHub Issue: #41334

Authored-by: Ruoxi Sun <zanmato1984@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
@pitrou pitrou added this to the 17.0.0 milestone May 14, 2024
@pitrou
Copy link
Member

pitrou commented May 14, 2024

Issue resolved by pull request 41335
#41335

@pitrou pitrou closed this as completed May 14, 2024
pitrou pushed a commit that referenced this issue May 23, 2024
…Init called in hash_join_benchmark (#41716)

### Rationale for this change
My local compilation parameters will include the compilation of some basic benchmarks. I discovered this compilation problem today. It seems that #41334 of `QueryContext::Init` is not synchronized to `hash_join_benchmark.cc`, and CI has not found this problem. .

### What changes are included in this PR?
Remove the first arg .

### Are these changes tested?
Needn't

### Are there any user-facing changes?
No

* GitHub Issue: #41720

Lead-authored-by: ZhangHuiGui <2689496754@qq.com>
Co-authored-by: ZhangHuiGui <hugo.zhang@openpie.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
vibhatha pushed a commit to vibhatha/arrow that referenced this issue May 25, 2024
…mitigate overflow (apache#41335)

### Rationale for this change

The risk of temp vector stack overflow still exists as described in apache#41334 . Many people have agreed on a per-node basis approach:
> 1) it doesn't introduce more performance penalty than shared stack; 2) it can mitigate the overflow in a natural way, i.e., expanding the stack size linear to the number of nodes; 3) it requires no more complexity to the existing stack implementation.

The full (but long) story is also revealed in the subsequent discussion of this PR. Feel free to scroll down.

### What changes are included in this PR?

1. Change the current shared (per-thread) temp vector stack usage to per-node basis.
2. Make the stack size required by each stack user more explicit.

### Are these changes tested?

UT included.

### Are there any user-facing changes?

None.

* GitHub Issue: apache#41334

Authored-by: Ruoxi Sun <zanmato1984@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
vibhatha pushed a commit to vibhatha/arrow that referenced this issue May 25, 2024
…text::Init called in hash_join_benchmark (apache#41716)

### Rationale for this change
My local compilation parameters will include the compilation of some basic benchmarks. I discovered this compilation problem today. It seems that apache#41334 of `QueryContext::Init` is not synchronized to `hash_join_benchmark.cc`, and CI has not found this problem. .

### What changes are included in this PR?
Remove the first arg .

### Are these changes tested?
Needn't

### Are there any user-facing changes?
No

* GitHub Issue: apache#41720

Lead-authored-by: ZhangHuiGui <2689496754@qq.com>
Co-authored-by: ZhangHuiGui <hugo.zhang@openpie.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
JerAguilon pushed a commit to JerAguilon/arrow that referenced this issue May 29, 2024
…mitigate overflow (apache#41335)

### Rationale for this change

The risk of temp vector stack overflow still exists as described in apache#41334 . Many people have agreed on a per-node basis approach:
> 1) it doesn't introduce more performance penalty than shared stack; 2) it can mitigate the overflow in a natural way, i.e., expanding the stack size linear to the number of nodes; 3) it requires no more complexity to the existing stack implementation.

The full (but long) story is also revealed in the subsequent discussion of this PR. Feel free to scroll down.

### What changes are included in this PR?

1. Change the current shared (per-thread) temp vector stack usage to per-node basis.
2. Make the stack size required by each stack user more explicit.

### Are these changes tested?

UT included.

### Are there any user-facing changes?

None.

* GitHub Issue: apache#41334

Authored-by: Ruoxi Sun <zanmato1984@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants