-
Notifications
You must be signed in to change notification settings - Fork 38.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bus error (core dumped) #71233
Comments
/sig node |
I also saw this error with the postgres:10.6 image after an upgrade from 1.9.3 to 1.11.1. |
Seeing this error on postgres:9.4 on ubuntu 18.04 on a kubernetes cluster with vSphere volume provisioning |
I believe I hit the same issue (postgres works through docker run, but not k8s). The issue I hit was that huge pages were enabled, but they were not working through k8s, and Postgres wouldn't fall back properly to not using huge pages. I think there are several possible solutions to the problem:
|
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Looks like the issue here is that the file /sys/fs/cgroup/hugetlb/kubepods/hugetlb.2MB.limit_in_bytes is set to 0 in the worker nodes. This prevents PODs from allocating huge pages. Setting this limit to a higher value in the worker nodes (It would have to be done on every node) solves it.
|
I've investigated the crash. The culprit is that kubelet doesn't update The code to reproduce the crash: #include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>
#define HUGEPAGE_FILE_DIR "/dev/hugepages/test.XXXXXX"
#define HUGEPAGE_FILE_LEN (sizeof(HUGEPAGE_FILE_DIR))
#define LENGTH 2097152
void main() {
void *addr = NULL;
int ret = 0;
int hpg_fd;
char hpg_fname[HUGEPAGE_FILE_LEN];
snprintf(hpg_fname, sizeof(HUGEPAGE_FILE_DIR), "%s", HUGEPAGE_FILE_DIR);
hpg_fd = mkstemp(hpg_fname);
if (hpg_fd < 0) {
printf("Can't create file in /dev/hugepages/\n");
goto end;
}
unlink(hpg_fname);
addr = mmap(NULL,
LENGTH,
PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE | MAP_HUGETLB,
hpg_fd,
0);
if (MAP_FAILED == addr) {
printf("Can't mmap\n");
close(hpg_fd);
goto end;
}
ret = madvise(addr, LENGTH, MADV_DONTFORK);
if (0 != ret) {
munmap(addr, LENGTH);
printf("Can't madvide\n");
close(hpg_fd);
goto end;
}
printf("Test it\n");
int test = *(int *)addr;
printf("Test value is %d\n", test);
end:
printf("End\n");
} Test pod's spec: apiVersion: v1
kind: Pod
metadata:
name: hptest
labels:
app: hptest
spec:
containers:
- name: hptest-container
image: nginx
imagePullPolicy: IfNotPresent
securityContext:
capabilities:
add: ["IPC_LOCK"]
command: ['sh', '-c', 'echo Hello Kubernetes! && sleep 7200']
resources:
limits:
memory: "2G"
cpu: 2
hugepages-2Mi: "200Mi"
requests:
memory: "2G"
cpu: 2
hugepages-2Mi: "200Mi"
volumeMounts:
- name: hugepage
mountPath: /dev/hugepages
- name: home
mountPath: /hugepage_test
volumes:
- name: hugepage
emptyDir:
medium: HugePages
- name: home
hostPath:
path: /home/user/hugepage_test
Also the pre-condition is that hugepages need to be disabled when kubelet starts. A possible workaround is to restart kubelet after every update in HugePages' configuration. |
/assign |
/assign |
@zouyee ping |
/remove-kind support |
/triage accepted Reproducer and diagnosis in #71233 (comment) /help |
@ehashman: GuidelinesPlease ensure that the issue body includes answers to the following questions:
For more details on the requirements of such an issue, please see here and ensure that they are met. If this request no longer meets these requirements, the label can be removed In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
not stale |
the same issue , in k8s: 1.23.17, pgsql: 13.8.0-debian-11-r26 Does this mean that huge_pages for pgsql is currently unavailable in k8s? |
Going to be fixed in runc 1.1.10 (see opencontainers/runc#3859, opencontainers/runc#4077) |
As this was fixed via runc 1.1.10, #121739 closed this bug and the fix is available in 1.29+. /close |
@ehashman: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What happened:
I'm getting
Bus error (core dumped)
on some images after updating from 1.9 to 1.12.Firstly, I thought it was my fault, so I completely removed kuberntes from all nodes and reinstalled it.
But it didn't helped. I thought it was problem with shared memory. I tried to mount some volumes to /dev/shm/, but it didn't help. On the plain docker at the same host everything works fine. Here are some images, I've got issue with:
postgres:9.6.5 - but I guess it's not problem with this version (docker-library/postgres#451)
gitlab:gitlab/gitlab-ce:10.3.3-ce.0 - error, at the same place as in postgres, on initdb.
richarvey/nginx-php-fpm - on some images based on this it works fine, but on some not.
webdevops/php-nginx:alpine-php7 - almost like previous, but this one has auto restart, and (omg) it started on 150'th try.:
wordpress - problem on executing php script:
And heres is part of the content of this file:
What you expected to happen:
I want to get rid of Core dumped error, like it was on kubernets v1.9
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
kubectl version
):Bare metal:
lshw
sudo output:
uname -a
):Linux md1 4.4.0-104-lowlatency Configurable restart behavior #127-Ubuntu SMP PREEMPT Mon Dec 11 13:07:12 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
kubeadm
/kind bug
/sig app
/sig release
The text was updated successfully, but these errors were encountered: