Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bus error (core dumped) #71233

Closed
ZzEeKkAa opened this issue Nov 19, 2018 · 39 comments
Closed

Bus error (core dumped) #71233

ZzEeKkAa opened this issue Nov 19, 2018 · 39 comments
Assignees
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@ZzEeKkAa
Copy link

ZzEeKkAa commented Nov 19, 2018

What happened:

I'm getting Bus error (core dumped) on some images after updating from 1.9 to 1.12.
Firstly, I thought it was my fault, so I completely removed kuberntes from all nodes and reinstalled it.
But it didn't helped. I thought it was problem with shared memory. I tried to mount some volumes to /dev/shm/, but it didn't help. On the plain docker at the same host everything works fine. Here are some images, I've got issue with:
postgres:9.6.5 - but I guess it's not problem with this version (docker-library/postgres#451)

The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
 The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
 Data page checksums are disabled.
 fixing permissions on existing directory /var/lib/postgresql/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 10
selecting default shared_buffers ... 400kB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
Bus error (core dumped)
child process exited with exit code 135
initdb: removing contents of data directory "/var/lib/postgresql/data"
running bootstrap script ...

gitlab:gitlab/gitlab-ce:10.3.3-ce.0 - error, at the same place as in postgres, on initdb.
richarvey/nginx-php-fpm - on some images based on this it works fine, but on some not.
webdevops/php-nginx:alpine-php7 - almost like previous, but this one has auto restart, and (omg) it started on 150'th try.:

.....
2018-11-19 21:44:23,484 INFO exited: php-fpmd (terminated by SIGBUS (core dumped); not expected)
2018-11-19 21:44:24,486 INFO spawned: 'php-fpmd' with pid 348
-> Executing /opt/docker/bin/service.d/php-fpm.d//10-init.sh
2018-11-19 21:44:24,494 INFO success: php-fpmd entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
Setting php-fpm user to application
2018-11-19 21:44:24,683 INFO exited: php-fpmd (terminated by SIGBUS (core dumped); not expected)
2018-11-19 21:44:25,685 INFO spawned: 'php-fpmd' with pid 354
-> Executing /opt/docker/bin/service.d/php-fpm.d//10-init.sh
2018-11-19 21:44:25,695 INFO success: php-fpmd entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
Setting php-fpm user to application
.....

2018-11-19 21:46:28,206 INFO exited: php-fpmd (terminated by SIGBUS (core dumped); not expected)
2018-11-19 21:46:29,209 INFO spawned: 'php-fpmd' with pid 948
-> Executing /opt/docker/bin/service.d/php-fpm.d//10-init.sh
2018-11-19 21:46:29,220 INFO success: php-fpmd entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
Setting php-fpm user to application
2018-11-19 21:46:29,417 INFO exited: php-fpmd (terminated by SIGBUS (core dumped); not expected)
2018-11-19 21:46:30,418 INFO spawned: 'php-fpmd' with pid 953
-> Executing /opt/docker/bin/service.d/php-fpm.d//10-init.sh
2018-11-19 21:46:30,423 INFO success: php-fpmd entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
Setting php-fpm user to application
[19-Nov-2018 21:46:30] NOTICE: fpm is running, pid 953
[19-Nov-2018 21:46:30] NOTICE: ready to handle connections

wordpress - problem on executing php script:

/usr/local/bin/docker-entrypoint.sh: line 242:   181 Bus error               (core dumped) TERM=dumb php --  <<'EOPHP'

And heres is part of the content of this file:

                TERM=dumb php -- <<'EOPHP'
<?php
// database might not exist, so let's try creating it (just to be safe)

$stderr = fopen('php://stderr', 'w');

// https://codex.wordpress.org/Editing_wp-config.php#MySQL_Alternate_Port
//   "hostname:port"
// https://codex.wordpress.org/Editing_wp-config.php#MySQL_Sockets_or_Pipes
//   "hostname:unix-socket-path"
list($host, $socket) = explode(':', getenv('WORDPRESS_DB_HOST'), 2);
$port = 0;
if (is_numeric($socket)) {
        $port = (int) $socket;
        $socket = null;
}
$user = getenv('WORDPRESS_DB_USER');
$pass = getenv('WORDPRESS_DB_PASSWORD');
$dbName = getenv('WORDPRESS_DB_NAME');

$maxTries = 10;
do {
        $mysql = new mysqli($host, $user, $pass, '', $port, $socket);
        if ($mysql->connect_error) {
                fwrite($stderr, "\n" . 'MySQL Connection Error: (' . $mysql->connect_errno . ') ' . $mysql->connect_error . "\n");
                --$maxTries;
                if ($maxTries <= 0) {
                        exit(1);
                }
                sleep(3);
        }
} while ($mysql->connect_error);

if (!$mysql->query('CREATE DATABASE IF NOT EXISTS `' . $mysql->real_escape_string($dbName) . '`')) {
        fwrite($stderr, "\n" . 'MySQL "CREATE DATABASE" Error: ' . $mysql->error . "\n");
        $mysql->close();
        exit(1);
}

$mysql->close();
EOPHP
        fi

        # now that we're definitely done writing configuration, let's clear out the relevant envrionment variables (so that stray "phpinfo()" calls don't leak secrets from our code)
        for e in "${envs[@]}"; do
                unset "$e"
        done
fi

exec "$@"

What you expected to happen:

I want to get rid of Core dumped error, like it was on kubernets v1.9

How to reproduce it (as minimally and precisely as possible):

apiVersion: v1
kind: Namespace
metadata:
  name: test
---
apiVersion: v1
kind: Pod
metadata:
  name: postgresql
  namespace: test
  labels:
    app: postgresql
spec:
  nodeSelector:
    kubernetes.io/hostname: md2
  containers:
  - name: postgres
    image: postgres:9.6.5
    ports:
    - containerPort: 5432
      hostPort: 5432
    volumeMounts:
    - mountPath: /dev/shm
      name: dshm
  volumes:
  - name: dshm
#    hostPath:
#      path: /dev/shm
    emptyDir:
      medium: Medium

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.2", GitCommit:"17c77c7898218073f14c8d573582e8d2313dc740", GitTreeState:"clean", BuildDate:"2018-10-24T06:54:59Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.2", GitCommit:"17c77c7898218073f14c8d573582e8d2313dc740", GitTreeState:"clean", BuildDate:"2018-10-24T06:43:59Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration:
    Bare metal:
lshw

sudo output:

    description: Rack Mount Chassis
    product: ProLiant DL20 Gen9 (823556-B21)
    vendor: HP
    serial: CZ274504G1
    width: 64 bits
    capabilities: smbios-2.8 dmi-2.8 vsyscall32
    configuration: boot=normal chassis=rackmount family=ProLiant sku=823556-B21 uuid=38323335-3536-435A-3237-343530344731
  *-core
       description: Motherboard
       product: ProLiant DL20 Gen9
       vendor: HP
       physical id: 0
       serial: CZ274504G1
     *-cache:0
          description: L1 cache
          physical id: 0
          slot: L1-Cache
          size: 256KiB
          capacity: 256KiB
          capabilities: synchronous internal write-back unified
          configuration: level=1
     *-cache:1
          description: L2 cache
          physical id: 1
          slot: L2-Cache
          size: 1MiB
          capacity: 1MiB
          capabilities: synchronous internal varies unified
          configuration: level=2
     *-cache:2
          description: L3 cache
          physical id: 2
          slot: L3-Cache
          size: 8MiB
          capacity: 8MiB
          capabilities: synchronous internal varies unified
          configuration: level=3
     *-cpu
          description: CPU
          product: Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
          vendor: Intel Corp.
          physical id: 3
          bus info: cpu@0
          version: Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
          serial: To Be Filled By O.E.M.
          slot: Proc 1
          size: 940MHz
          capacity: 3900MHz
          width: 64 bits
          clock: 100MHz
          capabilities: x86-64 fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp cpufreq
          configuration: cores=4 enabledcores=4 threads=8
     *-firmware
          description: BIOS
          vendor: HP
          physical id: 4
          version: U22
          date: 10/02/2017
          size: 64KiB
          capacity: 15MiB
          capabilities: pci pnp upgrade shadowing escd cdboot bootselect edd int13floppy360 int13floppy1200 int13floppy720 int5printscreen int9keyboard int14serial int17printer int10video acpi usb biosbootspecification netboot uefi
     *-memory
          description: System Memory
          physical id: 6
          slot: System board or motherboard
          size: 64GiB
        *-bank:0
             description: DIMM Synchronous 2133 MHz (0.5 ns)
             product: NOT AVAILABLE
             vendor: UNKNOWN
             physical id: 0
             slot: PROC 1 DIMM 1
             size: 16GiB
             width: 64 bits
             clock: 2133MHz (0.5ns)
        *-bank:1
             description: DIMM Synchronous 2133 MHz (0.5 ns)
             product: NOT AVAILABLE
             vendor: UNKNOWN
             physical id: 1
             slot: PROC 1 DIMM 2
             size: 16GiB
             width: 64 bits
             clock: 2133MHz (0.5ns)
        *-bank:2
             description: DIMM Synchronous 2133 MHz (0.5 ns)
             product: NOT AVAILABLE
             vendor: UNKNOWN
             physical id: 2
             slot: PROC 1 DIMM 3
             size: 16GiB
             width: 64 bits
             clock: 2133MHz (0.5ns)
        *-bank:3
             description: DIMM Synchronous 2133 MHz (0.5 ns)
             product: NOT AVAILABLE
             vendor: UNKNOWN
             physical id: 3
             slot: PROC 1 DIMM 4
             size: 16GiB
             width: 64 bits
             clock: 2133MHz (0.5ns)
     *-pci
          description: Host bridge
          product: Intel Corporation
          vendor: Intel Corporation
          physical id: 100
          bus info: pci@0000:00:00.0
          version: 05
          width: 32 bits
          clock: 33MHz
        *-usb
             description: USB controller
             product: Sunrise Point-H USB 3.0 xHCI Controller
             vendor: Intel Corporation
             physical id: 14
             bus info: pci@0000:00:14.0
             version: 31
             width: 64 bits
             clock: 33MHz
             capabilities: pm msi xhci bus_master cap_list
             configuration: driver=xhci_hcd latency=0
             resources: iomemory:2f0-2ef irq:27 memory:2ffff00000-2ffff0ffff
           *-usbhost:0
                product: xHCI Host Controller
                vendor: Linux 4.4.0-104-lowlatency xhci-hcd
                physical id: 0
                bus info: usb@3
                logical name: usb3
                version: 4.04
                capabilities: usb-3.00
                configuration: driver=hub slots=6 speed=5000Mbit/s
           *-usbhost:1
                product: xHCI Host Controller
                vendor: Linux 4.4.0-104-lowlatency xhci-hcd
                physical id: 1
                bus info: usb@2
                logical name: usb2
                version: 4.04
                capabilities: usb-2.00
                configuration: driver=hub slots=12 speed=480Mbit/s
              *-usb
                   description: USB hub
                   product: Hub
                   vendor: Standard Microsystems Corp.
                   physical id: 3
                   bus info: usb@2:3
                   version: 8.01
                   capabilities: usb-2.00
                   configuration: driver=hub maxpower=2mA slots=2 speed=480Mbit/s
        *-communication UNCLAIMED
             description: Communication controller
             product: Sunrise Point-H CSME HECI #1
             vendor: Intel Corporation
             physical id: 16
             bus info: pci@0000:00:16.0
             version: 31
             width: 64 bits
             clock: 33MHz
             capabilities: pm msi bus_master cap_list
             configuration: latency=0
             resources: iomemory:2f0-2ef memory:2ffff11000-2ffff11fff
        *-storage
             description: SATA controller
             product: Sunrise Point-H SATA controller [AHCI mode]
             vendor: Intel Corporation
             physical id: 17
             bus info: pci@0000:00:17.0
             version: 31
             width: 32 bits
             clock: 66MHz
             capabilities: storage msi pm ahci_1.0 bus_master cap_list
             configuration: driver=ahci latency=0
             resources: irq:28 memory:92c80000-92c87fff memory:92c8c000-92c8c0ff ioport:2040(size=8) ioport:2048(size=4) ioport:2020(size=32) memory:92c00000-92c7ffff
        *-pci:0
             description: PCI bridge
             product: Sunrise Point-H PCI Express Root Port #9
             vendor: Intel Corporation
             physical id: 1d
             bus info: pci@0000:00:1d.0
             version: f1
             width: 32 bits
             clock: 33MHz
             capabilities: pci pciexpress msi pm normal_decode bus_master cap_list
             configuration: driver=pcieport
             resources: irq:25 ioport:1000(size=4096) memory:90000000-92afffff
           *-generic:0 UNCLAIMED
                description: System peripheral
                product: Integrated Lights-Out Standard Slave Instrumentation & System Support
                vendor: Hewlett-Packard Company
                physical id: 0
                bus info: pci@0000:01:00.0
                version: 06
                width: 32 bits
                clock: 33MHz
                capabilities: pm msi pciexpress bus_master cap_list
                configuration: latency=0
                resources: ioport:1200(size=256) memory:92a8d000-92a8d1ff ioport:1100(size=256)
           *-display UNCLAIMED
                description: VGA compatible controller
                product: MGA G200EH
                vendor: Matrox Electronics Systems Ltd.
                physical id: 0.1
                bus info: pci@0000:01:00.1
                version: 01
                width: 32 bits
                clock: 33MHz
                capabilities: pm msi pciexpress vga_controller bus_master cap_list
                configuration: latency=0
                resources: memory:91000000-91ffffff memory:92a88000-92a8bfff memory:92000000-927fffff
           *-generic:1
                description: System peripheral
                product: Integrated Lights-Out Standard Management Processor Support and Messaging
                vendor: Hewlett-Packard Company
                physical id: 0.2
                bus info: pci@0000:01:00.2
                version: 06
                width: 32 bits
                clock: 33MHz
                capabilities: pm msi pciexpress bus_master cap_list
                configuration: driver=hpilo latency=0
                resources: irq:17 ioport:1000(size=256) memory:92a8c000-92a8c0ff memory:92900000-929fffff memory:92a00000-92a7ffff memory:92a80000-92a87fff memory:92800000-928fffff
           *-usb
                description: USB controller
                product: Integrated Lights-Out Standard Virtual USB Controller
                vendor: Hewlett-Packard Company
                physical id: 0.4
                bus info: pci@0000:01:00.4
                version: 03
                width: 32 bits
                clock: 33MHz
                capabilities: msi pciexpress pm uhci bus_master cap_list
                configuration: driver=uhci_hcd latency=0
                resources: irq:17 ioport:1300(size=32)
              *-usbhost
                   product: UHCI Host Controller
                   vendor: Linux 4.4.0-104-lowlatency uhci_hcd
                   physical id: 1
                   bus info: usb@1
                   logical name: usb1
                   version: 4.04
                   capabilities: usb-1.10
                   configuration: driver=hub slots=2 speed=12Mbit/s
        *-pci:1
             description: PCI bridge
             product: Sunrise Point-H PCI Express Root Port #11
             vendor: Intel Corporation
             physical id: 1d.2
             bus info: pci@0000:00:1d.2
             version: f1
             width: 32 bits
             clock: 33MHz
             capabilities: pci pciexpress msi pm normal_decode bus_master cap_list
             configuration: driver=pcieport
             resources: irq:26 memory:fe800000-fe8fffff ioport:92b00000(size=1048576)
           *-network:0
                description: Ethernet interface
                product: NetXtreme BCM5720 Gigabit Ethernet PCIe
                vendor: Broadcom Corporation
                physical id: 0
                bus info: pci@0000:02:00.0
                logical name: eno1
                version: 00
                serial: ec:eb:b8:5d:5a:e8
                size: 1Gbit/s
                capacity: 1Gbit/s
                width: 64 bits
                clock: 33MHz
                capabilities: pm vpd msi msix pciexpress bus_master cap_list rom ethernet physical tp 10bt 10bt-fd 100bt 100bt-fd 1000bt 1000bt-fd autonegotiation
                configuration: autonegotiation=on broadcast=yes driver=tg3 driverversion=3.137 duplex=full firmware=5720-v1.39 NCSI v1.4.16.0 ip=89.184.66.47 latency=0 link=yes multicast=yes port=twisted pair speed=1Gbit/s
                resources: irq:18 memory:92b30000-92b3ffff memory:92b40000-92b4ffff memory:92b50000-92b5ffff memory:fe800000-fe83ffff
           *-network:1 DISABLED
                description: Ethernet interface
                product: NetXtreme BCM5720 Gigabit Ethernet PCIe
                vendor: Broadcom Corporation
                physical id: 0.1
                bus info: pci@0000:02:00.1
                logical name: eno2
                version: 00
                serial: ec:eb:b8:5d:5a:e9
                capacity: 1Gbit/s
                width: 64 bits
                clock: 33MHz
                capabilities: pm vpd msi msix pciexpress bus_master cap_list rom ethernet physical tp 10bt 10bt-fd 100bt 100bt-fd 1000bt 1000bt-fd autonegotiation
                configuration: autonegotiation=on broadcast=yes driver=tg3 driverversion=3.137 firmware=5720-v1.39 NCSI v1.4.16.0 latency=0 link=no multicast=yes port=twisted pair
                resources: irq:19 memory:92b00000-92b0ffff memory:92b10000-92b1ffff memory:92b20000-92b2ffff memory:fe840000-fe87ffff
        *-isa
             description: ISA bridge
             product: Sunrise Point-H LPC Controller
             vendor: Intel Corporation
             physical id: 1f
             bus info: pci@0000:00:1f.0
             version: 31
             width: 32 bits
             clock: 33MHz
             capabilities: isa bus_master
             configuration: latency=0
        *-memory UNCLAIMED
             description: Memory controller
             product: Sunrise Point-H PMC
             vendor: Intel Corporation
             physical id: 1f.2
             bus info: pci@0000:00:1f.2
             version: 31
             width: 32 bits
             clock: 33MHz (30.3ns)
             capabilities: bus_master
             configuration: latency=0
             resources: memory:92c88000-92c8bfff
        *-serial
             description: SMBus
             product: Sunrise Point-H SMBus
             vendor: Intel Corporation
             physical id: 1f.4
             bus info: pci@0000:00:1f.4
             version: 31
             width: 64 bits
             clock: 33MHz
             configuration: driver=i801_smbus latency=0
             resources: iomemory:2f0-2ef irq:16 memory:2ffff10000-2ffff100ff ioport:efa0(size=32)
     *-scsi:0
          physical id: 5
          logical name: scsi0
          capabilities: emulated
        *-disk
             description: ATA Disk
             product: ST1000DM010-2EP1
             vendor: Seagate
             physical id: 0.0.0
             bus info: scsi@0:0.0.0
             logical name: /dev/sda
             version: CC43
             serial: Z9A8H9QD
             size: 931GiB (1TB)
             capabilities: partitioned partitioned:dos
             configuration: ansiversion=5 logicalsectorsize=512 sectorsize=4096 signature=1bde66a3
           *-volume:0
                description: EXT4 volume
                vendor: Linux
                physical id: 1
                bus info: scsi@0:0.0.0,1
                logical name: /dev/sda1
                version: 1.0
                serial: c519e927-efc2-457b-a2b3-e9936253909d
                size: 237MiB
                capacity: 237MiB
                capabilities: primary bootable multi journaled extended_attributes large_files huge_files dir_nlink extents ext4 ext2 initialized
                configuration: created=2017-12-29 18:59:23 filesystem=ext4 modified=2017-12-29 18:59:23 state=clean
           *-volume:1
                description: Linux swap volume
                physical id: 2
                bus info: scsi@0:0.0.0,2
                logical name: /dev/sda2
                version: 1
                serial: d3552ee3-1cf8-4af9-ab61-9d391485a57f
                size: 119GiB
                capacity: 119GiB
                capabilities: primary multi swap initialized
                configuration: filesystem=swap pagesize=4096
           *-volume:2
                description: EXT4 volume
                vendor: Linux
                physical id: 3
                bus info: scsi@0:0.0.0,3
                logical name: /dev/sda3
                version: 1.0
                serial: d34dfb91-afe5-4074-b5ce-56fd14cf7830
                size: 372GiB
                capacity: 372GiB
                capabilities: primary multi journaled extended_attributes large_files huge_files dir_nlink extents ext4 ext2 initialized
                configuration: created=2017-12-29 19:00:47 filesystem=ext4 modified=2017-12-29 19:00:47 state=clean
           *-volume:3
                description: EXT4 volume
                vendor: Linux
                physical id: 4
                bus info: scsi@0:0.0.0,4
                logical name: /dev/sda4
                version: 1.0
                serial: 0d7301bc-6882-4770-8a97-6065e504e193
                size: 439GiB
                capacity: 439GiB
                capabilities: primary multi journaled extended_attributes large_files huge_files dir_nlink extents ext4 ext2 initialized
                configuration: created=2017-12-29 19:00:49 filesystem=ext4 modified=2017-12-29 19:00:49 state=clean
     *-scsi:1
          physical id: 7
          logical name: scsi1
          capabilities: emulated
        *-disk
             description: ATA Disk
             product: ST1000DM010-2EP1
             vendor: Seagate
             physical id: 0.0.0
             bus info: scsi@1:0.0.0
             logical name: /dev/sdb
             version: CC43
             serial: Z9A8D6CD
             size: 931GiB (1TB)
             capabilities: partitioned partitioned:dos
             configuration: ansiversion=5 logicalsectorsize=512 sectorsize=4096 signature=e01a79e7
           *-volume:0
                description: EXT4 volume
                vendor: Linux
                physical id: 1
                bus info: scsi@1:0.0.0,1
                logical name: /dev/sdb1
                version: 1.0
                serial: 2af99ccb-f00f-4c0f-8eaa-ede0c4e2f769
                size: 237MiB
                capacity: 237MiB
                capabilities: primary bootable multi journaled extended_attributes large_files huge_files dir_nlink extents ext4 ext2 initialized
                configuration: created=2017-12-29 18:59:23 filesystem=ext4 modified=2017-12-29 18:59:23 state=clean
           *-volume:1
                description: Linux swap volume
                physical id: 2
                bus info: scsi@1:0.0.0,2
                logical name: /dev/sdb2
                version: 1
                serial: 7e221fb2-312c-4af3-b7c5-5c1e26c1d3ad
                size: 119GiB
                capacity: 119GiB
                capabilities: primary multi swap initialized
                configuration: filesystem=swap pagesize=4096
           *-volume:2
                description: EXT4 volume
                vendor: Linux
                physical id: 3
                bus info: scsi@1:0.0.0,3
                logical name: /dev/sdb3
                version: 1.0
                serial: bf7c2da4-0621-46b4-9ffc-d93a446ff606
                size: 372GiB
                capacity: 372GiB
                capabilities: primary multi journaled extended_attributes large_files huge_files dir_nlink extents ext4 ext2 initialized
                configuration: created=2017-12-29 19:02:59 filesystem=ext4 modified=2017-12-29 19:02:59 state=clean
           *-volume:3
                description: EXT4 volume
                vendor: Linux
                physical id: 4
                bus info: scsi@1:0.0.0,4
                logical name: /dev/sdb4
                version: 1.0
                serial: ca144958-7752-48b4-b33a-05fac0c3dd68
                size: 439GiB
                capacity: 439GiB
                capabilities: primary multi journaled extended_attributes large_files huge_files dir_nlink extents ext4 ext2 initialized
                configuration: created=2017-12-29 19:03:01 filesystem=ext4 modified=2017-12-29 19:03:01 state=clean
     *-scsi:2
          physical id: 8
          logical name: scsi4
          capabilities: emulated
        *-disk
             description: ATA Disk
             product: Samsung SSD 850
             physical id: 0.0.0
             bus info: scsi@4:0.0.0
             logical name: /dev/sdc
             version: 4B6Q
             serial: S39KNX0J745113J
             size: 238GiB (256GB)
             capabilities: partitioned partitioned:dos
             configuration: ansiversion=5 logicalsectorsize=512 sectorsize=512 signature=2dfc3098
           *-volume
                description: EXT4 volume
                vendor: Linux
                physical id: 1
                bus info: scsi@4:0.0.0,1
                logical name: /dev/sdc1
                logical name: /db/ssd
                version: 1.0
                serial: 85e87f84-1b0b-42f9-8782-f8d09df568c2
                size: 238GiB
                capacity: 238GiB
                capabilities: primary journaled extended_attributes large_files huge_files dir_nlink recover extents ext4 ext2 initialized
                configuration: created=2017-12-29 18:59:33 filesystem=ext4 lastmountpoint=/db/ssd modified=2018-11-17 02:23:09 mount.fstype=ext4 mount.options=rw,relatime,data=ordered mounted=2018-11-17 02:23:09 state=mounted
  *-power UNCLAIMED
       description: Power Supply 1
       vendor: HP
       physical id: 1
       capacity: 32768mWh
  *-network
       description: Ethernet interface
       physical id: 2
       logical name: flannel.1
       serial: f6:18:0a:c3:f2:77
       capabilities: ethernet physical
       configuration: broadcast=yes driver=vxlan driverversion=0.1 ip=10.244.0.0 link=yes multicast=yes

  • OS (e.g. from /etc/os-release):
NAME="Ubuntu"
VERSION="16.04.5 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.5 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
  • Kernel (e.g. uname -a):
    Linux md1 4.4.0-104-lowlatency Configurable restart behavior #127-Ubuntu SMP PREEMPT Mon Dec 11 13:07:12 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools:
    kubeadm
  • Others:

/kind bug
/sig app
/sig release

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. sig/release Categorizes an issue or PR as relevant to SIG Release. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 19, 2018
@liggitt
Copy link
Member

liggitt commented Nov 23, 2018

/sig node

@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Nov 23, 2018
@liggitt liggitt added kind/support Categorizes issue or PR as a support question. and removed sig/release Categorizes an issue or PR as relevant to SIG Release. labels Nov 23, 2018
@areed
Copy link
Contributor

areed commented Nov 28, 2018

I also saw this error with the postgres:10.6 image after an upgrade from 1.9.3 to 1.11.1.

@lwouis
Copy link

lwouis commented Dec 10, 2018

Seeing this error on postgres:9.4 on ubuntu 18.04 on a kubernetes cluster with vSphere volume provisioning

@nbartos
Copy link

nbartos commented Dec 14, 2018

I believe I hit the same issue (postgres works through docker run, but not k8s). The issue I hit was that huge pages were enabled, but they were not working through k8s, and Postgres wouldn't fall back properly to not using huge pages. I think there are several possible solutions to the problem:

  1. Modify the docker image to be able to set huge_pages = off in /usr/share/postgresql/postgresql.conf.sample before initdb was ran (this is what I did).
  2. Turn off huge page support on the system (vm.nr_hugepages = 0 in /etc/sysctl.conf).
  3. Fix Postgres's fallback mechanism when huge_pages = try is set (the default).
  4. Modify the k8s manifest to enable huge page support (https://kubernetes.io/docs/tasks/manage-hugepages/scheduling-hugepages/).
  5. Modify k8s to show that huge pages are not supported on the system, when they are not enabled for a specific container.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 14, 2019
@ZzEeKkAa
Copy link
Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 25, 2019
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 23, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 23, 2019
@ErMaVi
Copy link

ErMaVi commented Jul 29, 2019

Looks like the issue here is that the file /sys/fs/cgroup/hugetlb/kubepods/hugetlb.2MB.limit_in_bytes is set to 0 in the worker nodes. This prevents PODs from allocating huge pages. Setting this limit to a higher value in the worker nodes (It would have to be done on every node) solves it.

echo 9223372036854771712 | sudo tee hugetlb.2MB.limit_in_bytes

@rojkov
Copy link

rojkov commented Aug 16, 2019

I've investigated the crash. The culprit is that kubelet doesn't update /sys/fs/cgroup/hugetlb/kubepods/hugetlb.2MB.limit_in_bytes upon Node Status Update which happens every 5 minutes by default. Yet it updates the node's resources correctly after enabling hugepages on the host. This creates the possibility to schedule a workload using hugepages on a node with misconfigured limits in the root cgroup. The solution could be adding a call to Cgroup manager's Update() to the Node Status Update procedure.

The code to reproduce the crash:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>

#define HUGEPAGE_FILE_DIR "/dev/hugepages/test.XXXXXX"
#define HUGEPAGE_FILE_LEN (sizeof(HUGEPAGE_FILE_DIR))
#define LENGTH 2097152

void main() {
	void *addr = NULL;
	int ret = 0;
	int hpg_fd;
	char hpg_fname[HUGEPAGE_FILE_LEN];

	snprintf(hpg_fname, sizeof(HUGEPAGE_FILE_DIR), "%s", HUGEPAGE_FILE_DIR);
	hpg_fd = mkstemp(hpg_fname);
	if (hpg_fd < 0) {
		printf("Can't create file in /dev/hugepages/\n");
		goto end;
	}
	unlink(hpg_fname);
	addr = mmap(NULL,
			LENGTH,
			PROT_READ | PROT_WRITE,
			MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE | MAP_HUGETLB,
			hpg_fd,
			0);
	if (MAP_FAILED == addr) {
		printf("Can't mmap\n");
		close(hpg_fd);
		goto end;
	}

	ret = madvise(addr, LENGTH, MADV_DONTFORK);
	if (0 != ret) {
		munmap(addr, LENGTH);
		printf("Can't madvide\n");
		close(hpg_fd);
		goto end;
	}

	printf("Test it\n");
	int test = *(int *)addr;
	printf("Test value is %d\n", test);

end:
	printf("End\n");
}

Test pod's spec:

apiVersion: v1
kind: Pod
metadata:
  name: hptest
  labels:
    app: hptest
spec:
  containers:
  - name: hptest-container
    image: nginx
    imagePullPolicy: IfNotPresent
    securityContext:
      capabilities:
        add: ["IPC_LOCK"]
    command: ['sh', '-c', 'echo Hello Kubernetes! && sleep 7200']
    resources:
      limits:
        memory: "2G"
        cpu: 2
        hugepages-2Mi: "200Mi"
      requests:
        memory: "2G"
        cpu: 2
        hugepages-2Mi: "200Mi"
    volumeMounts:
      - name: hugepage
        mountPath: /dev/hugepages
      - name: home
        mountPath: /hugepage_test
  volumes:
    - name: hugepage
      emptyDir:
        medium: HugePages
    - name: home
      hostPath:
        path: /home/user/hugepage_test
$ kubectl exec -ti hptest /hugepage_test/a.out

Also the pre-condition is that hugepages need to be disabled when kubelet starts.

A possible workaround is to restart kubelet after every update in HugePages' configuration.

@zouyee
Copy link
Member

zouyee commented Aug 17, 2019

/assign

@zouyee zouyee removed their assignment Aug 20, 2019
@rojkov
Copy link

rojkov commented Aug 21, 2019

/assign

@rojkov
Copy link

rojkov commented Sep 18, 2019

@zouyee ping

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Apr 18, 2022
@ehashman
Copy link
Member

/remove-kind support

@k8s-ci-robot k8s-ci-robot removed the kind/support Categorizes issue or PR as a support question. label Apr 18, 2022
@ehashman ehashman added this to Triage in SIG Node Bugs Apr 18, 2022
@ehashman
Copy link
Member

/triage accepted

Reproducer and diagnosis in #71233 (comment)

/help

@k8s-ci-robot
Copy link
Contributor

@ehashman:
This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

  • Why are we solving this issue?
  • To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
  • Does this issue have zero to low barrier of entry?
  • How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/triage accepted

Reproducer and diagnosis in #71233 (comment)

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 20, 2022
@ehashman ehashman moved this from Triage to Triaged in SIG Node Bugs Apr 20, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 19, 2022
@mythi
Copy link
Contributor

mythi commented Jul 20, 2022

/remove-lifecycle stale

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 18, 2022
@bstrdsmkr
Copy link

not stale

@kwenzh
Copy link

kwenzh commented Aug 28, 2023

the same issue , in k8s: 1.23.17, pgsql: 13.8.0-debian-11-r26

Does this mean that huge_pages for pgsql is currently unavailable in k8s?

@kolyshkin
Copy link
Contributor

Going to be fixed in runc 1.1.10 (see opencontainers/runc#3859, opencontainers/runc#4077)

@ehashman
Copy link
Member

As this was fixed via runc 1.1.10, #121739 closed this bug and the fix is available in 1.29+.

/close

@k8s-ci-robot
Copy link
Contributor

@ehashman: Closing this issue.

In response to this:

As this was fixed via runc 1.1.10, #121739 closed this bug and the fix is available in 1.29+.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

SIG Node Bugs automation moved this from Triaged to Done Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

Successfully merging a pull request may close this issue.