Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] spark-operator:v1beta2-1.4.3-3.5.0 crashes on start #1987

Closed
1 task done
Aransh opened this issue Apr 17, 2024 · 7 comments · Fixed by #2039
Closed
1 task done

[BUG] spark-operator:v1beta2-1.4.3-3.5.0 crashes on start #1987

Aransh opened this issue Apr 17, 2024 · 7 comments · Fixed by #2039
Labels
kind/bug Something isn't working

Comments

@Aransh
Copy link
Contributor

Aransh commented Apr 17, 2024

  • ✋ I have searched the open/closed issues and my issue is not listed.

Reproduction Code [Required]

Steps to reproduce the behavior:
Using the newest version of the chart (1.2.7) and the image (spark-operator:v1beta2-1.4.3-3.5.0) results in an instant crash of the operator pods.
If I use the same exact configuration, but with image version v1beta2-1.4.2-3.5.0, I get no crash.
Values yaml:

replicaCount: 3
affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - topologyKey: kubernetes.io/hostname
      labelSelector:
        matchLabels:
          app.kubernetes.io/name: spark-operator
serviceAccounts:
  spark:
    name: "spark-apps"
  sparkoperator:
    name: "operator-spark"
sparkJobNamespaces: ["spark-apps"]
webhook:
  enable: true
podMonitor:
  enable: true

Expected behavior

No to crash

Actual behavior

Crashes

Terminal Output Screenshot(s)

Full log:

++ id -u

  • myuid=0
    ++ id -g
  • mygid=0
  • set +e
    ++ getent passwd 0
  • uidentry=root:x:0:0:root:/root:/bin/bash
  • set -e
  • echo 0
    0
  • echo 0
    0
  • echo root:x:0:0:root:/root:/bin/bash
    root:x:0:0:root:/root:/bin/bash
  • [[ -z root:x:0:0:root:/root:/bin/bash ]]
  • exec /usr/bin/tini -s -- /usr/bin/spark-operator -v=2 -logtostderr -namespace=spark-apps -enable-ui-service=true -ingress-url-format= -controller-threads=10 -resync-interval=30 -enable-batch-scheduler=false -label-selector-filter= -enable-metrics=true -metrics-labels=app_type -metrics-port=10254 -metrics-endpoint=/metrics -metrics-prefix= -enable-webhook=true -webhook-svc-namespace=operator-spark -webhook-port=8080 -webhook-timeout=30 -webhook-svc-name=spark-operator-devops-playground-webhook -webhook-config-name=spark-operator-devops-playground-webhook-config -webhook-namespace-selector= -enable-resource-quota-enforcement=false -leader-election=true -leader-election-lock-namespace=operator-spark -leader-election-lock-name=spark-operator-lock
    F0417 10:59:07.333005 10 main.go:146] Lock identity is empty

goroutine 1 [running]:
github.com/golang/glog.Fatal(...)
/go/pkg/mod/github.com/golang/glog@v1.2.1/glog.go:664
main.main()
/workspace/main.go:146 +0x1418

SIGABRT: abort
PC=0x40708e m=5 sigcode=18446744073709551610

goroutine 1 gp=0xc0000061c0 m=5 mp=0xc00007f808 [running, locked to thread]:
runtime/internal/syscall.Syscall6()
/usr/local/go/src/runtime/internal/syscall/asm_linux_amd64.s:36 +0xe fp=0xc0000c9a88 sp=0xc0000c9a80 pc=0x40708e
syscall.RawSyscall6(0xc00033a088?, 0xc000124270?, 0xc0005a2260?, 0x2be4440?, 0x548220?, 0x2be44d8?, 0xc0000c9af0?)
/usr/local/go/src/runtime/internal/syscall/syscall_linux.go:38 +0xd fp=0xc0000c9ad0 sp=0xc0000c9a88 pc=0x40706d
syscall.RawSyscall(0x2be44d8?, 0x0?, 0xc0000c9b70?, 0xc0000c9b50?)
/usr/local/go/src/syscall/syscall_linux.go:62 +0x15 fp=0xc0000c9b18 sp=0xc0000c9ad0 pc=0x48a8f5
syscall.Tgkill(0xba?, 0x0?, 0x0?)
/usr/local/go/src/syscall/zsyscall_linux_amd64.go:894 +0x25 fp=0xc0000c9b48 sp=0xc0000c9b18 pc=0x488aa5
github.com/golang/glog.abortProcess()
/go/pkg/mod/github.com/golang/glog@v1.2.1/glog_file_linux.go:35 +0x87 fp=0xc0000c9b90 sp=0xc0000c9b48 pc=0x548387
github.com/golang/glog.ctxfatalf({0x0?, 0x0?}, 0xc0004f1170?, {0x1b8e1cb?, 0x411d65?}, {0xc0004f1170?, 0x185ba80?, 0xc000596601?})
/go/pkg/mod/github.com/golang/glog@v1.2.1/glog.go:647 +0x6a fp=0xc0000c9bf8 sp=0xc0000c9b90 pc=0x54606a
github.com/golang/glog.fatalf(...)
/go/pkg/mod/github.com/golang/glog@v1.2.1/glog.go:657
github.com/golang/glog.FatalDepth(0x1, {0xc0004f1170, 0x1, 0x1})
/go/pkg/mod/github.com/golang/glog@v1.2.1/glog.go:670 +0x57 fp=0xc0000c9c48 sp=0xc0000c9bf8 pc=0x5461f7
github.com/golang/glog.Fatal(...)
/go/pkg/mod/github.com/golang/glog@v1.2.1/glog.go:664
main.main()
/workspace/main.go:146 +0x1418 fp=0xc0000c9f50 sp=0xc0000c9c48 pc=0x172efb8
runtime.main()
/usr/local/go/src/runtime/proc.go:271 +0x29d fp=0xc0000c9fe0 sp=0xc0000c9f50 pc=0x4404fd
runtime.goexit({})
/usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000c9fe8 sp=0xc0000c9fe0 pc=0x473721

goroutine 2 gp=0xc000006700 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
/usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000084fa8 sp=0xc000084f88 pc=0x44094e
runtime.goparkunlock(...)
/usr/local/go/src/runtime/proc.go:408
runtime.forcegchelper()
/usr/local/go/src/runtime/proc.go:326 +0xb3 fp=0xc000084fe0 sp=0xc000084fa8 pc=0x4407b3
runtime.goexit({})
/usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000084fe8 sp=0xc000084fe0 pc=0x473721
created by runtime.init.6 in goroutine 1
/usr/local/go/src/runtime/proc.go:314 +0x1a

goroutine 3 gp=0xc000006c40 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
/usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000085780 sp=0xc000085760 pc=0x44094e
runtime.goparkunlock(...)
/usr/local/go/src/runtime/proc.go:408
runtime.bgsweep(0xc000058070)
/usr/local/go/src/runtime/mgcsweep.go:318 +0xdf fp=0xc0000857c8 sp=0xc000085780 pc=0x42a2bf
runtime.gcenable.gowrap1()
/usr/local/go/src/runtime/mgc.go:203 +0x25 fp=0xc0000857e0 sp=0xc0000857c8 pc=0x41ebc5
runtime.goexit({})
/usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000857e8 sp=0xc0000857e0 pc=0x473721
created by runtime.gcenable in goroutine 1
/usr/local/go/src/runtime/mgc.go:203 +0x66

goroutine 4 gp=0xc000006e00 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0x1e0abb8?, 0x0?, 0x0?, 0x0?)
/usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000085f78 sp=0xc000085f58 pc=0x44094e
runtime.goparkunlock(...)
/usr/local/go/src/runtime/proc.go:408
runtime.(*scavengerState).park(0x2be48a0)
/usr/local/go/src/runtime/mgcscavenge.go:425 +0x49 fp=0xc000085fa8 sp=0xc000085f78 pc=0x427c69
runtime.bgscavenge(0xc000058070)
/usr/local/go/src/runtime/mgcscavenge.go:658 +0x59 fp=0xc000085fc8 sp=0xc000085fa8 pc=0x428219
runtime.gcenable.gowrap2()
/usr/local/go/src/runtime/mgc.go:204 +0x25 fp=0xc000085fe0 sp=0xc000085fc8 pc=0x41eb65
runtime.goexit({})
/usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000085fe8 sp=0xc000085fe0 pc=0x473721
created by runtime.gcenable in goroutine 1
/usr/local/go/src/runtime/mgc.go:204 +0xa5

goroutine 17 gp=0xc000102380 m=nil [finalizer wait]:
runtime.gopark(0xc000084660?, 0x42713c?, 0x80?, 0x7f?, 0x550011?)
/usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000084620 sp=0xc000084600 pc=0x44094e
runtime.runfinq()
/usr/local/go/src/runtime/mfinal.go:194 +0x107 fp=0xc0000847e0 sp=0xc000084620 pc=0x41dc07
runtime.goexit({})
/usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000847e8 sp=0xc0000847e0 pc=0x473721
created by runtime.createfing in goroutine 1
/usr/local/go/src/runtime/mfinal.go:164 +0x3d

goroutine 18 gp=0xc000103880 m=nil [select]:
runtime.gopark(0xc000080780?, 0x2?, 0x40?, 0x6?, 0xc000080774?)
/usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000080618 sp=0xc0000805f8 pc=0x44094e
runtime.selectgo(0xc000080780, 0xc000080770, 0x0?, 0x0, 0x0?, 0x1)
/usr/local/go/src/runtime/select.go:327 +0x725 fp=0xc000080738 sp=0xc000080618 pc=0x451e65
github.com/golang/glog.(*fileSink).flushDaemon(0x2be44d8)
/go/pkg/mod/github.com/golang/glog@v1.2.1/glog_file.go:351 +0xb9 fp=0xc0000807c8 sp=0xc000080738 pc=0x547df9
github.com/golang/glog.init.1.gowrap1()
/go/pkg/mod/github.com/golang/glog@v1.2.1/glog_file.go:166 +0x25 fp=0xc0000807e0 sp=0xc0000807c8 pc=0x546e85
runtime.goexit({})
/usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000807e8 sp=0xc0000807e0 pc=0x473721
created by github.com/golang/glog.init.1 in goroutine 1
/go/pkg/mod/github.com/golang/glog@v1.2.1/glog_file.go:166 +0x126

goroutine 34 gp=0xc000268c40 m=nil [GC worker (idle)]:
runtime.gopark(0xc000080fa8?, 0x40a20b?, 0x17?, 0x96?, 0x1?)
/usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000080f50 sp=0xc000080f30 pc=0x44094e
runtime.gcBgMarkWorker()
/usr/local/go/src/runtime/mgc.go:1310 +0xe5 fp=0xc000080fe0 sp=0xc000080f50 pc=0x420ca5
runtime.goexit({})
/usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000080fe8 sp=0xc000080fe0 pc=0x473721
created by runtime.gcBgMarkStartWorkers in goroutine 1
/usr/local/go/src/runtime/mgc.go:1234 +0x1c

goroutine 35 gp=0xc0003416c0 m=nil [GC worker (idle)]:
runtime.gopark(0x557d01c66dd1?, 0x0?, 0x0?, 0x0?, 0x0?)
/usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc0004b0750 sp=0xc0004b0730 pc=0x44094e
runtime.gcBgMarkWorker()
/usr/local/go/src/runtime/mgc.go:1310 +0xe5 fp=0xc0004b07e0 sp=0xc0004b0750 pc=0x420ca5
runtime.goexit({})
/usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0004b07e8 sp=0xc0004b07e0 pc=0x473721
created by runtime.gcBgMarkStartWorkers in goroutine 1
/usr/local/go/src/runtime/mgc.go:1234 +0x1c

goroutine 49 gp=0xc000500000 m=nil [GC worker (idle)]:
runtime.gopark(0x557d01c4abd0?, 0xc0000560a0?, 0x1a?, 0xa?, 0x0?)
/usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc0004ac750 sp=0xc0004ac730 pc=0x44094e
runtime.gcBgMarkWorker()
/usr/local/go/src/runtime/mgc.go:1310 +0xe5 fp=0xc0004ac7e0 sp=0xc0004ac750 pc=0x420ca5
runtime.goexit({})
/usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0004ac7e8 sp=0xc0004ac7e0 pc=0x473721
created by runtime.gcBgMarkStartWorkers in goroutine 1
/usr/local/go/src/runtime/mgc.go:1234 +0x1c

goroutine 50 gp=0xc0005001c0 m=nil [GC worker (idle)]:
runtime.gopark(0x557d01c6fcb5?, 0x0?, 0x0?, 0x0?, 0x0?)
/usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc0004acf50 sp=0xc0004acf30 pc=0x44094e
runtime.gcBgMarkWorker()
/usr/local/go/src/runtime/mgc.go:1310 +0xe5 fp=0xc0004acfe0 sp=0xc0004acf50 pc=0x420ca5
runtime.goexit({})
/usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0004acfe8 sp=0xc0004acfe0 pc=0x473721
created by runtime.gcBgMarkStartWorkers in goroutine 1
/usr/local/go/src/runtime/mgc.go:1234 +0x1c

goroutine 20 gp=0xc000007a40 m=nil [select, locked to thread]:
runtime.gopark(0xc0004affa8?, 0x2?, 0xe9?, 0xb?, 0xc0004aff94?)
/usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc0004afe38 sp=0xc0004afe18 pc=0x44094e
runtime.selectgo(0xc0004affa8, 0xc0004aff90, 0x0?, 0x0, 0x0?, 0x1)
/usr/local/go/src/runtime/select.go:327 +0x725 fp=0xc0004aff58 sp=0xc0004afe38 pc=0x451e65
runtime.ensureSigM.func1()
/usr/local/go/src/runtime/signal_unix.go:1034 +0x19f fp=0xc0004affe0 sp=0xc0004aff58 pc=0x46aadf
runtime.goexit({})
/usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0004affe8 sp=0xc0004affe0 pc=0x473721
created by runtime.ensureSigM in goroutine 1
/usr/local/go/src/runtime/signal_unix.go:1017 +0xc8

goroutine 5 gp=0xc000500380 m=7 mp=0xc0000bc008 [syscall]:
runtime.notetsleepg(0x2c472a0, 0xffffffffffffffff)
/usr/local/go/src/runtime/lock_futex.go:246 +0x29 fp=0xc000082fa0 sp=0xc000082f78 pc=0x410389
os/signal.signal_recv()
/usr/local/go/src/runtime/sigqueue.go:152 +0x29 fp=0xc000082fc0 sp=0xc000082fa0 pc=0x46ffe9
os/signal.loop()
/usr/local/go/src/os/signal/signal_unix.go:23 +0x13 fp=0xc000082fe0 sp=0xc000082fc0 pc=0x515d73
runtime.goexit({})
/usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000082fe8 sp=0xc000082fe0 pc=0x473721
created by os/signal.Notify.func1.1 in goroutine 1
/usr/local/go/src/os/signal/signal.go:151 +0x1f

goroutine 6 gp=0xc000500540 m=nil [chan receive]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
/usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc0000835d0 sp=0xc0000835b0 pc=0x44094e
runtime.chanrecv(0xc00059a120, 0xc000083718, 0x1)
/usr/local/go/src/runtime/chan.go:583 +0x3bf fp=0xc000083648 sp=0xc0000835d0 pc=0x40a71f
runtime.chanrecv2(0x0?, 0x0?)
/usr/local/go/src/runtime/chan.go:447 +0x12 fp=0xc000083670 sp=0xc000083648 pc=0x40a352
k8s.io/apimachinery/pkg/watch.(*Broadcaster).loop(0xc00059ccd0)
/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/watch/mux.go:268 +0x66 fp=0xc0000837c8 sp=0xc000083670 pc=0x8f6b66
k8s.io/apimachinery/pkg/watch.NewLongQueueBroadcaster.gowrap1()
/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/watch/mux.go:93 +0x25 fp=0xc0000837e0 sp=0xc0000837c8 pc=0x8f5ce5
runtime.goexit({})
/usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000837e8 sp=0xc0000837e0 pc=0x473721
created by k8s.io/apimachinery/pkg/watch.NewLongQueueBroadcaster in goroutine 1
/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/watch/mux.go:93 +0x125

rax 0x0
rbx 0xa
rcx 0x40708e
rdx 0x6
rdi 0xa
rsi 0xe
rbp 0xc0000c9ac0
rsp 0xc0000c9a80
r8 0x0
r9 0x0
r10 0x0
r11 0x216
r12 0x0
r13 0x0
r14 0xc0000061c0
r15 0x3ffffffffff
rip 0x40708e
rflags 0x216
cs 0x33
fs 0x0
gs 0x0

Environment & Versions

  • Spark Operator App version: v1beta2-1.4.3-3.5.0
  • Helm Chart Version: 1.2.7
  • Kubernetes Version: 1.27
  • Apache Spark version:

Additional context

@Aransh Aransh added the kind/bug Something isn't working label Apr 17, 2024
@vara-bonthu
Copy link
Contributor

We just released a new image update with important registry fixes. Check it out:

Image tag: https://github.com/kubeflow/spark-operator/tree/v1beta2-1.4.5-3.5.0
Helm chart: https://github.com/kubeflow/spark-operator/releases/tag/spark-operator-chart-1.2.14

Please give it a try and let us know if you encounter any issues. We're working on a new KubeFlow Spark Operator release and your testing will help make it stable! Feel free to share feedback on the Kubeflow Spark operator channel.

@Aransh
Copy link
Contributor Author

Aransh commented Apr 28, 2024

@vara-bonthu Just tested docker.io/kubeflow/spark-operator:v1beta2-1.4.5-3.5.0 and seeing the same exact issue

++ id -u
+ myuid=0
++ id -g
+ mygid=0
+ set +e
++ getent passwd 0
+ uidentry=root:x:0:0:root:/root:/bin/bash
+ set -e
+ echo 0
0
0
+ echo 0
+ echo root:x:0:0:root:/root:/bin/bash
root:x:0:0:root:/root:/bin/bash
+ [[ -z root:x:0:0:root:/root:/bin/bash ]]
+ exec /usr/bin/tini -s -- /usr/bin/spark-operator -v=2 -logtostderr -namespace=spark-apps -enable-ui-service=true -ingress-url-format= -controller-threads=10 -resync-interval=30 -enable-batch-scheduler=false -label-selector-filter= -enable-metrics=true -metrics-labels=app_type -metrics-port=10254 -metrics-endpoint=/metrics -metrics-prefix= -enable-webhook=true -webhook-svc-namespace=operator-spark -webhook-port=8080 -webhook-timeout=30 -webhook-svc-name=spark-operator-devops-playground-webhook -webhook-config-name=spark-operator-devops-playground-webhook-config -webhook-namespace-selector= -enable-resource-quota-enforcement=false -leader-election=true -leader-election-lock-namespace=operator-spark -leader-election-lock-name=spark-operator-lock
F0428 09:50:15.367621 10 main.go:146] Lock identity is empty
goroutine 1 [running]:
github.com/golang/glog.Fatal(...)
/go/pkg/mod/github.com/golang/glog@v1.2.1/glog.go:664
main.main()
/workspace/main.go:146 +0x1418
SIGABRT: abort
PC=0x40708e m=3 sigcode=18446744073709551610
goroutine 1 gp=0xc0000061c0 m=3 mp=0xc00007f008 [running, locked to thread]:
runtime/internal/syscall.Syscall6()
/usr/local/go/src/runtime/internal/syscall/asm_linux_amd64.s:36 +0xe fp=0xc000023a88 sp=0xc000023a80 pc=0x40708e
syscall.RawSyscall6(0xc000012048?, 0xc000000030?, 0xc0001c0060?, 0x2be5440?, 0x548220?, 0x2be54d8?, 0xc000023af0?)
/usr/local/go/src/runtime/internal/syscall/syscall_linux.go:38 +0xd fp=0xc000023ad0 sp=0xc000023a88 pc=0x40706d
syscall.RawSyscall(0x2be54d8?, 0x0?, 0xc000023b70?, 0xc000023b50?)
/usr/local/go/src/syscall/syscall_linux.go:62 +0x15 fp=0xc000023b18 sp=0xc000023ad0 pc=0x48a8f5
syscall.Tgkill(0xba?, 0x0?, 0x0?)
/usr/local/go/src/syscall/zsyscall_linux_amd64.go:894 +0x25 fp=0xc000023b48 sp=0xc000023b18 pc=0x488aa5
github.com/golang/glog.abortProcess()
/go/pkg/mod/github.com/golang/glog@v1.2.1/glog_file_linux.go:35 +0x87 fp=0xc000023b90 sp=0xc000023b48 pc=0x548387
github.com/golang/glog.ctxfatalf({0x0?, 0x0?}, 0xc0004460a0?, {0x1b8f1eb?, 0x411d65?}, {0xc0004460a0?, 0x185ca80?, 0xc00016a001?})
/go/pkg/mod/github.com/golang/glog@v1.2.1/glog.go:647 +0x6a fp=0xc000023bf8 sp=0xc000023b90 pc=0x54606a
github.com/golang/glog.fatalf(...)
/go/pkg/mod/github.com/golang/glog@v1.2.1/glog.go:657
github.com/golang/glog.FatalDepth(0x1, {0xc0004460a0, 0x1, 0x1})
/go/pkg/mod/github.com/golang/glog@v1.2.1/glog.go:670 +0x57 fp=0xc000023c48 sp=0xc000023bf8 pc=0x5461f7
github.com/golang/glog.Fatal(...)
/go/pkg/mod/github.com/golang/glog@v1.2.1/glog.go:664
main.main()
/workspace/main.go:146 +0x1418 fp=0xc000023f50 sp=0xc000023c48 pc=0x172f418
runtime.main()
/usr/local/go/src/runtime/proc.go:271 +0x29d fp=0xc000023fe0 sp=0xc000023f50 pc=0x4404fd
runtime.goexit({})
/usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000023fe8 sp=0xc000023fe0 pc=0x473721
goroutine 2 gp=0xc000006700 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
/usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000084fa8 sp=0xc000084f88 pc=0x44094e
runtime.goparkunlock(...)
/usr/local/go/src/runtime/proc.go:408
runtime.forcegchelper()
/usr/local/go/src/runtime/proc.go:326 +0xb3 fp=0xc000084fe0 sp=0xc000084fa8 pc=0x4407b3
runtime.goexit({})
/usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000084fe8 sp=0xc000084fe0 pc=0x473721
created by runtime.init.6 in goroutine 1
/usr/local/go/src/runtime/proc.go:314 +0x1a
goroutine 3 gp=0xc000006c40 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
/usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000085780 sp=0xc000085760 pc=0x44094e
runtime.goparkunlock(...)
/usr/local/go/src/runtime/proc.go:408
runtime.bgsweep(0xc000058070)
/usr/local/go/src/runtime/mgcsweep.go:318 +0xdf fp=0xc0000857c8 sp=0xc000085780 pc=0x42a2bf
runtime.gcenable.gowrap1()
/usr/local/go/src/runtime/mgc.go:203 +0x25 fp=0xc0000857e0 sp=0xc0000857c8 pc=0x41ebc5
runtime.goexit({})
/usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000857e8 sp=0xc0000857e0 pc=0x473721
created by runtime.gcenable in goroutine 1
/usr/local/go/src/runtime/mgc.go:203 +0x66
goroutine 4 gp=0xc000006e00 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0x1e0bc58?, 0x0?, 0x0?, 0x0?)
/usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000085f78 sp=0xc000085f58 pc=0x44094e
runtime.goparkunlock(...)
/usr/local/go/src/runtime/proc.go:408
runtime.(*scavengerState).park(0x2be58a0)
/usr/local/go/src/runtime/mgcscavenge.go:425 +0x49 fp=0xc000085fa8 sp=0xc000085f78 pc=0x427c69
runtime.bgscavenge(0xc000058070)
/usr/local/go/src/runtime/mgcscavenge.go:658 +0x59 fp=0xc000085fc8 sp=0xc000085fa8 pc=0x428219
runtime.gcenable.gowrap2()
/usr/local/go/src/runtime/mgc.go:204 +0x25 fp=0xc000085fe0 sp=0xc000085fc8 pc=0x41eb65
runtime.goexit({})
/usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000085fe8 sp=0xc000085fe0 pc=0x473721
created by runtime.gcenable in goroutine 1
/usr/local/go/src/runtime/mgc.go:204 +0xa5
goroutine 17 gp=0xc0001b0000 m=nil [finalizer wait]:
runtime.gopark(0xc000084660?, 0x42713c?, 0x80?, 0x8f?, 0x550011?)
/usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000084620 sp=0xc000084600 pc=0x44094e
runtime.runfinq()
/usr/local/go/src/runtime/mfinal.go:194 +0x107 fp=0xc0000847e0 sp=0xc000084620 pc=0x41dc07
runtime.goexit({})
/usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000847e8 sp=0xc0000847e0 pc=0x473721
created by runtime.createfing in goroutine 1
/usr/local/go/src/runtime/mfinal.go:164 +0x3d
goroutine 18 gp=0xc0001b1500 m=nil [select]:
runtime.gopark(0xc000080780?, 0x2?, 0x40?, 0x6?, 0xc000080774?)
/usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000080618 sp=0xc0000805f8 pc=0x44094e
runtime.selectgo(0xc000080780, 0xc000080770, 0x0?, 0x0, 0x0?, 0x1)
/usr/local/go/src/runtime/select.go:327 +0x725 fp=0xc000080738 sp=0xc000080618 pc=0x451e65
github.com/golang/glog.(*fileSink).flushDaemon(0x2be54d8)
/go/pkg/mod/github.com/golang/glog@v1.2.1/glog_file.go:351 +0xb9 fp=0xc0000807c8 sp=0xc000080738 pc=0x547df9
github.com/golang/glog.init.1.gowrap1()
/go/pkg/mod/github.com/golang/glog@v1.2.1/glog_file.go:166 +0x25 fp=0xc0000807e0 sp=0xc0000807c8 pc=0x546e85
runtime.goexit({})
/usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000807e8 sp=0xc0000807e0 pc=0x473721
created by github.com/golang/glog.init.1 in goroutine 1
/go/pkg/mod/github.com/golang/glog@v1.2.1/glog_file.go:166 +0x126
goroutine 33 gp=0xc0002ea8c0 m=nil [GC worker (idle)]:
runtime.gopark(0x1c89a40?, 0xc00015ec20?, 0x1a?, 0xa?, 0x0?)
/usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000080f50 sp=0xc000080f30 pc=0x44094e
runtime.gcBgMarkWorker()
/usr/local/go/src/runtime/mgc.go:1310 +0xe5 fp=0xc000080fe0 sp=0xc000080f50 pc=0x420ca5
runtime.goexit({})
/usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000080fe8 sp=0xc000080fe0 pc=0x473721
created by runtime.gcBgMarkStartWorkers in goroutine 1
/usr/local/go/src/runtime/mgc.go:1234 +0x1c
goroutine 21 gp=0xc0002eae00 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
/usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000081750 sp=0xc000081730 pc=0x44094e
runtime.gcBgMarkWorker()
/usr/local/go/src/runtime/mgc.go:1310 +0xe5 fp=0xc0000817e0 sp=0xc000081750 pc=0x420ca5
runtime.goexit({})
/usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000817e8 sp=0xc0000817e0 pc=0x473721
created by runtime.gcBgMarkStartWorkers in goroutine 1
/usr/local/go/src/runtime/mgc.go:1234 +0x1c
goroutine 5 gp=0xc0000076c0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
/usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000086750 sp=0xc000086730 pc=0x44094e
runtime.gcBgMarkWorker()
/usr/local/go/src/runtime/mgc.go:1310 +0xe5 fp=0xc0000867e0 sp=0xc000086750 pc=0x420ca5
runtime.goexit({})
/usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000867e8 sp=0xc0000867e0 pc=0x473721
created by runtime.gcBgMarkStartWorkers in goroutine 1
/usr/local/go/src/runtime/mgc.go:1234 +0x1c
goroutine 22 gp=0xc0002eafc0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
/usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000081f50 sp=0xc000081f30 pc=0x44094e
runtime.gcBgMarkWorker()
/usr/local/go/src/runtime/mgc.go:1310 +0xe5 fp=0xc000081fe0 sp=0xc000081f50 pc=0x420ca5
runtime.goexit({})
/usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000081fe8 sp=0xc000081fe0 pc=0x473721
created by runtime.gcBgMarkStartWorkers in goroutine 1
/usr/local/go/src/runtime/mgc.go:1234 +0x1c
goroutine 23 gp=0xc0002eb880 m=nil [select, locked to thread]:
runtime.gopark(0xc000557fa8?, 0x2?, 0xe9?, 0xb?, 0xc000557f94?)
/usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000557e38 sp=0xc000557e18 pc=0x44094e
runtime.selectgo(0xc000557fa8, 0xc000557f90, 0x0?, 0x0, 0x0?, 0x1)
/usr/local/go/src/runtime/select.go:327 +0x725 fp=0xc000557f58 sp=0xc000557e38 pc=0x451e65
runtime.ensureSigM.func1()
/usr/local/go/src/runtime/signal_unix.go:1034 +0x19f fp=0xc000557fe0 sp=0xc000557f58 pc=0x46aadf
runtime.goexit({})
/usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000557fe8 sp=0xc000557fe0 pc=0x473721
created by runtime.ensureSigM in goroutine 1
/usr/local/go/src/runtime/signal_unix.go:1017 +0xc8
goroutine 6 gp=0xc000007880 m=6 mp=0xc00007f808 [syscall]:
runtime.notetsleepg(0x2c482a0, 0xffffffffffffffff)
/usr/local/go/src/runtime/lock_futex.go:246 +0x29 fp=0xc000553fa0 sp=0xc000553f78 pc=0x410389
os/signal.signal_recv()
/usr/local/go/src/runtime/sigqueue.go:152 +0x29 fp=0xc000553fc0 sp=0xc000553fa0 pc=0x46ffe9
os/signal.loop()
/usr/local/go/src/os/signal/signal_unix.go:23 +0x13 fp=0xc000553fe0 sp=0xc000553fc0 pc=0x515d73
runtime.goexit({})
/usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000553fe8 sp=0xc000553fe0 pc=0x473721
created by os/signal.Notify.func1.1 in goroutine 1
/usr/local/go/src/os/signal/signal.go:151 +0x1f
goroutine 7 gp=0xc000007a40 m=nil [runnable]:
k8s.io/apimachinery/pkg/watch.NewLongQueueBroadcaster.gowrap1()
/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/watch/mux.go:93 fp=0xc0005547e0 sp=0xc0005547d8 pc=0x8f5cc0
runtime.goexit({})
/usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0005547e8 sp=0xc0005547e0 pc=0x473721
created by k8s.io/apimachinery/pkg/watch.NewLongQueueBroadcaster in goroutine 1
/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/watch/mux.go:93 +0x125
rax 0x0
rbx 0xa
rcx 0x40708e
rdx 0x6
rdi 0xa
rsi 0xd
rbp 0xc000023ac0
rsp 0xc000023a80
r8 0x0
r9 0x0
r10 0x0
r11 0x216
r12 0x0
r13 0x0
r14 0xc0000061c0
r15 0xffffffff800074
rip 0x40708e
rflags 0x216
cs 0x33
fs 0x0
gs 0x0

@MJFND
Copy link

MJFND commented May 2, 2024

I recently moved from 3.4 to 3.5 spark and it worked for me.

spark-operator: v1beta2-1.4.2-3.5.0
helm: 1.2.5
k8: 1.27

I faced same issue above but I had to make sure my internal library and docker were pointing to same spark 3.5.0 version.

@YanivKunda
Copy link

I recently moved from 3.4 to 3.5 spark and it worked for me.

spark-operator: v1beta2-1.4.2-3.5.0 helm: 1.2.5 k8: 1.27

I faced same issue above but I had to make sure my internal library and docker were pointing to same spark 3.5.0 version.

@MJFND The problem we encountered is with operator version 1.4.3 (you've used operator version 1.4.2 - which also works for us), plus - the spark version is not relevant here, since it is the operator container itself which fails to start.

@Aransh
Copy link
Contributor Author

Aransh commented May 9, 2024

Just updating that this issue occurs also in image tag v1beta2-1.4.6-3.5.0

@YanivKunda
Copy link

Trying to revisit this to see what actually went wrong, from the crash logs the issue seems to originate in the leader election process:

F0428 09:50:15.367621 10 main.go:146] Lock identity is empty

Could this be related to the K8s changes in done in #1983 ?
I see it is was done after #1968 which aligns APIs to 1.29.3 -
Maybe this is a K8s API backward compatibility? (we are using K8s 1.28.9)

@Aransh
Copy link
Contributor Author

Aransh commented Jun 2, 2024

Seems to work now, thanks @Aakcht!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants