New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Terraform fatal error: concurrent map read and map write #33333
Comments
Thanks for reporting this, @tgraskemper! From a shallow look at the affected code this seems like it's a long-standing concurrency bug that we've not noticed so far because it's relatively hard to hit: it would require using multiple Thankfully I don't think we need to be able to reliably reproduce it to fix it, because the cause and solution seems reasonably clear at first glance: the This bug is actually upstream in Thanks again! |
Thanks again for reporting this, @tgraskemper! This is closed now because I've merged a fix that will be included in the forthcoming v1.6 series. Unfortunately I had to abort my attempt to directly backport it into the v1.5 series because the upstream library (terraform-svchost) is using dependabot to ratchet all of its dependencies to latest and once of its dependencies is one we're intentionally holding back in Terraform v1.5 to avoid prematurely releasing some changes to configuration language behavior. Unfortunately I've not yet found a suitable way to achieve a similar effect by locking inside Terraform itself, and so I've not yet found a way to patch this problem satisfactorily for Terraform v1.5. If all else fails then this fix will go into v1.6, but since v1.5 is new that will still be several months away. Based on the stack trace it seems like the race here is between Thanks! |
Thanks for the quick action on this @apparentlymart. We really appreciate the effort put towards getting it resolved. I'll try to answer your question about how we're using So our
And so on, with many projects having 10+ of these As for the backend configuration, an example of this would be like
Hopefully that helps. If we have to wait until 1.6, and there's a high degree of confidence this is resolved there, then we'll wait for it. This is a relatively minor bug that is really just an inconvenience when it happens, and it's still quite rare. |
Thanks for that extra information, @tgraskemper! One thing that seems a little strange here is that it seems like all of your instances of the This made me think that the problem might be somewhere different than I initially concluded, and indeed digging deeper I found another oddity I missed on the first read, and this one is actually inside this codebase and so might be easier to fix in v1.5: One of the new additions in v1.4 was to support a special "magic" hostname called Unfortunately it achieves that by unconditionally writing an alias hostname into the terraform/internal/cloud/backend.go Line 290 in 1e6b2d0
The fix I already made for v1.6 is still a valid solution to this because it adds mutex guards to the call to write in the alias, but it does look like there's a plausible surgical fix we could include in v1.5 too: adding a mutex guard to the terraform/internal/cloud/backend.go Lines 207 to 222 in 1e6b2d0
(We'd also need to fix the equivalent call inside the deprecated The v1.6 fix is more general in that it makes all of the map writes inside the discovery system concurrency-safe; the surgical fix I'm describing here will only help with the call to alias I'm working elsewhere today so I won't be able to look at this immediately and so I'm going to de-assign myself for now in case someone else wants to pick this up. However, I'll try to return to this in the not-too-distant future and prepare a PR targeting just the v1.5 branch if nobody else beats me to it. Thanks again! I think there's also a secondary problem here which I want to note, but I don't think it actually matters in practice right now due to the order of operations: The remote backend unconditionally sets I don't think this is a problem in practice because the @brandonc would you like to make a separate issue for that somewhere? I don't want to make this issue represent it because I want to do the more surgical fix to mutex the lock registration here, but I think we should probably try to find a less hazardous design for this in the long run so that we don't get caught out by this hidden interaction in future work. |
@apparentlymart Interesting, I didn't consider that code outside of terraform might be configuring the backend at the same time! I'll get to work on a fix as you described in the near future and consider an alternative design. Maybe it would be a good idea to move explicit aliasing into backend configuration callers so that the actual alias can conveniently remain in svchost. Thanks for the detailed summary. |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. |
Terraform Version
Debug Output
data.terraform_remote_state.dns: Reading...
fatal error: concurrent map read and map write
goroutine 3724 [running]:
github.com/hashicorp/terraform-svchost/disco.(*Disco).CredentialsForHost(0x140008b1c80, {0x14006b24c60, 0x10})
github.com/hashicorp/terraform-svchost@v0.1.0/disco/disco.go:113 +0x4c
github.com/hashicorp/terraform/internal/backend/remote.(*Remote).token(0x14008690790)
github.com/hashicorp/terraform/internal/backend/remote/backend.go:505 +0x48
github.com/hashicorp/terraform/internal/backend/remote.(*Remote).Configure(0x14008690790, {{{0x1072475c0?, 0x14002862850?}}, {0x106e41c00?, 0x140066ee9f0?}})
github.com/hashicorp/terraform/internal/backend/remote/backend.go:287 +0x618
github.com/hashicorp/terraform/internal/builtin/providers/terraform.dataSourceRemoteStateRead({{{0x1072475c0?, 0x1400259a3a0?}}, {0x106e41c00?, 0x140062e84b0?}})
github.com/hashicorp/terraform/internal/builtin/providers/terraform/data_source_state.go:109 +0x168
github.com/hashicorp/terraform/internal/builtin/providers/terraform.(*Provider).ReadDataSource(0x0?, {{0x14000051578, 0x16}, {{{0x1072475c0, 0x1400259a3a0}}, {0x106e41c00, 0x140062e84b0}}, {{{0x107247518, 0x10864c738}}, {0x0, ...}}})
github.com/hashicorp/terraform/internal/builtin/providers/terraform/provider.go:77 +0x150
github.com/hashicorp/terraform/internal/terraform.(*NodeAbstractResourceInstance).readDataSource(0x14000f13c20, {0x10725cf98, 0x14000a96ee0}, {{{0x1072475c0?, 0x14001a7a3a0?}}, {0x106e41c00?, 0x140083ffa10?}})
github.com/hashicorp/terraform/internal/terraform/node_resource_abstract_instance.go:1434 +0xdac
github.com/hashicorp/terraform/internal/terraform.(*NodeAbstractResourceInstance).planDataSource(0x14000f13c20, {0x10725cf98, 0x14000a96ee0}, 0x9?, 0x0)
github.com/hashicorp/terraform/internal/terraform/node_resource_abstract_instance.go:1654 +0x1014
github.com/hashicorp/terraform/internal/terraform.(*NodePlannableResourceInstance).dataResourceExecute(0x1400185d100, {0x10725cf98, 0x14000a96ee0})
github.com/hashicorp/terraform/internal/terraform/node_resource_plan_instance.go:90 +0x2b8
github.com/hashicorp/terraform/internal/terraform.(*NodePlannableResourceInstance).Execute(0x0?, {0x10725cf98?, 0x14000a96ee0?}, 0xb0?)
github.com/hashicorp/terraform/internal/terraform/node_resource_plan_instance.go:62 +0x8c
github.com/hashicorp/terraform/internal/terraform.(*ContextGraphWalker).Execute(0x14002fc1a40, {0x10725cf98, 0x14000a96ee0}, {0x132c78178, 0x1400185d100})
github.com/hashicorp/terraform/internal/terraform/graph_walk_context.go:136 +0xa8
github.com/hashicorp/terraform/internal/terraform.(*Graph).walk.func1({0x1071c8c80, 0x1400185d100})
github.com/hashicorp/terraform/internal/terraform/graph.go:75 +0x238
github.com/hashicorp/terraform/internal/dag.(*Walker).walkVertex(0x140007e2840, {0x1071c8c80, 0x1400185d100}, 0x1400743e640)
github.com/hashicorp/terraform/internal/dag/walk.go:381 +0x2dc
created by github.com/hashicorp/terraform/internal/dag.(*Walker).Update
github.com/hashicorp/terraform/internal/dag/walk.go:304 +0xb7c
goroutine 1 [select]:
github.com/hashicorp/terraform/internal/command.(*Meta).RunOperation(0x140002ec700, {0x130686a20?, 0x140009cd290?}, 0x140002a2360)
github.com/hashicorp/terraform/internal/command/meta.go:440 +0x154
github.com/hashicorp/terraform/internal/command.(*PlanCommand).Run(0x140002ec700, {0x1400012a060, 0x2, 0x2})
github.com/hashicorp/terraform/internal/command/plan.go:96 +0x688
github.com/mitchellh/cli.(*CLI).Run(0x140001688c0)
github.com/mitchellh/cli@v1.1.5/cli.go:262 +0x4a8
main.realMain()
github.com/hashicorp/terraform/main.go:315 +0x1408
main.main()
github.com/hashicorp/terraform/main.go:58 +0x1c
goroutine 18 [select]:
go.opencensus.io/stats/view.(*worker).start(0x14000184f00)
go.opencensus.io@v0.23.0/stats/view/worker.go:276 +0x88
created by go.opencensus.io/stats/view.init.0
go.opencensus.io@v0.23.0/stats/view/worker.go:34 +0xb0
goroutine 24 [chan receive]:
k8s.io/klog/v2.(*loggingT).flushDaemon(0x0?)
k8s.io/klog/v2@v2.30.0/klog.go:1181 +0x5c
created by k8s.io/klog/v2.init.0
k8s.io/klog/v2@v2.30.0/klog.go:420 +0x150
goroutine 49 [syscall]:
os/signal.signal_recv()
runtime/sigqueue.go:149 +0x2c
os/signal.loop()
os/signal/signal_unix.go:23 +0x1c
created by os/signal.Notify.func1.1
os/signal/signal.go:151 +0x2c
goroutine 50 [chan receive]:
main.makeShutdownCh.func1()
github.com/hashicorp/terraform/commands.go:437 +0x30
created by main.makeShutdownCh
github.com/hashicorp/terraform/commands.go:435 +0xec
goroutine 8 [IO wait]:
internal/poll.runtime_pollWait(0x1304f9050, 0x72)
runtime/netpoll.go:306 +0xa0
internal/poll.(*pollDesc).wait(0x140009f2200?, 0x14000a5a000?, 0x0)
internal/poll/fd_poll_runtime.go:84 +0x28
internal/poll.(*pollDesc).waitRead(...)
internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0x140009f2200, {0x14000a5a000, 0x5500, 0x5500})
internal/poll/fd_unix.go:167 +0x200
net.(*netFD).Read(0x140009f2200, {0x14000a5a000?, 0x140009fd878?, 0x104cdca2c?})
net/fd_posix.go:55 +0x28
net.(*conn).Read(0x14000736088, {0x14000a5a000?, 0x19?, 0xf1e?})
net/net.go:183 +0x34
crypto/tls.(*atLeastReader).Read(0x14000471458, {0x14000a5a000?, 0x14000471458?, 0x0?})
crypto/tls/conn.go:788 +0x40
bytes.(*Buffer).ReadFrom(0x140008d0290, {0x107221220, 0x14000471458})
bytes/buffer.go:202 +0x90
crypto/tls.(*Conn).readFromUntil(0x140008d0000, {0x1303fffc8?, 0x14000736088}, 0xf2b?)
crypto/tls/conn.go:810 +0xd4
crypto/tls.(*Conn).readRecordOrCCS(0x140008d0000, 0x0)
crypto/tls/conn.go:617 +0xd8
crypto/tls.(*Conn).readRecord(...)
crypto/tls/conn.go:583
crypto/tls.(*Conn).Read(0x140008d0000, {0x14000b8b000, 0x1000, 0x104d73788?})
crypto/tls/conn.go:1316 +0x178
bufio.(*Reader).Read(0x1400064d2c0, {0x14000b822e0, 0x9, 0x104d73134?})
bufio/bufio.go:237 +0x1e0
io.ReadAtLeast({0x107220500, 0x1400064d2c0}, {0x14000b822e0, 0x9, 0x9}, 0x9)
io/io.go:332 +0xa0
io.ReadFull(...)
io/io.go:351
net/http.http2readFrameHeader({0x14000b822e0?, 0x9?, 0x140008b1e30?}, {0x107220500?, 0x1400064d2c0?})
net/http/h2_bundle.go:1567 +0x58
net/http.(*http2Framer).ReadFrame(0x14000b822a0)
net/http/h2_bundle.go:1831 +0x84
net/http.(*http2clientConnReadLoop).run(0x140009fdf88)
net/http/h2_bundle.go:9187 +0xfc
net/http.(*http2ClientConn).readLoop(0x140008ce180)
net/http/h2_bundle.go:9082 +0x5c
created by net/http.(*http2Transport).newClientConn
net/http/h2_bundle.go:7779 +0xad0
goroutine 2403 [semacquire]:
sync.runtime_Semacquire(0x0?)
runtime/sema.go:62 +0x2c
sync.(*WaitGroup).Wait(0x14006726210)
sync/waitgroup.go:116 +0x78
github.com/hashicorp/terraform/internal/dag.(*Walker).Wait(0x140067261e0)
github.com/hashicorp/terraform/internal/dag/walk.go:118 +0x34
github.com/hashicorp/terraform/internal/dag.(*AcyclicGraph).Walk(0x14002fc1a40?, 0x140020784e0)
github.com/hashicorp/terraform/internal/dag/dag.go:165 +0x70
github.com/hashicorp/terraform/internal/terraform.(*Graph).walk(0x140034044c0, {0x1072466e0, 0x14002fc1a40})
github.com/hashicorp/terraform/internal/terraform/graph.go:134 +0xbc
github.com/hashicorp/terraform/internal/terraform.(*Graph).Walk(...)
github.com/hashicorp/terraform/internal/terraform/graph.go:35
github.com/hashicorp/terraform/internal/terraform.(*Context).walk(0x0?, 0x0?, 0x0?, 0x14003849b68?)
github.com/hashicorp/terraform/internal/terraform/context_walk.go:47 +0xa8
github.com/hashicorp/terraform/internal/terraform.(*Context).planWalk(0x106e3c3e0?, 0x14000b82380, 0x14000881500?, 0x14000ad1840)
github.com/hashicorp/terraform/internal/terraform/context_plan.go:533 +0x2c0
github.com/hashicorp/terraform/internal/terraform.(*Context).plan(0x0?, 0x0?, 0x0?, 0x1400a36fd40?)
github.com/hashicorp/terraform/internal/terraform/context_plan.go:287 +0x24
github.com/hashicorp/terraform/internal/terraform.(*Context).Plan(0x10661d84b?, 0x14000b82380, 0x14000d2be00, 0x14000ad1840)
github.com/hashicorp/terraform/internal/terraform/context_plan.go:176 +0x4bc
github.com/hashicorp/terraform/internal/backend/local.(*Local).opPlan.func2()
github.com/hashicorp/terraform/internal/backend/local/backend_plan.go:86 +0xac
created by github.com/hashicorp/terraform/internal/backend/local.(*Local).opPlan
github.com/hashicorp/terraform/internal/backend/local/backend_plan.go:82 +0x460
goroutine 3296 [select]:
github.com/hashicorp/terraform/internal/dag.(*Walker).walkVertex(0x140067261e0, {0x107031e60, 0x14008c7dc40}, 0x14008158280)
github.com/hashicorp/terraform/internal/dag/walk.go:335 +0x120
created by github.com/hashicorp/terraform/internal/dag.(*Walker).Update
github.com/hashicorp/terraform/internal/dag/walk.go:304 +0xb7c
goroutine 3949 [chan send]:
github.com/hashicorp/terraform/internal/terraform.Semaphore.Acquire(...)
github.com/hashicorp/terraform/internal/terraform/util.go:25
github.com/hashicorp/terraform/internal/terraform.(*ContextGraphWalker).Execute(0x14002fc1a40, {0x10725cf98, 0x14000a96ee0}, {0x132670028, 0x14000c9a5a0})
github.com/hashicorp/terraform/internal/terraform/graph_walk_context.go:133 +0x50
github.com/hashicorp/terraform/internal/terraform.(*Graph).walk.func1({0x106ff6f60, 0x14000c9a5a0})
github.com/hashicorp/terraform/internal/terraform/graph.go:75 +0x238
github.com/hashicorp/terraform/internal/dag.(*Walker).walkVertex(0x14000c9a600, {0x106ff6f60, 0x14000c9a5a0}, 0x14004387d40)
github.com/hashicorp/terraform/internal/dag/walk.go:381 +0x2dc
created by github.com/hashicorp/terraform/internal/dag.(*Walker).Update
github.com/hashicorp/terraform/internal/dag/walk.go:304 +0xb7c
goroutine 3994 [select]:
github.com/hashicorp/terraform/internal/command/views.(*UiHook).stillApplying(0x14000a8aea0, {{0x14004b9ad60, 0x1f}, {0x0, 0x0}, {0x0, 0x0}, 0x4, {0x0, 0xedbf8f220, ...}, ...})
github.com/hashicorp/terraform/internal/command/views/hook_ui.go:147 +0x1c8
created by github.com/hashicorp/terraform/internal/command/views.(*UiHook).PreApply
github.com/hashicorp/terraform/internal/command/views/hook_ui.go:138 +0x618
goroutine 39 [IO wait]:
internal/poll.runtime_pollWait(0x1304f9140, 0x72)
runtime/netpoll.go:306 +0xa0
internal/poll.(*pollDesc).wait(0x140009f2080?, 0x140001a0800?, 0x0)
internal/poll/fd_poll_runtime.go:84 +0x28
internal/poll.(*pollDesc).waitRead(...)
internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0x140009f2080, {0x140001a0800, 0x1800, 0x1800})
internal/poll/fd_unix.go:167 +0x200
net.(*netFD).Read(0x140009f2080, {0x140001a0800?, 0x14000465878?, 0x104cdca2c?})
net/fd_posix.go:55 +0x28
net.(*conn).Read(0x1400000e1a8, {0x140001a0800?, 0x19?, 0x1652?})
net/net.go:183 +0x34
crypto/tls.(*atLeastReader).Read(0x140004704c8, {0x140001a0800?, 0x140004704c8?, 0x0?})
crypto/tls/conn.go:788 +0x40
bytes.(*Buffer).ReadFrom(0x14000300290, {0x107221220, 0x140004704c8})
bytes/buffer.go:202 +0x90
crypto/tls.(*Conn).readFromUntil(0x14000300000, {0x1303fffc8?, 0x1400000e1a8}, 0x165f?)
crypto/tls/conn.go:810 +0xd4
crypto/tls.(*Conn).readRecordOrCCS(0x14000300000, 0x0)
crypto/tls/conn.go:617 +0xd8
crypto/tls.(*Conn).readRecord(...)
crypto/tls/conn.go:583
crypto/tls.(*Conn).Read(0x14000300000, {0x14000a22000, 0x1000, 0x104d73788?})
crypto/tls/conn.go:1316 +0x178
bufio.(*Reader).Read(0x140004075c0, {0x14000a0a660, 0x9, 0x104d73134?})
bufio/bufio.go:237 +0x1e0
io.ReadAtLeast({0x107220500, 0x140004075c0}, {0x14000a0a660, 0x9, 0x9}, 0x9)
io/io.go:332 +0xa0
io.ReadFull(...)
io/io.go:351
net/http.http2readFrameHeader({0x14000a0a660?, 0x9?, 0x140004496b0?}, {0x107220500?, 0x140004075c0?})
net/http/h2_bundle.go:1567 +0x58
net/http.(*http2Framer).ReadFrame(0x14000a0a620)
net/http/h2_bundle.go:1831 +0x84
net/http.(*http2clientConnReadLoop).run(0x14000465f88)
net/http/h2_bundle.go:9187 +0xfc
net/http.(*http2ClientConn).readLoop(0x1400044a180)
net/http/h2_bundle.go:9082 +0x5c
created by net/http.(*http2Transport).newClientConn
net/http/h2_bundle.go:7779 +0xad0
Expected Behavior
Terraform should not fatal error and cause a locked state we must force unlock.
Actual Behavior
Terraform crashes with a fatal error, leaves state locked.
Steps to Reproduce
terraform plan
orterraform apply
on a workspace with multiple resources and terraform_remote_state calls from Terraform Cloud.Additional Context
Difficult to reproduce as it appears random but started in 1.4.x. Force unlocking the state and running the plan/apply again is typically successful. Occurring across multiple users using the same process, but only aware of issue on ARM64 MacOS Ventura. Potentially affecting Intel MacOS systems but not confirmed from our team. Stack trace provided is partial as it is very long.
Looking through history we're also seeing
fatal error: concurrent map writes
which looks very similar to the above posted error, at a different point in the run of a plan/apply.References
No response
The text was updated successfully, but these errors were encountered: