Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix websocket/yamux/drpc goleak failure on Windows #317

Closed
bryphe-coder opened this issue Feb 17, 2022 · 5 comments · Fixed by #329 or #337
Closed

Fix websocket/yamux/drpc goleak failure on Windows #317

bryphe-coder opened this issue Feb 17, 2022 · 5 comments · Fixed by #329 or #337
Labels
api Area: HTTP API
Milestone

Comments

@bryphe-coder
Copy link
Contributor

bryphe-coder commented Feb 17, 2022

There is failure that occurs somewhat regularly in CI - the test pass, but goleak reports a failure.

Full Failure
coverage: 74.1% of statements
goleak: Errors on successful test run: found unexpected goroutines:
[Goroutine 88026 in state select, with nhooyr.io/websocket.(*Conn).writeFrame.func1 on top of the stack:
goroutine 88026 [select]:
nhooyr.io/websocket.(*Conn).writeFrame.func1()
	C:/Users/runneradmin/go/pkg/mod/nhooyr.io/websocket@v1.8.7/write.go:276 +0x129
nhooyr.io/websocket.(*Conn).writeFrame(0xc0025b6000, {0x181dee0, 0xc000873580}, 0x1, 0x0, 0x2, {0xc000a6b000, 0x4, 0x1000})
	C:/Users/runneradmin/go/pkg/mod/nhooyr.io/websocket@v1.8.7/write.go:318 +0xb48
nhooyr.io/websocket.(*Conn).write(0xc0025b6000, {0x181dee0, 0xc000873580}, 0x1, {0xc000a6b000, 0x4, 0x1000})
	C:/Users/runneradmin/go/pkg/mod/nhooyr.io/websocket@v1.8.7/write.go:129 +0x2d2
nhooyr.io/websocket.(*Conn).Write(0xc0001391a0, {0x181dee0, 0xc000873580}, 0xc00021b268, {0xc000a6b000, 0x4, 0x1000})
	C:/Users/runneradmin/go/pkg/mod/nhooyr.io/websocket@v1.8.7/write.go:42 +0x8e
nhooyr.io/websocket.(*netConn).Write(0xc000139080, {0xc000a6b000, 0x4, 0x1000})
	C:/Users/runneradmin/go/pkg/mod/nhooyr.io/websocket@v1.8.7/netconn.go:84 +0xb0
github.com/hashicorp/yamux.(*Session).send(0xc001718f20)
	C:/Users/runneradmin/go/pkg/mod/github.com/hashicorp/yamux@v0.0.0-20211028200310-0bc27b27de87/session.go:[44](https://github.com/coder/coder/runs/5239756778?check_suite_focus=true#step:7:44)3 +0x234
created by github.com/hashicorp/yamux.newSession
	C:/Users/runneradmin/go/pkg/mod/github.com/hashicorp/yamux@v0.0.0-20211028200310-0bc27b27de87/session.go:114 +0x8d1

 Goroutine 88025 in state select, with nhooyr.io/websocket.(*mu).lock on top of the stack:
goroutine 88025 [select]:
nhooyr.io/websocket.(*mu).lock(0xc0041b4fe0, {0x181df50, 0xc003405b60})
	C:/Users/runneradmin/go/pkg/mod/nhooyr.io/websocket@v1.8.7/conn_notjs.go:238 +0x17f
nhooyr.io/websocket.(*Conn).writeFrame(0xc0025b6000, {0x181df50, 0xc003405b60}, 0x1, 0x0, 0x8, {0xc001508855, 0xb, 0xb})
	C:/Users/runneradmin/go/pkg/mod/nhooyr.io/websocket@v1.8.7/write.go:2[45](https://github.com/coder/coder/runs/5239756778?check_suite_focus=true#step:7:45) +0xd2
nhooyr.io/websocket.(*Conn).writeControl(0xc0025b6158, {0x181df18, 0xc0000a0198}, 0xa8cf4c, {0xc001508855, 0xb, 0xb})
	C:/Users/runneradmin/go/pkg/mod/nhooyr.io/websocket@v1.8.7/write.go:236 +0x105
nhooyr.io/websocket.(*Conn).writeClose(0xc0025b6000, 0x3f0, {0xc001508827, 0x9})
	C:/Users/runneradmin/go/pkg/mod/nhooyr.io/websocket@v1.8.7/close_notjs.go:76 +0x21c
nhooyr.io/websocket.(*Conn).handleControl(0xc0025b6000, {0x181dee0, 0xc0008735c0}, {0x1, 0x0, 0x0, 0x0, 0x8, 0xb, 0x1, ...})
	C:/Users/runneradmin/go/pkg/mod/nhooyr.io/websocket@v1.8.7/read.go:295 +0x4f4
nhooyr.io/websocket.(*Conn).readLoop(0xc0025b6000, {0x181dee0, 0xc0008735c0})
	C:/Users/runneradmin/go/pkg/mod/nhooyr.io/websocket@v1.8.7/read.go:163 +0x206
nhooyr.io/websocket.(*Conn).reader(0xc0025b6000, {0x181dee0, 0xc0008735c0})
	C:/Users/runneradmin/go/pkg/mod/nhooyr.io/websocket@v1.8.7/read.go:315 +0x245
nhooyr.io/websocket.(*Conn).Reader(...)
	C:/Users/runneradmin/go/pkg/mod/nhooyr.io/websocket@v1.8.7/read.go:30
nhooyr.io/websocket.(*netConn).Read(0xc000139080, {0xc00093e000, 0x1000, 0x1000})
	C:/Users/runneradmin/go/pkg/mod/nhooyr.io/websocket@v1.8.7/netconn.go:100 +0x167
bufio.(*Reader).Read(0xc0001390e0, {0xc00144c9e4, 0xc, 0xc})
	C:/hostedtoolcache/windows/go/1.17.6/x64/src/bufio/bufio.go:227 +0x4db
io.ReadAtLeast({0x180ae00, 0xc0001390e0}, {0xc00144c9e4, 0xc, 0xc}, 0xc)
	C:/hostedtoolcache/windows/go/1.17.6/x64/src/io/io.go:328 +0xde
io.ReadFull(...)
	C:/hostedtoolcache/windows/go/1.17.6/x64/src/io/io.go:347
github.com/hashicorp/yamux.(*Session).recvLoop(0xc001718f20)
	C:/Users/runneradmin/go/pkg/mod/github.com/hashicorp/yamux@v0.0.0-20211028200310-0bc27b27de87/session.go:483 +0x136
github.com/hashicorp/yamux.(*Session).recv(0x0)
	C:/Users/runneradmin/go/pkg/mod/github.com/hashicorp/yamux@v0.0.0-20211028200310-0bc27b27de87/session.go:[46](https://github.com/coder/coder/runs/5239756778?check_suite_focus=true#step:7:46)2 +0x2f
created by github.com/hashicorp/yamux.newSession
	C:/Users/runneradmin/go/pkg/mod/github.com/hashicorp/yamux@v0.0.0-20211028200310-0bc27b27de87/session.go:113 +0x865

 Goroutine 88707 in state select, with storj.io/drpc/drpcmanager.(*Manager).manageStream on top of the stack:
goroutine 88707 [select]:
storj.io/drpc/drpcmanager.(*Manager).manageStream(0xc001576b60, {0x181df88, 0xc001dc5950}, 0xc0009a2d80)
	C:/Users/runneradmin/go/pkg/mod/storj.io/drpc@v0.0.29/drpcmanager/manager.go:289 +0x1b7
storj.io/drpc/drpcmanager.(*Manager).manageStreams(0xc001576b60)
	C:/Users/runneradmin/go/pkg/mod/storj.io/drpc@v0.0.29/drpcmanager/manager.go:276 +0xfa
created by storj.io/drpc/drpcmanager.NewWithOptions
	C:/Users/runneradmin/go/pkg/mod/storj.io/drpc@v0.0.29/drpcmanager/manager.go:104 +0x7d1

 Goroutine 88027 in state select, with github.com/hashicorp/yamux.(*Session).keepalive on top of the stack:
goroutine 88027 [select]:
github.com/hashicorp/yamux.(*Session).keepalive(0xc001718f20)
	C:/Users/runneradmin/go/pkg/mod/github.com/hashicorp/yamux@v0.0.0-20211028200310-0bc27b27de87/session.go:338 +0x129
created by github.com/hashicorp/yamux.newSession
	C:/Users/runneradmin/go/pkg/mod/github.com/hashicorp/yamux@v0.0.0-20211028200310-0bc27b27de87/session.go:116 +0x95a

 Goroutine 88024 in state select, with nhooyr.io/websocket.(*Conn).timeoutLoop on top of the stack:
goroutine 88024 [select]:
nhooyr.io/websocket.(*Conn).timeoutLoop(0xc0025b6000)
	C:/Users/runneradmin/go/pkg/mod/nhooyr.io/websocket@v1.8.7/conn_notjs.go:153 +0x23d
created by nhooyr.io/websocket.newConn
	C:/Users/runneradmin/go/pkg/mod/nhooyr.io/websocket@v1.8.7/conn_notjs.go:114 +0xc65

 Goroutine 88028 in state chan receive, with storj.io/drpc/drpcserver.(*Server).Serve.func1 on top of the stack:
goroutine 88028 [chan receive]:
storj.io/drpc/drpcserver.(*Server).Serve.func1({0x181e5e0, 0xc001777890})
	C:/Users/runneradmin/go/pkg/mod/storj.io/drpc@v0.0.29/drpcserver/server.go:80 +0x51
storj.io/drpc/drpcctx.(*Tracker).track(0xc001777890, 0xc0014c3620)
	C:/Users/runneradmin/go/pkg/mod/storj.io/drpc@v0.0.29/drpcctx/transport.go:52 +0x3e
created by storj.io/drpc/drpcctx.(*Tracker).Run
	C:/Users/runneradmin/go/pkg/mod/storj.io/drpc@v0.0.29/drpcctx/transport.go:[47](https://github.com/coder/coder/runs/5239756778?check_suite_focus=true#step:7:47) +0xf1

 Goroutine 88688 in state select, with github.com/hashicorp/yamux.(*Session).waitForSendErr on top of the stack:
goroutine 88688 [select]:
github.com/hashicorp/yamux.(*Session).waitForSendErr(0xc001718f20, {0xc001cb5fd4, 0xc, 0xc}, {0xc000a6b000, 0x4, 0x1000}, 0xc0034051a0)
	C:/Users/runneradmin/go/pkg/mod/github.com/hashicorp/yamux@v0.0.0-20211028200310-0bc27b27de87/session.go:385 +0x532
github.com/hashicorp/yamux.(*Stream).write(0xc001b4c410, {0xc000a6b000, 0x4, 0x1000})
	C:/Users/runneradmin/go/pkg/mod/github.com/hashicorp/yamux@v0.0.0-20211028200310-0bc27b27de87/stream.go:202 +0x474
github.com/hashicorp/yamux.(*Stream).Write(0xc001b4c410, {0xc000a6b000, 0x4, 0x1000})
	C:/Users/runneradmin/go/pkg/mod/github.com/hashicorp/yamux@v0.0.0-20211028200310-0bc27b27de87/stream.go:158 +0x18d
storj.io/drpc/drpcwire.(*Writer).Flush(0xc0018f9d40)
	C:/Users/runneradmin/go/pkg/mod/storj.io/drpc@v0.0.29/drpcwire/writer.go:85 +0xba
storj.io/drpc/drpcstream.(*Stream).rawFlushLocked(0xc0009a2d80)
	C:/Users/runneradmin/go/pkg/mod/storj.io/drpc@v0.0.29/drpcstream/stream.go:352 +0x176
storj.io/drpc/drpcstream.(*Stream).MsgSend(0xc0009a2d80, {0x15950c0, 0xc0018f9dc0}, {0x18177e0, 0x1dc8e68})
	C:/Users/runneradmin/go/pkg/mod/storj.io/drpc@v0.0.29/drpcstream/stream.go:404 +0x2b2
storj.io/drpc/drpcmux.(*Mux).HandleRPC(0xc000a41918, {0x1823f38, 0xc0009a2d80}, {0xc0005b6420, 0x29})
	C:/Users/runneradmin/go/pkg/mod/storj.io/drpc@v0.0.29/drpcmux/handle_rpc.go:38 +0x57f
storj.io/drpc/drpcserver.(*Server).handleRPC(0xc000139200, 0x181df88, {0xc0005b6420, 0x29})
	C:/Users/runneradmin/go/pkg/mod/storj.io/drpc@v0.0.29/drpcserver/server.go:122 +0x72
storj.io/drpc/drpcserver.(*Server).ServeOne(0xc000139200, {0x181e5e0, 0xc001777890}, {0x16245952598, 0xc001b4c410})
	C:/Users/runneradmin/go/pkg/mod/storj.io/drpc@v0.0.29/drpcserver/server.go:66 +0x2db
storj.io/drpc/drpcserver.(*Server).Serve.func2({0x181e5e0, 0xc001777890})
	C:/Users/runneradmin/go/pkg/mod/storj.io/drpc@v0.0.29/drpcserver/server.go:112 +0x77
storj.io/drpc/drpcctx.(*Tracker).track(0xc001777890, 0xc00004fac0)
	C:/Users/runneradmin/go/pkg/mod/storj.io/drpc@v0.0.29/drpcctx/transport.go:52 +0x3e
created by storj.io/drpc/drpcctx.(*Tracker).Run
	C:/Users/runneradmin/go/pkg/mod/storj.io/drpc@v0.0.29/drpcctx/transport.go:47 +0xf1

 Goroutine 88706 in state select, with github.com/hashicorp/yamux.(*Stream).Read on top of the stack:
goroutine 88706 [select]:
github.com/hashicorp/yamux.(*Stream).Read(0xc001b4c410, {0xc000114000, 0x1000, 0x0})
	C:/Users/runneradmin/go/pkg/mod/github.com/hashicorp/yamux@v0.0.0-20211028200310-0bc27b27de87/stream.go:141 +0x4bc
storj.io/drpc/drpcwire.(*Reader).ReadPacketUsing(0xc000081d60, {0xc0005b63f0, 0x0, 0x30})
	C:/Users/runneradmin/go/pkg/mod/storj.io/drpc@v0.0.29/drpcwire/reader.go:68 +0x8f6
storj.io/drpc/drpcmanager.(*Manager).manageReader(0xc001576b60)
	C:/Users/runneradmin/go/pkg/mod/storj.io/drpc@v0.0.29/drpcmanager/manager.go:199 +0x185
created by storj.io/drpc/drpcmanager.NewWithOptions
	C:/Users/runneradmin/go/pkg/mod/storj.io/drpc@v0.0.29/drpcmanager/manager.go:103 +0x765

 Goroutine 88022 in state select, with github.com/hashicorp/yamux.(*Session).AcceptStream on top of the stack:
goroutine 88022 [select]:
github.com/hashicorp/yamux.(*Session).AcceptStream(0xc001718f20)
	C:/Users/runneradmin/go/pkg/mod/github.com/hashicorp/yamux@v0.0.0-20211028200310-0bc27b27de87/session.go:237 +0x109
github.com/hashicorp/yamux.(*Session).Accept(0xc001777890)
	C:/Users/runneradmin/go/pkg/mod/github.com/hashicorp/yamux@v0.0.0-20211028200310-0bc27b27de87/session.go:227 +0x31
storj.io/drpc/drpcserver.(*Server).Serve(0xc000139200, {0x181df88, 0xc001777680}, {0x18182e0, 0xc001718f20})
	C:/Users/runneradmin/go/pkg/mod/storj.io/drpc@v0.0.29/drpcserver/server.go:85 +0x326
github.com/coder/coder/coderd.(*api).provisionerDaemonsServe(0xc000561d00, {0x18186d0, 0xc001f090a0}, 0xc0007d8500)
	D:/a/coder/coder/coderd/provisionerdaemons.go:100 +0x1185
net/http.HandlerFunc.ServeHTTP(0xc0025e5d00, {0x18186d0, 0xc001f090a0}, 0xc00144c98a)
	C:/hostedtoolcache/windows/go/1.17.6/x64/src/net/http/server.go:2047 +0x4e
github.com/go-chi/chi/v5.(*Mux).routeHTTP(0xc001c94240, {0x18186d0, 0xc001f090a0}, 0xc0007d8500)
	C:/Users/runneradmin/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.7/mux.go:442 +0x498
net/http.HandlerFunc.ServeHTTP(0xc0025e5cf0, {0x18186d0, 0xc001f090a0}, 0xa720dc)
	C:/hostedtoolcache/windows/go/1.17.6/x64/src/net/http/server.go:2047 +0x4e
github.com/go-chi/chi/v5.(*Mux).ServeHTTP(0xc001c94240, {0x18186d0, 0xc001f090a0}, 0xc0007d8500)
	C:/Users/runneradmin/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.7/mux.go:71 +0x937
github.com/go-chi/chi/v5.(*Mux).Mount.func1({0x18186d0, 0xc001f090a0}, 0xc0007d8500)
	C:/Users/runneradmin/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.7/mux.go:314 +0x516
net/http.HandlerFunc.ServeHTTP(0xc0000989c0, {0x18186d0, 0xc001f090a0}, 0xc0033efca0)
	C:/hostedtoolcache/windows/go/1.17.6/x64/src/net/http/server.go:2047 +0x4e
github.com/go-chi/chi/v5.(*Mux).routeHTTP(0xc00077a2a0, {0x18186d0, 0xc001f090a0}, 0xc0007d8500)
	C:/Users/runneradmin/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.7/mux.go:442 +0x498
net/http.HandlerFunc.ServeHTTP(0xc0025e5610, {0x18186d0, 0xc001f090a0}, 0xa720dc)
	C:/hostedtoolcache/windows/go/1.17.6/x64/src/net/http/server.go:2047 +0x4e
github.com/go-chi/chi/v5.(*Mux).ServeHTTP(0xc00077a2a0, {0x18186d0, 0xc001f090a0}, 0xc0007d8500)
	C:/Users/runneradmin/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.7/mux.go:71 +0x937
github.com/go-chi/chi/v5.(*Mux).Mount.func1({0x18186d0, 0xc001f090a0}, 0xc0007d8500)
	C:/Users/runneradmin/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.7/mux.go:314 +0x516
net/http.HandlerFunc.ServeHTTP(0xc0000989e0, {0x18186d0, 0xc001f090a0}, 0xc00045b174)
	C:/hostedtoolcache/windows/go/1.17.6/x64/src/net/http/server.go:2047 +0x4e
github.com/go-chi/chi/v5.(*Mux).routeHTTP(0xc00077a240, {0x18186d0, 0xc001f090a0}, 0xc0007d8500)
	C:/Users/runneradmin/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.7/mux.go:442 +0x498
net/http.HandlerFunc.ServeHTTP(0xc0025e5d30, {0x18186d0, 0xc001f090a0}, 0x1d51b60)
	C:/hostedtoolcache/windows/go/1.17.6/x64/src/net/http/server.go:2047 +0x4e
github.com/go-chi/chi/v5.(*Mux).ServeHTTP(0xc00077a240, {0x18186d0, 0xc001f090a0}, 0xc0007d8400)
	C:/Users/runneradmin/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.7/mux.go:88 +0x8b7
net/http.serverHandler.ServeHTTP({0xc0017775c0}, {0x18186d0, 0xc001f090a0}, 0xc0007d8400)
	C:/hostedtoolcache/windows/go/1.17.6/x64/src/net/http/server.go:2879 +0x89b
net/http.(*conn).serve(0xc0008337c0, {0x181df88, 0xc001c8df50})
	C:/hostedtoolcache/windows/go/1.17.6/x64/src/net/http/server.go:1930 +0x12e5
created by net/http.(*Server).Serve
	C:/hostedtoolcache/windows/go/1.17.6/x64/src/net/http/server.go:3034 +0x8[48](https://github.com/coder/coder/runs/5239756778?check_suite_focus=true#step:7:48)

Link to example manifestation: https://github.com/coder/coder/runs/5239756778?check_suite_focus=true#step:7:44

I suspect we're failing to close a connection or server - it seems like this is stemming from

(*api).provisionerDaemonsServe(0xc000561d00, {0x18186d0, 0xc001f090a0}, 0xc0007d8500)

and an issue where a connection is staying open between the provisioner <-> coderd host.

Interestingly, I've only observed this particular failure on Windows runs.

@kylecarbs
Copy link
Member

Pretty sure it's this:

https://github.com/nhooyr/websocket/blob/e4c3b0f8168d619c279822ab882b8f15717041af/close_notjs.go#L76-L80

It's using a context.Background when writing the close message, which seems incorrect. I've pinged nhooyr.

@kylecarbs
Copy link
Member

Just confirmed with nhooyr that's it! Working on a fix now.

kylecarbs added a commit that referenced this issue Feb 18, 2022
Closes #317. We depended on the context canceling the yamux connection,
but this isn't a sync operation. Explicitly calling close ensures the
handler waits for yamux to complete before exit.
@kylecarbs
Copy link
Member

Turns out that is a leak but isn't the cause of our problem. Not synchronously closing yamux caused this leak.

kylecarbs added a commit that referenced this issue Feb 18, 2022
Closes #317. The httptest server cancels the context after the connection
is closed, but if a connection takes a long time to close, the request
would never end. This applies a context to the entire listener that cancels
on test cleanup.

After discussion with @bryphe-coder, reducing the parallel limit on
Windows is likely to reduce failures as well.
kylecarbs added a commit that referenced this issue Feb 18, 2022
Closes #317. The httptest server cancels the context after the connection
is closed, but if a connection takes a long time to close, the request
would never end. This applies a context to the entire listener that cancels
on test cleanup.

After discussion with @bryphe-coder, reducing the parallel limit on
Windows is likely to reduce failures as well.
kylecarbs added a commit that referenced this issue Feb 18, 2022
Closes #317. The httptest server cancels the context after the connection
is closed, but if a connection takes a long time to close, the request
would never end. This applies a context to the entire listener that cancels
on test cleanup.

After discussion with @bryphe-coder, reducing the parallel limit on
Windows is likely to reduce failures as well.
kylecarbs added a commit that referenced this issue Feb 19, 2022
Closes #317. The httptest server cancels the context after the connection
is closed, but if a connection takes a long time to close, the request
would never end. This applies a context to the entire listener that cancels
on test cleanup.

After discussion with @bryphe-coder, reducing the parallel limit on
Windows is likely to reduce failures as well.
kylecarbs added a commit that referenced this issue Feb 19, 2022
* fix: Leaking yamux session after HTTP handler is closed

Closes #317. The httptest server cancels the context after the connection
is closed, but if a connection takes a long time to close, the request
would never end. This applies a context to the entire listener that cancels
on test cleanup.

After discussion with @bryphe-coder, reducing the parallel limit on
Windows is likely to reduce failures as well.

* Switch to windows-2022 to improve decompression

* Invalidate cache on matrix OS
kylecarbs added a commit that referenced this issue Feb 19, 2022
* feat: Add workspace agent for SSH

This adds the initial agent that supports TTY
and execution over SSH. It functions across MacOS,
Windows, and Linux.

This does not handle the coderd interaction yet,
but does setup a simple path forward.

* Fix pty tests on Windows

* Fix log race

* Lock around dial error to fix log output

* Fix context return early

* fix: Leaking yamux session after HTTP handler is closed

Closes #317. We depended on the context canceling the yamux connection,
but this isn't a sync operation. Explicitly calling close ensures the
handler waits for yamux to complete before exit.

* Lock around close return

* Force failure with log

* Fix failed handler

* Upgrade dep

* Fix defer inside loops

* Fix context cancel for HTTP requests

* Fix resize
@kylecarbs kylecarbs reopened this Feb 20, 2022
@kylecarbs
Copy link
Member

Fixing the yamux portion helped, but nhooyr/websocket still has it's leak.

https://github.com/coder/coder/runs/5265928205?check_suite_focus=true#step:7:44

@kylecarbs
Copy link
Member

nhooyr/websocket#330

@misskniss misskniss added this to the V2 Alpha milestone May 3, 2022
@misskniss misskniss added the api Area: HTTP API label Jun 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment