New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: Support fsnotify reloading of certs #6415
server: Support fsnotify reloading of certs #6415
Conversation
cba4dea
to
849372f
Compare
I was interested in this issue so read over your solution and it seems quite well done! one question about a comment |
849372f
to
ccceb16
Compare
Thanks for the review @tjons! 🙂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@charlieegan3 it's a bit difficult to understand the individual changes. Would it be a good idea to create separate PRs unless these changes are related to each other in some way?
ccceb16
to
a2fbe36
Compare
Hey @ashutosh-narkar, I guess that if we need to break this down the parts might be:
They aren't really co-dependent, but I'd rather work to make sure that the changes are easier to understand. I have had a go with some changes to the impl and comments in a621e2b. Hope that helps. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of thoughts/questions.
054a125
to
1b45886
Compare
✅ Deploy Preview for openpolicyagent ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
✅ Deploy Preview for openpolicyagent ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
22529a1
to
fa5395c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @johanfylling - think I've got them all now
d1ada1f
to
39f6a8e
Compare
39f6a8e
to
1ec66e3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
@@ -4852,6 +4863,747 @@ func TestDistributedTracingEnabled(t *testing.T) { | |||
} | |||
} | |||
|
|||
func TestCertPoolReloading(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add another round to this where we change the client CA again; picking up a change nothing->something might be different to picking up a change something->something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review, I can get back to this next week and hopefully we can get this in! 🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
d08398b should do the trick, lmk if that looks good to you.
Reload certs, keys and optionally the CA cert pool when they change on disk. Signed-off-by: Charlie Egan <charlie@styra.com>
Changes to improve readability. Signed-off-by: Charlie Egan <charlie@styra.com>
Signed-off-by: Charlie Egan <charlie@styra.com>
Signed-off-by: Charlie Egan <charlie@styra.com>
Signed-off-by: Charlie Egan <charlie@styra.com>
Signed-off-by: Charlie Egan <charlie@styra.com>
Signed-off-by: Charlie Egan <charlie@styra.com>
This updates the CA reloading test to check that both switching the CA cert, and placing the CA cert in the first place both behave correctly. Signed-off-by: Charlie Egan <charlie@styra.com>
1ec66e3
to
d08398b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@charlieegan3 please squash your commits and merge. Thanks! |
This was missed in open-policy-agent#6415 https://github.com/open-policy-agent/opa/actions/runs/7194437460/job/19595009978?pr=6476 We have a warning of this data race here: ``` ================== WARNING: DATA RACE Write at 0x00c009b562e8 by goroutine 1815: github.com/open-policy-agent/opa/server.(*Server).reloadTLSConfig() /src/server/certs.go:65 +0x608 github.com/open-policy-agent/opa/server.(*Server).getListener.(*Server).certLoopNotify.func2() /src/server/certs.go:174 +0x434 github.com/open-policy-agent/opa/server.TestCertPoolReloading.func1() /src/server/server_test.go:5105 +0x4f github.com/open-policy-agent/opa/server.TestCertPoolReloading.func2() /src/server/server_test.go:5108 +0x41 Previous read at 0x00c009b562e8 by goroutine 1824: github.com/open-policy-agent/opa/server.(*Server).getListenerForHTTPSServer.func1() /src/server/server.go:649 +0x145 crypto/tls.(*Conn).readClientHello() /usr/local/go/src/crypto/tls/handshake_server.go:149 +0x97d crypto/tls.(*Conn).serverHandshake() /usr/local/go/src/crypto/tls/handshake_server.go:42 +0x64 crypto/tls.(*Conn).serverHandshake-fm() <autogenerated>:1 +0x47 crypto/tls.(*Conn).handshakeContext() /usr/local/go/src/crypto/tls/conn.go:1552 +0x615 crypto/tls.(*Conn).HandshakeContext() /usr/local/go/src/crypto/tls/conn.go:1492 +0x16b8 net/http.(*conn).serve() /usr/local/go/src/net/http/server.go:1891 +0x16c0 net/http.(*Server).Serve.func3() /usr/local/go/src/net/http/server.go:3086 +0x4f ``` Signed-off-by: Charlie Egan <charlie@styra.com>
This was missed in open-policy-agent#6415 https://github.com/open-policy-agent/opa/actions/runs/7194437460/job/19595009978?pr=6476 We have a warning of this data race here: ``` ================== WARNING: DATA RACE Write at 0x00c009b562e8 by goroutine 1815: github.com/open-policy-agent/opa/server.(*Server).reloadTLSConfig() /src/server/certs.go:65 +0x608 github.com/open-policy-agent/opa/server.(*Server).getListener.(*Server).certLoopNotify.func2() /src/server/certs.go:174 +0x434 github.com/open-policy-agent/opa/server.TestCertPoolReloading.func1() /src/server/server_test.go:5105 +0x4f github.com/open-policy-agent/opa/server.TestCertPoolReloading.func2() /src/server/server_test.go:5108 +0x41 Previous read at 0x00c009b562e8 by goroutine 1824: github.com/open-policy-agent/opa/server.(*Server).getListenerForHTTPSServer.func1() /src/server/server.go:649 +0x145 crypto/tls.(*Conn).readClientHello() /usr/local/go/src/crypto/tls/handshake_server.go:149 +0x97d crypto/tls.(*Conn).serverHandshake() /usr/local/go/src/crypto/tls/handshake_server.go:42 +0x64 crypto/tls.(*Conn).serverHandshake-fm() <autogenerated>:1 +0x47 crypto/tls.(*Conn).handshakeContext() /usr/local/go/src/crypto/tls/conn.go:1552 +0x615 crypto/tls.(*Conn).HandshakeContext() /usr/local/go/src/crypto/tls/conn.go:1492 +0x16b8 net/http.(*conn).serve() /usr/local/go/src/net/http/server.go:1891 +0x16c0 net/http.(*Server).Serve.func3() /usr/local/go/src/net/http/server.go:3086 +0x4f ``` Signed-off-by: Charlie Egan <charlie@styra.com>
This was missed in #6415 https://github.com/open-policy-agent/opa/actions/runs/7194437460/job/19595009978?pr=6476 We have a warning of this data race here: ``` ================== WARNING: DATA RACE Write at 0x00c009b562e8 by goroutine 1815: github.com/open-policy-agent/opa/server.(*Server).reloadTLSConfig() /src/server/certs.go:65 +0x608 github.com/open-policy-agent/opa/server.(*Server).getListener.(*Server).certLoopNotify.func2() /src/server/certs.go:174 +0x434 github.com/open-policy-agent/opa/server.TestCertPoolReloading.func1() /src/server/server_test.go:5105 +0x4f github.com/open-policy-agent/opa/server.TestCertPoolReloading.func2() /src/server/server_test.go:5108 +0x41 Previous read at 0x00c009b562e8 by goroutine 1824: github.com/open-policy-agent/opa/server.(*Server).getListenerForHTTPSServer.func1() /src/server/server.go:649 +0x145 crypto/tls.(*Conn).readClientHello() /usr/local/go/src/crypto/tls/handshake_server.go:149 +0x97d crypto/tls.(*Conn).serverHandshake() /usr/local/go/src/crypto/tls/handshake_server.go:42 +0x64 crypto/tls.(*Conn).serverHandshake-fm() <autogenerated>:1 +0x47 crypto/tls.(*Conn).handshakeContext() /usr/local/go/src/crypto/tls/conn.go:1552 +0x615 crypto/tls.(*Conn).HandshakeContext() /usr/local/go/src/crypto/tls/conn.go:1492 +0x16b8 net/http.(*conn).serve() /usr/local/go/src/net/http/server.go:1891 +0x16c0 net/http.(*Server).Serve.func3() /usr/local/go/src/net/http/server.go:3086 +0x4f ``` Signed-off-by: Charlie Egan <charlie@styra.com>
|
||
for evt := range watcher.Events { | ||
removalMask := fsnotify.Remove | fsnotify.Rename | ||
mask := fsnotify.Create | fsnotify.Write | removalMask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are aware that this will trigger partial writes:
// The pathname was written to; this does *not* mean the write has finished,
// and a write can be followed by more writes.
Write
One other thing that I figured out when I did https://pkg.go.dev/github.com/zalando/skipper/secrets#SecretPaths some years ago is that in Kubernetes secret mounts you have symlinks and at least inotify does not work for symlinks as expected. A symlink does not change, if only the content of the targeted file change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, thanks for this. Since a partial write would result in OPA failing to reload the file (as it'd be invalid in reloadCertificateKeyPair
), the main concern here is missing a write, is that correct?
Were you able to mitigate this in skipper without polling? Perhaps we could have OPA fallback to a polling behaviour when the file in question is a symlink?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do just polling every minute or 30s (something like this). Keep it simple. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, I'll need to do some digging by the sounds of things!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, no worries I just wanted to hint that it might not work as expected in all environments.
Fixes #5788
This PR makes the following changes:
WithTLSConfig
option on the server to support the setting of a cert, key and ca cert at the same time. Previously, it was only possible to set the cert and key, in addition to a polling duration withWithCertificatePaths
.TestFSNotifyCertReloading
and the interval reloading inTestIntervalCertReloading
. These tests are really verbose, and the cert fixture creation might be best if DRYed up some.WithCertificate
andWithCertPool
runtime options as well as the polling behaviour if the interval is set incertLoopPolling