Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: Support fsnotify reloading of certs #6415

Merged

Conversation

charlieegan3
Copy link
Contributor

Fixes #5788

This PR makes the following changes:

  • Adds a new WithTLSConfig option on the server to support the setting of a cert, key and ca cert at the same time. Previously, it was only possible to set the cert and key, in addition to a polling duration with WithCertificatePaths.
  • Adds a test for this in TestFSNotifyCertReloading and the interval reloading in TestIntervalCertReloading. These tests are really verbose, and the cert fixture creation might be best if DRYed up some.
  • Makes it possible to reload the CA certs when the pool file changes on disk. Previously, this was static post init.
  • Retains the WithCertificate and WithCertPool runtime options as well as the polling behaviour if the interval is set in certLoopPolling

server/certs.go Outdated Show resolved Hide resolved
@tjons
Copy link
Contributor

tjons commented Nov 17, 2023

I was interested in this issue so read over your solution and it seems quite well done! one question about a comment

@charlieegan3
Copy link
Contributor Author

Thanks for the review @tjons! 🙂

Copy link
Member

@ashutosh-narkar ashutosh-narkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charlieegan3 it's a bit difficult to understand the individual changes. Would it be a good idea to create separate PRs unless these changes are related to each other in some way?

@charlieegan3
Copy link
Contributor Author

Hey @ashutosh-narkar, I guess that if we need to break this down the parts might be:

  • Adding WithTLSConfig, to support the setting of a ca pool file path on the server that it can then refresh alongside existing polling update of certs and keys.
  • adding the fsnotify based implementation of the reloading of these files.

They aren't really co-dependent, but I'd rather work to make sure that the changes are easier to understand. I have had a go with some changes to the impl and comments in a621e2b. Hope that helps.

Copy link
Contributor

@johanfylling johanfylling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of thoughts/questions.

server/certs.go Outdated Show resolved Hide resolved
server/certs.go Outdated Show resolved Hide resolved
server/certs.go Show resolved Hide resolved
server/server.go Show resolved Hide resolved
server/server_test.go Show resolved Hide resolved
@charlieegan3 charlieegan3 force-pushed the fsnotify-reloading-certs branch 2 times, most recently from 054a125 to 1b45886 Compare November 30, 2023 10:12
Copy link

netlify bot commented Nov 30, 2023

Deploy Preview for openpolicyagent ready!

Name Link
🔨 Latest commit 054a125
🔍 Latest deploy log https://app.netlify.com/sites/openpolicyagent/deploys/65685fe9a6f08a0008cb31ab
😎 Deploy Preview https://deploy-preview-6415--openpolicyagent.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

netlify bot commented Nov 30, 2023

Deploy Preview for openpolicyagent ready!

Name Link
🔨 Latest commit d08398b
🔍 Latest deploy log https://app.netlify.com/sites/openpolicyagent/deploys/657871006490240008aa9395
😎 Deploy Preview https://deploy-preview-6415--openpolicyagent.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link
Contributor Author

@charlieegan3 charlieegan3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @johanfylling - think I've got them all now

server/certs.go Outdated Show resolved Hide resolved
server/certs.go Show resolved Hide resolved
server/certs.go Outdated Show resolved Hide resolved
server/server.go Show resolved Hide resolved
server/server_test.go Show resolved Hide resolved
@charlieegan3 charlieegan3 force-pushed the fsnotify-reloading-certs branch 2 times, most recently from d1ada1f to 39f6a8e Compare November 30, 2023 17:48
johanfylling
johanfylling previously approved these changes Dec 7, 2023
Copy link
Contributor

@johanfylling johanfylling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@@ -4852,6 +4863,747 @@ func TestDistributedTracingEnabled(t *testing.T) {
}
}

func TestCertPoolReloading(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add another round to this where we change the client CA again; picking up a change nothing->something might be different to picking up a change something->something else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review, I can get back to this next week and hopefully we can get this in! 🎉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d08398b should do the trick, lmk if that looks good to you.

Reload certs, keys and optionally the CA cert pool when they change on
disk.

Signed-off-by: Charlie Egan <charlie@styra.com>
Changes to improve readability.

Signed-off-by: Charlie Egan <charlie@styra.com>
Signed-off-by: Charlie Egan <charlie@styra.com>
Signed-off-by: Charlie Egan <charlie@styra.com>
Signed-off-by: Charlie Egan <charlie@styra.com>
Signed-off-by: Charlie Egan <charlie@styra.com>
Signed-off-by: Charlie Egan <charlie@styra.com>
This updates the CA reloading test to check that both switching the CA
cert, and placing the CA cert in the first place both behave correctly.

Signed-off-by: Charlie Egan <charlie@styra.com>
Copy link
Member

@ashutosh-narkar ashutosh-narkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ashutosh-narkar
Copy link
Member

@charlieegan3 please squash your commits and merge. Thanks!

@charlieegan3 charlieegan3 merged commit a307ec4 into open-policy-agent:main Dec 13, 2023
24 checks passed
@charlieegan3 charlieegan3 deleted the fsnotify-reloading-certs branch December 13, 2023 08:53
charlieegan3 added a commit to charlieegan3/opa that referenced this pull request Dec 13, 2023
This was missed in open-policy-agent#6415

https://github.com/open-policy-agent/opa/actions/runs/7194437460/job/19595009978?pr=6476

We have a warning of this data race here:

```
==================
WARNING: DATA RACE
Write at 0x00c009b562e8 by goroutine 1815:
  github.com/open-policy-agent/opa/server.(*Server).reloadTLSConfig()
      /src/server/certs.go:65 +0x608
  github.com/open-policy-agent/opa/server.(*Server).getListener.(*Server).certLoopNotify.func2()
      /src/server/certs.go:174 +0x434
  github.com/open-policy-agent/opa/server.TestCertPoolReloading.func1()
      /src/server/server_test.go:5105 +0x4f
  github.com/open-policy-agent/opa/server.TestCertPoolReloading.func2()
      /src/server/server_test.go:5108 +0x41

Previous read at 0x00c009b562e8 by goroutine 1824:
  github.com/open-policy-agent/opa/server.(*Server).getListenerForHTTPSServer.func1()
      /src/server/server.go:649 +0x145
  crypto/tls.(*Conn).readClientHello()
      /usr/local/go/src/crypto/tls/handshake_server.go:149 +0x97d
  crypto/tls.(*Conn).serverHandshake()
      /usr/local/go/src/crypto/tls/handshake_server.go:42 +0x64
  crypto/tls.(*Conn).serverHandshake-fm()
      <autogenerated>:1 +0x47
  crypto/tls.(*Conn).handshakeContext()
      /usr/local/go/src/crypto/tls/conn.go:1552 +0x615
  crypto/tls.(*Conn).HandshakeContext()
      /usr/local/go/src/crypto/tls/conn.go:1492 +0x16b8
  net/http.(*conn).serve()
      /usr/local/go/src/net/http/server.go:1891 +0x16c0
  net/http.(*Server).Serve.func3()
      /usr/local/go/src/net/http/server.go:3086 +0x4f
```

Signed-off-by: Charlie Egan <charlie@styra.com>
ashutosh-narkar pushed a commit to charlieegan3/opa that referenced this pull request Dec 13, 2023
This was missed in open-policy-agent#6415

https://github.com/open-policy-agent/opa/actions/runs/7194437460/job/19595009978?pr=6476

We have a warning of this data race here:

```
==================
WARNING: DATA RACE
Write at 0x00c009b562e8 by goroutine 1815:
  github.com/open-policy-agent/opa/server.(*Server).reloadTLSConfig()
      /src/server/certs.go:65 +0x608
  github.com/open-policy-agent/opa/server.(*Server).getListener.(*Server).certLoopNotify.func2()
      /src/server/certs.go:174 +0x434
  github.com/open-policy-agent/opa/server.TestCertPoolReloading.func1()
      /src/server/server_test.go:5105 +0x4f
  github.com/open-policy-agent/opa/server.TestCertPoolReloading.func2()
      /src/server/server_test.go:5108 +0x41

Previous read at 0x00c009b562e8 by goroutine 1824:
  github.com/open-policy-agent/opa/server.(*Server).getListenerForHTTPSServer.func1()
      /src/server/server.go:649 +0x145
  crypto/tls.(*Conn).readClientHello()
      /usr/local/go/src/crypto/tls/handshake_server.go:149 +0x97d
  crypto/tls.(*Conn).serverHandshake()
      /usr/local/go/src/crypto/tls/handshake_server.go:42 +0x64
  crypto/tls.(*Conn).serverHandshake-fm()
      <autogenerated>:1 +0x47
  crypto/tls.(*Conn).handshakeContext()
      /usr/local/go/src/crypto/tls/conn.go:1552 +0x615
  crypto/tls.(*Conn).HandshakeContext()
      /usr/local/go/src/crypto/tls/conn.go:1492 +0x16b8
  net/http.(*conn).serve()
      /usr/local/go/src/net/http/server.go:1891 +0x16c0
  net/http.(*Server).Serve.func3()
      /usr/local/go/src/net/http/server.go:3086 +0x4f
```

Signed-off-by: Charlie Egan <charlie@styra.com>
ashutosh-narkar pushed a commit that referenced this pull request Dec 13, 2023
This was missed in #6415

https://github.com/open-policy-agent/opa/actions/runs/7194437460/job/19595009978?pr=6476

We have a warning of this data race here:

```
==================
WARNING: DATA RACE
Write at 0x00c009b562e8 by goroutine 1815:
  github.com/open-policy-agent/opa/server.(*Server).reloadTLSConfig()
      /src/server/certs.go:65 +0x608
  github.com/open-policy-agent/opa/server.(*Server).getListener.(*Server).certLoopNotify.func2()
      /src/server/certs.go:174 +0x434
  github.com/open-policy-agent/opa/server.TestCertPoolReloading.func1()
      /src/server/server_test.go:5105 +0x4f
  github.com/open-policy-agent/opa/server.TestCertPoolReloading.func2()
      /src/server/server_test.go:5108 +0x41

Previous read at 0x00c009b562e8 by goroutine 1824:
  github.com/open-policy-agent/opa/server.(*Server).getListenerForHTTPSServer.func1()
      /src/server/server.go:649 +0x145
  crypto/tls.(*Conn).readClientHello()
      /usr/local/go/src/crypto/tls/handshake_server.go:149 +0x97d
  crypto/tls.(*Conn).serverHandshake()
      /usr/local/go/src/crypto/tls/handshake_server.go:42 +0x64
  crypto/tls.(*Conn).serverHandshake-fm()
      <autogenerated>:1 +0x47
  crypto/tls.(*Conn).handshakeContext()
      /usr/local/go/src/crypto/tls/conn.go:1552 +0x615
  crypto/tls.(*Conn).HandshakeContext()
      /usr/local/go/src/crypto/tls/conn.go:1492 +0x16b8
  net/http.(*conn).serve()
      /usr/local/go/src/net/http/server.go:1891 +0x16c0
  net/http.(*Server).Serve.func3()
      /usr/local/go/src/net/http/server.go:3086 +0x4f
```

Signed-off-by: Charlie Egan <charlie@styra.com>

for evt := range watcher.Events {
removalMask := fsnotify.Remove | fsnotify.Rename
mask := fsnotify.Create | fsnotify.Write | removalMask
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are aware that this will trigger partial writes:

        // The pathname was written to; this does *not* mean the write has finished,
        // and a write can be followed by more writes.
        Write

One other thing that I figured out when I did https://pkg.go.dev/github.com/zalando/skipper/secrets#SecretPaths some years ago is that in Kubernetes secret mounts you have symlinks and at least inotify does not work for symlinks as expected. A symlink does not change, if only the content of the targeted file change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks for this. Since a partial write would result in OPA failing to reload the file (as it'd be invalid in reloadCertificateKeyPair), the main concern here is missing a write, is that correct?

Were you able to mitigate this in skipper without polling? Perhaps we could have OPA fallback to a polling behaviour when the file in question is a symlink?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do just polling every minute or 30s (something like this). Keep it simple. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, I'll need to do some digging by the sounds of things!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, no worries I just wanted to hint that it might not work as expected in all environments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support fsnotify based reloading of certificate, key and CA certs
5 participants