Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Renew address resolution in conn.reset #559

Merged
merged 1 commit into from
Mar 1, 2024

Conversation

larskanis
Copy link
Collaborator

@larskanis larskanis commented Feb 28, 2024

libpq resolves the host address while PQreset, but ruby-pg doesn't. This is because we explicit set the hostaddr connection parameter when the connection is established the first time. This prevents a newly DNS resolution when running PQresetStart.

This patch adds DNS resolution to conn.reset
Since we can not change the connection parameters after connection start, the underlying PGconn pointer is exchanged in reset_start2. This is done by a PQfinish() + PQconnectStart() sequence. That way the hostaddr parameter is updated and a new connection is established with it.

There is a /etc/hosts and sudo based test in the specs.
The behavior of libpq is slightly different to that of ruby-pg.
It can be verified by the following code:

require "pg"

puts "pg version: #{PG::VERSION}"
# PG::Connection.async_api = false  # Enable to see libpq behaviour

system "sudo sed -i 's/.* abcd/::1 abcd/g' /etc/hosts"
conn = PG.connect host: "abcd", password: "l"
conn.exec("select 1")
p conn.conninfo_hash.slice(:host, :hostaddr, :port)

system "sudo sed -i 's/.* abcd/127.0.0.1 abcd/g' /etc/hosts"
conn.reset
conn.exec("select 1")
p conn.conninfo_hash.slice(:host, :hostaddr, :port)

system "sudo sed -i 's/.* abcd/::2 abcd/g' /etc/hosts"
conn.reset
conn.exec("select 1")
p conn.conninfo_hash.slice(:host, :hostaddr, :port)

This gives the following output showing, that the IP address is updated:

pg version: 1.5.5
{:host=>"abcd", :hostaddr=>"::1", :port=>"5432"}
{:host=>"abcd", :hostaddr=>"127.0.0.1", :port=>"5432"}
ruby-pg/lib/pg/connection.rb:573:in `reset_start2': connection to server at "::2", port 5432 failed: Network is unreachable (PG::ConnectionBad)
	Is the server running on that host and accepting TCP/IP connections?

Whereas libpq resolves similarly with async_api=false, but doesn't raise the error in conn.reset but in the subsequent conn.exec.

pg version: 1.5.5
{:host=>"abcd", :hostaddr=>nil, :port=>"5432"}
{:host=>"abcd", :hostaddr=>nil, :port=>"5432"}
test-reset-dns.rb:18:in `sync_exec': no connection to the server (PG::UnableToSend)

Fixes #558

…his is because we explicit set the `hostaddr` connection parameter when the connection is established the first time. This prevents a newly DNS resolution when running PQresetStart.

This patch adds DNS resolution to `conn.reset`
Since we can not change the connection parameters after connection start, the underlying PGconn pointer is exchanged in reset_start2. This is done by a PQfinish() + PQconnectStart() sequence. That way the `hostaddr` parameter is updated and a new connection is established with it.

There is a `/etc/hosts` and `sudo` based test in the specs.
The behavior of libpq is slightly different to that of ruby-pg.
It can be verified by the following code:

```ruby
require "pg"

puts "pg version: #{PG::VERSION}"

system "sudo sed -i 's/.* abcd/::1 abcd/g' /etc/hosts"
conn = PG.connect host: "abcd", password: "l"
conn.exec("select 1")
p conn.conninfo_hash.slice(:host, :hostaddr, :port)

system "sudo sed -i 's/.* abcd/127.0.0.1 abcd/g' /etc/hosts"
conn.reset
conn.exec("select 1")
p conn.conninfo_hash.slice(:host, :hostaddr, :port)

system "sudo sed -i 's/.* abcd/::2 abcd/g' /etc/hosts"
conn.reset
conn.exec("select 1")
p conn.conninfo_hash.slice(:host, :hostaddr, :port)
```

This gives the following output showing, that the IP address is updated:
```
pg version: 1.5.5
{:host=>"abcd", :hostaddr=>"::1", :port=>"5432"}
{:host=>"abcd", :hostaddr=>"127.0.0.1", :port=>"5432"}
ruby-pg/lib/pg/connection.rb:573:in `reset_start2': connection to server at "::2", port 5432 failed: Network is unreachable (PG::ConnectionBad)
	Is the server running on that host and accepting TCP/IP connections?
```

Whereas libpq resolves similarly with `async_api=false`, but doesn't raise the error in `conn.reset` but in the subsequent `conn.exec`.

```
pg version: 1.5.5
{:host=>"abcd", :hostaddr=>nil, :port=>"5432"}
{:host=>"abcd", :hostaddr=>nil, :port=>"5432"}
test-reset-dns.rb:18:in `sync_exec': no connection to the server (PG::UnableToSend)
```

Fixes ged#558
@bnferguson
Copy link

Hello! Very excited to see this PR! I had previously been fixing this issue via a patch to Rails to do a full disconnect and reconnect (instead of a reset) but this is so much nicer as there were some concerns around not doing a reset.

My test setup is a Rails app with Sidekiq workers attaching to a postgres 14 primary with a read replica. They're behind a single hostname via dnsmasq. I start the failover by setting the primary to read only via ALTER SYSTEM SET default_transaction_read_only = on; and then promote the secondary via SELECT pg_promote();.

I then switch the hostname entries in dnsmasq and disconnect all db connections via SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname = current_database() AND pid <> pg_backend_pid();

The previous behavior had Rails and Sidekiq workers immediately reconnecting to the old primary and then throwing errors around being readonly.

With pg set to this branch we see that Rails and Sidekiq no longer reconnect to the old db and get the DNS change. This is verified with select * from pg_stat_activity;

Seems very successful! 🎉 🙌

@larskanis larskanis merged commit 27d7c2a into ged:master Mar 1, 2024
13 of 17 checks passed
@larskanis larskanis deleted the reset-with-dns branch March 1, 2024 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

connection.reset don't check again DNS
2 participants