i feel like this is the reaction of everyone having ever tried to setup postgres...

takeda · on May 24, 2020

If you worry about data, you should not use automatic failover. It's nearly impossible for standby to know why master stopped responding. Maybe there was a hardware failure, or maybe master is just busy. This is why manual failover is better, because you can know the real reason and decide whether you should perform failover or just wait.

With tools like repmgr it is just a single command invoked on the standby.

If you absolutely don't want to lose any data, you should have two masters in close proximity (so the latency isn't high) set up with synchronous replication, then have one or two standbys with asynchronous replication. This will reduce throughout, but then you can be sure that the other machine has all the same transactions. If something happens to both you then can fallback to the asynchronous one which might be a bit behind.

feike · on May 24, 2020

One of the authors of Patroni here.

Automatic failover for PostgreSQL works great and can be done safely if combined with synchronous replication.

Multiple tools will implement this correctly:

https://patroni.readthedocs.io/en/latest/replication_modes.h... https://github.com/sorintlab/stolon/blob/master/doc/syncrepl...

Quoting a former colleague here, but "if it hurts, do it more often". That is what you should do with your PostgreSQL failovers.

I have clusters running on timelines in the hundreds without a byte of data loss due to using synchronous replication, tools that help out with leader election, and just doing it often.

takeda · on May 24, 2020

Can Patroni tell if master node is not responsive because it is busy vs dead? GitHub (I believe) had few outages that caused data loss because their auto failover mechanism kicked in when it shouldn't.

I would actually be interested if aphyr's analysis of Patroni and other distributed add-ons to PostgreSQL.

pas · on May 25, 2020

There is no real difference between dead or too-busy.

The only question is how soon are you going to page humans. After the automated mechanism flipped your master 2-3 times but the cluster still hasn't made progress [nothing coming out of the master; or it locks up after a few minutes again]), or right after some other automated mechanism detects that there's a problem.

Whatever automation you have in place, it has advantages and disadvantages. In the GitHub case - I suppose - they determined post-mortem that it would have been better to just let the master chug through the incoming onslaught of queries instead of failing over, and over, and over. (But of course this seems like a trivial problem in any auto failover setup, so I suspect there's more to the story.)

feike · on May 25, 2020

> Can Patroni tell if master node is not responsive because it is busy vs dead

No. But the contract Patroni has is this:

I only serve a master (primary) if I have the lock. If I do not have the lock I will demote.

This results in that there can be only 1 primary active at any given point in time, even if the network is partitioned.

This in and of itself does not guarantee no-split-brain situations, a split-brain can occur if writes were made on the former primary, but not yet on the future primary. This however can be mitigated with synchronous replication.

zozbot234 · on May 25, 2020

> tell if master node is not responsive because it is busy vs dead?

The postgres documentation will tell you that you'll need to set up your own mechanisms for this, and that they will need to integrate with OS facilities as appropriate. One-size-fits-all does not cut it. Not wrt. replication, not wrt. HA/failover.