I started this analysis intending to do just that--it's been difficult, however,...

qeternity · on June 12, 2020

And herein lies the rub: HA Postgres is an extremely painful proposition. Based on our non-scientific research, Patroni seems to be the most battle tested solution, and as popular if not more so than Stolon.

zozbot234 · on June 12, 2020

"HA in Postgres" does not have a very well-defined meaning. The Postgres documentation provides an overview of different viable solutions: https://www.postgresql.org/docs/12/different-replication-sol... with features and drawbacks for each. But to call it "extremely painful" seems to be a bit overstated.

aphyr · on June 12, 2020

Patroni's documentation also seems to suggest that even with the strongest settings, it can lose transactions; Stolon makes stronger claims.

feike · on June 12, 2020

Patroni does have synchronous_mode_strict setting, which may be what you're looking for:

This parameter prevents Patroni from switching off the synchronous replication on the primary when no synchronous standby candidates are available. As a downside, the primary is not be available for writes (unless the Postgres transaction explicitly turns of synchronous_mode), blocking all client write requests until at least one synchronous replica comes up.

https://patroni.readthedocs.io/en/latest/replication_modes.h...

edit: seems I missed this discussion on twitter: https://twitter.com/jepsen_io/status/1265626035380346881

aphyr · on June 12, 2020

Er, again, the docs say "it is still possible to lose transactions even when using synchronous_mode_strict". I've talked about this with some of the Stolon folks on Twitter, and we're not exactly sure how that manifests--possibly an SI or SSI violation.

merb · on June 13, 2020

why is stolon more safe than? I tought stolon uses synchronous replication of pg aswell?

satyanash · on June 12, 2020

> documentation also seems to suggest that even with the strongest settings, it can lose transactions;

Can be reproduced even on a single node postgres. Just hammer it with inserts and maintain a local counter for inserts performed. Then, kill9 the postgres process. You'd expect your local counter to match the actual rows inserted, but you'll find that your counter will always be "less" than the actual rows inserted. Like any "networked" system, it is possible to lose commit acknowledgments even if the commit itself was successful.

So yes, you've not "lost" transactions per se. You've "gained" them, but it is still a data issue in either case.

anarazel · on June 12, 2020

The classical solution to that is to use 2PC. But often it's not worth it...

andreareina · on June 13, 2020

2PC can have unacknowledged writes: https://aphyr.com/posts/282-call-me-maybe-postgres

qeternity · on June 12, 2020

Ah, I presumed you were talking about distributed failure situations (split brain, etc) as opposed the to PG level replication (which most solutions orchestrate anyway).

LunaSea · on June 12, 2020

It's one of the reasons for which NoSQL databases got a lot of publicity during the early 2010's.

threeseed · on June 12, 2020

And are still widely used today.

People like to criticise NoSQL databases like MongoDB etc but at least they took on the challenge of making clustering easy enough to use and safe enough to rely on. Especially because it such a complex and error prone challenge.

biggestdummy · on June 12, 2020

Odd that you would point out MongoDB as your named example, as it is pretty awful at sharding/clustering. For HA, the more better example would be Cassandra or Scylla. Mongo's success is more tied to the ease of development with a native JSON document DB, rather than any claims to scalability. (Insert "Mongodb is webscale" video here.)

PeterCorless · on June 25, 2020

I lol'ed. (Full disclosure: I work for ScyllaDB.)

MongoDB has definitely come a long way in terms of HA, but yes, they are still have a long way to go. A good primer talk on the differences can be found here: https://www.scylladb.com/tech-talk/mongodb-vs-scylla-product...

threeseed · on June 12, 2020

MongoDB was called out because of its ease of use. You can create replica sets and shards in seconds. And for many use cases it works great.

Cassandra is one of if not the best since it's multi-master but it's a little bit more complex to setup.

mwcampbell · on June 12, 2020

> HA Postgres is an extremely painful proposition.

Does anyone here know how Amazon RDS's HA setup, particularly their multi-AZ option, works? That seems to be a switch that the AWS customer can just turn on. Do they have a proprietary implementation, even for non-Aurora Postgres?

aeyes · on June 12, 2020

Here is pretty much the most detailed post about how it works you'll be able to find in public: https://aws.amazon.com/blogs/database/amazon-rds-under-the-h...

They basically do replication at the storage layer. Each write has to be acknowledged by both the primary and secondary EBS volume.

threeseed · on June 12, 2020

They basically have built a proprietary, distributed block store.

And on top of this they have layered PostgreSQL, MySQL, MongoDB, Cassandra etc.

I doubt they will never release the code for it since it's very much a competitive advantage.

devit · on June 12, 2020

So if the hardware running the database is suddenly destroyed they try to start another instance really fast?

That seems inferior to having multiple sync replicas ready to take over without having to start a process and replay the WAL.

Also, such an HA block store seems very easy to replicate ( I'd guess there would be something open source already), not much of a competitive advantage.

aeyes · on June 13, 2020

There are two types of failover in RDS:

- Single instance, if the instance dies they start a new one and mount the same storage. This can take some time, in general under 5min but I have seen it take 45min, especially for the large instance types.

- Multi A-Z, they run a hot standby with replicated and physically separated storage. Failover takes about a minute. The replication happens at storage level, every write has to be acknowledged by both availability zones. I'm not sure if Postgres is always running or if it gets started when failing over.

I guess you could replicate this using drbd.

redis_mlc · on June 12, 2020

- using block-level replication allows them to support multiple databases in a common way

- block-level replication can be more reliable in the long run operationally than some types of database replication, especially MySQL back in the day

- block-level replication has more scalable support staff available than hiring DBAs to fix database replication problems

- programming for all the edge cases is something that is a competitive advantage

- no licensing required for it

- you can probably guess which Open Source project it's based on

Source: DBA, worked there.

hashhar · on June 12, 2020

Just gonna put this here for people who might be wondering. Not too many block-level replication implementations exist.

https://www.linbit.com/drbd/

NB: It is Free software, don't be alarmed by the domain name.

bratao · on June 13, 2020

Linstor can do that. I use in Kubernetes and it is incredible!

crgwbr · on June 12, 2020

That’s how the Aurora DBs work, but I believe standard RDS Postgres uses a more typical method of WAL replication.

jpgvm · on June 13, 2020

For read replicas yes but not for the standby master. They give you access to pg_stat_replication so this is quite easy to verify.

ksec · on June 12, 2020

Is there a proposed roadmap for basic / default solution of HA Postgres? It seems MySQl has this well covered and Postgres continue to think it is not a core part of their DB and relies on third party. ( Not suggesting that is necessary a bad thing )