Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I started this analysis intending to do just that--it's been difficult, however, to figure out which of the dozens of replication/HA configurations to actually test. I settled on Stolon, since it seemed to make the strongest safety claims. However, I found bugs which turned out to be PostgreSQL's fault, so I backed off to investigate those first.


And herein lies the rub: HA Postgres is an extremely painful proposition. Based on our non-scientific research, Patroni seems to be the most battle tested solution, and as popular if not more so than Stolon.


"HA in Postgres" does not have a very well-defined meaning. The Postgres documentation provides an overview of different viable solutions: https://www.postgresql.org/docs/12/different-replication-sol... with features and drawbacks for each. But to call it "extremely painful" seems to be a bit overstated.


Patroni's documentation also seems to suggest that even with the strongest settings, it can lose transactions; Stolon makes stronger claims.


Patroni does have synchronous_mode_strict setting, which may be what you're looking for:

This parameter prevents Patroni from switching off the synchronous replication on the primary when no synchronous standby candidates are available. As a downside, the primary is not be available for writes (unless the Postgres transaction explicitly turns of synchronous_mode), blocking all client write requests until at least one synchronous replica comes up.

https://patroni.readthedocs.io/en/latest/replication_modes.h...

edit: seems I missed this discussion on twitter: https://twitter.com/jepsen_io/status/1265626035380346881


Er, again, the docs say "it is still possible to lose transactions even when using synchronous_mode_strict". I've talked about this with some of the Stolon folks on Twitter, and we're not exactly sure how that manifests--possibly an SI or SSI violation.


why is stolon more safe than? I tought stolon uses synchronous replication of pg aswell?


> documentation also seems to suggest that even with the strongest settings, it can lose transactions;

Can be reproduced even on a single node postgres. Just hammer it with inserts and maintain a local counter for inserts performed. Then, kill9 the postgres process. You'd expect your local counter to match the actual rows inserted, but you'll find that your counter will always be "less" than the actual rows inserted. Like any "networked" system, it is possible to lose commit acknowledgments even if the commit itself was successful.

So yes, you've not "lost" transactions per se. You've "gained" them, but it is still a data issue in either case.


The classical solution to that is to use 2PC. But often it's not worth it...


2PC can have unacknowledged writes: https://aphyr.com/posts/282-call-me-maybe-postgres


Ah, I presumed you were talking about distributed failure situations (split brain, etc) as opposed the to PG level replication (which most solutions orchestrate anyway).


It's one of the reasons for which NoSQL databases got a lot of publicity during the early 2010's.


And are still widely used today.

People like to criticise NoSQL databases like MongoDB etc but at least they took on the challenge of making clustering easy enough to use and safe enough to rely on. Especially because it such a complex and error prone challenge.


Odd that you would point out MongoDB as your named example, as it is pretty awful at sharding/clustering. For HA, the more better example would be Cassandra or Scylla. Mongo's success is more tied to the ease of development with a native JSON document DB, rather than any claims to scalability. (Insert "Mongodb is webscale" video here.)


I lol'ed. (Full disclosure: I work for ScyllaDB.)

MongoDB has definitely come a long way in terms of HA, but yes, they are still have a long way to go. A good primer talk on the differences can be found here: https://www.scylladb.com/tech-talk/mongodb-vs-scylla-product...


MongoDB was called out because of its ease of use. You can create replica sets and shards in seconds. And for many use cases it works great.

Cassandra is one of if not the best since it's multi-master but it's a little bit more complex to setup.


> HA Postgres is an extremely painful proposition.

Does anyone here know how Amazon RDS's HA setup, particularly their multi-AZ option, works? That seems to be a switch that the AWS customer can just turn on. Do they have a proprietary implementation, even for non-Aurora Postgres?


Here is pretty much the most detailed post about how it works you'll be able to find in public: https://aws.amazon.com/blogs/database/amazon-rds-under-the-h...

They basically do replication at the storage layer. Each write has to be acknowledged by both the primary and secondary EBS volume.


They basically have built a proprietary, distributed block store.

And on top of this they have layered PostgreSQL, MySQL, MongoDB, Cassandra etc.

I doubt they will never release the code for it since it's very much a competitive advantage.


So if the hardware running the database is suddenly destroyed they try to start another instance really fast?

That seems inferior to having multiple sync replicas ready to take over without having to start a process and replay the WAL.

Also, such an HA block store seems very easy to replicate ( I'd guess there would be something open source already), not much of a competitive advantage.


There are two types of failover in RDS:

- Single instance, if the instance dies they start a new one and mount the same storage. This can take some time, in general under 5min but I have seen it take 45min, especially for the large instance types.

- Multi A-Z, they run a hot standby with replicated and physically separated storage. Failover takes about a minute. The replication happens at storage level, every write has to be acknowledged by both availability zones. I'm not sure if Postgres is always running or if it gets started when failing over.

I guess you could replicate this using drbd.


- using block-level replication allows them to support multiple databases in a common way

- block-level replication can be more reliable in the long run operationally than some types of database replication, especially MySQL back in the day

- block-level replication has more scalable support staff available than hiring DBAs to fix database replication problems

- programming for all the edge cases is something that is a competitive advantage

- no licensing required for it

- you can probably guess which Open Source project it's based on

Source: DBA, worked there.


Just gonna put this here for people who might be wondering. Not too many block-level replication implementations exist.

https://www.linbit.com/drbd/

NB: It is Free software, don't be alarmed by the domain name.


Linstor can do that. I use in Kubernetes and it is incredible!


That’s how the Aurora DBs work, but I believe standard RDS Postgres uses a more typical method of WAL replication.


For read replicas yes but not for the standby master. They give you access to pg_stat_replication so this is quite easy to verify.


Is there a proposed roadmap for basic / default solution of HA Postgres? It seems MySQl has this well covered and Postgres continue to think it is not a core part of their DB and relies on third party. ( Not suggesting that is necessary a bad thing )




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: