I love ClickHouse and use it heavily. One area where I wish CH could do better w...

I love ClickHouse and use it heavily. One area where I wish CH could do better was as a primary data store.

Basically, every proposed use case for CH is based on event sourcing some data, from Postgres or logs or whatever. The implication is that the data either already exist as a "source of truth" in some primary ACID database, or at least there is an archive of raw data files, or maybe (as with logs and metrics) the risk of data loss isn't that big of a deal.

But what if you actually want to store the data in a single place? CH doesn't really offer peace of mind here. Its entire architecture is based on best-effort management of data. One of ClickHouse's best features is that it can store the data in cloud storage, to allow separation of data and compute at an incredible price point. But it can lose data.

So if you have, say, 30TB of data that is very columnar and cannot be effficiently queried in Postgres, you cannot simply store it in CH alone. You'd have to pay quite a lot of $ to have it safely guarded by (let's say) Postgres, even if it's not being used as the main source of queries. If you have heavy ingest rates, you're going to have to pay for more expensive SSD storage, too.

There are columnar databases that are ACID and focus on consistency, like TimescaleDB. But they tend to be cloud databases. For example, you can self-host Timescale, but you don't get access to the tiered cloud storage layer. So when self-hosting, you need to run expensive SSDs again, no separating of compute and data.

If CH has a better consistency story, or maybe a clustering story that ensures redundancy, I would be really inclined to use it as a primary store.