There are circumstances where you really don't know the shape of the data, espec...

spmurrayzzz · on June 27, 2022

> usually not understanding the shape of your data is something that you should fix up-front as it indicates you don't actually understand the problem you are trying to solve.

This is a good point and probably correct often enough, but I also think not understanding the entire problem you are solving is not only common, but in fact necessary to most early-stage velocity. There is need to iterate and adapt frequently, sometimes as part of your go-to-market strategy, in order to fully understand the problem space.

> a relational DB with well-defined constraints could have saved you from while still in dev

This presumes that systems built on top of non-RDBMS are incapable of enforcing similar constraints. This has not been my experience personally. But its possible I don't understand your meaning of constraints in this context. I assumed it to mean, for instance, something like schemas which are fairly common now in the nosql world. Curious what other constraints were you referencing?

happimess · on June 27, 2022

> There is need to iterate and adapt frequently, sometimes as part of your go-to-market strategy, in order to fully understand the problem space.

If you're pivoting so hard that your SQL schema breaks, how is a schemaless system going to help you? You'll still have to either throw out your old data (easy in both cases) or figure out a way to map old records onto new semantics (hard in both cases).

I agree with GP that this is a central problem to solve, not something to figure out _after_ you write software. Build your house on stone.

spmurrayzzz · on June 27, 2022

>If you're pivoting so hard that your SQL schema breaks, how is a schemaless system going to help you? You'll still have to either throw out your old data (easy in both cases) or figure out a way to map old records onto new semantics (hard in both cases).

I agree with your comment that it's a central problem to solve and that both options, throwing out data or map old records onto new semantics, is an endemic choice both stacks need to make. I don't agree that it's always possible to solve entirely up front though.

In my experience, it has been less so about whether the storage engine is schemaless or not, even many modern nosql stacks now ship with schemas (e.g. MongoDB). I think the differentiation I make between these platforms is mostly around APIs. Expressive, flexible semantics that (in theory) let you move quickly.

As an aside, I also think the differentiation between all these systems is largely unimpactful for most software engineers. And the choice often made is one of qualitative/subjective analysis of dev ergonomics etc. At scale there are certainly implementation details that begin to disproportionately impact the way you write software, sometimes prohibitively so, but most folks aren't in that territory.

funcDropShadow · on June 27, 2022

Admittedly, my experience with MongoDB and Cassandra has gained some rust over the last decade, but what makes you say such nosql databases have expressive APIs? Compared to PostgreSQL they have miniscule query languages and it is very hard, if at all possible, to express constraints. And constraints, sometimes self-imposed sometimes not, are what makes projects successful, even startups. Many startups try to find this one little niche they can dominate. That is a self-imposed constraint. People tend to think freedom makes them creative, productive, and inventive, while in fact the opposite is the truth. With opposite of freedom I mean carefully selected constraints not oppression.

dspillett · on June 27, 2022

> schemas

The confusion there is due to the fact that non-R-DBMS (particular when referred to as noSQL) can mean several different things.

In this context I was replying to a comment about not knowing the shape of your data which implies that person was thinking about solutions that are specifically described as schemaless, which is what a lot of people assume (in my experience) if you say non-relational or noSQL.

That is the sort of constraints I was meaning: primary & unique keys and foreign keys for enforcing referential integrity and other validity rules enforced at the storage level. There are times when you can't enforce these things immediately with good performance (significantly distributed data stores that need concurrent distributed writes for instance - but the need for those is less common for most developers than the big data hype salespeople might have you believe) so then you have to consider letting go of them (I would advise considering it very carefully).

marcosdumay · on June 27, 2022

> This presumes that systems built on top of non-RDBMS are incapable of enforcing similar constraints.

Are you kidding? They never can.

The entire point of ditching the relational model is discarding data constraints and normalization.

spmurrayzzz · on June 27, 2022

Never? Many NoSQL stores are offering parity in many of the feature verticals that were historically the sole domain of RDBMS.

Mongo has always had semantics to support normalized data modeling, has schema support, and has had distributed multi-document ACID transactions since 2019 [1]. You don't have to use those features as they're opt-in, but they're there.

I know that full parity between the two isn't feasible, but to say they never can is a mischaracterization.

[1] Small addendum on this: Jepsen highlighted some issues with their implementation of snapshot isolation and some rightful gripes about poor default config settings and wonky API (you need to specify snapshot read concern on all queries in conjunction with majority write concern, which isnt highlighted in some docs). But with the right config, their only throat clearing was whether snapshot isolation was "full ACID", which would apply to postgres as well given they use the same model.

funcDropShadow · on June 27, 2022

What is the point of using MongoDB with multi-document ACID transactions? Enabling durability in MongoDB is usually costly enough that you can't find a performance benefit compared to Postgres. With JSONB support in PostgreSQL, I dare say, it can express anything that MongoDB can express with its data model and query language. That leaves scalability as the only possible advantage of MongoDB compared to PostgreSQL. And the scalability of MongoDB is rather restrictive, compared to e.g. Cassandra.

And I would never trust a database that has such a bad track record, regarding durability as MongoDB, although I admit that PostgreSQL had theoretical problems there as well in the past.

spmurrayzzz · on June 27, 2022

I actually agree with you on the point about multi-document tx's, I wouldn't choose mongo solely for that feature. Its nice to have maybe for the niche use case in your nosql workload for when its beneficial. But the point I was originally making was that nosql stacks are not fundamentally incompatible with most of the features or safety constraints offered by other RDBMS.

> And I would never trust a database that has such a bad track record, regarding durability as MongoDB

I can't comment on your own experience obviously, but I've been using mongo since 2011 in high throughput distributed systems and it's been mostly great (one of my current systems averages ~38 million docs per minute, operating currently at 5 9s of uptime).

Definitely some pain points initially in the transition to WiredTiger, but that largely was a positive move for the stack as a whole. Durability fires have not plagued my experience thus far, not to say they won't in the future of course.

As you noted, Postgres has had its own headaches as well. Finding out that all their literature claiming their transactions were serializable when they were in fact not serializable could be considered a mar on their record. But much like mongo, they have been quick to address implementation bugs as they are found.

funcDropShadow · on June 27, 2022

> I can't comment on your own experience obviously, but I've been using mongo since 2011 in high throughput distributed systems and it's been mostly great (one of my current systems averages ~38 million docs per minute, operating currently at 5 9s of uptime).

> Definitely some pain points initially in the transition to WiredTiger, but that largely was a positive move for the stack as a whole. Durability fires have not plagued my experience thus far, not to say they won't in the future of course.

Good to read that some are actually using MongoDB to their benefit. Indeed I have encountered problems with durability in the wild. Nothing, that I would like to repeat. But as always for a specific use case the answer is: it depends. For a general advice with what to start a new project I would select PostgreSQL in 10 out of 10 cases, if a database server is actually required.