Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Why would you use UUIDv7 rather than UUIDv4 though?


UUIDv4 is much more scattered (i.e., uniformly distributed), which heavily degrades indexing performance in databases.


But mainly on writes, not much for reads.

And if your database is 99% reads 1% writes, the difference probably doesn't really matter.

And tons of database indexes operate on randomly distributed data -- looking up email addresses or all sorts of things. So in many cases this is not an optimization worth caring about.


This depends on the database and should not be written as gospel.


Which databases doesn't it degrade performance with when used as an indexed field?


UUIDv7 seems popular for Postgres performance improvements, but it causes issues with databases like Spanner.

https://medium.com/google-cloud/understanding-uuidv7-and-its...


Lots of distributed, NoSQL databases work (or partially work) this way too (e.g., HBase rowkey, Accumulo row ID, Cassandra clustering key, DynamoDB sort key). They partition the data into shards based upon key ranges and then spread those shards across as many servers as possible. UUIDv7 is (by design) temporally clustered. Since many workloads place far more value on recent data, and all recent data is likely to end up in the same shard, you bottleneck on the throughput of a single server or, even with replication, a small number of servers.


FWIW it looks like Cassandra doesn't belong on this list, and DynamoDB only with qualifications.

Though Cassandra is more like quasi-SQL than NoSQL, the bigger issue is that actually the clustering key is never used for sharding. So Cassandra (today) always puts all data with the same partition key on the same shard, and the partition key is hashed, meaning there's no situation in which UUIDv7 would perform differently (better or worse) than UUIDv4.

In DynamoDB, it is possible for sort keys to be used for sharding, but only if there is a large number of distinct sort keys for the same partition key. Generally, you would be putting a UUID in the partition key and not the sort key, so UUIDv7 vs UUIDv4 typically has no impact on DB performance.


i think the standard recommendation is to do range partitioning on the hash of the key, aka hash range partitioning (i know yugabyte supports this out of the box, i'd be surprised if others don't). this prevents the situation of all recent uuids ending up on the same shard.


Indeed. In fact, Cassandra and DynamoDB have both hash keys and range keys; I've edited my comment to be more specific.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: