It's not just better performance on latency benchmarks, it likely improves throughput as well because the writes will be batched together.
Many applications do not require true durability and it is likely that many applications benefit from lazy fsync. Whether it should be the default is a lot more questionable though.
It’s like using a non-cryptographically secure RNG: if you don’t know enough to look for the fsync flag off yourself, it’s unlikely you know enough to evaluate the impact of durability on your application.
I also think fsync before acking writes is a better default.
That aside, if you were to choose async for batching writes, their default value surprises me.
2 minutes seems like an eternity. Would you not get very good batching for throughout even at something like 2 seconds too?
Still not safe, but safer.
Maybe what's confusing here is "true durability" but most people want to know that when data is committed that they can reason about the durability of that data using something like a basic MTBF formula - that is, your durability is "X computers of Y total have to fail at the same time, at which point N data loss occurs". They expect that as the number Y goes up, X goes up too.
When your system doesn't do things like fsync, you can't do that at all. X is 1. That is not what people expect.
Most people probably don't require X == Y, but they may have requirements that X > 1.
I think you're still not getting my point. Yes, a rare event of data loss may not be a big deal. What is a big deal is being able to reason about how rare that event is. When you have durable raft you can reason by using straightforward MTBF calculations. When you don't, you can keep adding nodes but you can't use MTBF anymore because a single failure is actually sufficient to cause data loss.
Many applications do not require true durability and it is likely that many applications benefit from lazy fsync. Whether it should be the default is a lot more questionable though.