Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Dragonfly Is Production Ready (and we raised $21M) (dragonflydb.io)
82 points by ksec on March 21, 2023 | hide | past | favorite | 66 comments


Cool initiative. Just watch out for bullshit. Redis reply from some months ago to those benchmarks: https://redis.com/blog/redis-architecture-13-years-later/

(Originally posted 3 months ago on https://news.ycombinator.com/item?id=34231033)


Yep. Dragonfly compares a single-threaded Redis with a multi-threaded Dragonfly. It’s an extremely misleading benchmark.


How is it misleading when the whole point is that Redis can only be single threaded†? That's why Dragonfly (claims) to scale better. If anything, it's the Redis rebuttal that comes across as misleading; the posted announcement is very up front that Dragonfly's value proposition is that you get vertical scaling for free without having the additional ops overhead of a Redis cluster, which is very much not free in terms of maintenance and opportunity cost.

†: Redis 6 added threads, but AFAIK this is only for handling connection I/O. Actual database access is still single threaded. The only way I'm aware of to scale Redis is via clustering.


> How is it misleading

It's misleading because the comparison would be redis cluster vs dragonfly. There's no speed-up if the Redis user isn't fully saturating a single core. The real question is why is it only 25x faster on a 64-vCPU machine? Why isn't it 64x? Does this mean it's 60% slower when the request volume is below the needs of a single-threaded redis?

> Dragonfly's value proposition

Dragonfly has zero value proposition other than a ticking-time-bomb of pricing fuckery when they're forced to yield a return on that $21M investment.


It compares a process with a listening port with another process with a listening port. To give another example - nobody compares minio with bunch of disks to which you can write separately, and probably more efficiently.


hmm, Redis Labs are setting a cluster of 40 Redis processes on the same instance. It would be extremely difficult to do that with Redis OSS for anyone else.


But

"For the last 15 years, Redis has been the primary technology for developers looking to provide a real-time experience to their users. Over this period, the amount of data the average application uses has increased dramatically, as has the available hardware to serve that data. Readily available cloud instances today have at least 10X the CPUs and 100X more memory than their equivalent counterparts had 15 years ago. However, the single-threaded design of Redis has not evolved to meet modern data demands nor to take full advantage of modern hardware."

That's not what they are saying is wrong with Redis. Is Redis really 'antique tech'? Arguably, concurrent processing with a scale-up-only approach is a poor fit for "modern hardware".

So yes, you are correct: Redis from github requires knowledge and (your) code to make n instances work together (whether on the same node or not). But to claim that this is the case for "anyone else [but Redis Labs]" is questionable.

From a certain architectural camp, pin-to-core-process-in-parallel approach is optimal for [scaling on] "modern hardware". Salvatore can correct me on this but I don't recall that being a consideration at the early days, but it turned out to be a good choice. Some of the Redis apis however require dataset ensemble participation (anykind of total order semantics over the partitioned set) which is what is "difficult" to do effectively.

So basically any startup that can do that, should theoretically be able to squeeze more performance form their SaaS infrastructure than running Dragonfly type of architecture. Bonus, as pointed out by Redis Labs, being that the lots of parallel k/v processes can bust out of the max-jumbo-box should you ever need that to happen (for 'reliability' for example) ..


They chose those numbers because they wanted a fair comparison with their benchmark instance of AWS c6gn.16xlarge. Says so in the 4th paragraph.


I think using word "misleading" is also "misleading". Dragonfly hides complexity. Docker hid complexity of managing cgroups and deploying applications. S3 hid complexity of writing into separate disks. But you do not call S3 or minio misleading because they store stuff similarly to how disk stores files. Dragonfly hides complexity of managing bunch of processes on the same instance and the outcome of this is a cheaper production stack. What do you think has higher effective memory capacity on c6gn.16xlarge: a single process using all the memory or 40 processes which you need to provision independently?


It's misleading because, practically speaking, the type of people who are after the performance you advertise, are running clusters to begin with. So what you are selling is just a simplified stack that lets you not have to manage one more "system". That's fair but you could mention that? Or atleast acknowledge that if you repeat these tests with redis cluster the results will be wildly different and you wont have those crazy looking charts.

For example it's like me claiming that my new python web framework is X faster than Flask because it comes bundles with uwsgi. Yes, technically mine is faster, but its not a fair comparison.


What's odd is that they probably saw the reply but they still chose to re-iterate their misleading claims rather than not mentioning anything.


License: https://github.com/dragonflydb/dragonfly/blob/main/LICENSE.m...

It is source available. Generally can't use it to create a competing product... but also means you cannot combine it with any of the popular open source licenses.

Decide for yourself what that means to you.


I think maybe I'm just small-potatoes, but the only limitations or constraints I've ever run into with Redis are: 1. memory utilization 2. deployment/orchestration 3. bugs I created for myself related to using caches

What are the use cases that max out Redis speed/throughput?


Dragonfly is better with 1. Memory utilization (read the announcement, they mention it) 2. Deployment/orchestration - your initial threshold that forces you to scale horizontally just went up by order of magnitude. In fact for many use-cases you will never need to go horizontally with Dragonfly. 3. Dragonfly also provides a better experience when working with the system. Just a week ago one of the community contributors submitted a PR that introduces automatic recognition of the hot keys: https://github.com/dragonflydb/dragonfly/pull/951 (this feature is not ready for production use yet but we will get there). It also has a built-in open-metrics support, built in cpu-profiler support, fully asynchronous I/O that allows answering INFO commands even under load etc.


It really depends on the type of application but a common one is getting a large spike in traffic beyond the norm (front page on HN, flash sale, etc.) I do think #1 and #2 that you mentioned are more common constraints and are ones that also both addressed in Dragonfly (much more efficient memory utilization and ability to scale vertically which negates the need for complex orchestration)


Congrats on the funding and getting production ready, it's good that KeyDB (and Redis) get some competition.

https://docs.keydb.dev/

Open question, how does Dragonfly differ from KeyDB?


KeyDB implements multiple threading with spin-locks that protect a global shared data structure.

Dragonfly is built upon shared-nothing architecture where each thread manages its own slice of data, hence no need for classical locks and no contention under high load. It still provides atomicity guarantees but allows multiple transactions to progress independently as long as they do not need exlusive access to the same keys. So basically different approaches to the same promise - scale. Also different trade-offs. Shared-nothing approach has less contention and more flexible transaction framework but inhibits a slightly higher 50%th percentile latency (order of 30usec).


KeyDB is a fork of Redis, whereas Dragonfly introduces a brand-new architecture, crafted from the ground up utilizing a share nothing, multi-threaded design. It implements both Redis and memcached APIs


Does it support modules from the Redis Stack, like RediSearch?

https://redis.io/docs/stack/


Not yet but it supports JSON API


Thanks!


> Dragonfly throughput is 25X higher than Redis for both GET and SET operations.

That's unbelievable. In both senses of that phrase.


Putting aside what seems to be an amazing new product, but the blog post showing the results are also downplaying what (to me at least) seems to be a 25% or more increase in latency.


I wouldn't take their latency numbers too seriously since their measure isn't relative to the throughput. It's not always obvious, but the latency of high performance is tightly coupled with throughput. The latency of high performance systems is the server side execution time (for Redis that is a couple of microseconds), the network hop (probably a couple of hundred microseconds), and the congestion on the server (this is the time it takes the server to actually getting around to processing your request, since the server probably needs to handle other requests first before getting around to yours). The congestion, is directly tied to the throughput.

The most useful measurement I've seen is you pick a latency target, and evaluate how many QPS you can send to the server that meets that latency target. That gives a fairly simple dimension to compare to.


Dragonfly employee here...where are you seeing the 25% increase in latency? that's not the case so want to make sure the post is not confusing.


> it's worth noting that if we were to reduce Dragonfly's throughput to match that of Redis, Dragonfly would have much lower P99 latency than Redis

It sounds like this is a function of higher throughput in the benchmark for Dragonfly.


I think they get most of the speedup from just being multi-threaded. A fair comparison would use a Redis cluster.


Seems to be Redis/Memcached API-compatible. Can I use it as a drop-in replacement?


From the web page: Dragonfly is an in-memory data store built for modern application workloads. It is fully compatible with the Redis and Memcached APIs, required no code changes to adopt


Yes, you can - in fact it supports out of the box existing frameworks and clients like redisson, sidekiq and many others


> ...the most performant in-memory datastore for cloud workloads...

Oh, ok, so def. not about Dragonfly BSD.


TLDR: Dragonfly throughput is 25X faster than Redis for both GET and SET operations.

https://dragonflydb.io/blog/scaling-performance-redis-vs-dra...


When I saw the title, I thought it was a post about DragonFly BSD. But apparently it is about a closed source in-memory database meant as a competitor to Redis.


I honestly would rather see a BSD get 21m in funding versus another database startup that won't exist in 4 years.


Honestly, I don't think giving 21m to DragonflyBSD would be a good idea, not sure how the project could adapt back to a shoestring budget after the money runs out.

I would not be against giving them a few 100Ks so that they could get one or two full time developers however.

(joke (but also food for thoughts): alternatively, just "safe" invest the 21M. At 3% interest rate, that 630k per year, enough for ~2 to 10 developers depending on the geo. An OSS project can do a lot of things with this kind of budget).


It does not have to be instead of our startup, really.


I seriously thought a bunch of BSD experts got together and raised funds for making 2023 the "Year of the BSD".

On a different note, I wish new projects respect the history of other projects while naming their project. I am sure they would've found a better name for this DB.


I made an honest mistake. While I saw DragonflyBSD, I didn't realize its prominence within the BSD community. I assumed that since our project only focuses on Linux, having a similar name in a different "namespace" wasn't a significant issue.


Assuming you are the project lead, I would implore you to change the name. There are noble precedents in our industry E.g. Firebird -> Firefox.

1: https://en.wikipedia.org/wiki/Firefox_early_version_history#...


>DragonFly BSD

You too, huh?


There are dozens of us. Dozens!


I think DragonflyBSD has been production ready for some time.



The license is a closed-source, proprietary, source-available license.

It is not open to use as you please.


Not open either. :)


TBH, I love BSL licenses. You can use it as you want for free, except being a competitor, and there is a high chance of a sustainable business model. (What benefit do you expect if the company goes bankrupt?) Despite the unpopular opinion, you have free and easy access to the code, so it is open source in the sense of words. Just not in the sense that you can steal their business like AWS does. Feel free to start a similar project, invest your time and money, and make it available under whatever license you want.


It doesn't matter how much you love it. It is still not open source.


It is open for me to use as I please. And I don't want to destroy their business. I can understand your ideological drive, but in reality it doesn't matter until you behave unethically and steal their intellectual property. Do you want that?


If you are sure that what you please is and always will be in line with what their company pleases, then I suppose so. Ideology has nothing to do with it. Nor does "unethically and steal their intellectual property" have anything to do with it- they can choose whatever license they want for their intellectual property and I can ethically choose not to use it. Unless your idea of "ethics" is for force me to use their intellectual property and agree to their license?


> It is open for me to use as I please.

Sure, but that's not what "open source" means.

I mean I can go around calling C a functional programming language because it functions and I can build functioning programs with it... but that's not what the word means in anyone's discourse besides yours. I mean I support your freedom of speech, but also mine in saying your usage is disingenuous.


What's not open about it? I see the source, I see instructions to build from source, I see a license that seems to say I can copy and make derivative works out of it. Can you try to make a more substantive comment?


It's a weird license that seems to say "you can use this as long as you don't compete with us", with an automatic switchover to Apache 2.0 in 5 years. Definitely better than closed source, but probably not Open Source by the OSI definition.


Restricting Usage or Distribution makes it de-facto not OSS.

It's actually a slightly different form of an old debate. I'm thinking in particular about the Crockford license (the MIT-like one with "The Software shall be used for Good, not Evil." bit). It was determined to be non-free quite a while back due to such restrictions.

That being said, it hard to be a commercially successful software editor with an OSS model (RethinkDB comes to mind).

I do understand why BSL exists, but it feels to me like an unsatisfactory compromise.


BSL license is more company-friendly than AGPL.


Being company-friendly is a separate point from being open source.


The license comes with restrictions on what you can use your derivative works for - e.g. not creating an in-memory datastore service. It's essentially an Apache 2 with a "Also AWS can't just steal it and sell it as a service when it gets huge"

Is it open-source? Well, depending on your definition probably not. Is it a fair license? Yeah I'd think so.


It is open-source, no question.

What you might mean is that it isn't free/libre as in FOSS.


That’s the Stallman kind of point of view. https://en.m.wikipedia.org/wiki/Open_source section "Open" versus "free" versus "free and open"

But to a lot of us, “open source” has a specific meaning as well.

> Open source doesn’t just mean access to the source code.

https://opensource.org/osd/


From my point of view, for example, GPL is not "open" at all. And yet it is on the list. In my opinion, BSL is even more "open" than GPL. Feel free to have a different view.


Open Source as defined by OSI, which means BSL is not considered as Open Source.

Open Source as defined by majority of programmers on Internet, which means either GPL, MIT, or Apache and their derivatives.

Open Source as defined by HN, depending on which timeline you join HN, it could be MIT, BSD only all the way to AGPL only.

Open Source as defined by layman, anything I can see its source is considered as Open. Open Source in its literal sense.


DragonflyDB cofounder here. I am not shy about our choice of license. Like with software design, everything is about trade-offs. Folks here voiced reasons why we chose BSL. I am sure you perfectly aware about all this.

I do not know personally you but I noticed that you posted the link to the announcement. I am guessing you are passionate about the technology and innovation. Dragonfly is much more than the licensing choice we made. I wish HN discussions here were about how fibers work in Dragonfly and how SSD tiering is gonna be implemented and how we provide atomicity for lua scripts while running many of them in parallel etc. Btw, Dragonfly relies on an io-engine called helio (roughly equivalent to tokio) that has been developed by me and open sourced under Apache 2.0.


While technical discussion about those details would be interesting, HN is also a strong entrepreneurial community and your license choice has a big impact on whether or not a business would choose to depend on your product. I would not choose to build on a BSL licensed foundation for my product because of real business concerns with vendor lock-in. I would also not choose to contribute to a BSL licensed code base because the CLA means I am not free to use the code base on equal grounds with other contributors (DragonflyDB Ltd in this case).

Because of these two things, I didn't spend a lot of time looking into the technical details and therefore can't really say much on them. By all means it sounds like a very interesting piece of technology- but the license makes it useless to me.


Please do not express your opinion as the opinion of the majority. Instead of discussing license choices, create your project and open source it as you like. Do not tell others how to build their business, especially on HN.


I never expressed my opinion as that of the majority. I expressed my opinion as my opinion, and I can express whatever opinion I want to and discuss whatever I want to on Hacker News, so long as as Y Combinator (Who owns Hacker News) is okay with it.

Are you a representative of Y Contaminator/Hacker News? If you are, I will be glad to comply with your request.


The reply wasn't directed at DragonFly, but simply an answer to parent. I think you got a lot of sticks on HN. [1] In case you are wondering, I am much closer to Laymen in my definition of Open Source so I am actually extremely supportive of BSL. A perfect balance of Business Needs and Open Source. But HN has gotten a lot more ideological than it used to be. So please keep up the good work.

[1] Part of the reason why I submitted Dragonfly, interesting technology should get more coverage on HN, and not be ignored simply because of some over zealotry ideological reason.


It is not open to use as you please. For example, I cannot pay the supplier of my choosing to provide services using the code.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: