This is "appeal to authority" fallacy incarnate. Google/Amazon "etc;" are likely...

Thaxll · on Sept 6, 2023

I know they bypass the kernel but my point still stands, most of the servers on the internet runs on Linux, that's a fact, so there was more money, time invested, man power on that OS than any others.

dijit · on Sept 6, 2023

Your point is that popularity means that it will improve.

This is true, to a point.

Counterpoint: Windows Desktop Experience.

EDIT: that comment was glib, let me do a proper counterpoint.

Common area's are some of the most least maintained in reality; I can think of meet-me-rooms or central fibre hubs in major cities; they are expensive and subject to a lot of the whims of the major provider.

Crucially, despite large amounts of investment, the underlying architecture or infrastructure remains, even if the entire fabric of the area changes around it. Most providers using these kind of common areas do everything they can to avoid touching the area itself, especially as after a while it becomes very difficult to navigate and politically charged.

Fundamentally the architecture of Linux's network stack, really is, "good enough", which is almost worse than you would originally think since "good enough" means there's no reason to look there. There is an old parable about "worse is better" because if something is truly broken people will put effort into fixing it.

Linux's networking stack is fine, it's just not quite as good an architecture as the FreeBSD one. FreeBSD one has a lot less attention on it but fundamentally it's a cleaner implementation and easier to get much more out of..

You will find the same argument ad infinitum regarding other subjects such as Epoll vs IOCP vs kqueue (Epoll was abysmally terrible though and ended up being replaced by IO_URING, but even that took over a decade)

Thaxll · on Sept 6, 2023

Yes things improve where we're talking about multi billions $ of infra cost.

Linux is not your "random on the side" feature that is good enough.

dijit · on Sept 6, 2023

To start with: it's not that much infra cost.

Especially since you don't even know what you're attempting to optimise for.

Latency? p99 of linux is fine, nobody is going to care that the request took 300μs longer. Even in aggregate across a huge fleet of machines waiting an extra 3ms is totally, totally fine.

Throughput? you'll bottleneck on something else most likely anyway, getting a storage array to hydrate at line rate for 100GBPs is difficult and anyway you want to do authentication and distribution of chunks and metadata operations anyway? right?

You're forgetting that it's likely an additional cost of a couple million dollars per year in absolute hardware to solve that issue with throughput, which is, in TCO terms, a couple of developers.

Engineering effort to replace the foundation of an OS? Probably an order of magnitude more. Definitely contains a significant amount more risk, and the potential risk of political backlash for upheaving some other companies workflow that is weird.

Hardware isn't so expensive really.

Of course, you could just bypass the kernel with much less effort and avoid all of this shit entirely.

tptacek · on Sept 6, 2023

Do you know for a fact that Google primarily uses userland networking, or does that just seem accurate to you?

drewg123 · on Sept 6, 2023

Google makes heavy use of userspace networking. I was there roughly a decade ago. At least at that time, a major factor is the choice of userspace over kernel networking was time to deployment. Services like the ones described above were built on the monorepo, and could be deployed in seconds at the touch of a button.

Meanwhile, Google had a building full of people maintaining the Google kernel (eg, maintaining rejected or unsubmitted patches that were critical for business reasons), and it took many months to do a kernel release.

tptacek · on Sept 6, 2023

Yes. I don't think anyone is disputing that Google does significant userspace networking things. But the premise of this thread is that "ordinary" (ie: non-network-infrastructure --- SDN, load balancer, routing) applications, things that would normally just get BSD sockets, are based on userspace networking. That seems not to be the case.

dijit · on Sept 6, 2023

I can't honestly answer that with the NDA I signed.

However there is some public information on some components that has been shared in this thread which allows you to draw your own conclusion.

tptacek · on Sept 6, 2023

Yes, the one link shared says essentially the opposite thing.

dijit · on Sept 6, 2023

You may have read it wrong.

tptacek · on Sept 6, 2023

Did I? How?

dijit · on Sept 6, 2023

For one by assuming the work that is done primarily for microkernels/appliances is the absolute limit of userspace networking at Google and that similar work would not go into a hypervisor (hypervisors which are universally treated as a vSwitch in almost all virtual environments the world over).

And making that assumption when there are many public examples of Google doing this in other areas such as gVisor and Netstack?

tptacek · on Sept 6, 2023

If you have information about other userspace networking projects at Google, I'd love to read it, but the Snap paper repeatedly suggests that the userspace networking characteristics of the design are distinctive. Certainly, most networking at Google isn't netstack. Have you done much with netstack? It is many things, but ultra-high-performance isn't one of them.

dijit · on Sept 6, 2023

userspace networking will take different forms depending on the use-case.

Which is one of the arguments of why to do it that way; instead of using general purpose networking.

I haven't the time or inclination to find anything public on this, nor am I interested really in convincing you. Ask a former googler.

tptacek · on Sept 6, 2023

OK. I did. They said "no, it's not the case that networking at Google is predominately user-mode". (They also said "it depends on what you mean by most"). Do you have more you want me to relay to them? Did you work on this stuff at Google?

Per the Snap thread above: if you're building a router or a load balancer or some other bit of network infrastructure, it's not unlikely that there's userland IP involved. But if you're shipping a normal program on, like, Borg or whatever, it's kernel networking.

dijit · on Sept 6, 2023

I worked as a Google partner for some specialised projects within AAA online gaming.

I continue in a similar position today and thus my NDA is still in complete effect which limits what I can say if there’s nothing public.

I have not worked for Google, just very closely.

tptacek · on Sept 6, 2023

Oh. Then, unless a Googler jumps in here and says I'm wrong: no, ordinary applications at Google are not as a rule built on userspace networking. That's not my opinion (though: it was my prior, having done a bunch of userspace networking stuff), it's the result of asking Google people about it.

Maybe it's all changed in the last year! But then: that makes all of this irrelevant to the thread, about FreeBSD vs. Linux network stack performance.

dijit · on Sept 6, 2023

Based on this I understand why you're talking like this: I think you have made an assumption/interpretation here and argued the assumption because nobody here (I believe) has claimed that Google only uses user-space networking; merely that google makes use of user-space networking where it's "appropriate" (IE; when FreeBSD would have had an advantage). Which is backed up by basically everything in this thread.

Which is why I said you "probably read it wrong".

Google is much happier to throw hardware at the problem in most cases, only when it really matters and they would have had to rearchitect the kernel to improve a situation any further do they break out the user-space networking.

The point I was driving at was that it's more common than you think.

Your base assertion that it's ubiquitous is very obviously false because Chromebooks are pretty common inside google offices and those are running stock chromeOS (except in the offices that are developing chromeOS)

tptacek · on Sept 6, 2023

Let me make my point clearly: Google depends on the Linux kernel stack as much as the top-of-the-thread comment suggests that they did, and the things they're doing in user-mode, they would also be doing in user-mode on FreeBSD.

That's all I'm here to say.

As these things go, in the course of making your argument, you made a falsifiable and, I believe, flatly incorrect claim:

Most of the larger tech companies these days are using userland networking and bypass the kernel almost completely for networking

At least in Google's case, this isn't true. People doing custom network stack stuff totally do bypass the kernel stack (sometimes with usermode stacks, and sometimes with eBPF, and sometimes with offload). But the way you phrased this, you implied pretty directly that networking writ large was usermode at Google, and while I entertained the possibility that this might be true, when I investigated, it wasn't (unsurprisingly, given how annoying user mode networking is to interact directly with, as opposed to in middlebox applications).

dijit · on Sept 6, 2023

Ok, this is a very hostile and could not possibly be considered a charitable interpretation of what I said, in fact I'd say it borders on trying to pick an argument where there isn't one. I did not "flat out lie" and I detest the insinuation. I expect better of you honestly.

To answer: "Most of the larger tech companies these days are using userland networking and bypass the kernel almost completely for networking"

You could read "almost completely" as in "almost across the whole company" which would be a weird way to read it, but you seem to have read it this way.

I intended it to mean: when they bypass the kernel; it is a near complete bypass.

Since there actually is still a network connection going through the kernel (the host itself will still be connected of course), which is of course the inverse of what you seem to have taken away; in that user-mode networking is used even less than entirely even on a single node.

edit: In fact, I stated multiple times in my post: "Google is mostly happy to just throw hardware at this" which you seemed to just.. ignore? Google are absolutely happy to throw hardware at issues until they can't anymore or the gains are too enormous to avoid. I thought I was extremely clear about that.

Your point about user-land networking in FreeBSD is just a nonsense one to make -- and not the point we were discussing anyway, like suggesting "if it did rain beer, would we all get drunk?" it's completely hypothetical and not based in any subjective reality or objective truth. You have absolutely no way of knowing if FreeBSD could do those things, the statement that the architecture permits is is shown somewhat in FreeBSD's use in Netflix, which has been commented elsewhere in this thread to achieve close to 0.8TiB of data transfer, but knowing what google would have done with freebsd would require seeing into alternative realities.

I don't know where you work, but as far as I know: nobody has managed to perfect that technology yet.

tptacek · on Sept 6, 2023

That is exactly how I took your comment.

This is a weird cursed thread that started out with a pretty silly† claim about FreeBSD vs. Linux network stack performance (neither of us started it; it's a standard platform war argument). Someone made a comment that the hyperscalers all depend on the Linux kernel stack, to a far greater degree than they do on FreeBSD. That's a true statement; at this point, you and I have both agreed on it.

When that point was pressed earlier, you and another comment brought up kernel bypass (usermode networking, specifically) as a way of dismissing hyperscaler Linux dependencies. But that's not really a valid argument. Hyperscalers do kernel bypass stuff! Of course they do! But they're doing it for the things that you'd formerly have bought dedicated network equipment for, and in no case are they doing it in a situation where deploying FreeBSD and using the FreeBSD stack would be a valid alternative.

The disconnect between us is that I'm still talking about the subject that the thread was originally about --- whether Google using the Linux stack is a valid point backing up its fitness for purpose. I think it pretty clearly is a valid point. I think the usermode networking stuff is interesting, but is a sideshow.

† "Silly" because these kinds of claims never, ever get resolved, and just bring out each side's cheering section, not because I have low opinions of FreeBSD's kernel stack --- I came up on that stack and find it easier to follow than Linux's, though I rather doubt the claim that it has a decisive performance advantage.

dijit · on Sept 6, 2023

Thank you for clarifying, and I apologise as I also became quite hostile.

I agree that we agree on many points; but I think where we diverge (and perhaps fundamentally) is in the base assumption that "because google uses it, it must be the best, because even if it wasn't google would make it so" (at least, this is my interpretation of GPs comment).

I have little doubt that the low hanging... mid hanging and perhaps even most of the high-hanging fruit has been well and truly plucked when it comes to linux throughput at the behest of the large tech companies; because a few percent improvement translates a lot at their scale. -- However I am reminded of an allegory given in a talk (that I can't find) regarding bowling.

In the talk the speaker mentions how they got "really good" at bowling with completely the wrong technique; but it worked for them, up to a point in which they could not improve no matter what they did. They had to go back to the basics and learn proper technique and become much worse before they were able to overtake their previous scores with the bad technique.... but after that point there were further improvements to be had.

My argument that this is the case is merely: doing an architectural re-write of the linux kernel to be more scalable in the way FreeBSD's is would be very punishing for too many people, and additionally that the economics are not favourable when, if you do get to a point where you cannot scale due to the kernel, you could just break out into userland. -- then simply suffer the adequate but not insane performance everywhere else where it's not needed anyway.

So, to summarise my points:

* Because a big company uses something does not mean it is perfect in all areas

* That Linux has a lot of attention on it does not mean necessarily that it has the most potential: though I don't doubt that the majority of it's potential has been reached.

* Diminishing returns means once it's "good enough" people will try to get performance elsewhere if they need it.

* Rewriting the network stack in Linux completely would likely be harmful to many and subtly so, I haven't seen people moving towards this idea either, this feels like it could be political as well as technical.

* Hyperscalers will often trade convenience over performance: regardless, CPU time is much cheaper for them than it is for us.

tptacek · on Sept 6, 2023

I think it's totally legit to say that hyperscaler Linux adoption isn't dispositive of the Linux's stack's performance advantage over FreeBSD. I basically think of this FreeBSD vs. Linux stack debate as unknowable (it probably isn't, but it's liberating to decide I'm not going to resolve it to anyone's satisfaction). So I'm not here to say "Google uses Linux ergo it's faster than FreeBSD"; that adoption is a useful observation, but that's all it is.