Context switches on Linux are a pretty heavy affair, this is the result of some ...

PaulDavisThe1st · on Sept 15, 2021

A citation is required for this claim.

I know of no evidence that Linux context switching on x86/x86_64 is slower than any other OS, and some suggestions that it is faster (Linux does not save/restore FP state, which Windows (at least at one point) does).

Linux is as capable or more capable of realtime work than any other general purpose OS, and the latency numbers from actual measurement are excellent (when using RT_PREEMPT etc).

What are you referring to?

jacquesm · on Sept 15, 2021

Back in the day when the Linux kernel was first written there was a huge argument between Linus and Andrew Tanenbaum about whether or not the micro or macro kernel road was the superior one.

Tanenbaum argued that a microkernel was lighter, and could switch context faster than a macrokernel (the likes of which UNIX was typically reincarnated with). Linus argued that throughput, not latency is what matters to end users. At that time your typical OS switched tasks 18.5 times per second and Linux did substantially better than that. Case closed, the throughput argument won.

But now, many years later the consequences of that mean that we are switching contexts orders of magnitude slower than we could have because the context contains a lot more information than it strictly speaking has to. My own QnX clone switched 10K / second on a 486/33, and yes, the IPC mechanism meant that throughput suffered but for real time applications with a lot of the hard stuff in userspace context switches are far more important than throughput (and incidentally, also for perceived responsiveness of the OS and apps).

The latency numbers are excellent from the perspective of very forgiving applications, a typical DAW runs with 1K or even larger sample buffers which is acceptable, but for many real time applications that is an eternity and so those are not typically built using Linux as the core but some dedicated RTOS.

edit: I had 100K / second before, this was in error. It's been 30 years ;)

PaulDavisThe1st · on Sept 15, 2021

If you read the article linked to in the article I linked to above (which is fairly out of date, from 2010):

https://blog.tsunanet.net/2010/11/how-long-does-it-take-to-m...

you will find that on Linux a context switch takes about 30 usec. More recent measurements that take account of the effect of the TLB flush put the range at 10-300usec.

That means that in 2010, on Linux, you could reasonably expect to do at least 30k/sec. In 2021, with realistic audio processing workloads, the range is probably 3-50k/sec.

The 486 is a much lower register count than contemporary processors, which accounts for the faster context switching.

Modern audio processing software on Linux can run with 64 sample buffers, not 1k.

This recent paper on RT linux on RPi/Beagleboard single board systems concludes that on some of these relatively "low power" systems, 95% of latencies are in the 40-60usec range, which is completely adequate for the majority of RTOS tasks (but not all).

https://www.mdpi.com/2073-431X/10/5/64/pdf

>"The majority of Linux kernels’ measurements with PREEMPT_RT-patched kernel show the minimum response latency to be below 50 μs, both in user and kernel space. The maximum worst-case response latency (wcrl) reached 147 μs for RPi3 and 160 μs for BBB in user space, and 67 μs and 76 μs, respectively, in kernel space (average values). Most of the latencies are quite below this maximum (90% and 95%, respectively, for user space and kernel space). In general, it seems that maximal latencies do not often cross these values."

[ ... ]

"As an outcome, Linux kernels patched with PREEMPT_RT on such devices have the ability to run in a deterministic way as long as a latency value of about 160 μs, as an upper bound, is an acceptable safety margin. Such results reconfirm the reliability of such COTS devices running Linux with real-time support and extend their life cycle for the running applications."

This slide presentation offers up very similar numbers with graphs, also on ARM systems (I think):

https://elinux.org/image/d/de/Real_Time_Linux_Scheduling_Per...

This article shows cyclictest, a very minimal scheduling latency tester, getting the following results on an x86_64 system:

"The average average latency (Avg) is 4.875 us and the average maximum latency (Max) is 20.750 us, with the Max latency on 23 us. So, the average latency raises by 1.875 us, while the average maximum raises by 1.875 us, with the maximum latency raised by 2 us."

https://bristot.me/demystifying-the-real-time-linux-latency/

They conclude

> "Maximum observed latency values generally range from a few microseconds on single-CPU systems to 250 microseconds on non-uniform memory access systems, which are acceptable values for a vast range of applications with sub-millisecond timing precision requirements. This way, PREEMPT_RT Linux closely fulfills theoretical fully-preemptive system assumptions that consider atomic scheduling operations with negligible overheads."

I'm not sure where you're getting your current info from, but I'm extremely confident that it's wrong. If I had to guess, you have not kept up with the impact of the PREEMPT_RT patchset on the kernel, nor scheduling improvements in general, but I don't know (obviously).

jacquesm · on Sept 15, 2021

The last time that I've been actively involved with the development of real time control of time critical hardware on linux was about 2007 (very high speed stepper motor driven plasmacutter, slow down in a curve and you've ruined the workpiece), so for sure I'm out of the loop but I do have a fairly large Linux audio setup with all of the real time patches installed and clearly if it is possible to run with 64 sample buffers I have not been able to do so on my hardware, 1K really is the minimum before I get - inevitably, unfortunately - dropouts under relatively light load.

It might be worth documenting my setup (reproduced across three different machines, a laptop, an 'all-in-one' and a very beefy desktop), to see what could be improved because that difference is substantial.

jcelerier · on Sept 15, 2021

> but I do have a fairly large Linux audio setup with all of the real time patches installed and clearly if it is possible to run with 64 sample buffers I have not been able to do so on my hardware, 1K really is the minimum before I get - inevitably, unfortunately - dropouts under relatively light load.

that sounds very weird, I don't even run a RT kernel and I have no trouble running at 64 with a fair amount of plug-ins and even 32 samples when I just want some live guitar effects (i7 6900k, RME multiface 2). My only configuration is installing this AUR package: https://archlinux.org/packages/community/any/realtime-privil...

PaulDavisThe1st · on Sept 15, 2021

I've linked to it elsewhere in these comments, but this tries to describe in broad terms why any given x86* computer may not be able to meet your latency goals:

https://manual.ardour.org/setting-up-your-system/the-right-c...

There's a wide variety of reasons, all of which can interact. It's one of the few good arguments for buying Apple hardware, where this is not an issue.

Over the years I've been working on pro-audio/music creation on Linux (22+ years), I've had a couple of systems that could perform reliably at 64 samples/buffer. My current, based on a Ryzen Threadripper 2950X, can get down to 256 but not 128 or 64.

jacquesm · on Sept 15, 2021

Ok, so at a guess then that 64 is an optimum configured set of hardware bought specifically with the goal of reaching that minimum, and for more realistic 'run of the mill' hardware it would be 256 and up?

If someone were to put together a guaranteed low latency config and keep it patched using a custom distro (assuming say 'Ubuntu Studio' would not be up to the task, would there be a market for that? Are there such suppliers? What specifically is different in Apple hardware that it works there?

I read that page earlier, its helpful, but more helpful would be a shopping list that says 'get this: it will work, assuming you install this particular distro'. And after independent verification you could then add alternatives for each slot. For me for instance a big question would if NVidia video cards would break the latency guarantees (their driver is pretty opaque) by keeping interrupts masked for too long in their drivers. If that would be a deal breaker then I'd have to set up a system only for studio use.

PaulDavisThe1st · on Sept 15, 2021

The problem with "shopping lists" is that, at least in the past, it's turned out that companies like e.g. mobo manufacturers change the chipsets in the corners of these devices without even changing the product ID. If I told you a mobo to buy, there's no guarantee that you'll actually get what I was recommending.

Lots of efforts have been made over the years to create "audio PC" companies. Even with the Windows market within their intent, I don't know of a single one that has lasted more than a year or two. How much of that is a market problem and how much of it is a problem of actually sourcing reliable components, I don't know. I do know that when large scale mixing console companies find mobos that work for them, they buy dozens of them, just to ensure they don't get switched out by the manufacturer.

Apple stuff works because Apple sort of has to care about this workflow functioning correctly. There's no magic beyond careful selection of components and then rigorously sourcing them for the duration of a given Apple product's lifetime.

I have no actual evidence on the video adapter front, but my limited experience would keep me aware from NVidia if I was trying to build a low latency audio workstation. Back in the olden days (say, 2002), there were companies like Matrox who deliberately made video adapters that were "2D only, targetting audio professionals". These cards were properly engineered to get the hell off the bus ASAP, and didn't have of the 3D capabilities that audio professionals (while wearing that hat) really don't tend to need.