When I did low latency everyone was offloading TCP to dedicated hardware. They w...

shaklee3 · on July 9, 2024

You don't need to shut down processes on the server. All you have to do is isolate CPU cores and move your workloads onto those cores. That's been a common practice in low latency networking for decades.

bluGill · on July 9, 2024

I'm not in HFT, but I wouldn't expect that to be enough.

Not only do you want to isolate cores, you want to isolate any shared cache between cores. You do not want your critical data ejected from the cache because a different core sharing the cache has decided it needs that cache. Which of course starts with knowing exactly what CPU you are using since different ones have different cache layouts.

You also don't want those other cores using up precious main memory or IO bandwidth at the moment you need it.

worstspotgain · on July 9, 2024

Just to add to your good points: since there's always a faster cache for your working set to not fit in, you can use memory streaming instructions to reduce cache pollution. Depending on the algorithm, increasing cache hit rates can give ridiculous speed-ups.

shaklee3 · on July 9, 2024

Correct. I was just pointing out to OP that moving processes is not worthwhile and isolation is how you'd do it

gohwell · on July 9, 2024

I’ve worked at a few firms and never heard of an IT budget for f-ups. Sounds like a toxic work environment.

davidmr · on July 9, 2024

Same. That sounds like a way to make that relationship between front office and back office as toxic and unproductive as possible.

hawk_ · on July 9, 2024

Depends on how it's set up. You take a chunk of profits as well if things go well.

resonious · on July 10, 2024

It's just business, no? Would you rather trade with a service that's liable for their mistakes or one that isn't?

rramadass · on July 9, 2024

Any good books/resources you can recommend to learn about the above architectures/techniques?

neomantra · on July 9, 2024

Some years ago I wrote a gist about HFT/HPC systems patterns (versus OPs C++ patterns) applied to dockerized Redis. Might be dated, but touches on core isolation/pinning, numa/cgroups, kernel bypass, with some links to go deeper. Nowadays I do it with Kubernetes and Nomad facilities, but same basic ideas:

https://gist.github.com/neomantra/3c9b89887d19be6fa5708bf401...

rramadass · on July 9, 2024

Nice; reminds me of the Redhat Performance Tuning and Real Time Low Latency Optimization guides.

crabmusket · on July 9, 2024

A few episodes of Signals and Threads, a podcast from Jane Street, go into parts of it.

rramadass · on July 9, 2024

Thank You.

ra0x3 · on July 9, 2024

A great insightful comment, thank you!