Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

When I did low latency everyone was offloading TCP to dedicated hardware.

They would shut down every single process on the server and bind the trading trading app to the CPUs during trading hours to ensure nothing interrupted.

Electrons travel slower than light so they would rent server space at the exchange so they had direct access to the exchange network and didn't have to transverse miles of cables to send their orders.

They would multicast their traffic and there were separate systems to receive the multicast, log packets, and write orders to to databases. There were redundant trading servers that would monitor the multicast traffic so that if they had to take over they would know all of the open positions and orders.

They did all of their testing against simulators - never against live data or even the exchange test systems. They had a petabyte of exchange data they could play back to verify their code worked and to see if tweaks to the algorithm yielding better or worse trading decisions over time.

A solid understanding of the underlying hardware was required, you would make sure network interfaces were arranged in a way they wouldn't cause contention on the PCI bus. You usually had separate interfaces for market data and orders.

All changes were done after exchange hours once trades had been submitted to the back office. The IT department was responsible for reimbursing traders for any losses caused by IT activity - there were shady traders who would look for IT problems and bank them up so they could blame a bad trade on them at some future time.



You don't need to shut down processes on the server. All you have to do is isolate CPU cores and move your workloads onto those cores. That's been a common practice in low latency networking for decades.


I'm not in HFT, but I wouldn't expect that to be enough.

Not only do you want to isolate cores, you want to isolate any shared cache between cores. You do not want your critical data ejected from the cache because a different core sharing the cache has decided it needs that cache. Which of course starts with knowing exactly what CPU you are using since different ones have different cache layouts.

You also don't want those other cores using up precious main memory or IO bandwidth at the moment you need it.


Just to add to your good points: since there's always a faster cache for your working set to not fit in, you can use memory streaming instructions to reduce cache pollution. Depending on the algorithm, increasing cache hit rates can give ridiculous speed-ups.


Correct. I was just pointing out to OP that moving processes is not worthwhile and isolation is how you'd do it


I’ve worked at a few firms and never heard of an IT budget for f-ups. Sounds like a toxic work environment.


Same. That sounds like a way to make that relationship between front office and back office as toxic and unproductive as possible.


Depends on how it's set up. You take a chunk of profits as well if things go well.


It's just business, no? Would you rather trade with a service that's liable for their mistakes or one that isn't?


Any good books/resources you can recommend to learn about the above architectures/techniques?


Some years ago I wrote a gist about HFT/HPC systems patterns (versus OPs C++ patterns) applied to dockerized Redis. Might be dated, but touches on core isolation/pinning, numa/cgroups, kernel bypass, with some links to go deeper. Nowadays I do it with Kubernetes and Nomad facilities, but same basic ideas:

https://gist.github.com/neomantra/3c9b89887d19be6fa5708bf401...


Nice; reminds me of the Redhat Performance Tuning and Real Time Low Latency Optimization guides.


A few episodes of Signals and Threads, a podcast from Jane Street, go into parts of it.


Thank You.


A great insightful comment, thank you!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: