Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've also heard the opinion that modern compilers are better at generating optimized code than someone writing assembly by hand. Not sure how true it is, but considering the unfathomable complexity of modern CPUs, it does feel believable.


As a low level performance guy I trust the compiler nowadays, especially with deep instruction pipelines. The compiler is beatable - a lot of the decisions are heuristic - but it takes a lot of work to beat it.


Last time I looked intel CPUs had like 1700 instructions. Every generation comes with an even more expanded ISA. I doubt that compilers use even a fraction of the ISA. Especially considering that binaries are often expected to run on a wide range of older CPUs. I know that there are intrinsic functions which provide access to some of the powerful, yet special purpose instructions. It is unrealistic to expect the compiler to make effective use of all the fancy instructions you paid for with your latest hardware upgrade.


> It is unrealistic to expect the compiler to make effective use of all the fancy instructions you paid for with your latest hardware upgrade.

I'd add "yet" - we runinafed that the reason new machines with similar shapes (quad core to quad core of a newer generation) doesn't immediately seem like a large a jump as it ought, in performance, is because it takes time for people other than intel to update their compilers to effectively make use of the new instructions. icc is obviously going to more quickly (in the sense of how long after the CPU is released, not `time`) generate faster executing code on new Intel hardware. But gcc will take longer to catch up.

There's a sweet spot from about 1-4 years after initial release where hardware speeds up, but toward the end of that run programs bloat and wipe all the benefits of the new instructions; leading to needing a new CPU, that isn't that much faster than the one you replaced.

Yet.

Which reminds me I need to benchmark a Linux kernel compile to see if my above supposition is correct, I have the timings from when I first bought it, as compared to a 10 year old HP 40 core machine (ryzen 5950 is 5% faster but used 1/4th the wall power.)


> runinafed

-> ruminated


These kinds of SIMD instructions are usually used by things like media codecs and DSPs. They would include several versions of the performance-critical number-crunching code and would pick the best one at runtime depending on which SIMD instructions your CPU supports.


If and only if someone has taken time to write specific optimizations for your specific CPU.

In embedded land, if your microcontroller is unpopular, you don't get much in the way of optimization. The assembly GCC generates is frankly hot steaming trash and an intern with an hour of assembly experience can do better. This is not in any way an exaggeration.

I've run into several situations where hand-optimized assembly is tens of times faster than optimized C mangled by GCC.

I do not trust compilers anymore unless it's specifically for x86_64, and only for CPUs made this decade


I'm curious! Can you provide an example of something gcc does poorly that you think such an intern could actually improve upon?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: