> it simply works better this way That is simply not true. > clang messing up x8...

> it simply works better this way

That is simply not true.

> clang messing up x86 intrinsics code

The code is correct, and on some processors runs slightly faster than the original. Clang is the only compiler which does anything like that. And the example is irrelevant to ffmpeg because it operates on FP64 numbers, video codecs mostly do integer math.

> they're so hard to read that the asm is actually more maintainable

That’s subjective, I’m using SIMD intrinsics for years and I find them way better than assembly.

Another thing, you can treat C as a high-level language as opposed to portable assembler. If you define structures, functions and classes in C++ which use these SIMD vectors, readability of intrinsics becomes way better than assembly. Here’s a good example of a library designed that way: https://github.com/microsoft/DirectXMath