xenn0010's comments

xenn0010 · 2025-12-11T21:50:44 1765489844

Found this while fighting with ROCm last week. Some team called Aule Technologies built a FlashAttention implementation for AMD hardware that actually runs. Anyone who's tried attention kernels on MI200/MI300 knows the ecosystem is a graveyard of abandoned CUDA ports. This one works out of the box. Repo: https://github.com/AuleTechnologies/Aule-Attention 53 stars, still pretty new. Looked at the code — it's clean HIP, not some janky translation layer. Benchmarks look legit though I haven't verified independently. Not affiliated, just surprised something in the ROCm attention space isn't broken. Figured others here running AMD might find it useful.

magicalhippo · 2025-12-12T00:22:40 1765498960

Says it works with Vulkan as well, for intel as well as AMD consumer cards. Sounds pretty great if it delivers on that.