> Although Apple has included a matrix accelerator in its devices since 2019, it used a proprietary instruction set inaccessible to developers, who officially could only use Apple-provided numerical libraries.
How does that work? Does the hardware throw some kind of fault when using those instructions? Or are they merely undocumented and you could use them if you figure out how they work? I guess the second, as hinted by the "officially"?
IIRC there was a BLIS fork that used AMX instructions. I think it was unofficial though(?). It is hard to do science without properly documented tools.
How does that work? Does the hardware throw some kind of fault when using those instructions? Or are they merely undocumented and you could use them if you figure out how they work? I guess the second, as hinted by the "officially"?