If you have designed FPUs you should know that FP computation involves a lot more additional operations than just shifting (e.g. rounding, subnormals, and special value handling). That’s why, for example, CPUs use different hardware blocks for INT vs FP computation.
But that’s not the point. The point is, this particular method to speed up matmul is not suitable for FP.
But that’s not the point. The point is, this particular method to speed up matmul is not suitable for FP.