Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I dunno man. My claim was that for specific cases with unique properties, it's not hard to beat BLAS, without getting too exotic with your code. BLAS doesn't have routines for multiplies with non-contiguous data, various patterns of sparsity, mixed precision inputs/outputs, etc. The example I gave is for a specific case close-ish to the case I cared about.

You're changing it to a very different case, presumably one that you cared about, although 4096x4096 is oddly square and a very clean power of 2... I said right at the beginning of this long digression that what is hard about reproducing BLAS is its generality.



When I run your benchmark with matrices larger than 1024x1024 it errors out in the verification step. Since your implementation isn't even correct I think my original point about OpenBLAS being extremely difficult to replicate still stands.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: