Man I remembered something similar I had tried working on in 2018, but gave up a...

yorwba · on March 15, 2024

I'm not seeing the connection. This work is about low-level optimization of matrix multiplication. The repo you linked seems to be about replacing back-propagated gradients with a cheaper estimate. What's the similarity you see between these two?

ixaxaar · on March 15, 2024

Correct, I think I mistook it as "use a small neural net to approximate matrix multiplication" instead it seems as "use cheaper replacements of matrix mul without much acc loss".

Wellll that means I can give dni another try :D

jebarker · on March 15, 2024

This feels like a "no free lunch" situation. I would imagine that any time saving in approximating the gradients this way would be lost to needing to train for more iterations due to the loss in gradient accuracy. Is that not the case?

ixaxaar · on March 15, 2024

I think that's the reason for its dead end.

However if this is really the biological analogue of credit assignment, this might scale better than training llms from scratch every time. Even if say it could approx gradients to a certain degree given a new network, normal backprop could further tune for a few epochs or so dramatically reducing overall training costs.

rollingtide · on March 15, 2024

Unrelated to the technical discussion but I was wondering what you made that architecture gif with? Looks neat!

ixaxaar · on March 15, 2024

I think that image is from the paper and was not created by me. Looks cool indeed!