Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Man I remembered something similar I had tried working on in 2018, but gave up after all my PhD applications got rejected.

https://github.com/ixaxaar/pytorch-dni

The concept here goes a bit further and tries to replicate backprop with an external network, arguing that that's probably what the brain actually does.



I'm not seeing the connection. This work is about low-level optimization of matrix multiplication. The repo you linked seems to be about replacing back-propagated gradients with a cheaper estimate. What's the similarity you see between these two?


Correct, I think I mistook it as "use a small neural net to approximate matrix multiplication" instead it seems as "use cheaper replacements of matrix mul without much acc loss".

Wellll that means I can give dni another try :D


This feels like a "no free lunch" situation. I would imagine that any time saving in approximating the gradients this way would be lost to needing to train for more iterations due to the loss in gradient accuracy. Is that not the case?


I think that's the reason for its dead end.

However if this is really the biological analogue of credit assignment, this might scale better than training llms from scratch every time. Even if say it could approx gradients to a certain degree given a new network, normal backprop could further tune for a few epochs or so dramatically reducing overall training costs.


Unrelated to the technical discussion but I was wondering what you made that architecture gif with? Looks neat!


I think that image is from the paper and was not created by me. Looks cool indeed!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: