Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How does BDH handle long-range dependencies compared to Transformers, given its locally interacting neuron particles? Does the scale-free topology implicitly support efficient global information propagation?


From the authors: great question. If you take an "easy" task for long-range dependencies where a Mamba-like architecture flies (and the transformer doesn't, or gets messy), the hatchling should also be made to fly. For more ambitious benchmarks, give it a try in a place you care about. The paper is really vanilla and focused on explaining what's happening inside the model, but should be good enough as a starting point for architecture tweaks and experiments.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: