Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I haven't read through the entire thing yet, but the long abstract combined with the way the acronym BDH is introduced (What does the B stand for?) along with the very "flowery" name (When neither "dragon" nor "hatchling" appears again past page 2) is rather offputting

- It seems strange to make use of the term "scale-free" and then defer a definition until half way through the paper (in fact, the term is mentioned 3 times after, and 14 times before said definition)

- This might just be CS people doing CS things, but the notation in the paper is awful: Claims/Observations end with a QED-symbol (for example on pages 29 and 30) but without a proof

- They make strong claims about performance and scaling ("It exhibits Transformer-like scaling laws") but the only (i think?) benchmark is a translation task comparison with <1B models, ,which is ~2 orders of magnitude smaller than sota



It's a 'dragon hatchling' because it is 'scale-free'.


Hah, that's pretty clever if it's true .D


The B stands for "Baby". Baby Dragon Hatchling is their model name.


Seems like this should be in the paper! Thanks though


> Claims/Observations end with a QED-symbol

Author comment: as a fairly common convention, QED immediately after a particular statement means that the statement should be considered proven. Depending on the text, this may either be because the statement (Observation) is self-explanatory, or, the discussion in the text leading up to the statement is sufficient, or, whenever the final statement of a Theorem follows as a direct corollary of Lemmas previously proven in the text.


I could agree with that, but the example on p29 (Claim 6) ends with QED, but only then the proof follows. I realize I'm nitpicking form here, but still


Well spotted. Thank you!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: