Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>They did something to quantize >90% of the model parameters to the MXFP4 format (4.25 bits/parameter) to let the 120B model to fit on a single 80GB GPU, which is pretty cool

They said it was native FP4, suggesting that they actually trained it like that; it's not post-training quantisation.



The native FP4 is one of the most interesting architectural aspects here IMO, as going below FP8 is known to come with accuracy tradeoffs. I'm curious how they navigated this and how the FP8 weights (if they exist) were to perform.


One thing to note is that MXFP4 is a block scaled format, with 4.25 bits per weight. This lets it represent a lot more numbers than just raw FP4 would with say 1 mantissa and 2 exponent bits.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: