Yes, I mean a Mac Studio with MLX. An M3 Ultra with 256GB of RAM is $5599. That ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		simonw 44 days ago \| parent \| context \| favorite \| on: Kimi K2 Thinking, a SOTA open-source trillion-para... Yes, I mean a Mac Studio with MLX. An M3 Ultra with 256GB of RAM is $5599. That should just about be enough to fit MiniMax M2 at 8bit for MLX: https://huggingface.co/mlx-community/MiniMax-M2-8bit Or maybe run a smaller quantized one to leave more memory for other apps! Here are performance numbers for the 4bit MLX one: https://x.com/ivanfioravanti/status/1983590151910781298 - 30+ tokens per second.

zht 44 days ago | [–]

It’s kinda misleading to omit the generally terrible prompt processing speed on Macs

30 tokens per second looks good until you have to wait minutes for the first token

simonw 44 days ago | | [–]

The tweet I linked to includes that information in the chart.

oxcidized 44 days ago | [–]

Thanks for the info! Definitely much better than I expected.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact