Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Agreed. With some work, 13B runs on consumer hardware at this point. That redefines consumer to a 3090 (but hey, some depressed crypto guys are selling them. I recently got another GPU for my homelab this way).

30B is within reach, with compression techniques that seem to lose very little information of the overall network. Many argue that machine learning IS fundamentally a compression technique, but the topology of the trained network turns out to be more important. Assuming an appropriate activation function after this transformation.

No… definitely not your GPT4 replacement. However this is the kind of PoC I keep following… every… 18 hours or so? Amazing.



> That redefines consumer to a 3090

Or a beefy MacBook Pro. I recently bought one with 64gb of memory and Llama 65B infers very promptly as long as I'm using quantized weights (and the Mac's GPU).


This is very impressive. I think everyone should pay very close attention to what M1/M2 have given us.

But I’m waiting until my friends can afford it. Right now (which in this pace might mean I change my mind tonight)

…I am earnestly studying how to make this a thing anyone can install as a part of a product they can use without a subscription.


And beam size 1?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: