Or a beefy MacBook Pro. I recently bought one with 64gb of memory and Llama 65B infers very promptly as long as I'm using quantized weights (and the Mac's GPU).
But I’m waiting until my friends can afford it. Right now (which in this pace might mean I change my mind tonight)
…I am earnestly studying how to make this a thing anyone can install as a part of a product they can use without a subscription.
Or a beefy MacBook Pro. I recently bought one with 64gb of memory and Llama 65B infers very promptly as long as I'm using quantized weights (and the Mac's GPU).