Hacker Newsnew | past | comments | ask | show | jobs | submit | tcdent's commentslogin

We're not yet to the point where a single PCIe device will get you anything meaningful; IMO 128 GB of ram available to the GPU is essential.

So while you don't need a ton of compute on the CPU you do need the ability address multiple PCIe lanes. A relatively low-spec AMD EPYC processor is fine if the motherboard exposes enough lanes.


There is plenty that can run within 32/64/96gb VRAM. IMO models like Phi-4 are underrated for many simple tasks. Some quantized Gemma 3 are quite good as well.

There are larger/better models as well, but those tend to really push the limits of 96gb.

FWIW when you start pushing into 128gb+, the ~500gb models really start to become attractive because at that point you’re probably wanting just a bit more out of everything.


IDK all of my personal and professional projects involve pushing the SOTA to the absolute limit. Using anything other than the latest OpenAI or Anthropic model is out of the question.

Smaller open source models are a bit like 3d printing in the early days; fun to experiment with but really not that valuable for anything other than making toys.

Text summarization, maybe? But even then I want a model that understands the complete context and does a good job. Even things like "generate one sentence about the action we're performing" I usually find I can just incorporate it into the output schema of a larger request instead of making a separate request to a smaller model.


It seems to me like the use case for local GPUs is almost entirely privacy.

If you buy a 15k AUD rtx 6000 96GB, that card will _never_ pay for itself on a gpt-oss:120b workload vs just using openrouter - no matter how many tokens you push through it - because the cost of residential power in Australia means you cannot generate tokens cheaper than the cloud even if the card were free.


> because the cost of residential power in Australia

This so doesn't really matter to your overall point which I agree with but:

The rise of rooftop solar and home battery energy storage flips this a bit now in Australia, IMO. At least where I live, every house has a solar panel on it.

Not worth it just for local LLM usage, but an interesting change to energy economics IMO!


There’s a few more considerations:

- You can use the GPU for training and run your own fine tuned models

- You can have much higher generation speeds

- You can sell the GPU on the used market in ~2 years time for a significant portion of its value

- You can run other types of models like image, audio or video generation that are not available via an API, or cost significantly more

- Psychologically, you don’t feel like you have to constrain your token spending and you can, for instance, just leave an agent to run for hours or overnight without feeling bad that you just “wasted” $20

- You won’t be running the GPU at max power constantly


Or censorship avoidance

This is simply not true. Your heuristic is broken.

The recent Gemma 3 models, which are produced by Google (a little startup - heard of em?) outperform the last several OpenAI releases.

Closed does not necessarily mean better. Plus the local ones can be finetuned to whatever use case you may have, won't have any inputs blocked by censorship functionality, and you can optimize them by distilling to whatever spec you need.

Anyway all that is extraneous detail - the important thing is to decouple "open" and "small" from "worse" in your mind. The most recent Gemma 3 model specifically is incredible, and it makes sense, given that Google has access to many times more data than OpenAI for training (something like a factor of 10 at least). Which is of course a very straightforward idea to wrap your head around, Google was scrapign the internet for decades before OpenAI even entered the scene.

So just because their Gemma model is released in an open-source (open weights) way, doesn't mean it should be discounted. There's no magic voodoo happening behind the scenes at OpenAI or Anthropic; the models are essentially of the same type. But Google releases theirs to undercut the profitability of their competitors.



I'm holding out for someone to ship a gpu with dimm slots on it.

DDR5 is a couple of orders of magnitude slower than really good vram. That’s one big reason.

DDR5 is ~8GT/s, GDDR6 is ~16GT/s, GDDR7 is ~32GT/s. It's faster but the difference isn't crazy and if the premise was to have a lot of slots then you could also have a lot of channels. 16 channels of DDR5-8200 would have slightly more memory bandwidth than RTX 4090.

Yeah, so DDR5 is 8GT and GDDR7 is 32GT. Bus width is 64 vs 384. That already makes the VRAM 4*6 (24) times faster.

You can add more channels, sure, but each channel makes it less and less likely for you to boot. Look at modern AM5 struggling to boot at over 6000 with more than two sticks.

So you’d have to get an insane six channels to match the bus width, at which point your only choice to be stable would be to lower the speed so much that you’re back to the same orders of magnitude difference, really.

Now we could instead solder that RAM, move it closer to the GPU and cross-link channels to reduce noise. We could also increase the speed and oh, we just invented soldered-on GDDR…


> Bus width is 64 vs 384.

The bus width is the number of channels. They don't call them channels when they're soldered but 384 is already the equivalent of 6. The premise is that you would have more. Dual socket Epyc systems already have 24 channels (12 channels per socket). It costs money but so does 256GB of GDDR.

> Look at modern AM5 struggling to boot at over 6000 with more than two sticks.

The relevant number for this is the number of sticks per channel. With 16 channels and 64GB sticks you could have 1TB of RAM with only one stick per channel. Use CAMM2 instead of DIMMs and you get the same speed and capacity from 8 slots.


But it would still be faster than splitting the model up on a cluster though, right? But I’ve also wondered why they haven’t just shipped gpus like cpus.

Man I'd love to have a GPU socket. But it'd be pretty hard to get a standard going that everyone would support. Look at sockets for CPUs, we barely had cross over for like 2 generations.

But boy, a standard GPU socket so you could easily BYO cooler would be nice.


The problem isn't the sockets. It costs a lot to spec and build new sockets, we wouldn't swap them for no reason.

The problem is that the signals and features that the motherboard and CPU expect are different between generations. We use different sockets on different generations to prevent you plugging in incompatible CPUs.

We used to have cross-generational sockets in the 386 era because the hardware supported it. Motherboards weren't changing so you could just upgrade the CPU. But then the CPUs needed different voltages than before for performance. So we needed a new socket to not blow up your CPU with the wrong voltage.

That's where we are today. Each generation of CPU wants different voltages, power, signals, a specific chipset, etc. Within the same +-1 generation you can swap CPUs because they're electrically compatible.

To have universal CPU sockets, we'd need a universal electrical interface standard, which is too much of a moving target.

AMD would probably love to never have to tool up a new CPU socket. They don't make money on the motherboard you have to buy. But the old motherboards just can't support new CPUs. Thus, new socket.


For AI, really good isn't really a requirement. If a middle ground memory module could be made, then it'd be pretty appealing.

Would that be worth anything, though? What about the overhead of clock cycles needed for loading from and storing to RAM? Might not amount to a net benefit for performance, and it could also potentially complicate heat management I bet.

A single CAMM might suit better.

It might seem minor, but the little things add up. Make your dev environment mirror prod from the start will save you a bunch of headaches. Then, when you're ready to deploy, there is nothing to change.

Even better, stage to a production-like environment early, and then deploy day can be as simple as a DNS record change.


Thanks to LetsEncrypt DNS-01, you can absolutely spin up a production-like environment with SSL and everything. It's definitely worth doing.

Can I attach multiple GPUs to a container?

I use it as much as my brain can handle and I never exceed my Max plan quota.

Just a warning for those not on the max plan; if you pay by the token or have the lower tier plans you can easily blow through $100s or cap your plan in under an hour. The rates for paying by the token are insane and the scaling from pro to max is also pretty crazy.

They made pro have many times more value than paying per token and then they made max again have 25x more tokens than pro on the $200 plan.

It’s a bit like being offered rice at $1 per grain (pay per token) or a tiny bag of rice for $20 (pro) or a truck load for $200. That’s the pricing structure right now.

So while i agree you can’t easily exceed the quota on the big plans it’s a little crazy how they’ve tiered pricing. I hope no one out there’s paying per token!


> I hope no one out there’s paying per token!

Some companies are. Yes, for Claude Code. My co used to be like that as it's an easy ramp up instead of giving devs who might not use it that much a $150/mo seat; if you use it enough you can have a seat and save money, but if you're not touching $150 in credits a month just use the API. Oxide also recommends using API pricing. [0]

0: https://gist.github.com/david-crespo/5c5eaf36a2d20be8a3013ba...


They should publish the token limits not just talk about conversations or what average users can expect: https://support.claude.com/en/articles/11145838-using-claude...

For comparison’s sake, this is clear: https://support.cerebras.net/articles/9996007307-cerebras-co...

And while the Cerebras service is pretty okay, their website otherwise kinda sucks - and yet you can find clear info!


Oh yeah totally my bill used to be closer to $1000/mo when paying per-token.

Yeah well, wait til they take it away

Exactly I feel like my brain burns out after a few days. Like Im the limit already (yet im the maximizer also) its a very weird feeling

Wait, if I am providing essential data to your service, why am I paying you?

Perfect opportunity to run a project that benefits it's users (monetarily) if you only did the leg work to market that value to map consumers. And, as a consumer, you don't need the sophisticated hardware, anyway.


"you don't need the sophisticated hardware, anyway."

It depends on what kind of map you are building for which use cases and how passive you want it to be. Sure, you can use an iPhone or Android device but its not very passive (requires starting up, etc.) and it will quickly overheat when it gets hot. We tried it, and most people gave up after a few weeks given the fact that its not passive.

For most commercial fleets there is real value in the services we provide, eg monitoring, accident detection, remote video retrieval in case of accident, ELD compliance, etc.

You should read the article about rewards/incentives as it talks about that.


That's definitely a possible future abstraction and one are about the future of technology I'm excited about.

First we get to tackle all of the small ideas and side projects we haven't had time to prioritize.

Then, we start taking ownership of all of the software systems that we interact with on a daily basis; hacking in modifications and reverse engineering protocols to suit our needs.

Finally our own interaction with software becomes entirely boutique: operating systems, firmware, user interfaces that we have directed ourselves to suit our individual tastes.


There is a solution to this. You can become a US citizen.


* terms and conditions may apply


Didn't realise it was that easy. Why don't all the illegal immigrants just do that?


It doesn't have to be easy to be factual. You simply are not owed entry into any country if you are not a citizen of that country, that is a fundamental part of what things like "citizenship" and "sovereign state" mean in the modern world.


Ah so this is basically a meaningless platitude?


Very hard process though.


Another option is that we can treat our guests better.


And then wonder if they'll try to take your citizenship away anyway - the exact boat I'm in. Naturalized after almost 20 years of holding a GC, because I expected trouble with this administration - and now wondering they'll try to take away my citizenship because I did it recently.

I actually expected to leave and have my right to come back not dependent on GC status (which expires after 6 months), but due to family have stayed so far. by the by - I'm a citizen of that dangerous country bordering the US - Canada.


I started using Django before the official 1.0 release and used it almost exclusively for years on web projects.

Lately I prefer to mix my own tooling and a couple major packages in for backends (FastAPI, SQLAchemy) that are still heavily inspired by patterns I picked up while using Django. I end up with a little more boilerplate, but I also end up with a little more stylistic flexibility.


> I started using Django before the official 1.0 release

Indeed. I'm still using the 0.97beta. It's perfectly good for production use!

</obscure joke>


Rather than having multiple agents running inside of one IDE window, I structure my codebase in a way that is somewhat siloed to facilitate development by multiple agents. This is an obvious and common pattern when you have a front-end and a back-end. Super easy to just open up those directories of the repository in separate environments and have them work in their own siloed space.

Then I take it a step further and create core libraries that are structured like standalone packages and are architected like third-party libraries with their own documentation and public API, which gives clear boundaries of responsibility.

Then the only somewhat manual step you have is to copy/paste the agent's notes of the changes that they made so that dependent systems can integrate them.

I find this to be way more sustainable than spawning multiple agents on a single codebase and then having to rectify merge conflicts between them as each task is completed; it's not unlike traditional software development where a branch that needs review contains some general functionality that would be beneficial to another branch and then you're left either cherry-picking a commit, sharing it between PRs, or lumping your PRs together.

Depending on the project I might have 6-10 IDE sessions. Each agent has its own history then and anything to do with running test harnesses or CLI interactions gets managed on that instance as well.


I'm making an effort to support Open Source projects that I use everyday; much in the way I support creators on YouTube via Patreon with small monthly commitments, so it's a welcome opportunity that GhosTTY has made that easy to accomplish.


I give a lot of money to the free things I use as well, but even if I used Ghostty I'd struggle to give them any money since the founder is extraordinarily wealthy.

Please fund projects that actually need it, and don't voluntarily gift money to a literal billionaire.

> I get asked the same about terminals all the time. “How will you turn this into a business? What’s the monetization strategy?” The monetization strategy is that my bank account has 3 commas mate.

Original post: https://x.com/mitchellh/status/1964785527741427940


My intention is that the project isn't wholly dependent on me, so that I can move on (one day) and refocus my efforts elsewhere. I think no matter who the donor is, any charity dependent on the welfare of a single large whale is not a healthy organization. I intend to resolve this over time.

That all being said, everyone should give where they want, and if you don't want to give to a terminal emulator non-profit project, then don't! Don't let anyone bully you (me, the person I'm responding to, or anyone else) into what you should and shouldn't charitably support. Enjoy.

(Also, I don't want to repeat this everywhere but I paid taxes and I lost a comma, so no need to worry about that anymore! Everyone please pull out your most microscopic violins! )


> Also, I don't want to repeat this everywhere but I paid taxes and I lost a comma, so no need to worry about that anymore! Everyone please pull out your most microscopic violins!

Well, since we're talking about it, maybe you're down to answer a question I've always wondered about: money into the hundred millions, let alone billions, is for me an unfathomable amount of capital for one person to wield. I've always thought, if I ever had that kind of power to swing around, I'd spend it all trying to solve every problem I could get my hands on, until there was nothing left but my retirement fund (which could be 10 million and still let me spend hundreds of millions while retiring in permanent wealthy comfort). Hunger in specific areas, housing crises, underfunded education, across the world many issues that, at least locally, one individual with that kind of money could, so far as I can tell, independently resolve.

Why aren't the ultra rich doing it? You seem to have a more philanthropic mind than most, you're doing this cool project and nobody can deny your FOSS contributions. But even you are still holding onto keeping that count into the hundreds rather than the tens - is there some quality of life aspect hidden to us that's just really difficult to imagine giving up or something? Yacht life? Private flights? Chumming it up with Gabe and Zuck?

Becoming that wealthy won't happen to me but if it did, what would change about me that'd make me not want to spend it all anymore?


While I understand that people might downvote the parent post because it seems in bad taste and touches on a culturally sensitive thing, haven't we all wondered this? Why is it that the poor give relatively more generously than the rich?

It's such an interesting phenomenon that so many ultra rich people are essentially just hoarding wealth beyond what they should reasonably be able to even have use of in multiple generations. Worse, some of them simply cannot seem to get enough and will literally commit crimes and/or do indisputably morally wrong things to get even more.

I would personally never ask anyone this, and I wouldn't expect anyone who could answer it to actually answer it, but I think what komali2 asked is one of the most interesting questions out there.


I think it might be because I'm autistic but can you help me understand why it's in bad taste to ask it? I see YouTube videos of people talking about how they became really wealthy or showing off their houses or cars, and this person was talking about his bank account directly and has mentioned the 3 comma thing before, so I'm a bit confused why it's not ok to ask more about it.

You did mention something I didn't think of which is lifetimes, I guess if someone wanted to guarantee an ultra wealthy lifestyle for all generations of their kids and grandkids forever, that would be a reason to hoard wealth into the hundreds of millions.


> don't voluntarily gift money to a literal billionaire.

The entire point of this post is that the money is not going to him.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: