More

thegeomaster · 2025-11-30T13:39:28 1764509968

Article talks about all of this and references DeepSeek R1 paper[0], section 4.2 (first bullet point on PRM) on why this is much trickier to do than it appears.

[0]: https://arxiv.org/abs/2501.12948

thegeomaster · 2025-11-30T13:35:16 1764509716

You could think of supervised learning as learning against a known ground truth, which pretraining certainly is.

Davidzheng · 2025-11-30T14:20:13 1764512413

a large number of breakthroughs in AI are based on turning unsupervised learning into supervised learning (alphazero style MCTS as policy improvers are also like this). So the confusion is kind of intrinsic.

thegeomaster · 2025-10-17T10:33:12 1760697192

It's interesting to also compare this to getting a bare metal instance and provisioning microVMs on it using Firecracker. (Obviously something you shouldn't roll yourself in most cases.)

You can get a bare metal AX162 from Hetzner for 200 EUR/mo, with 48 cores and 128GB of RAM. For 4:1 virtual:physical oversubscription, you could run 192 guests on such a machine, yielding a cost of 200/192 = 1.04 EUR/mo, and giving each guest a bit over 1GiB of RAM. Interestingly, that's not groundbreakingly cheaper than just getting one of Hetzner's virtual machines!

freakynit · 2025-10-17T13:30:48 1760707848

"Interestingly, that's not groundbreakingly cheaper than just getting one of Hetzner's virtual machines!" .... yea.. cause this is what these companies are doing behind the scenes :)

thegeomaster · 2025-10-01T12:56:59 1759323419

You didn't include the amortized cost of a Blackwell GPU, which is an order of magnitude larger expense than electricity.

dwohnitmok · 2025-10-01T13:40:37 1759326037

Yeah that's fair (although the original comment was only talking about energy costs).

But this is kind of a worst case cost analysis. I fully expect that the average non-pro Sora 2 video has one to two orders of magnitude less GPU utilization than I listed here (because I think those video tokens are probably generated at a batch size of ~100 per batch).

thegeomaster · 2025-09-16T11:55:15 1758023715

Warning: LLM-generated article, terribly difficult to follow and full of irrelevant details.

thegeomaster · 2025-09-15T00:22:23 1757895743

Well this was a trip down the memory lane. I built a small game on Irrlicht at the time and I remember these discussions also.

Irrlicht had its editor (irrEdit), a sound system (irrKlang), and some basic collision detection and FPS controller was built right into the engine. This was enough to get you a considerable way through a fully featured tech demo, at the very least. (I even remember Irrlicht including a beautiful first-person tech demo of traversing a large BSP-partitioned castle level.)

However, for those not afraid to stitch these additional parts from other promising libraries (or derive them from first principles, as was fashionable), OGRE offered more raw rendering prowess: a working deferred shading system (this was the heyday of deferred shading), a pop-less terrain implementation with texture splatting, and more impressive shader and rendering pipeline support, with the Cg multi-platform shading language. I remember a fairly impressive ocean surface and Fresnel refraction/reflection demos from OGRE at the time.

thegeomaster · 2025-09-13T09:39:27 1757756367

What an astounding achievement. In 6 years, this person has written not only a very well-designed microkernel, but a build system, UEFI bootloader, graphical shell, UI framework, and a browser engine.

The story of 10x developers among us is not a myth... if anything, it's understated.

nylonstrung · 2025-09-13T11:01:57 1757761317

And unlike a similar project, they accomplished it without the benefit of divine guidance.

Very impressive!

cidd · 2025-09-14T04:53:54 1757825634

The greatest programmer who ever lived. Gifted with divine intellect.

speed_spread · 2025-09-13T11:23:33 1757762613

[flagged]

reactordev · 2025-09-13T11:38:56 1757763536

Not with Messiah.ai :D

cryptoz · 2025-09-13T11:50:28 1757764228

Oh my God. That domain is parked and for sale for $125,000?!?!

Wild.

zenmac · 2025-09-13T12:57:09 1757768229

Oh that is nothing. Check out god.ai..... domain parking is back. At this point we might as well just have a TLD for .god

TuxSH · 2025-09-13T23:46:00 1757807160

> TLD for .god

Sounds like a good TLD for an "identity and access management" system :)

KerrAvon · 2025-09-13T22:09:34 1757801374

Musk would just hog it for himself

Levitating · 2025-09-13T18:58:08 1757789888

You might enjoy reading the SerentiyOS progress reports

https://serenityos.org/

ktallett · 2025-09-13T22:06:19 1757801179

I want serenity now

Rohansi · 2025-09-14T01:05:34 1757811934

https://www.youtube.com/watch?v=LW_s6EqOxqY

rayiner · 2025-09-13T13:41:41 1757770901

Yeah it’s amazing.

thegeomaster · 2025-08-28T16:21:03 1756398063

Are you saying that you think Sonnet 4 has 100B-200B _active_ params? And that Opus has 2T active? What data are you basing these outlandish assumptions on?

ankit219 · 2025-08-28T16:53:41 1756400021

Oh nothing official. There are people who estimate the sizes based on tok/s, cost, benchmarks etc. The one that most go on is https://lifearchitect.substack.com/p/the-memo-special-editio.... This guy estimated Claude 3 opus to be 2T param model (given the pricing + speed). Opus 4 is 1.2T param according to him (but then I dont understand why the price remained the same.). Sonnet is estimated by various people to be around 100B-200B params.

[1]: https://docs.google.com/spreadsheets/d/1kc262HZSMAWI6FVsh0zJ...

NoahZuniga · 2025-08-28T17:05:40 1756400740

If you're using the api cost of the model to estimate it's size, then you can't use this size estimate to estimate the inference cost.

thegeomaster · 2025-08-28T23:34:03 1756424043

tok/s cannot in any way be used to estimate parameters. It's a tradeoff made at inference time. You can adjust your batch size to serve 1 user at a huge tok/s or many users at a slow tok/s.

Der_Einzige · 2025-08-28T18:13:55 1756404835

Not everyone uses MoE architectures. It's not outlandish at all...

thegeomaster · 2025-08-28T23:31:49 1756423909

There's no way Sonnet 4 or Opus 4 are dense models.

Der_Einzige · 2025-08-28T23:44:02 1756424642

Citation needed

thegeomaster · 2025-08-29T00:33:43 1756427623

Common sense:

- The compute requirements would be massive compared to the rest of the industry

- Not a single large open source lab has trained anything over 32B dense in the recent past

- There is considerable crosstalk between researchers at large labs; notice how all of them seem to be going in similar directions all the time. If dense models of this size actually provided benefit compared to MoE, the info would've spread like wildfire.

thegeomaster · 2025-08-18T21:52:09 1755553929

Seems heavily vibe coded, down to the Claude-generated README and a lot of the LLM prompts themselves (which I have found works very poorly compared to human-written prompts). While none of this is necessarily bad, it requires a higher burden of proof that it actually works beyond toy problems [0]. I think everyone would appreciate some examples of vulnerabilities it can find. The missing JWT check showcased in the screenshot would've probably been caught with ordinary AI code review, so to my eye that by itself is not persuasive.

Good luck!

[0]: Why I say this --- a 10kLOC piece of software that was mostly human-written would require a large amount of testing, even manual, to ensure that it works, reliably, at all. All this testing and experimentation would naturally force a certain depth of exploration for the approach, the LLM prompts, etc across a variety of usecases. A mostly AI-written codebase of this size would've required much less testing to get it to "doesn't crash and runs reliably", and so this depth is not a given anymore.

thegeomaster · 2025-08-15T12:16:14 1755260174

Thanks for sharing this! It's difficult to find good examples of useful codebases where coding agents have done most of the work. I'm always actively looking at how I can push these agents to do more for me and it's very instructive to hear from somebody who has had success on this level. (Would be nice to read a writeup, too)

diwank · 2025-08-15T14:54:58 1755269698

It's coming soon! I think this experiment has really taught me a lot about the limits of agentic code assistants, stuff that they're good at, they're insanely good at, and stuff that they're horrible at and cannot seem to overcome. I did write a little bit about how I use Claude Code [1] before I started this project a while back, and I'm planning to finish a sequel pretty soon.

^[1]: https://diwank.space/field-notes-from-shipping-real-code-wit...