Hacker Newsnew | past | comments | ask | show | jobs | submit | a_wild_dandan's commentslogin

> Unlike the previous GPT-5.1 model, GPT-5.2 has new features for managing what the model "knows" and "remembers to improve accuracy.

Dumb nit, but why not put your own press release through your model to prevent basic things like missing quote marks? Reminds me of that time an OAI released wildly inaccurate copy/pasted bar charts.


It does seem to raise fair questions about either the utility of these tools, or adoption inertia. If not even OpenAI feels compelled to integrate this kind of model-check into their pipeline, what's that say about the business world at-large? Is it that it's too onerous to set up, is it that it's too hard to get only true-positive corrections, is it that it's too low value for the effort?

> what's that say about the business world at-large?

Nothing. OpenAI is a terrible baseline to extrapolate anything from.


I always remember this old image https://i.imgur.com/MCsOM8e.jpeg

Their model doesn't handle punctuation, quote marks, and similar things very well at all.

It may have been used, how could we know?

Mainly, I don't get why there are quote marks at all.


Humans are now expected to parse sloppy typing without complaining about it, just like LLMs do. Slop is the new normal.

Maybe they did

Businesses do whatever’s cheap. AI labs will continue making their models smarter, more persuasive. Maybe the SWE profession will thrive/transform/get massacred. We don’t know.

No. I like being able to ignore them. I can’t do that if people chop off their disclaimers to avoid comment removal.

Someone will make a killing on a rechargeable version of this. The ergonomics are a good idea.

If the claims in the abstract are true, then this is legitimately revolutionary. I don’t believe it. There are probably some major constraints/caveats that keep these results from generalizing. I’ll read through the paper carefully this time instead of a skim and come back with thoughts after I’ve digested it.

What's not to believe? Qwerky-32b has already done something similar as a finetune of QwQ-32b but not using traditional attention architecture.

And hybrid models aren't new, MLA based hybrid models is basically just Deepseek V3.2 in a nutshell. Note that Deepseek V3.2 (and V3.1, R1, and V3... and V2 actually) all use MLA. Deepseek V3.2 is what adds the linear attention stuff.

Actually, since Deepseek V3.1 and Deepseek V3.2 are just post-training on top of the original Deepseek V3 pretrain run, I'd say this paper is basically doing exactly what Deepseek V3.2 did in terms of efficiency.


DeepSeek-V3.2 is a sparse attention architecture, while Zebra-Llama is a hybrid attention/SSM architecture. The outcome might be similar in some ways (close to linear complexity) but I think they are otherwise quite different.

His specific thesis is that pods fundamentally clean worse than powder because they're inherently single-stage releases of detergent in machines designed for two-stage releases. Despite this, he still explicitly says that pods have their uses. So I'm unclear on how his goal is "proving that everyone is wrong." Did we watch different videos?


I think the main advantage of pods is accessibility


Is there list of machines that designed for two stage releases?


Even if it doesn't have a specific prewash section you can literally just toss a bit of powder in to the machine, since the prewash happens first.


More interesting would be a list of machines not designed for two-stage release. They probably exist, but it'll be a much smaller list.


All of them


Out of 5 machines I've used at different apartments, none had a separate pre-wash dispenser. And I've saved manual for my current one, it says nothing about adding detergent additionally to the dishes. And all of them washed just fine with powder, without any additional mumbo-jumbo.


i have dishwasher that is loaded with cartridge that has 400g of powder. ideal scenario for dispensing detergent at will. yet, never mind what cycle I am using, it dispensed only during main wash cycle.

i also had in past machines from 5 different manufacturers. none of them had mechanisms that facilitate 2 releases or pre-wash compartments


> i also had in past machines from 5 different manufacturers. none of them had mechanisms that facilitate 2 releases or pre-wash compartments

did you check the manual? I think in a previous video he mentioned that for machines like that it was stated in manual to add powder for prewash directly in the machine.


they all washed dishes just fine without any prewash powder added. somebody "here" even quoted bosch manual that there is no need in prewash powder. i most of the time use cycle that doesn't even has prewash


How does having management strategies over an alleged addiction imply that it isn’t an addiction?


I take it you are unfamiliar with the “do not get addicted to water” speech in Mad Max.


Intelligence is whatever an LLM can’t do yet. Fluid intelligence is the capacity to quickly move goal posts.


I'm not sure I understand your statement. Are you implying that once an LLM can do something, "it" is not intelligent anymore? ("it" being the model, the capability, or both?)


I would bet that it's far lower now. Inference is expensive we've made extraordinary efficiency gains through techniques like distillation. That said, GPT-5 is a reasoning model, and those are notorious for high token burn. So who knows, it could be a wash. But selective pressures to optimize for scale/growth/revenue/independence from MSFT/etc makes me think that OpenAI is chasing those watt-hours pretty doggedly. So 0.34 is probably high...

...but then Sora came out.


Yeah, something we are confident about is that

a) training is where the bulk of an AI system's energy usage goes (based on a report released by Mistral)

b) video generation is very likely a few orders of magnitude more expensive than text generation.

That said, I still believe that data centres in general - including AI ones - don't consume a significant amount of energy compared with everything else we do, especially heating and cooling and transport.

Pre-LLM data centres consume about 1% of the world's electricity. AI data centres may bump that up to 2%


You mean this Mistral report? https://mistral.ai/news/our-contribution-to-a-global-environ...

I don't think it shows that training uses more energy than inference over the lifetime of the model - they don't appear to share that ratio.


> don't consume a significant amount of energy compared with everything else we do, especially heating and cooling and transport

Ok, but heating and cooling are largely not negotiable. We need those technologies to make places liveable

LLMs are not remotely as crucial to our lives


You gotta start thinking about the energy used to mine and refine the raw materials used to make the chips and GPUs. Then take into account the infrastructure and data centers.

The amount of energy is insane.


And yet still tiny in relationship to transportation energy requirements and transportation itself is stuck on fossil fuels mostly.

At the end of the day green energy is perfect for AI and AI workloads.


This might be a dumb question but like...why does it matter? Are other companies reporting training run costs including amortized equipment/labor/research/etc expenditures? If so, then I get it. DeepSeek is inviting an apples-and-oranges comparison. If not, then these gotcha articles feel like pointless "well ackshually" criticisms. Akin to complaining about the cost of a fishing trip because the captain didn't include the price of their boat.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: