More

sethkim · 2025-10-20T22:04:44 1760997884

Under-discussed superpower of LLMs is open-set labeling, which I sort of consider to be inverse classification. Instead of using a static set of pre-determined labels, you're using the LLM to find the semantic clusters within a corpus of unstructured data. It feels like "data mining" in the truest sense.

frenchmajesty · 2025-10-20T22:33:31 1760999611

OP here. This is exactly right! You perfectly encapsulated the idea I stumbled up so beautifully.

alansaber · 2025-10-21T11:56:41 1761047801

problem is these dont bin properly

sethkim · 2025-09-26T19:12:22 1758913942

The models you called out at the beginning were all released this year. What do you think is the difference between this generation of models and previous ones?

sethkim · 2025-07-03T18:53:03 1751568783

Yes! Both Llama 3 and Gemma 3 have 128k context windows.

ryao · 2025-07-03T20:03:44 1751573024

Llama 3 had a 8192 token context window. Llama 3.1 increased it to 131072.

sethkim · 2025-07-03T18:50:04 1751568604

Yes, we're a startup! And LLM inference is a major component of what we do - more importantly, we're working on making these models accessible as analytical processing tools, so we have a strong focus on making them cost-effective at scale.

sharkjacobs · 2025-07-03T20:16:30 1751573790

I see your prices page lists the average cost per million tokens. Is that because you are using the formula you describe, which depends on hardware time and throughput?

> API Price ≈ (Hourly Hardware Cost / Throughput in Tokens per Hour) + Margin

sethkim · 2025-07-03T18:47:18 1751568438

My two cents here is the classic answer - it depends. If you need general "reasoning" capabilities, I see this being a strong possibility. If you need specific, factual information baked into the weights themselves, you'll need something large enough to store that data.

I think the best of both worlds is a sufficiently capable reasoning model with access to external tools and data that can perform CPU-based lookups for information that it doesn't possess.

sethkim · 2025-07-03T18:29:54 1751567394

Both great points, but more or less speak to the same root cause - customer usage patterns are becoming more of a driver for pricing than underlying technology improvements. If so, we likely have hit a "soft" floor for now on pricing. Do you not see it this way?

simonw · 2025-07-03T18:34:01 1751567641

Even given how much prices have decreased over the past 3 years I think there's still room for them to keep going down. I expect there remain a whole lot of optimizations that have not yet been discovered, in both software and hardware.

That 80% drop in o3 was only a few weeks ago!

sethkim · 2025-07-03T18:40:08 1751568008

No doubt prices will continue to drop! We just don't think it will be anything like the orders-of-magnitude YoY improvements we're used to seeing. Consequently, developers shouldn't expect the cost of building and scaling AI applications to be anything close to "free" in the near future as many suspect.

vfvthunter · 2025-07-03T18:40:48 1751568048

I do not see it this way. Google is a publicly traded company responsible for creating value for their shareholders. When they became dicks about ad blockers on youtube last year or so, was it because they hit a bandwidth Moore's law? No. It was a money grab.

ChatGPT is simply what Google should've been 5-7 years ago, but Google was more interested in presenting me with ads to click on instead of helping me find what I was looking for. ChatGPT is at least 50% of my searches now. And they're losing revenue because of that.

sethkim · 2025-06-17T19:29:38 1750188578

I run a batch inference/LLM data processing service and we do a lot of work around cost and performance profiling of (open-weight) models.

One odd disconnect that still exists in LLM pricing is the fact that providers charge linearly with respect to token consumption, but costs are actually quadratic with an increase in sequence length.

At this point, since a lot of models have converged around the same model architecture, inference algorithms, and hardware - the chosen costs are likely due to a historical, statistical analysis of the shape of customer requests. In other words, I'm not surprised to see costs increase as providers gather more data about real-world user consumption patterns.

diziet · 2025-06-18T17:02:05 1750266125

Aren't advances in KV caching making compute cost not quite quadratic?

sethkim · 2025-06-03T00:00:17 1748908817

Sutro.sh (fka Skysight) | Infrastructure/LLMs & Research Engineering | SF Bay Area | Full-time

We are building batch inference infrastructure and a great/user developer experience around it. We believe LLMs have not yet been meaningfully unlocked as data processing tools - we're changing that.

Our work involves interesting distributed systems and LLM research problems, newly-imagined user experiences, and a meaningful focus on mission and values.

Open Roles:

Infrastructure/LLM Engineer — https://jobs.skysight.inc/Member-of-Technical-Staff-Infrastr...

Research Engineer - https://jobs.skysight.inc/Member-of-Technical-Staff-Research...

If you're interested in applying, please send an email to jobs@sutro.sh with a resume/LinkedIn Profile. For extra priority, please include [HN] in the subject line.

sethkim · 2025-05-01T22:42:32 1746139352

Skysight | Infrastructure/LLMs & Research Engineering | SF Bay Area | Full-time

We are building large-scale batch inference infrastructure and a great/user developer experience around it. We believe LLMs have not yet been meaningfully unlocked as data processing tools - we're changing that.

Our work involves interesting distributed systems and LLM research problems, newly-imagined user experiences, and a meaningful focus on mission and values.

Open Roles:

Infrastructure/LLM Engineer — https://jobs.skysight.inc/Member-of-Technical-Staff-Infrastr...

Research Engineer - https://jobs.skysight.inc/Member-of-Technical-Staff-Research...

If you're interested in applying, please send an email to jobs@skysight.inc with a resume/LinkedIn Profile. For extra priority, please include [HN] in the subject line.

sethkim · 2025-04-17T21:49:23 1744926563

How "huge" are these datasets? Did you build your own tooling to accomplish this?