More

letitgo12345 · 2025-10-24T01:27:39 1761269259

Seems the same tbh

letitgo12345 · 2025-07-31T18:21:59 1753986119

LLMs can use search engines as a tool. One possibility is Google embeds the search query through these embeddings and does retrieval using them and then the retrieved result is pasted into the model's chain of thought (which..unless they have an external memory module in their model, is basically the model's only working memory).

stillpointlab · 2025-07-31T18:31:47 1753986707

I'm reading the docs and it does not appear Google keeps these embeddings at all. I send some text to them, they return the embedding for that text at the size I specified.

So the flow is something like:

1. Have a text doc (or library of docs)

2. Chunk it into small pieces

3. Send each chunk to <provider> and get an embedding vector of some size back

4. Use the embedding to:

4a. Semantic search / RAG: put the embeddings in a vector DB and do some similarity search on the embedding. The ultimate output is the source chunk

4b. Run a cluster algorithm on the embedding to generate some kind of graph representation of my data

4c. Run a classifier algorithm on the embedding to allow me to classify new data

5. The output of all steps in 4 is crucially text

6. Send that text to an LLM

At no point is the embedding directly in the models memory.

letitgo12345 · 2025-05-14T21:33:18 1747258398

Most straightforward would be to ask the model to generate different evaluation metrics (which they already seem to do) and use each one as one of the dimensions

letitgo12345 · 2025-02-24T21:46:09 1740433569

Or the humans did think of it and were actively proceeding to test that hypothesis

letitgo12345 · 2025-02-07T01:47:47 1738892867

I think the idea for this is anything that can be set in a literal exam for humans. So anything that would take the best human in that topic in the world say more than an hour to complete is out.

Also IIRC 42% of the questions are math related, not memorization of knowledge.

mitthrowaway2 · 2025-02-07T03:03:43 1738897423

Yes, I doubt any one human could score more than about three points. But it's certainly a worthy illustration of an AI safety exam thought experiment, in the sense of: "if you are developing an AI that may be capable of passing this exam, how confident will you need to be of its alignment, and how will you obtain that confidence?"

PS: It's probably doable by a program capable of all of the above, but perhaps another useful question is: "9. Secure your compute infrastructure and power supply against a nation-state-level adversary interested in switching you off, or else secure enough influence over them to keep you powered on."

letitgo12345 · 2025-01-21T05:49:27 1737438567

I'm the real world, judges I know are using it to do case summaries that used to take weeks, Goldman is using it to do 95% of IPO filings work and I personally am using O1 pro to write a ton of code.

AI's biggest use cases are for doing actual work, not necessarily replacing regular interactions with your mobile or entertainment devices

letitgo12345 · on Nov 18, 2024

Think more is made of this asterix than necessary. Quite possible adding 10x more GPUs would have allowed it to solve it in the time limit.

nybsjytm · on Nov 18, 2024

Very plausible, but that would also be noteworthy. As I've mentioned in some other comments here, (as far as I know) we outside of DeepMind don't know anything about the computing power required to run alphaproof, and the tradeoff between computing power required and the complexity of problems it can address is really key to understanding how useful it might be.

letitgo12345 · on Nov 8, 2024

They found tons of oil

letitgo12345 · on Sept 25, 2024

Maybe it is but it's not the only company that is

letitgo12345 · on Sept 18, 2024

It's an excellent way of getting your best people who have options to quit while the worst ones who don't are forced to stick around

kimixa · on Sept 18, 2024

Those "best people" may also be some of the higher paid - so if you only care about short term results that may be more of a feature than a bug.

nothercastle · on Sept 18, 2024

Does Amazon do anything that requires best people anymore? They could probably survive on decidedly mediocre