All you have done is contribute a wikipedia article which is the second google result if you search the title of the video. Another user made a comment referencing a textbook they used to learn this material as well as some extended comments of their own - this actually provides information unlike a bare wikipedia link presented with a dismissive attitude.
> Why do you think that the 2024 Putnam programs that they used to test were in the training data?
Putnam solutions can be found multiple places online: https://kskedlaya.org/putnam-archive/, https://artofproblemsolving.com/community/c3249_putnam. These could have appeared in the training of the base LLM DeepSeek-V3.2-Exp or as problems in the training set - they do not give further detail on what problems they selected from AOPS and as the second link gives they are there.
> By that logic I can slice open a sphere and call it a sheet
You can do this. If you remove a point (or a line, or really any connected component), you get a space which is the same as the plane. What happens if you remove two distinct points? You end up with with a very thick circle. Three points? It starts to get harder to visualize, but you end up with two circles joined at a point. As you remove more points you will get more circles joined together. From a mathematical perspective, these spaces are very different. If we start to allow gluing arbitrary points in the sphere together it gets even worse, and you can get some pretty wild spaces.
The point of surgery is that by requiring this gluing in of these spheres along the boundary of the space we cut out, the resulting spaces are not as wild - or at least are easier to handle than if we do any operation. To give an example, one might have some space and we want to determine if it has property A. The problem is our space has some property B which makes it difficult to determine property A directly. But by performing surgery in a specific way, we can produce a new space which has property A if and only if the original space did, and importantly, no longer has property B.
For property As that mathematicians care about, surgery often does a good job of preserving the property. In contrast things like just cutting and gluing points together without care will typically change property A, so it does not help as much.
> Likewise I wonder why we need to import a sphere rather than just pinch the ends of the tube shut and say it's now a sphere.
I am not an expect on surgery, but I think from a mathematical perspective, pinching the ends of the tube shut and gluing in a new sphere would be equivalent operations. This pinching operation would be formalized as a "quotient space", and you can formalize the sphere as a "quotient" space equivalent to the pinching.
> In guideline 1v1 a lot of very high level games are decided by garbage RNG which I think is even less interesting than determining who is 0.1pps faster.
I have played a lot of (moderately high level) 1v1 tetris and I would have to disagree. In fact I often felt that the reverse is true - if I felt I died to garbage hole RNG, really that meant I was getting out pressured and would have lost eventually anyways. And while my playstyle was more aggressive, try to out speed opponent, I lost my fair share of games to people playing (much) slower but just incredibly efficient.
I agree there is an overall disappointing amount of interaction between players, though. Watching your opponents board and adjusting to it is hard and takes a while to build the skill to do. And a lot of the times you can just get away with it by playing faster and out pressuring and ignoring the other player.
> I have played a lot of (moderately high level) 1v1 tetris and I would have to disagree. In fact I often felt that the reverse is true - if I felt I died to garbage hole RNG, really that meant I was getting out pressured and would have lost eventually anyways.
To be honest, I was never good enough for it to be a big issue, but it does seem apparent to me that it is an issue for the highest level players. I could be wrong, of course, but assuming I'm not, I think this brings up an interesting question: if it's something that you have to be so good at the game to have impact you meaningfully, does it really matter for 99.9% of players including myself who will certainly never get there? I guess the answer is probably not, but it does have a psychological impact of sorts. It definitely can make tournament outcomes feel less interesting.
So really random garbage just irks me because it seems like an unnecessary addition of RNG into an otherwise skilled game. I don't think random garbage is more fun than deterministic garbage schemes. I would suppose some people disagree.
The lack of serious interaction and a deep meta game, though... That's a bigger problem, yeah. I am not sure you can fix that while still producing something that you can really call "Tetris".
(And even when Nintendo called "Panel de Pon" "Tetris Attack" outside of Japan, I don't think it wound up having a terribly interesting interaction between players, either, despite being an entirely different game from the ground up! Still pretty fun though.)
Thanks for the correction. I hadn't considered bulk memory operations to be part of SIMD operation but it makes sense -- they operate on a larger grain than word-size so they can do the same operation with less micro-ops overhead.
> There have been efforts to reprove it with a more easily verified proof, but they've gone nowhere.
My understanding was that the so called "second generation proof" of the classification of finite simple groups led by Gorenstein, Lyons, Solomon has been progressing slowly but steadily, and only the quasithin case had a significant (but now fixed) hole. Are there other significant gaps that aren't as well known?
Still they have a couple more books of proof left, and I have to wonder how carefully it will be reviewed. This will still be a massive improvement, but I'd be a lot happier if the entire proof could be formalized.
Plus there is still a possibility that there proves to be another significant hole.
If any theorem needs to be formalized, this is the one. No other theorem is this big, this hard to prove, and this important to get right.
I am somewhat surprised issues of scripting and trading even exist in the registration system. Staggering enrollment times over a few days, with new waves every 20 minutes or so, mostly solves scripting issues since you are only competing with a fraction of the student body now. Giving courses waitlists once they are full, instead of allowing people to just directly register once a spot frees up, makes trading impossible since if you could trade you could have just registered for the course anyways.
I understand that the registration system is probably old and tied up in tons of just as old management software, but if the university really cared the solutions should be there.
When I went there it was staggered, which causes this desire for spot trading - seniors register first, so if you are an freshman/sophomore/junior, you beg a senior to register a spot in a class you want then coordinate them deregistering just before you register for the class. This automated that and at scale could be a big issue.
The school I attended in 2010 had a system like this as well. Awful backend with a simple, but still awful, web interface to talk to it. There was rumor you could telnet in and use an actual text interface, but I never saw it.
The system was replaced a few years later with an Oracle PeopleSoft implementation. Everybody hated it more.
It's much easier to build a gateway API for a legacy system than to extend it. Not disagreeing with you. Honestly, though, software systems for academic institutions are ridiculously complicated, because they are essentially a student portal, a school, a sales organization, a rules engine, etc. etc. all wrapped up into one and interconnected in ways that aren't obvious on the surface.
You can just drop the course - pretty much every university (in the United States, at least) allows students to drop courses one or two weeks into the semester without any record (on say, a transcript). Otherwise students cannot possibly plan their semesters, since courses may not make material available until after the semester actually starts.
So if you are planning to sell the slots and it does not work out, you just drop the course, no harm to you.
"No harm" except for the several slots that were taken up by people hoping to make a quick buck (selling the trade) who drop a day or two into classes and cause the students who actually NEED the class to be very stressed and annoyed and possibly have to adjust their plans because they don't think they will be able to take a class even though it will actually be free by the third time the class meets.
It also just fucks with the University's ability to gauge class interest. In my university, if a class filled up early in the registration window, the University would try to increase capacity or add another copy of the class, but that's not always an option.
A reminder that this is not done for technical reasons. Plenty of colleges all across the US, big and small, custom-built registration software or purchased on the open market, have fully functional waitlist features. "first come first serve" is a policy choice.
But how does the student, or in your case the LLM, know that it actually has the solution? For students, this is done by: a grader grading the homework, asking the professor at OH, working on problems with other peers who crosscheck as you go. I see no reason why this LLM produced synthetic data, without this correction factor, would not devolve into a mess of incorrect, maybe even not-even-wrong style "proofs". And then how can training on this yield anything?
When you get good enough at mathematics, you can tell if your proofs are correct or not without asking a TA to grade them for you. Most mathematicians reach this level before they finish undergrad (a rare few reach it before they finish high school). While AI hasn't reached this level yet, there is no fundamental barrier stopping it from happening - and for now, researchers can use formal proof-checking software like Lean, Coq, or Isabelle to act as a grader.
(In principle, it should be also be possible to get good enough at philosophy to avoid devolving into a mess of incoherence while reasoning about concepts like "knowledge", "consciousness", and "morality". I suspect some humans have achieved that, but it seems rather difficult to tell...)
> When you get good enough at mathematics, you can tell if your proofs are correct or not without asking a TA to grade them for you.
This is simply not true - you can get a very good sense of when your argument is correct, yes. But having graded for (graduate, even!) courses, even advanced students make mistakes. It's not limited to students, either; tons of textbooks have significant errata, and its not as if no retraction in math has ever been issued.
These get corrected by talking with other people - if you have an LLM spew out this synthetic chain-of-reasoning data, you probably get at least some wrong proofs, and if you blindly try to scale with this I would expect it to collapse.
Even tying into a proof-checker seems non-trivial to me. If you work purely in the proof-checker, you never say anything wrong - but the presentations in proof checking language is very different from textual ones, so I would anticipate issues of the LLM leveraging knowledge from, say, textbooks in its proofs. You might also run into issues of the AI playing a game against the compiler rather than building understanding (you see elements of this in the proofs produced by AlphaProof). And if you start mixing natural language and proof checkers, you've just kicked the verification can up the road a bit, since you need some way of ensuring the natural language actually matches the statements being shown by the proof checker.
I don't think these are insurmountable challenges, but I also don't think its as simple as the "generate synthetic data and scale harder" approach the parent comment thinks. Perhaps I'm wrong - time will tell.
The error rate of human mathematical work is not zero, but it does go down exponentially with the amount of time that the mathematician spends carefully thinking about the problem. Mistakes tend to be the result of typos, time pressure, or laziness - showing your work to others and having them check it over does help (it's one of the reasons we have peer review!), but is absolutely not necessary.
If the error rate is low enough - and by simply spending a constant factor more time finding and correcting errors, you can get it below one in a million - then you do get a virtuous feedback loop even without tying in a proof-checker. That's how humans have progressed, after all. While you are right to say that the proof-checker approach certainly is not trivial, it is currently much easier than you would expect - modern LLMs are surprisingly good at converting math written in English directly to math formalized in Lean.
I do think that LLMs will struggle to learn to catch their mistakes for a while. This is mostly because the art of catching mistakes on your own is not taught well (often it is not taught at all), and the data sets that modern LLMs train on probably contain very, very few examples of people applying this art.
A tangent: how do human mathematicians reliably manage to catch mistakes in proofs? Going line-by-line through a proof and checking that each line logically follows from the previous lines is what many people believe we do, but it is actually a method of last resort - something we only do if we are completely lost and have given up on concretely understanding what is going on. What we really do is build up a collection of concrete examples and counterexamples within a given subject, and watch how the steps of the proof play out in each of these test cases. This is why humans tend to become much less reliable at catching mistakes when they leave their field of expertise - they haven't yet built up the necessary library of examples to allow them to properly interact with the proofs, and must resort to reading line by line.
> All of which effort and edifice would collapse into the dumpster
Except it wouldn't, because the work towards the BSD would still be right and applicable to other problems. If someone proved the Riemann hypothesis false, all of our math (and there is a lot of it) surrounding the problem isn't immediately made worthless. The same is true for any mathematical conjecture.
I don't doubt the rest of your comment might have played a role, however.