> Isn't your input your confidence that GPT-4 gives the correct answer You may b...

jefftk · on Sept 2, 2023

In that case 0.3 would be more wrong than 0.4 and less wrong than 0.2. The closer your predictions are to reality over a bunch of questions, the better you understand reality.

layer8 · on Sept 2, 2023

You can’t really say that for a single data point. The 0.3 may be completely correct. Now, if you try ten times, things might be different.

whimsicalism · on Sept 2, 2023

It’s a noisy measurement of how right you were.

kqr · on Sept 3, 2023

You're right. But if you enter 0.3 on average over 28 questions and the actual number of true answers differ by a lot from 8 then you have learned your general sense of GPT-4s abilities is uncalibrated.

croes · on Sept 2, 2023

Isn't the whole point to show you how right your confidence is about GPT's capabilities?

At least the results are about the quiz taker and his confidence.