I just tried "analyze this audio file recording of a meeting and notes along with a transcript labeling all the speakers" (using the language from the parent's comment) and indeed Gemini 3 was significantly better than 2.5 Pro.
3 created a great "Executive Summary", identified the speakers' names, and then gave me a second by second transcript:
[00:00] Greg: Hello.
[00:01] X: You great?
[00:02] Greg: Hi.
[00:03] X: I'm X.
[00:04] Y: I'm Y.
...
3 created a great "Executive Summary", identified the speakers' names, and then gave me a second by second transcript:
Super impressive!