I am (genuinely, really asking) curious why you think so. I've got no hand in this, but the skepticism around phishing attacks from this site of all places really surprises me. People like Kevin Mitnick have done more sophisticated phishing with fewer tools. Why wouldn't someone intent on running a social engineering scam use one of the widely available voice faking technologies that are available now? Keep in mind that they're simple enough to use that people are making memes with voices generated from ~5 seconds of voice recordings.
Making a meme is nothing like an interactive telephone conversation.
It's not that it's impossible, but it's not trivial either. But mainly, it's just unnecessary.
If the user is not fooled by a well crafted phishing, by doing the most trivial countermeasures such as calling back, they are not going to be fooled by a deepfake. In practice work on phishing is mostly better spent elsewhere. So while we shouldn't dismiss it completely, it's clearly not the case with a smallish company with limited economic value, so very unlikely the case here.
There has been a handful of highly profile media cases involving deepfake. None of which has held up on further investigation. It is understandable, nobody wants to be known as the one who didn't recognize his own kid on the phone, but the truth is more simple and actually helps us when designing countermeasures.
I suppose the disconnect then would be that we fundamentally disagree on what the simpler answer is. It's my understanding that a deepfake voice being used as part of a phishing scam is something that can be done trivially (or at least by a determined actor using free tools, so at least trivial enough for this case), so to me that would be the simplest, most obvious answer when compared to a company-wide conspiracy, but I can see your point if it is assumed that that isn't the case and that deepfake voices are actually hard to do.
Try it! The tools are publicly available. You might find that it's harder than you think. We are very sensitive for uncanny conversations. Analogue imitations and pitching the voice is much easier to work with.
However, my point is that none of that matters. After all, deepfakes are only going to get easier, so it is only a matter of time before it is as cheap as you describe. It is that imitating a voice have very little impact on the outcome of a phishing operation. Sure, it might not hurt, but other things affects the success of a scam. Don't rely on impersonating a voice, especially since a trivial callback completely defeats it, no matter how much resources you put into it.
Which is also why none of these recent media stories make sense. And when investigated, none of them has held up to scrutiny, precisely as expected. I have not done this myself but look out for follow up stories by respected bloggers and journalists.
Lots of people work with defending against these operations, and none of them spend any correctly identifying deepfakes for a reason. Don't believe my word, I am not in the business, but ask anyone who is if they find these details believable.