This is really cool. Can you describe more about the scenarios you define and how you manage them over time? I can imagine if you end up with a lot of them, then it becomes time consuming to perform all of the visual tests or handle the human-in-the-loop part.
I wonder if you could do some automated diffing between the scenario results. If they are pretty similar and they were previously considered in a good state, then you don't necessarily need a QA person to review it.