Hey, indeed Whisper can do the transcription of Japanese and even the translation (but only to English). For the best results you need to use the largest model which depending on your hardware might be slow or fast.
Another option is to use something like VideoToTextAI which allows you to transcribe it fast and then translate it into 100+ languages which you can then export the subtitle (SRT) file for
https://www.videototextai.com/ - an AI transcription, translation, chat with your video/audio platform. We are very close to releasing an update where it is possible to caption any video in any language - perfect for making social media content.
We're currently developing https://www.videototextai.com/ – ChatGPT for video and audio. The idea is to get to an all-in-one video and audio editing/insights platform. We’re actively building out new features to fully realise our vision, and we'd love to get any feedback from HackerNews!
If you are looking for something automatic that also allows you to interact with your transcripts chatgpt style then I would recommend https://www.videototextai.com/
That cookies box though... Dark pattern (accept lots + accept all, fake drag affordance, covering a quarter of the page) for cookies doesn't bode well for privacy protections around the transcripts.
You are allowed to delete any transcription you make and with that we do not keep any copy of the transcripts :) . The cookie banner is there to comply with the EU laws.
Another option is to use something like VideoToTextAI which allows you to transcribe it fast and then translate it into 100+ languages which you can then export the subtitle (SRT) file for