I've got codex-cli with speech-to-text hooked up to (among other things) Home Assistant via MCP.
It'll do anything. I can literally tell it to play some music from a playlist and make the lights flash to the beat, and it'll just figure out how to do that.
Is it fast? Not really. Is it annoyingly slow for quick tasks like turning the lights off? Not too annoying anyway. Turning the lights on/off takes about 4 seconds from when I finish speaking.
You yourself have not felt the need to hook an LLM up, and you already have the hardware! :p