I evaluated RWKV recently, and it's interesting for sure. It's undertrained, and has a quirky architect, so some parts of it are different than playing with the llama ecosystem. The huge context length is super appealing, and in my tests, long prompts do seem to work and get coherent results.
Where it's slow is in tokenization -- it can be very, very slow to make an initial tokenization of a prompt. I think this has to do with how the network actually functions, like there's a forward loop that feeds each token in to the network sequentially.
I would guess if it had the same level of attention and work that the Llama stack is getting it would be pretty fantastic, but that's just a guess, I'm a hobbyist only.
Nope, not yet, the current 14B version is much worse than LLaMA 65B. But there are apparently plans to train a RWKV-65B by the end of the year, and if including the LLaMA training dataset results in something like LLaMA-65B but with infinite context then that'd be really amazing.