From the article linked from this blog post: "Enabling computers to understand language remains one of the hardest problems in artificial intelligence."
I worked for this task for a year and it doesn't work very well because in embedding space relatedness, synonymy and antonymy are mixed up and require pairwise thresholding. You can probably get to 90% but not 99% this way. Better use a crossentropy approach.
In modern RAG applications we return top-k results for this reason - it can't simply give the correct snippet in one result, leaving the hard part to the LLM to make sense what is useful and what is not.
From the article linked from this blog post: "Enabling computers to understand language remains one of the hardest problems in artificial intelligence."