the reranker is a cross encoder that sees the docs and the query at the same time. What you normally do is you generating embeddings ahead of time, independent of the prompt used, calculate cosine similarity with the prompt, select the top-k best chunks that match the prompt and only then use a reranker to sort them.
embeddings are a lossy compression, so if you feed the chunks with the prompt at the same time, the results are better. But you can't do this for your whole db, that's why the filtering with cosine similarity at the beginning.
embeddings are a lossy compression, so if you feed the chunks with the prompt at the same time, the results are better. But you can't do this for your whole db, that's why the filtering with cosine similarity at the beginning.