How did you construct the embedding? Sum of individual token vectors, or somethi...

liqilin1567 · 2025-10-21T03:20:56 1761016856

sentence embedding models like all-MiniLM-L6-v2 [1], bge-m3 [2]

[1] https://huggingface.co/sentence-transformers/all-MiniLM-L6-v...

In my recent project I used openai's embedding model for that because of its convenient api and low cost.

minimaxir · 2025-10-21T00:30:17 1761006617

Model embedding models (particulaly those with context windows of 2048+ tokens) allow you to YOLO and just plop the entire text blob into it and you can still get meaningful vectors.

Formatting the input text to have a consistent schema is optional but recommended to get better comparisons between vectors.

olliepro · 2025-10-20T22:22:55 1760998975

sentence embedding models are great for this type of thing.