florianhoenicke.bsky.social
@florianhoenicke.bsky.social
Reposted
Embedding models become "blind" beyond 4K tokens in context length. Building on the NoLIMA paper, our experiments show that for needle-in-a-haystack tasks, performance of embedding models drops to near-random chance with long contexts—even with exact keyword matches 🤔 🧵
March 7, 2025 at 9:28 AM