We show that even strong RAG systems quickly break under these conditions.
Awesome project led by
@neelbhandari.bsky.social and @tianyucao.bsky.social!!
RAG systems excel on academic benchmarks - but are they robust to variations in linguistic style?
We find RAG systems are brittle. Small shifts in phrasing trigger cascading errors, driven by the complexity of the RAG pipeline 🧵
We show that even strong RAG systems quickly break under these conditions.
Awesome project led by
@neelbhandari.bsky.social and @tianyucao.bsky.social!!
Check out this extensive eval by @neelbhandari.bsky.social and @tianyucao.bsky.social!
RAG systems excel on academic benchmarks - but are they robust to variations in linguistic style?
We find RAG systems are brittle. Small shifts in phrasing trigger cascading errors, driven by the complexity of the RAG pipeline 🧵
Check out this extensive eval by @neelbhandari.bsky.social and @tianyucao.bsky.social!
RAG systems excel on academic benchmarks - but are they robust to variations in linguistic style?
We find RAG systems are brittle. Small shifts in phrasing trigger cascading errors, driven by the complexity of the RAG pipeline 🧵
RAG systems excel on academic benchmarks - but are they robust to variations in linguistic style?
We find RAG systems are brittle. Small shifts in phrasing trigger cascading errors, driven by the complexity of the RAG pipeline 🧵