Lightnews — Scholar-powered news

Emily Xiao

@emilyxiao.bsky.social

8 followers 3 following 6 posts

Student @ CMU

Posts Media Videos Starter Packs

Emily Xiao @emilyxiao.bsky.social · Mar 18

Check out the paper and code for details! We hope DBSA pushes many-shot ICL closer to real-world viability. Happy to discuss!

arxiv.org/abs/2503.08640
github.com/millix19/dbsa

Thank you to my collaborators! Chin-Jou Li, Yilin Zhang,
@abertsch.bsky.social @gneubig.bsky.social

Efficient Many-Shot In-Context Learning with Dynamic Block-Sparse Attention

Many-shot in-context learning has recently shown promise as an alternative to finetuning, with the major advantage that the same model can be served for multiple tasks. However, this shifts the comput...

arxiv.org

1 2

Emily Xiao @emilyxiao.bsky.social · Mar 18

Some insights we found:
- preceding context + attention sink are both critical for making block-sparse attention work without additional training.
- grouping examples for encoding & retrieval also boosts performance vs. purely individual retrieval.

[5/n]

1 2

Emily Xiao @emilyxiao.bsky.social · Mar 18

Storage Cost?
Yes, caching thousands of examples can be large. However, it’s also easy to re-compute if needed—unlike fine-tuned parameters, which also requires substantial storage space for a large number of tasks and are often stored indefinitely.

[4/n]

1 1

Emily Xiao @emilyxiao.bsky.social · Mar 18

Results:
We evaluate DBSA with Llama models, and up to 90k context length. DBSA achieves comparable per request latency to fine-tuning while maintaining on average >95% of the best accuracy.

[3/n]

1 1

Emily Xiao @emilyxiao.bsky.social · Mar 18

Method:
- DBSA pre-encodes the many-shot examples with streaming block-sparse attention, allowing constant encoding time for new demos.
- During inference, it dynamically selects relevant KV chunks for each test query, given any retrieval method.

[2/n]

1 1

Emily Xiao @emilyxiao.bsky.social · Mar 18

Many-shot ICL (thousands of examples+) can match fine-tuning on many tasks, but its high inference cost makes deployment impractical.

We introduce DBSA, a training-free framework that achieves the best efficiency even under high request volumes, while maintaining strong accuracy 🧵

1 1