Lightnews — Scholar-powered news

Emily Xiao

@emilyxiao.bsky.social

8 followers 3 following 6 posts

Student @ CMU

Posts Replies Media Videos

Emily Xiao

@emilyxiao.bsky.social

Some insights we found:
- preceding context + attention sink are both critical for making block-sparse attention work without additional training.
- grouping examples for encoding & retrieval also boosts performance vs. purely individual retrieval.

[5/n]

March 18, 2025 at 3:48 PM

Emily Xiao

@emilyxiao.bsky.social

Results:
We evaluate DBSA with Llama models, and up to 90k context length. DBSA achieves comparable per request latency to fine-tuning while maintaining on average >95% of the best accuracy.

[3/n]

March 18, 2025 at 3:45 PM

Emily Xiao

@emilyxiao.bsky.social

Method:
- DBSA pre-encodes the many-shot examples with streaming block-sparse attention, allowing constant encoding time for new demos.
- During inference, it dynamically selects relevant KV chunks for each test query, given any retrieval method.

[2/n]

March 18, 2025 at 3:44 PM

Emily Xiao

@emilyxiao.bsky.social

Many-shot ICL (thousands of examples+) can match fine-tuning on many tasks, but its high inference cost makes deployment impractical.

We introduce DBSA, a training-free framework that achieves the best efficiency even under high request volumes, while maintaining strong accuracy 🧵

March 18, 2025 at 3:43 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news