Lightnews — Scholar-powered news

Reposted by Archiki Prasad

Elias Stengel-Eskin @esteng.bsky.social · May 5

Extremely excited to announce that I will be joining
@utaustin.bsky.social Computer Science in August 2025 as an Assistant Professor! 🎉

5 9 43

Reposted by Archiki Prasad

Elias Stengel-Eskin @esteng.bsky.social · Apr 30

🌵 I'm going to be presenting PBT at #NAACL2025 today at 2PM! Come by poster session 2 if you want to hear about:
-- balancing positive and negative persuasion
-- improving LLM teamwork/debate
-- training models on simulated dialogues

With @mohitbansal.bsky.social and @peterbhase.bsky.social

Elias Stengel-Eskin @esteng.bsky.social · Jan 23

🎉Very excited that our work on Persuasion-Balanced Training has been accepted to #NAACL2025! We introduce a multi-agent tree-based method for teaching models to balance:

1️⃣ Accepting persuasion when it helps
2️⃣ Resisting persuasion when it hurts (e.g. misinformation)

arxiv.org/abs/2410.14596
🧵 1/4

3 8

Reposted by Archiki Prasad

Elias Stengel-Eskin @esteng.bsky.social · Apr 29

✈️ Heading to #NAACL2025 to present 3 main conf. papers, covering training LLMs to balance accepting and rejecting persuasion, multi-agent refinement for more faithful generation, and adaptively addressing varying knowledge conflict.

Reach out if you want to chat!

1 5 15

Reposted by Archiki Prasad

Elias Stengel-Eskin @esteng.bsky.social · Apr 24

Check out 🚨CAPTURe🚨 -- a new benchmark testing spatial reasoning by making VLMs count objects under occlusion.

SOTA VLMs (GPT-4o, Qwen2-VL, Intern-VL2) have high error rates on CAPTURe (but humans have low error ✅) and models struggle to reason about occluded objects.

arxiv.org/abs/2504.15485

🧵👇

1 4 5

Reposted by Archiki Prasad

Mohit Bansal @mohitbansal.bsky.social · Apr 21

In Singapore for #ICLR2025 this week to present papers + keynotes 👇, and looking forward to seeing everyone -- happy to chat about research, or faculty+postdoc+phd positions, or simply hanging out (feel free to ping)! 🙂

Also meet our awesome students/postdocs/collaborators presenting their work.

1 4 19

Archiki Prasad @archiki.bsky.social · Apr 18

Thanks to my coauthors: @hwang98.bsky.social @esteng.bsky.social @mohitbansal.bsky.social
for the fun collaboration!

@unccs.bsky.social

Paper: arxiv.org/abs/2504.13079
Data and Code: github.com/HanNight/RAM...

Retrieval-Augmented Generation with Conflicting Evidence

Large language model (LLM) agents are increasingly employing retrieval-augmented generation (RAG) to improve the factuality of their responses. However, in practice, these systems often need to handle...

arxiv.org

1

Archiki Prasad @archiki.bsky.social · Apr 18

Can RAG systems handle imbalanced evidence or increasing misinformation?

➡️ As document support becomes imbalanced, baselines ignore under-supported correct answers but MADAM-RAG maintains stable performance

➡️ As misinformation 📈, baselines degrade sharply (−46%) but MADAM-RAG remains more robust

Archiki Prasad @archiki.bsky.social · Apr 18

How important are multi-round debate and aggregation in MADAM-RAG?

Increasing debate rounds in MADAM-RAG improves performance by allowing agents to refine answers via debate.

Aggregator provides even greater gains, especially in early rounds, aligning conflicting views & suppressing misinfo.

1

Archiki Prasad @archiki.bsky.social · Apr 18

We evaluate on 3 datasets: FaithEval (suppression of misinformation), AmbigDocs (disambiguation across sources), RAMDocs (our dataset w/ different types of conflict).

MADAM-RAG consistently outperforms concatenated-prompt and Astute RAG baselines across all three datasets and model backbones.

1

Archiki Prasad @archiki.bsky.social · Apr 18

We propose MADAM-RAG, a structured, multi-agent framework designed to handle inter-doc conflicts, misinformation, & noise in retrieved content, comprising:

1️⃣ Independent LLM agents - generate intermediate response conditioned on a single doc
2️⃣ Centralized aggregator
3️⃣ Iterative multi-round debate

1

Archiki Prasad @archiki.bsky.social · Apr 18

📂RAMDocs is designed to reflect the complexities of real-world retrieval. It includes:

➡️ Ambiguous queries w/ multiple valid ans.
➡️ Imbalanced document support (some answers backed by many sources, others by fewer)
➡️ Docs w/ misinformation (plausible but wrong claims) or noisy/irrelevant content

1

Archiki Prasad @archiki.bsky.social · Apr 18

🚨Real-world retrieval is messy: queries are ambiguous or docs conflict & have incorrect/irrelevant info. How can we jointly address these problems?

➡️RAMDocs: challenging dataset w/ ambiguity, misinformation & noise
➡️MADAM-RAG: multi-agent framework, debates & aggregates evidence across sources

🧵⬇️

3 7 14

Reposted by Archiki Prasad

Zaid Khan @codezakh.bsky.social · Apr 15

What if we could transform advanced math problems into abstract programs that can generate endless, verifiable problem variants?

Presenting EFAGen, which automatically transforms static advanced math problems into their corresponding executable functional abstractions (EFAs).
🧵👇

1 5 15

Reposted by Archiki Prasad

Elias Stengel-Eskin @esteng.bsky.social · Apr 12

🚨Announcing TaCQ 🚨 a new mixed-precision quantization method that identifies critical weights to preserve. We integrate key ideas from circuit discovery, model editing, and input attribution to improve low-bit quant., w/ 96% 16-bit acc. at 3.1 avg bits (~6x compression)

📃 arxiv.org/abs/2504.07389

1 7 15

Reposted by Archiki Prasad

UNC-Chapel Hill Computer Science @unccs.bsky.social · Mar 27

🎉 A big congratulations to @archiki.bsky.social (advised by Prof. @mohitbansal.bsky.social) for the being awarded the 2025 Apple Scholars in AI/ML PhD Fellowship!", we are proud of you! 👏

Archiki Prasad @archiki.bsky.social · Mar 27

🥳🥳 Honored and grateful to be awarded the 2025 Apple Scholars in AI/ML PhD Fellowship! ✨

Huge shoutout to my advisor @mohitbansal.bsky.social, & many thanks to my lab mates @unccs.bsky.social , past collaborators + internship advisors for their support ☺️🙏

machinelearning.apple.com/updates/appl...

1 2

Archiki Prasad @archiki.bsky.social · Mar 27

Thanks Elias!!

1

Archiki Prasad @archiki.bsky.social · Mar 27

Thanks Jaemin, learned so much from you as well!

Reposted by Archiki Prasad

Mohit Bansal @mohitbansal.bsky.social · Mar 27

🎉🎉 Big congrats to @archiki.bsky.social on being awarded the @Apple AI/ML PhD Fellowship, for her extensive contributions in evaluating+improving reasoning in language/reward models and their applications to new domains (ReCEval, RepARe, System-1.x, ADaPT, ReGAL, ScPO, UTGen, GrIPS)! #ProudAdvisor

Archiki Prasad @archiki.bsky.social · Mar 27

🥳🥳 Honored and grateful to be awarded the 2025 Apple Scholars in AI/ML PhD Fellowship! ✨

Huge shoutout to my advisor @mohitbansal.bsky.social, & many thanks to my lab mates @unccs.bsky.social , past collaborators + internship advisors for their support ☺️🙏

machinelearning.apple.com/updates/appl...

1 1 3

Archiki Prasad @archiki.bsky.social · Mar 27

🥳🥳 Honored and grateful to be awarded the 2025 Apple Scholars in AI/ML PhD Fellowship! ✨

Huge shoutout to my advisor @mohitbansal.bsky.social, & many thanks to my lab mates @unccs.bsky.social , past collaborators + internship advisors for their support ☺️🙏

machinelearning.apple.com/updates/appl...

1 3 14

Reposted by Archiki Prasad

Shoubin Yu @shoubin.bsky.social · Mar 19

Introducing VEGGIE 🥦—a unified, end-to-end, and versatile instructional video generative model.

VEGGIE supports 8 skills, from object addition/removal/changing, and stylization to concept grounding/reasoning. It exceeds SoTA and shows 0-shot multimodal instructional & in-context video editing.

1 4 5

Reposted by Archiki Prasad

Mohit Bansal @mohitbansal.bsky.social · Feb 5

🚨 Check out "UTGen & UTDebug" for learning to automatically generate unit tests (i.e., discovering inputs which break your code) and then applying them to debug code with LLMs, with strong gains (>12% pass@1) across multiple models/datasets! (see details in 🧵👇)

1/4

Archiki Prasad @archiki.bsky.social · Feb 4

🚨 Excited to share: "Learning to Generate Unit Tests for Automated Debugging" 🚨
which introduces ✨UTGen and UTDebug✨ for teaching LLMs to generate unit tests (UTs) and debugging code from generated tests.

UTGen+UTDebug yields large gains in debugging (+12% pass@1) & addresses 3 key questions:

🧵👇

1 4 7

Archiki Prasad @archiki.bsky.social · Feb 4

Thanks to my amazing co-authors @esteng.bsky.social (co-lead), @cyjustinchen.bsky.social, @codezakh.bsky.social, @mohitbansal.bsky.social

@unccs.bsky.social

Paper: arxiv.org/abs/2502.01619
Code+Datasets: github.com/archiki/UTGe...

Learning to Generate Unit Tests for Automated Debugging

Unit tests (UTs) play an instrumental role in assessing code correctness as well as providing feedback to a large language model (LLM) as it iteratively debugs faulty code, motivating automated test g...

arxiv.org

1

Archiki Prasad @archiki.bsky.social · Feb 4

Lastly, we show that both test-time scaling and backtracking are crucial for UTDebug, and scaling the number of generated UTs also consistently improves code accuracy.

1

Archiki Prasad @archiki.bsky.social · Feb 4

Combining UTGen with UTDebug 🤝 we consistently outperform no UT feedback, randomly sampling UTs, and prompting targeted UTs across 3 models & datasets.

For partially correct code with subtle errors (our MBPP+Fix hard split) debugging with UTGen improves over baselines by >12.35% on Qwen 2.5!

1 1

Archiki Prasad @archiki.bsky.social · Feb 4

RQ3: We also propose ✨UTDebug ✨ with two key modifications:

1⃣Test-time scaling (self-consistency over multiple samples) for increasing output acc.

2⃣Validation & Backtracking: Generating multiple UTs to perform validation, accept edits only when the overall pass rate increases & backtrack otherwise

1