Dustin Wright
@dustinbwright.com
3.7K followers 1K following 39 posts
Postdoc @ University of Copenhagen (CopeNLU) | Making the world's knowledge reliable and accessible w/ ML + NLP | Former UMSI, AI2, IBM Research, UCSD | https://dustinbwright.com
Posts Media Videos Starter Packs
Pinned
dustinbwright.com
🎉 Our work on attribution in summarization is now accepted to #EMNLP2025 main! 🎉

"Unstructured Evidence Attribution for Long Context Query Focused Summarization"

w/ @zainmujahid.me , Lu Wang, @iaugenstein.bsky.social , and @davidjurgens.bsky.social
dustinbwright.com
🦾 We demonstrate across 5 LLMs and 4 datasets that LLMs adapted with SUnsET generate more relevant and factually consistent evidence, extract evidence from more diverse locations in their context, and can generate more relevant and consistent summaries than baselines.
dustinbwright.com
🔎 We show for existing large language models that evidence is often copied incorrectly and "lost-in-the-middle". To help perform this task, we create the Summaries with Unstructured Evidence Text dataset (☀️SUnsET☀️), a synthetic dataset which can be used to train unstructured evidence citation.
dustinbwright.com
💡 Normally when automatically generated summaries cite supporting evidence, they cite fixed-granular evidence e.g., individual sentences or whole documents. Our work proposes to extract spans of *any* length as more relevant and consistent evidence for long context query focused summaries.
dustinbwright.com
🎉 Our work on attribution in summarization is now accepted to #EMNLP2025 main! 🎉

"Unstructured Evidence Attribution for Long Context Query Focused Summarization"

w/ @zainmujahid.me , Lu Wang, @iaugenstein.bsky.social , and @davidjurgens.bsky.social
dustinbwright.com
There’s something really special about seeing a physical print copy of our work 🤩

You can read “Efficiency is Not Enough: A Critical Perspective on Environmentally Sustainable AI” now in CACM!!!

dl.acm.org/doi/10.1145/...
Reposted by Dustin Wright
andersgiovanni.com
No fewer than three people were needed to cover all the aspects of our dialogue simulation paper. Thanks for the interest — check out the preprint. Link in Dustin’s post.
@dustinbwright.com @ic2s2.bsky.social #ic2s2
dustinbwright.com
We had a great time talking about dialogue simulation with LLMs at @ic2s2.bsky.social !!! Amazing work by all of our colleagues at UMich.

See the preprint of this work here: arxiv.org/abs/2409.08330
Reposted by Dustin Wright
ariannapera.bsky.social
The work “Extracting Participation in Collective Action from Social Media”, in collaboration with @lajello.bsky.social, at @ic2s2.bsky.social today!

Check out the paper ojs.aaai.org/index.php/IC... and models huggingface.co/ariannap22

Feat. poster and research buddy @alessianetwork.bsky.social ♥️
dustinbwright.com
Open PhD positions in Denmark! daracademy.dk/fellowship/f...

If you want to apply to work with me and Johannes Bjerva at @aau.dk Copenhagen, I'll be at @ic2s2.bsky.social this week and @aclmeeting.bsky.social next week! DM me if you'd like to meet :)
Dara
daracademy.dk
dustinbwright.com
Join us for the Pre-ACL 2025 Workshop in Copenhagen, 26 July, 2025!
🇩🇰 With international NLP experts from Columbia, UCLA, University of Michigan, and more to Copenhagen to meet with the Danish NLP community. 🇩🇰
📅 Poster submission deadline: June 16, 2025
🔗 Register: www.aicentre.dk/events/pre-a...
Pre-ACL 2025 Workshop | Event | Pioneer Centre for Artificial Intelligence
www.aicentre.dk
Reposted by Dustin Wright
aicentre.dk
Thanks to @dustinbwright.com (@copenlu.bsky.social) and @mxij.me (@itu.dk) for sharing insights on your research within the collaboratory of Speech & Language, at the Last Fridays Talks!
dustinbwright.com
This work on fact checking with summarized evidence was accepted to #SIGIR 2025!
Reposted by Dustin Wright
nicolang.bsky.social
Latest newsletter featuring a perspective paper by @dustinbwright.com et al. on "Efficiency is Not Enough: A Critical Perspective of Environmentally Sustainable AI"
@aicentre.dk
climateainordics.com
📝🌍🍃Our latest newsletter is out! Read about coming webinars, and about why it is crucial to look at the impact of machine learning (ML) using systems thinking, i.e. across the entire life cycle of models, from development to deployment.
climateainordics.com/newsletter/2...
Climate AI Nordics Newsletter, March 2025
Welcome to the Climate AI Nordics Newsletter March 2025.Since the launch in October, the network has grown to 129 people spread over the Nordic countries (166 including international supporting affili...
climateainordics.com
dustinbwright.com
I am still in need of emergency reviewers for ARR this cycle for the computational social science track, please DM me if you have capacity 🙏
dustinbwright.com
Our long context summarization dataset is now on 🤗 Huggingface!

huggingface.co/datasets/dwr...

Use it as a training or a test set for long context query focused summarization! It includes evidence attribution of free-form text spans from the context, making summaries more transparent and reliable!
dustinbwright.com
3) Evidence tends to be lost-in-the-middle for all base models
4) Shuffling the document sections helps mitigate evidence being lost-in-the-middle
5) Learning to cite with SUnsET also improves the quality of the final summaries
dustinbwright.com
We have the following main findings across 5 models and 4 disparate test datasets:
1) All base models struggle both to extract evidence text and to use it correctly.
2) Fine-tuning on SUnsET improves model ability to extract and correctly use evidence across the board
⬇️
dustinbwright.com
We use SUnsET to adapt LLMs to the unstructured evidence attribution tasks using two approaches: standard fine-tuning and fine-tuning on shuffled documents, in order to overcome the lost-in-the-middle problem for evidence attibution.
dustinbwright.com
SUnsET is created using a novel inductive synthetic data generation pipeline. Each stage has carefully engineered prompts designed to maximize diversity and accuracy. The dataset consists of documents broken into multiple sections, with multiple queries and summaries which cite specific text spans.