Amanda Bertsch
abertsch.bsky.social
Amanda Bertsch
@abertsch.bsky.social
PhD student @ CMU LTI. working on text generation + long context

https://www.cs.cmu.edu/~abertsch/
Reposted by Amanda Bertsch
LLMs don't accumulate information over the course of a text the way you'd hope!

I think this is why LLMs often feel 'fixated on the wrong thing' or 'overly literal'—they are usually responding using the most relevant single thing they remember, not the aggregate of what was said
Can LLMs accurately aggregate information over long, information-dense texts? Not yet…

We introduce Oolong, a dataset of simple-to-verify information aggregation questions over long inputs. No model achieves >50% accuracy at 128K on Oolong!
November 9, 2025 at 8:06 PM
Can LLMs accurately aggregate information over long, information-dense texts? Not yet…

We introduce Oolong, a dataset of simple-to-verify information aggregation questions over long inputs. No model achieves >50% accuracy at 128K on Oolong!
November 7, 2025 at 5:07 PM
Reposted by Amanda Bertsch
why intern at Ai2?

🐟interns own major parts of our model development, sometimes even leading whole projects
🐡we're committed to open science & actively help our interns publish their work

reach out if u wanna build open language models together 🤝

links 👇
November 5, 2025 at 11:11 PM
Reposted by Amanda Bertsch
DeltaNet Explained by Sonlin Yang

A gentle and comprehensive introduction to the DeltaNet

Part 1: sustcsonglin.github.io/blog/2024/de...
Part 2: sustcsonglin.github.io/blog/2024/de...
Part 3: sustcsonglin.github.io/blog/2024/de...
November 5, 2025 at 11:45 PM
Reposted by Amanda Bertsch
I’ll be presenting this work in **2 hours** at EMNLP’s Gather Session 3. Come by to chat about fanfiction, literary notions of similarity, long-context modeling, and consent-focused data collection!
Digital humanities researchers often care about fine-grained similarity based on narrative elements like plot or tone, which don’t necessarily correlate with surface-level textual features.

Can embedding models capture this? We study this in the context of fanfiction!
November 5, 2025 at 10:01 PM
Reposted by Amanda Bertsch
Digital humanities researchers often care about fine-grained similarity based on narrative elements like plot or tone, which don’t necessarily correlate with surface-level textual features.

Can embedding models capture this? We study this in the context of fanfiction!
November 5, 2025 at 9:59 PM
. @gneubig.bsky.social and I are co-teaching a new class on LM inference this fall!

We designed this class to give a broad view on the space, from more classical decoding algorithms to recent methods for LLMs, plus a wide range of efficiency-focused work.

website: phontron.com/class/lminfe...
11-664/763 LM Inference
A class at Carnegie Mellon University on language model inference algorithms.
phontron.com
September 12, 2025 at 5:14 PM
Reposted by Amanda Bertsch
When it comes to text prediction, where does one LM outperform another? If you've ever worked on LM evals, you know this question is a lot more complex than it seems. In our new #acl2025 paper, we developed a method to find fine-grained differences between LMs:

🧵1/9
June 9, 2025 at 1:47 PM
super excited to see folks at #NAACL25 this week! I'll be presenting our work on long-context ICL Wednesday in the 2pm poster session in Hall 3-- would love to chat with folks there or at the rest of the conference about long context data, ICL, inference time methods, New Mexican food, etc :)
April 30, 2025 at 12:03 AM
Reposted by Amanda Bertsch
How can we better think and talk about human-like qualities attributed to language technologies like LLMs? In our #CHI2025 paper, we taxonomize how text outputs from cases of user interactions with language technologies can contribute to anthropomorphism. arxiv.org/abs/2502.09870 1/n
March 6, 2025 at 3:43 AM
Reposted by Amanda Bertsch
9.6 million seconds = 1 PhD 🔥

Finally analyzed my PhD time tracking data so you can plan your own research journey more effectively: mxij.me/x/phd-learning-dynamics

For current students: I hope this helps put your journey into perspective. Wishing you all the best!
The Learning Dynamics of a PhD
This is what a PhD looks like: 9.6 million seconds of research.
mxij.me
December 23, 2024 at 10:08 PM
Reposted by Amanda Bertsch
When I started on ARL project that funds my PhD, the thing we were supposed to build was a "MaterialsGPT".

What is a MaterialsGPT? Where does that idea come from? I got to spend a lot of time thinking about that second question with @davidthewid.bsky.social and Lucy Suchman (!) working on this:
December 17, 2024 at 2:33 PM
Reposted by Amanda Bertsch
📢 NEW Paper!

@siree.sh, Lucy Suchman, and I examine a corpus of 7,000 US Military grant solicitations to ask what the world’s largest military wants with to do with AI, by looking at what it seeks to fund. #STS

📄: arxiv.org/pdf/2411.17840

We find…
December 9, 2024 at 2:18 PM
Reposted by Amanda Bertsch
when you try to convert your text into smaller pieces but all it gives you is Elvish, that’s a tolkienizer
November 20, 2024 at 5:51 PM
Reposted by Amanda Bertsch
That’s right. You might think that all successful CS academics are good at running. But that’s only because the ones who weren’t, have been eaten by bears.
A disproportionate number of sucessful CS academics have some intense cardio hobby. Took me some years to understand.
Every time I see someone post this image it goes viral
November 21, 2024 at 12:45 AM
Reposted by Amanda Bertsch
💬 Have you or a loved one compared LM probabilities to human linguistic acceptability judgments? You may be overcompensating for the effect of frequency and length!
🌟 In our new paper, we rethink how we should be controlling for these factors 🧵:
November 20, 2024 at 6:08 PM
Reposted by Amanda Bertsch
I'm keeping track of people at the CMU Language Technologies Institute here: go.bsky.app/NhTwCVb. Follow along!
November 12, 2024 at 2:54 PM
Reposted by Amanda Bertsch
Today is the day!! Find me at 2 PM in the Jasmine Hall (the one on the floor near food).
I will be at #EMNLP2024 presenting our work on "Extrinsic Evaluation of Cultural Competence in Large Language Models" in Poster Session 12 on Thursday 2-3:30 PM.

In this work we take the first steps towards asking whether LLMs can cater to diverse cultures in *user-facing generative* tasks.

[1/7]
November 14, 2024 at 12:19 PM
Reposted by Amanda Bertsch
(Hehe first bsky post!) I'll be at #EMNLP2024 💃🌴! Happy to chat about (among other things):
✨linguistically+cognitively motivated evaluation
✨NLP for low-resource+endangered languages
✨figuring out what features of language data LMs are *actually* learning
I'll be presenting two posters 🧵:
November 8, 2024 at 6:39 PM
Reposted by Amanda Bertsch
Taking a stand that we aren’t doing the #nlproc tag here. It’s #nlp. We used #nlproc because a decade ago the #nlp tag was full of sleazy scammers selling guides for hypnotizing women into sleeping with you.

But guess what? We won. All the sleazy scammers are doing natural language processing now.
November 8, 2024 at 3:10 AM
Reposted by Amanda Bertsch
Building/customizing your own LLM? You'll want to curate training data for it, but how do you know what makes the data good?
You can try out recipes👩‍🍳 iterate on ✨vibes✨ but we can't actually test all possible combos of tweaks,,, right?? 🙅‍♂️WRONG! arxiv.org/abs/2410.15661 (1/n) 🧵
November 5, 2024 at 10:37 PM
Reposted by Amanda Bertsch
bsky.app/profile/sire... it was so interesting to see participants' perceptions of "paradigm shifts" paralleled across eras and cycles of NLP, and at the same time nothing until recent years had reached quite the level of *47%* of ACL papers in 2021 citing BERT
We conducted long-form interviews with established NLP researchers, which reveal larger trends and forces that have been shaping the NLP research community since the 1980s.
October 12, 2023 at 3:47 PM
Reposted by Amanda Bertsch
We all know that “recently large language models have”, “large language models are”, and “large language models can.” But *why* LLMs? How did we get here? (where is “here”?) What forces are shaping NLP, and how recent are they, actually?

To appear at EMNLP 2023: arxiv.org/abs/2310.07715
October 12, 2023 at 1:59 PM
Reposted by Amanda Bertsch
Talking to the youths in 2023: did you know that "podcast" comes from a pun on "broadcast" plus the Apple iPod, a precursor to the iPhone that only played music

Talking to the youths in 2043: did you know that "tweet" comes from a pun on Twitter, a precursor to various shortform social media
October 11, 2023 at 3:40 AM