@juliekallini.bsky.social
29 followers 15 following 6 posts
Posts Media Videos Starter Packs
Reposted
jemoka.com
New Paper Day! For EMNLP findings—in LM red-teaming, we show you have to optimize for **both** perplexity and toxicity for high-probability, hard to filter, and natural attacks!
Reposted
jemoka.com
New Paper Day! For ACL 2025 Findings:

You should **drop dropout** when you are training your LMs AND MLMs!
Reposted
kmahowald.bsky.social
I might be able to hire a postdoc for this fall in computational linguistics at UT Austin. Topics in the general LLM + cognitive space (particularly reasoning, chain of thought, LLMs + code) and LLM + linguistic space. If this could be of interest, feel free to get in touch!
Reposted
kaitlynzhou.bsky.social
Life update! Excited to announce that I’ll be starting as an assistant professor at Cornell Info Sci in August 2026! I’ll be recruiting students this upcoming cycle!

An abundance of thanks to all my mentors and friends who helped make this possible!!
juliekallini.bsky.social
I once again want to thank my wonderful coauthors for making this work possible!

@shikharmurty.bsky.social Chris Manning @cgpotts.bsky.social @robertcsordas.bsky.social

Can’t wait to connect with folks @iclr-conf.bsky.social—come say hi if you're around!
juliekallini.bsky.social
As the models get larger, MrT5 gets better.

At 1.23B params, the gap in PPL between ByT5 and MrT5 shrinks dramatically—suggesting that MrT5’s deletion mechanism scales effectively with model size.

This means: better efficiency–performance trade-offs in high-resource settings.
juliekallini.bsky.social
MrT5 is a variant of ByT5 that dynamically shortens inputs for faster inference, addressing the limitations of tokenizer-free modeling!

In the final version, we include:
- A new controller algorithm for targeted compression rates
- More baselines and downstream tasks
- MrT5 at 1.23B parameter scale
juliekallini.bsky.social
If you’re at #ICLR2025, come see me present 💪MrT5 on Thursday (4/24)!

🪧 Poster: 10–12:30 in Hall 3 + 2B (#273)
⚡️ Lightning talk: right after in Opal 103–104 (Session on Tokenizer-Free, End-to-end Architectures)

Plus, MrT5 has many exciting updates 🧵
Reposted
mbartelds.bsky.social
🎙️ Speech recognition is great - if you speak the right language.

Our new @stanfordnlp.bsky.social paper introduces CTC-DRO, a training method that reduces worst-language errors by up to 47.1%.

Work w/ Ananjan, Moussa, @jurafsky.bsky.social, Tatsu Hashimoto and Karen Livescu.

Here’s how it works 🧵
juliekallini.bsky.social
"Mission: Impossible" was featured in Quanta Magazine! Big thank you to @benbenbrubaker.bsky.social for the wonderful article covering our work on impossible languages. Ben was so thoughtful and thorough in all our conversations, and it really shows in his writing!
Reposted
kmahowald.bsky.social
Quanta write-up of our Mission: Impossible Language Models work, led by @juliekallini.bsky.social. As the photos suggest, Richard, @isabelpapad.bsky.social, and I do all our work sitting together around a single laptop and pointing at the screen.