Lightnews — Scholar-powered news

Reposted

Houjun Liu @jemoka.com · Aug 20

New Paper Day! For EMNLP findings—in LM red-teaming, we show you have to optimize for **both** perplexity and toxicity for high-probability, hard to filter, and natural attacks!

1 2 6

Reposted

Houjun Liu @jemoka.com · Jun 2

New Paper Day! For ACL 2025 Findings:

You should **drop dropout** when you are training your LMs AND MLMs!

1 4 11

Reposted

Kyle Mahowald @kmahowald.bsky.social · Apr 21

I might be able to hire a postdoc for this fall in computational linguistics at UT Austin. Topics in the general LLM + cognitive space (particularly reasoning, chain of thought, LLMs + code) and LLM + linguistic space. If this could be of interest, feel free to get in touch!

31 60

Reposted

Kaitlyn Zhou @ COLM @kaitlynzhou.bsky.social · Apr 24

Life update! Excited to announce that I’ll be starting as an assistant professor at Cornell Info Sci in August 2026! I’ll be recruiting students this upcoming cycle!

An abundance of thanks to all my mentors and friends who helped make this possible!!

19 8 76

juliekallini.bsky.social @juliekallini.bsky.social · Apr 23

I once again want to thank my wonderful coauthors for making this work possible!

@shikharmurty.bsky.social Chris Manning @cgpotts.bsky.social @robertcsordas.bsky.social

Can’t wait to connect with folks @iclr-conf.bsky.social—come say hi if you're around!

juliekallini.bsky.social @juliekallini.bsky.social · Apr 23

🧩 Want to use MrT5?

On HuggingFace 🤗 we’re releasing:
- MrT5 Small (300M params): stanfordnlp/mrt5-small
- MrT5 Large (1.23B): stanfordnlp/mrt5-large

And if you haven’t already, check out the paper!

Paper: arxiv.org/abs/2410.20771
Github Repo: github.com/jkallini/mrt5

GitHub - jkallini/mrt5: Code repository for the paper "MrT5: Dynamic Token Merging for Efficient Byte-level Language Models."

Code repository for the paper "MrT5: Dynamic Token Merging for Efficient Byte-level Language Models." - jkallini/mrt5

github.com

1

juliekallini.bsky.social @juliekallini.bsky.social · Apr 23

As the models get larger, MrT5 gets better.

At 1.23B params, the gap in PPL between ByT5 and MrT5 shrinks dramatically—suggesting that MrT5’s deletion mechanism scales effectively with model size.

This means: better efficiency–performance trade-offs in high-resource settings.

1

juliekallini.bsky.social @juliekallini.bsky.social · Apr 23

MrT5 is a variant of ByT5 that dynamically shortens inputs for faster inference, addressing the limitations of tokenizer-free modeling!

In the final version, we include:
- A new controller algorithm for targeted compression rates
- More baselines and downstream tasks
- MrT5 at 1.23B parameter scale

1

juliekallini.bsky.social @juliekallini.bsky.social · Apr 23

If you’re at #ICLR2025, come see me present 💪MrT5 on Thursday (4/24)!

🪧 Poster: 10–12:30 in Hall 3 + 2B (#273)
⚡️ Lightning talk: right after in Opal 103–104 (Session on Tokenizer-Free, End-to-end Architectures)

Plus, MrT5 has many exciting updates 🧵

1 2

Reposted

Martijn Bartelds @mbartelds.bsky.social · Mar 12

🎙️ Speech recognition is great - if you speak the right language.

Our new @stanfordnlp.bsky.social paper introduces CTC-DRO, a training method that reduces worst-language errors by up to 47.1%.

Work w/ Ananjan, Moussa, @jurafsky.bsky.social, Tatsu Hashimoto and Karen Livescu.

Here’s how it works 🧵

1 3 11

juliekallini.bsky.social @juliekallini.bsky.social · Jan 14

"Mission: Impossible" was featured in Quanta Magazine! Big thank you to @benbenbrubaker.bsky.social for the wonderful article covering our work on impossible languages. Ben was so thoughtful and thorough in all our conversations, and it really shows in his writing!

Quanta Magazine @quantamagazine.bsky.social · Jan 13

Large language models may not be so omnipotent after all. New research shows that LLMs, like humans, prefer to learn some linguistic patterns over others. @benbenbrubaker.bsky.social reports: www.quantamagazine.org/can-ai-model...

Can AI Models Show Us How People Learn? Impossible Languages Point a Way. | Quanta Magazine

Certain grammatical rules never appear in any known language. By constructing artificial languages that have these rules, linguists can use neural networks to explore how people learn.

www.quantamagazine.org

2 11

Reposted

Kyle Mahowald @kmahowald.bsky.social · Jan 13

Quanta write-up of our Mission: Impossible Language Models work, led by @juliekallini.bsky.social. As the photos suggest, Richard, @isabelpapad.bsky.social, and I do all our work sitting together around a single laptop and pointing at the screen.

Quanta Magazine @quantamagazine.bsky.social · Jan 13

Large language models may not be so omnipotent after all. New research shows that LLMs, like humans, prefer to learn some linguistic patterns over others. @benbenbrubaker.bsky.social reports: www.quantamagazine.org/can-ai-model...

Can AI Models Show Us How People Learn? Impossible Languages Point a Way. | Quanta Magazine

Certain grammatical rules never appear in any known language. By constructing artificial languages that have these rules, linguists can use neural networks to explore how people learn.

www.quantamagazine.org

4 27