Lightnews — Scholar-powered news

Pete Shaw @ptshaw.bsky.social · 7d

w/ James Cohan, @jacobeisenstein.bsky.social, and Kristina Toutanova

Paper link: arxiv.org/abs/2509.22445

Bridging Kolmogorov Complexity and Deep Learning: Asymptotically Optimal Description Length Objectives for Transformers

The Minimum Description Length (MDL) principle offers a formal framework for applying Occam's razor in machine learning. However, its application to neural networks such as Transformers is challenging...

arxiv.org

1 1

Pete Shaw @ptshaw.bsky.social · 7d

We hope this work adds some conceptual clarity around how Kolmogorov complexity relates to neural networks, and provides a path towards identifying new complexity measures that enable greater compression and generalization.

1 1

Pete Shaw @ptshaw.bsky.social · 7d

We prove that asymptotically optimal objectives exist for Transformers, building on a new demonstration of their computational universality. We also highlight potential challenges related to effectively optimizing such objectives.

1

Pete Shaw @ptshaw.bsky.social · 7d

To address this question, we define the notion of asymptotically optimal description length objectives. We establish that a minimizer of such an objective achieves optimal compression, for any dataset, up to an additive constant, in the limit as model resource bounds increase.

1

Pete Shaw @ptshaw.bsky.social · 7d

The Kolmogorov complexity of an object is the length of the shortest program that prints that object. Combining Kolmogorov complexity with the MDL principle provides an elegant foundation for formalizing Occam’s razor. But how can these ideas be applied to neural networks?

1

Pete Shaw @ptshaw.bsky.social · 7d

Excited to share a new paper that aims to narrow the conceptual gap between the idealized notion of Kolmogorov complexity and practical complexity measures for neural networks.

Bridging Kolmogorov Complexity and Deep Learning: Asymptotically Optimal Description Length Objectives for Transformers

1 5 8

Reposted by Pete Shaw

Ahmad Beirami @abeirami.bsky.social · Jan 1

Excited to share 𝐈𝐧𝐟𝐀𝐥𝐢𝐠𝐧!

Alignment optimization objective implicitly assumes 𝘴𝘢𝘮𝘱𝘭𝘪𝘯𝘨 from the resulting aligned model. But we are increasingly using different and sometimes sophisticated inference-time compute algorithms.

How to resolve this discrepancy?🧵

InfAlign: Inference-aware language model alignment
Ananth Balashankar, Ziteng Sun, Jonathan Berant, Jacob Eisenstein, Michael Collins, Adrian Hutter, Jong Lee, Chirag Nagpal, Flavien Prost, Aradhana Sinha, Ananda Theertha Suresh, Ahmad Beirami

2 11 55

Pete Shaw @ptshaw.bsky.social · Dec 9

I'll be at NeurIPS this week. Please reach out if you would like to chat!

1 6

Reposted by Pete Shaw

Marc Lanctot @sharky6000.bsky.social · Oct 28

New starter pack! go.bsky.app/GZ4hZzu

7 18 43

Reposted by Pete Shaw

Kevin K. Yang 楊凱筌 @kevinkaichuang.bsky.social · Nov 18

Two BioML starter packs now:

Pack 1: go.bsky.app/2VWBcCd
Pack 2: go.bsky.app/Bw84Hmc

DM if you want to be included (or nominate people who should be!)

Kevin K. Yang 楊凱筌 @kevinkaichuang.bsky.social · Nov 11

I tried to make a bioml starter pack. DM if you want me to add or remove you?

go.bsky.app/2VWBcCd

Kevin K. Yang 楊凱筌 @kevinkaichuang.bsky.social · Nov 11

Anybody have a bioml starter pack?

10 56 120

Pete Shaw @ptshaw.bsky.social · Nov 19

Hi Marc, thanks for putting this together, mind adding me?

1

Reposted by Pete Shaw

Kuzman Ganchev @ganchev.bsky.social · Nov 11

Wanted to share that Varun Godbole recently released a prompting playbook. The title says prompt tuning, but this is text prompts, not soft prompts.

github.com/varungodbole...

GitHub - varungodbole/prompt-tuning-playbook: A playbook for effectively prompting post-trained LLMs

A playbook for effectively prompting post-trained LLMs - varungodbole/prompt-tuning-playbook

github.com

7 14

Reposted by Pete Shaw

M A Osborne @maosbot.bsky.social · Nov 9

New here? Interested in AI/ML? Check out these great starter packs!

AI: go.bsky.app/SipA7it
RL: go.bsky.app/3WPHcHg
Women in AI: go.bsky.app/LaGDpqg
NLP: go.bsky.app/SngwGeS
AI and news: go.bsky.app/5sFqVNS

You can also search all starter packs here: blueskydirectory.com/starter-pack...

67 210 560

Pete Shaw @ptshaw.bsky.social · Nov 16

Getting set up on Bluesky today!

5

Reposted by Pete Shaw

Jacob Eisenstein is at CoLM @jacobeisenstein.bsky.social · Oct 24

I’m pretty excited about this one!

ALTA is A Language for Transformer Analysis.

Because ALTA programs can be compiled to transformer weights, it provides constructive proofs of transformer expressivity. It also offers new analytic tools for *learnability*.

arxiv.org/abs/2410.18077

ALTA: Compiler-Based Analysis of Transformers

We propose a new programming language called ALTA and a compiler that can map ALTA programs to Transformer weights. ALTA is inspired by RASP, a language proposed by Weiss et al. (2021), and Tracr (Lin...

arxiv.org

2 16 53