Pete Shaw
@ptshaw.bsky.social
1.6K followers 360 following 9 posts
Research Scientist at Google DeepMind. Mostly work on ML, NLP, and BioML. Based in Seattle. http://ptshaw.com
Posts Media Videos Starter Packs
ptshaw.bsky.social
We hope this work adds some conceptual clarity around how Kolmogorov complexity relates to neural networks, and provides a path towards identifying new complexity measures that enable greater compression and generalization.
ptshaw.bsky.social
We prove that asymptotically optimal objectives exist for Transformers, building on a new demonstration of their computational universality. We also highlight potential challenges related to effectively optimizing such objectives.
ptshaw.bsky.social
To address this question, we define the notion of asymptotically optimal description length objectives. We establish that a minimizer of such an objective achieves optimal compression, for any dataset, up to an additive constant, in the limit as model resource bounds increase.
ptshaw.bsky.social
The Kolmogorov complexity of an object is the length of the shortest program that prints that object. Combining Kolmogorov complexity with the MDL principle provides an elegant foundation for formalizing Occam’s razor. But how can these ideas be applied to neural networks?
ptshaw.bsky.social
Excited to share a new paper that aims to narrow the conceptual gap between the idealized notion of Kolmogorov complexity and practical complexity measures for neural networks.
Bridging Kolmogorov Complexity and Deep Learning: Asymptotically Optimal Description Length Objectives for Transformers
Reposted by Pete Shaw
abeirami.bsky.social
Excited to share 𝐈𝐧𝐟𝐀𝐥𝐢𝐠𝐧!

Alignment optimization objective implicitly assumes 𝘴𝘢𝘮𝘱𝘭𝘪𝘯𝘨 from the resulting aligned model. But we are increasingly using different and sometimes sophisticated inference-time compute algorithms.

How to resolve this discrepancy?🧵
InfAlign: Inference-aware language model alignment
Ananth Balashankar, Ziteng Sun, Jonathan Berant, Jacob Eisenstein, Michael Collins, Adrian Hutter, Jong Lee, Chirag Nagpal, Flavien Prost, Aradhana Sinha, Ananda Theertha Suresh, Ahmad Beirami
ptshaw.bsky.social
I'll be at NeurIPS this week. Please reach out if you would like to chat!
Reposted by Pete Shaw
Reposted by Pete Shaw
kevinkaichuang.bsky.social
Two BioML starter packs now:

Pack 1: go.bsky.app/2VWBcCd
Pack 2: go.bsky.app/Bw84Hmc

DM if you want to be included (or nominate people who should be!)
kevinkaichuang.bsky.social
I tried to make a bioml starter pack. DM if you want me to add or remove you?

go.bsky.app/2VWBcCd
kevinkaichuang.bsky.social
Anybody have a bioml starter pack?
ptshaw.bsky.social
Hi Marc, thanks for putting this together, mind adding me?
Reposted by Pete Shaw
maosbot.bsky.social
New here? Interested in AI/ML? Check out these great starter packs!

AI: go.bsky.app/SipA7it
RL: go.bsky.app/3WPHcHg
Women in AI: go.bsky.app/LaGDpqg
NLP: go.bsky.app/SngwGeS
AI and news: go.bsky.app/5sFqVNS

You can also search all starter packs here: blueskydirectory.com/starter-pack...
ptshaw.bsky.social
Getting set up on Bluesky today!
Reposted by Pete Shaw
jacobeisenstein.bsky.social
I’m pretty excited about this one!

ALTA is A Language for Transformer Analysis.

Because ALTA programs can be compiled to transformer weights, it provides constructive proofs of transformer expressivity. It also offers new analytic tools for *learnability*.

arxiv.org/abs/2410.18077
ALTA: Compiler-Based Analysis of Transformers
We propose a new programming language called ALTA and a compiler that can map ALTA programs to Transformer weights. ALTA is inspired by RASP, a language proposed by Weiss et al. (2021), and Tracr (Lin...
arxiv.org