Houjun Liu
@jemoka.com
290 followers 610 following 59 posts
NLP & POMDPs; CS@Stanford; gradient descent enthusiast www: jemoka.com ac: nlp.stanford.edu/~houjun/
Posts Media Videos Starter Packs
Reposted by Houjun Liu
jemoka.com
Introducing 𝘁𝗵𝗼𝘂𝗴𝗵𝘁𝗯𝘂𝗯𝗯𝗹𝗲𝘀: a *fully unsupervised* LM for input-adaptive parallel latent reasoning

✅ Learn yourself a reasoning model with normal pretraining
✅ Better perplexity compared to fixed thinking tokens

No fancy loss, no chain of thought labels 🚀
jemoka.com
I'm really excited about this. Because this model is trained with literally nothing but LM loss, it helps create a new reasoning paradigm where reasoning capabilities are baked right in at pretraining, unifying train and test time behaviors.
Look ma, no distribution shift! 🙏
jemoka.com
Better yet, without us teaching the model to do this at all, it learned to allocate more compute at tokens of higher entropy (even as measured by an independently trained model of the same architecture), and use less compute where there's either too little or too much entropy. 🤯
jemoka.com
By just using our approach, you don't have to do any extra work to get pretraining gains! We show across scale AND computation match that our approach performs better in pretraining perplexity than both regular transformers and manually inserting non-adaptive thinking tokens. 🥳
jemoka.com
We design an transformer variant that uses a score-attenuated "forking" mechanism to clone useful residuals the model wants to update and attend to, thus creating a 𝗯𝘂𝗯𝗯𝗹𝗲 of latent computation for those highly-informative tokens.
jemoka.com
Current approaches in scaling inference-time compute require supervising with explicit chain-of-thought data, which limits thoughts to be sequential and in human language only. 😔
Wouldn't it be nice if you can do normal pretraining, and somehow get latent thinking for free? 🤔
jemoka.com
Introducing 𝘁𝗵𝗼𝘂𝗴𝗵𝘁𝗯𝘂𝗯𝗯𝗹𝗲𝘀: a *fully unsupervised* LM for input-adaptive parallel latent reasoning

✅ Learn yourself a reasoning model with normal pretraining
✅ Better perplexity compared to fixed thinking tokens

No fancy loss, no chain of thought labels 🚀
Reposted by Houjun Liu
jemoka.com
New Paper Day! For EMNLP findings—in LM red-teaming, we show you have to optimize for **both** perplexity and toxicity for high-probability, hard to filter, and natural attacks!
jemoka.com
Thanks to @schmidtsciences.bsky.social and Lambda Labs for generously supporting our work :)
jemoka.com
☝️ And so.... You should optimize for **BOTH** attack success and perplexity to get the most effective attacks!
jemoka.com
Even across baseline methods, low-perplexity prompts result in more effective attacks, but optimizing for attack success alone results in high-perplexity prompts.
jemoka.com
In fact, our method allows us to discover a Pareto tradeoff (🤯) between attack success and prompt likelihood; tuning a single parameter in our method travels along the Pareto-optimal front.
jemoka.com
Using the Adaptive Stress Testing (AST) framework as a reward signal for an online DPO-based optimization, we present a method to discover **both** high-probability prompts that are also successful in attacks.
jemoka.com
Most approaches in gradient-based red-teaming result in very low-probability prompts, which previous work have shown are both easier to filter and bad negative examples for downstream hardening.
jemoka.com
Done at Stanford Intelligent Systems Laboratory — my joint first author Amelia Hardy, along with our wonderful collaborators Allie Griffith, @bernardlange.bsky.social, Duncan Eddy, Mykel Kochenderfer.

Paper:
arxiv.org/pdf/2407.09447
Python package to do this for yourself:
github.com/sisl/astra-rl
arxiv.org
jemoka.com
New Paper Day! For EMNLP findings—in LM red-teaming, we show you have to optimize for **both** perplexity and toxicity for high-probability, hard to filter, and natural attacks!
Reposted by Houjun Liu
haskell.org
You're not too dumb for Haskell, you just need a reason to practice. :)
Reposted by Houjun Liu
joss-openjournals.bsky.social
Just published in JOSS: 'Turftopic: Topic Modelling with Contextual Representations from Sentence Transformers' https://doi.org/10.21105/joss.08183
Reposted by Houjun Liu
martin.kleppmann.com
OCaml @ocaml.org is in The Economist!
economist.com
Jane Street is the quant shop's quant shop. The goose that lays the golden egg is its tech system, which is built rather unusually https://econ.trib.al/MPdov6Y
Jane Street’s sneaky retention tactic
It involves the use of an obscure, French programming language
econ.trib.al
Reposted by Houjun Liu
tticconnect.bsky.social
We’re proud to announce three new tenure-track assistant professors joining TTIC in Fall 2026: Yossi Gandelsman, Will Merrill, and Nick Tomlin (@nickatomlin.bsky.social). Meet them here: buff.ly/JH1DFtT
Reposted by Houjun Liu
mathurinmassias.bsky.social
New paper on the generalization of Flow Matching www.arxiv.org/abs/2506.03719

🤯 Why does flow matching generalize? Did you know that the flow matching target you're trying to learn *can only generate training points*?

w @quentinbertrand.bsky.social @annegnx.bsky.social @remiemonet.bsky.social 👇👇👇
Reposted by Houjun Liu
jemoka.com
New Paper Day! For ACL 2025 Findings:

You should **drop dropout** when you are training your LMs AND MLMs!