Jørgen Lund
jaalu.bsky.social
Jørgen Lund
@jaalu.bsky.social
Industry Ph.D. student in ML, DIPS AS, UiT The Arctic University of Norway

github.com/jaalu | he/him
Reposted by Jørgen Lund
i read through ~5 pages of the Olmo 3 tech report.. whoah

this is the best and most detailed summary of the current state of SOTA LLM training

nanochat is good for understanding LLM training, this tech report catches you up to SOTA methods
November 20, 2025 at 2:38 PM
Reposted by Jørgen Lund
Announcing Olmo 3, a leading fully open LM suite built for reasoning, chat, & tool use, and an open model flow—not just the final weights, but the entire training journey.
Best fully open 32B reasoning model & best 32B base model. 🧵
November 20, 2025 at 2:37 PM
Reposted by Jørgen Lund
Secure your NLDL 2026 registration before the fees increase on December 1st! 🤖 ❄️

Registration is open until January 1st 2026, but we recommend registering early to avoid expensive hotel prices

More info in the comments 👇
November 14, 2025 at 8:00 AM
Today I learnt that OpenSSL really does not like it if you try to pass in an X.509 certificate which only consists of the word "Blah"
November 13, 2025 at 4:37 PM
Playing around with the PleIAs "smallest viable model" Monad, and realizing that with 4-bit quantization (storing 56 M parameters in ~27 MB) and a SuperDisk drive (to use the FD32MB format), you could turn it into a chat model that fits on a standard 3.5 inch diskette
November 13, 2025 at 3:49 PM
Reposted by Jørgen Lund
We are excited to have Mihaela van der Schaar and Anders Boyd as Winter School speakers at the Northern Lights Deep Learning Conference 2026!

Read more about van der Schaar's and Boyd's and other Winter School tutorials in the comments 👇
November 5, 2025 at 2:14 PM
Having used the Mac for a bit, I realize most of my time is spent in the same software I was using on Windows (Obsidian, Zotero, Marimo), but Homebrew is quite nice, Ghostty is a good terminal, and UTM is a good QEMU/virtualization frontend for the Windows things I do need
As someone who hasn't used MacOS in a minute (since... Sierra?), which MacOS utilities do people on #MLSky recommend?

(Homebrew is a given, but not sure which terminal emulators people prefer now, for instance)
October 27, 2025 at 12:45 PM
Reposted by Jørgen Lund
> RigAnything: Template-Free Autoregressive Rigging
for Diverse 3D Assets

Paper: arxiv.org/abs/2502.09615
Web: www.liuisabella.com/RigAnything/
Code: github.com/Isabella98Li...
Model: huggingface.co/Isabellaliu/...
October 25, 2025 at 1:05 AM
Reposted by Jørgen Lund
Neural audio codecs: how to get audio into LLMs

The plan: sandwich a language model in an audio encoder/decoder pair (=neural audio codec), allowing it to predict audio continuations.

kyutai.org/next/codec-e...
October 24, 2025 at 9:43 AM
Reposted by Jørgen Lund
October 20, 2025 at 8:41 AM
Reposted by Jørgen Lund
Is 32B-4bit equal to 16B-8bit? Depends on the task

* math: precision matters
* knowledge: effective param count is more important
* 4B-8bit threshold — for bigger prefer quant, smaller prefer more params
* parallel TTC only works above 4B-8bit

arxiv.org/abs/2510.10964
October 15, 2025 at 11:10 AM
Reposted by Jørgen Lund
GPU computing before CUDA was *weird*.


Memory primitives were graphics shaped, not computer science shaped.


Want to do math on an array? Store it as an RGBA texture.


Fragment Shader for processing. *Paint* the result in a big rectangle.
October 14, 2025 at 8:43 PM
Reposted by Jørgen Lund
nanochat by Andrej Karpathy is neat - 8,000 lines of code (mostly Python, a tiny bit of Rust) that can train an LLM on $100 of rented cloud compute which can then be served with a web chat UI on a much smaller machine simonwillison.net/2025/Oct/13/...
nanochat
Really interesting new project from Andrej Karpathy, described at length in this discussion post. It provides a full ChatGPT-style LLM, including training, inference and a web Ui, that can be …
simonwillison.net
October 14, 2025 at 1:58 AM
Reposted by Jørgen Lund
Only 7 days left to submit your abstract for the Northern Lights Deep Learning Conference 2026! 🤖 ❄️

📅 Abstract submission deadline: October 17th 2025

More information about submission guidelines on nldl.org
October 10, 2025 at 10:26 AM
Reposted by Jørgen Lund
Defying Transformers: Searching for "Fixed Points" of Pretrained LLMs by Jiacheng Liu

He wondered what CAN'T be transformed by Transformers? So, he wrote a fun blog post on finding "fixed points" of your LLMs. If you prompt it with a fixed point token,
October 10, 2025 at 1:08 AM
Reposted by Jørgen Lund
new blog post! why do LLMs freak out over the seahorse emoji? i put llama-3.3-70b through its paces with the logit lens to find out, and explain what the logit lens (everyone's favorite underrated interpretability tool) is in the process.

link in reply!
October 5, 2025 at 2:36 PM
Reposted by Jørgen Lund
How Diffusion Models Memorize

Juyeop Kim, Songkuk Kim, Jong-Seok Lee
tl;dr: classifier-free-guidance is to blame
arxiv.org/abs/2509.25705
October 1, 2025 at 10:48 AM
Reposted by Jørgen Lund
Very happy this paper got accepted to NeurIPS 2025 as a Spotlight! 😁

Main takeaway: In mechanistic interpretability, we need assumptions about how DNNs encode concepts in their representations (eg, the linear representation hypothesis). Without them, we can claim any DNN implements any algorithm!
Mechanistic interpretability often relies on *interventions* to study how DNNs work. Are these interventions enough to guarantee the features we find are not spurious? No!⚠️ In our new paper, we show many mech int methods implicitly rely on the linear representation hypothesis🧵
October 1, 2025 at 3:00 PM
Reposted by Jørgen Lund
really neat clear explainer for the new on “centralizing flows” to theoretically model learning dynamics
Understanding Optimization in Deep Learning with Central Flows
centralflows.github.io
October 1, 2025 at 12:20 PM
Half-serious question: Could one tackle every "language models produce X because Y is in the training data" hypothesis in one go by training a large retrieval transformer, and doing side-by-side evaluations with/without a filter for Y on the retriever? #MLSky
October 2, 2025 at 8:01 AM
Reposted by Jørgen Lund
And new paper out: Pleias 1.0: the First Family of Language Models Trained on Fully Open Data

How we train an open everything model on a new pretraining environment with releasable data (Common Corpus) with an open source framework (Nanotron from HuggingFace).

www.sciencedirect.com/science/arti...
September 27, 2025 at 11:44 AM
Stepping outside my lane for a bit, NPM really needs trusted publishing/provenance, but I think a per-package "will use secure-context-only features" flag would go a long way too, there are few good reasons for a package to suddenly _start_ using eval(), Fetch, or executing arbitrary commands
ctrl/tinycolor and 40+ NPM Packages Compromised - StepSecurity
The popular @ctrl/tinycolor package with over 2 million weekly downloads has been compromised alongside 40+ other NPM packages in a sophisticated supply chain attack dubbed
www.stepsecurity.io
September 18, 2025 at 1:38 PM
As someone interested in cryptography and privacy-preserving measures, it's cool to see differential privacy being applied to ML practically like this - this seems to basically stop memorization of passages from the training set, even allowing for approximate matches (up to 10% edit distance)
luok.ai luokai @luok.ai · Sep 13
VaultGemma just dropped as the world’s largest open-source, differentially private LLM—think privacy meets power!

Built on Google’s Gemma, it’s designed for responsible AI, using new scaling laws to balance privacy, compute, and utility.
September 16, 2025 at 2:25 PM
Reposted by Jørgen Lund
We've just released an amazing Embedding model:

EmbeddingGemma, the new best-in-class open embedding model! 🚀

🏆 Top multilingual model on MTEB (<500M)
💾 Runs on <200MB RAM
⚙️ Customizable output for on-device use
🧩 Integrated with your favorite tools

developers.googleblog.com/en/introduci...
Introducing EmbeddingGemma: The Best-in-Class Open Model for On-Device Embeddings- Google Developers Blog
Discover EmbeddingGemma, Google's new on-device embedding model designed for efficient on-device AI, enabling features like RAG and semantic search.
developers.googleblog.com
September 4, 2025 at 5:17 PM
Reposted by Jørgen Lund
EPFL, ETH Zurich & CSCS just released Apertus, Switzerland’s first fully open-source large language model.
Trained on 15T tokens in 1,000+ languages, it’s built for transparency, responsibility & the public good.

Read more: actu.epfl.ch/news/apertus...
September 2, 2025 at 11:48 AM