Lightnews — Scholar-powered news

Reposted by Jørgen Lund

Tim Kellogg

@timkellogg.me

i read through ~5 pages of the Olmo 3 tech report.. whoah

this is the best and most detailed summary of the current state of SOTA LLM training

nanochat is good for understanding LLM training, this tech report catches you up to SOTA methods

Tim Kellogg @timkellogg.me · 8d

tech report: www.datocms-assets.com/64837/176364...

www.datocms-assets.com

November 20, 2025 at 2:38 PM

Reposted by Jørgen Lund

Ai2

@ai2.bsky.social

Announcing Olmo 3, a leading fully open LM suite built for reasoning, chat, & tool use, and an open model flow—not just the final weights, but the entire training journey.
Best fully open 32B reasoning model & best 32B base model. 🧵

November 20, 2025 at 2:37 PM

Reposted by Jørgen Lund

Northern Lights Deep Learning Conference 2026

@nldlconference.bsky.social

Secure your NLDL 2026 registration before the fees increase on December 1st! 🤖 ❄️

Registration is open until January 1st 2026, but we recommend registering early to avoid expensive hotel prices

More info in the comments 👇

November 14, 2025 at 8:00 AM

Jørgen Lund

@jaalu.bsky.social

Today I learnt that OpenSSL really does not like it if you try to pass in an X.509 certificate which only consists of the word "Blah"

November 13, 2025 at 4:37 PM

Jørgen Lund

@jaalu.bsky.social

Playing around with the PleIAs "smallest viable model" Monad, and realizing that with 4-bit quantization (storing 56 M parameters in ~27 MB) and a SuperDisk drive (to use the FD32MB format), you could turn it into a chat model that fits on a standard 3.5 inch diskette

November 13, 2025 at 3:49 PM

Reposted by Jørgen Lund

Northern Lights Deep Learning Conference 2026

@nldlconference.bsky.social

We are excited to have Mihaela van der Schaar and Anders Boyd as Winter School speakers at the Northern Lights Deep Learning Conference 2026!

Read more about van der Schaar's and Boyd's and other Winter School tutorials in the comments 👇

November 5, 2025 at 2:14 PM

Jørgen Lund

@jaalu.bsky.social

Having used the Mac for a bit, I realize most of my time is spent in the same software I was using on Windows (Obsidian, Zotero, Marimo), but Homebrew is quite nice, Ghostty is a good terminal, and UTM is a good QEMU/virtualization frontend for the Windows things I do need

Jørgen Lund @jaalu.bsky.social · Sep 1

As someone who hasn't used MacOS in a minute (since... Sierra?), which MacOS utilities do people on #MLSky recommend?

(Homebrew is a given, but not sure which terminal emulators people prefer now, for instance)

October 27, 2025 at 12:45 PM

Reposted by Jørgen Lund

David Marx

@digthatdata.bsky.social

> RigAnything: Template-Free Autoregressive Rigging
for Diverse 3D Assets

Paper: arxiv.org/abs/2502.09615
Web: www.liuisabella.com/RigAnything/
Code: github.com/Isabella98Li...
Model: huggingface.co/Isabellaliu/...

October 25, 2025 at 1:05 AM

Reposted by Jørgen Lund

Sung Kim

@sungkim.bsky.social

Neural audio codecs: how to get audio into LLMs

The plan: sandwich a language model in an audio encoder/decoder pair (=neural audio codec), allowing it to predict audio continuations.

kyutai.org/next/codec-e...

October 24, 2025 at 9:43 AM

Reposted by Jørgen Lund

Sung Kim

@sungkim.bsky.social

Deepseek's DeepSeek-OCR

Model: huggingface.co/deepseek-ai/...
Paper: github.com/deepseek-ai/...
Repo: github.com/deepseek-ai/...

October 20, 2025 at 8:41 AM

Reposted by Jørgen Lund

Tim Kellogg

@timkellogg.me

Is 32B-4bit equal to 16B-8bit? Depends on the task

* math: precision matters
* knowledge: effective param count is more important
* 4B-8bit threshold — for bigger prefer quant, smaller prefer more params
* parallel TTC only works above 4B-8bit

arxiv.org/abs/2510.10964

A scatter plot titled “AIME25 — Total Memory vs. Accuracy (Qwen3)” compares model accuracy (%) against total memory usage (weights + KV cache, in GB) for various Qwen3 model sizes and quantization levels.

Axes:
• X-axis: Total Memory (Weight + KV Cache) [GB] (log scale, ranging roughly from 1 to 100)
• Y-axis: Accuracy (%), ranging from 0 to 75

Legend:
• Colors: model sizes —
• 0.6B (yellow)
• 1.7B (orange)
• 4B (salmon)
• 8B (pink)
• 14B (purple)
• 32B (blue)
• Shapes: precision levels —
• Circle: 16-bit
• Triangle: 8-bit
• Square: 4-bit
• Marker size: context length —
• Small: 2k tokens
• Large: 30k tokens

Main trend:
Larger models (rightward and darker colors) achieve higher accuracy but require significantly more memory. Smaller models (left, yellow/orange) stay below 30% accuracy. Compression (8-bit or 4-bit) lowers memory usage but can reduce accuracy slightly.

Inset zoom (upper center):
A close-up box highlights the 8B (8-bit) and 14B (4-bit) models showing their proximity in accuracy despite differing memory footprints.

Overall, the chart demonstrates scaling behavior for Qwen3 models—accuracy grows with total memory and model size, with diminishing returns beyond the 14B range.

October 15, 2025 at 11:10 AM

Reposted by Jørgen Lund

LaurieWired

@lauriewired.bsky.social

GPU computing before CUDA was *weird*. 

Memory primitives were graphics shaped, not computer science shaped. 

Want to do math on an array? Store it as an RGBA texture.

 Fragment Shader for processing. *Paint* the result in a big rectangle.

October 14, 2025 at 8:43 PM

Reposted by Jørgen Lund

Simon Willison

@simonwillison.net

nanochat by Andrej Karpathy is neat - 8,000 lines of code (mostly Python, a tiny bit of Rust) that can train an LLM on $100 of rented cloud compute which can then be served with a web chat UI on a much smaller machine simonwillison.net/2025/Oct/13/...

nanochat

Really interesting new project from Andrej Karpathy, described at length in this discussion post. It provides a full ChatGPT-style LLM, including training, inference and a web Ui, that can be …

simonwillison.net

October 14, 2025 at 1:58 AM

Reposted by Jørgen Lund

Northern Lights Deep Learning Conference 2026

@nldlconference.bsky.social

Only 7 days left to submit your abstract for the Northern Lights Deep Learning Conference 2026! 🤖 ❄️

📅 Abstract submission deadline: October 17th 2025

More information about submission guidelines on nldl.org

October 10, 2025 at 10:26 AM

Reposted by Jørgen Lund

Sung Kim

@sungkim.bsky.social

Defying Transformers: Searching for "Fixed Points" of Pretrained LLMs by Jiacheng Liu

He wondered what CAN'T be transformed by Transformers? So, he wrote a fun blog post on finding "fixed points" of your LLMs. If you prompt it with a fixed point token,

October 10, 2025 at 1:08 AM

Reposted by Jørgen Lund

thebes

@vgel.me

new blog post! why do LLMs freak out over the seahorse emoji? i put llama-3.3-70b through its paces with the logit lens to find out, and explain what the logit lens (everyone's favorite underrated interpretability tool) is in the process.

link in reply!

October 5, 2025 at 2:36 PM

Reposted by Jørgen Lund

Dmytro Mishkin

@ducha-aiki.bsky.social

How Diffusion Models Memorize

Juyeop Kim, Songkuk Kim, Jong-Seok Lee
tl;dr: classifier-free-guidance is to blame
arxiv.org/abs/2509.25705

October 1, 2025 at 10:48 AM

Reposted by Jørgen Lund

Tiago Pimentel

@tpimentel.bsky.social

Very happy this paper got accepted to NeurIPS 2025 as a Spotlight! 😁

Main takeaway: In mechanistic interpretability, we need assumptions about how DNNs encode concepts in their representations (eg, the linear representation hypothesis). Without them, we can claim any DNN implements any algorithm!

Tiago Pimentel @tpimentel.bsky.social · Jul 14

Mechanistic interpretability often relies on *interventions* to study how DNNs work. Are these interventions enough to guarantee the features we find are not spurious? No!⚠️ In our new paper, we show many mech int methods implicitly rely on the linear representation hypothesis🧵

Paper title "The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?" with the paper's graphical abstract showing how more powerful alignment maps between a DNN and an algorithm allow more complex features to be found and more "accurate" abstractions.

October 1, 2025 at 3:00 PM

Reposted by Jørgen Lund

Naomi Saphra

@nsaphra.bsky.social

really neat clear explainer for the new on “centralizing flows” to theoretically model learning dynamics

Understanding Optimization in Deep Learning with Central Flows

centralflows.github.io

October 1, 2025 at 12:20 PM

Jørgen Lund

@jaalu.bsky.social

Half-serious question: Could one tackle every "language models produce X because Y is in the training data" hypothesis in one go by training a large retrieval transformer, and doing side-by-side evaluations with/without a filter for Y on the retriever? #MLSky

October 2, 2025 at 8:01 AM

Reposted by Jørgen Lund

Alexander Doria

@dorialexander.bsky.social

And new paper out: Pleias 1.0: the First Family of Language Models Trained on Fully Open Data

How we train an open everything model on a new pretraining environment with releasable data (Common Corpus) with an open source framework (Nanotron from HuggingFace).

www.sciencedirect.com/science/arti...

September 27, 2025 at 11:44 AM

Jørgen Lund

@jaalu.bsky.social

Stepping outside my lane for a bit, NPM really needs trusted publishing/provenance, but I think a per-package "will use secure-context-only features" flag would go a long way too, there are few good reasons for a package to suddenly _start_ using eval(), Fetch, or executing arbitrary commands

ctrl/tinycolor and 40+ NPM Packages Compromised - StepSecurity

www.stepsecurity.io

September 18, 2025 at 1:38 PM

Jørgen Lund

@jaalu.bsky.social

As someone interested in cryptography and privacy-preserving measures, it's cool to see differential privacy being applied to ML practically like this - this seems to basically stop memorization of passages from the training set, even allowing for approximate matches (up to 10% edit distance)

luokai @luok.ai · Sep 13

VaultGemma just dropped as the world’s largest open-source, differentially private LLM—think privacy meets power!

Built on Google’s Gemma, it’s designed for responsible AI, using new scaling laws to balance privacy, compute, and utility.

September 16, 2025 at 2:25 PM

Reposted by Jørgen Lund

Gus

@gusthema.bsky.social

We've just released an amazing Embedding model:

EmbeddingGemma, the new best-in-class open embedding model! 🚀

🏆 Top multilingual model on MTEB (<500M)
💾 Runs on <200MB RAM
⚙️ Customizable output for on-device use
🧩 Integrated with your favorite tools

developers.googleblog.com/en/introduci...

Introducing EmbeddingGemma: The Best-in-Class Open Model for On-Device Embeddings- Google Developers Blog

Discover EmbeddingGemma, Google's new on-device embedding model designed for efficient on-device AI, enabling features like RAG and semantic search.

developers.googleblog.com

September 4, 2025 at 5:17 PM

Reposted by Jørgen Lund

EPFL School of Computer and Communication Sciences

@icepfl.bsky.social

EPFL, ETH Zurich & CSCS just released Apertus, Switzerland’s first fully open-source large language model.
Trained on 15T tokens in 1,000+ languages, it’s built for transparency, responsibility & the public good.

Read more: actu.epfl.ch/news/apertus...

September 2, 2025 at 11:48 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news