Lightnews — Scholar-powered news

Sebastian Raschka (rasbt)

@sebastianraschka.com

9.5K followers 240 following 250 posts

ML/AI researcher & former stats professor turned LLM research engineer. Author of "Build a Large Language Model From Scratch" (https://amzn.to/4fqvn0D) & reasoning (https://mng.bz/Nwr7). Also blogging about AI research at magazine.sebastianraschka.com.

amzn.to

Posts Media Videos Starter Packs

Pinned

Sebastian Raschka (rasbt) @sebastianraschka.com · 4d

How do we evaluate LLMs?
I wrote up a new article on
(1) multiple-choice benchmarks,
(2) verifiers,
(3) leaderboards, and
(4) LLM judges

All with from-scratch code examples, of course!

sebastianraschka.com/blog/2025/ll...

Understanding the 4 Main Approaches to LLM Evaluation (From Scratch)

Multiple-Choice Benchmarks, Verifiers, Leaderboards, and LLM Judges with Code Examples

sebastianraschka.com

7 52

Sebastian Raschka (rasbt) @sebastianraschka.com · 37m

Interesting! I am not sure it will be a concatenation of models (I can't see how it would work), but I can see using modules like this that a generalist model can call for specific problems.

1 1

Sebastian Raschka (rasbt) @sebastianraschka.com · 38m

This was probably developed with ARC in mind. But they also had models for Sudoku-hard and Maze, so I don't think that's true. I.e., you can train a similar model on other logic puzzle tasks as well.

1 1

Sebastian Raschka (rasbt) @sebastianraschka.com · 1h

8/8 Anyways, it's nice paper and a refreshing read in the era of LLMs: arxiv.org/abs/2510.04871

Less is More: Recursive Reasoning with Tiny Networks

Hierarchical Reasoning Model (HRM) is a novel approach using two small neural networks recursing at different frequencies. This biologically inspired method beats Large Language models (LLMs) on hard ...

arxiv.org

Sebastian Raschka (rasbt) @sebastianraschka.com · 1h

7/ In practice, we often start by throwing LLMs at a problem, which makes sense for quick prototyping and establishing a baseline. But I can see a point where someone sits down afterward and trains a focused model like this to solve the same task more efficiently.

1 6

Sebastian Raschka (rasbt) @sebastianraschka.com · 1h

6/
That said, HRM and TRM are fascinating proof‑of‑concepts that show what’s possible with relatively small and efficient architectures. I'm still curious what the real‑world use case will look like. Maybe they could serve as reasoning or planning modules within a larger tool‑calling system.

1 2 7

Sebastian Raschka (rasbt) @sebastianraschka.com · 1h

5/ My personal caveat: comparing this method (or HRMs) to LLMs feels a bit unfair since HRMs/TRM are specialized models trained for specific tasks (here: ARC, Sudoku, and Maze pathfinding) while LLMs are generalists. It’s like comparing a pocket calculator to a computer.

1 8

Sebastian Raschka (rasbt) @sebastianraschka.com · 1h

4/ TRM backpropagates through the full recursion once per step, whereas HRM only backpropagates through the final few steps. And TRM also removes HRM's extra forward pass for halting and instead uses a simple binary cross-entropy loss to learn when to stop iterating.

1 4

Sebastian Raschka (rasbt) @sebastianraschka.com · 1h

3/ In short, HRM recurses multiple times through two small neural nets with 4 transformer blocks each (high and low frequency). TRM is much smaller (i.e., 4x) and only a single network with 2 transformer blocks.

1 4

Sebastian Raschka (rasbt) @sebastianraschka.com · 1h

2/ Now, the new "Less is More: Recursive Reasoning with Tiny Networks" paper proposes Tiny Recursive Model (TRM), which a simpler and even smaller model (7M, 4× smaller than HRM) that performs even better on the ARC challenge.

1 4

Sebastian Raschka (rasbt) @sebastianraschka.com · 1h

From the Hierarchical Reasoning Model (HRM) to a new Tiny Recursive Model (TRM).

A few months ago, the HRM made big waves in the AI research community as it showed really good performance on the ARC challenge despite its small 27M size. (That's about 22x smaller than the smallest Qwen3 0.6B model.)

2 6 19

Sebastian Raschka (rasbt) @sebastianraschka.com · 1d

It only took 13 years, but dark mode is finally here
sebastianraschka.com/blog/2021/dl...

1 48

Sebastian Raschka (rasbt) @sebastianraschka.com · 4d

Understanding the 4 Main Approaches to LLM Evaluation (From Scratch)

Multiple-Choice Benchmarks, Verifiers, Leaderboards, and LLM Judges with Code Examples

sebastianraschka.com

7 52

Sebastian Raschka (rasbt) @sebastianraschka.com · Apr 19

Just shared a new article on "The State of Reinforcement Learning for LLM Reasoning"!
If you are new to reinforcement learning, this article has a generous intro section (PPO, GRPO, etc)
Also, I cover 15 recent articles focused on RL & Reasoning.

🔗 magazine.sebastianraschka.com/p/the-state-...

1 10 62

Sebastian Raschka (rasbt) @sebastianraschka.com · Apr 2

Ha, yeah! It's interesting that they went with the GPT-4 tokenizer as the base tokenizer. But why not.

Sebastian Raschka (rasbt) @sebastianraschka.com · Mar 31

Gemma 3 is great! But t hat's a project for another day ... 😅

Sebastian Raschka (rasbt) @sebastianraschka.com · Mar 31

Coded Llama 3.2 model from scratch and shared it on the HF Hub.
Why? Because I think 1B & 3B models are great for experimentation, and I wanted to share a clean, readable implementation for learning and research: huggingface.co/rasbt/llama-...

5 14 72

Sebastian Raschka (rasbt) @sebastianraschka.com · Mar 23

My next tutorial on pretraining an LLM from scratch is now out. It starts with a step-by-step walkthrough of understanding, calculating, and optimizing the loss. After training, we update the text generation function with temperature scaling and top-k sampling: www.youtube.com/watch?v=Zar2...

12 62

Sebastian Raschka (rasbt) @sebastianraschka.com · Mar 18

wow, small world! And thanks regarding the code base, that means a lot to me (I was really spending a lot of time making this nice!)

Sebastian Raschka (rasbt) @sebastianraschka.com · Mar 18

I can't tell if you are joking...I hope you are joking...😅

1 1

Sebastian Raschka (rasbt) @sebastianraschka.com · Mar 17

Hmm 🤔

Reposted by Sebastian Raschka (rasbt)

Rémy @xowap.dev · Mar 17

I'm right now on the last chapter of "Build a Large Language Model (from scratch)" by @sebastianraschka.com and it's absolutely amazing to get started. Now I can understand why people lose their shit over DeepSeek, for example

2 1 10

Sebastian Raschka (rasbt) @sebastianraschka.com · Mar 17

all I can say is that there will likely be another book in due time...but it may take a bit (this book took me almost 1.5 years...) 😅

1 1

Sebastian Raschka (rasbt) @sebastianraschka.com · Mar 17

Ha, glad you are liking it so far!? Chapter 7 is maybe also the most fun one was we get to leverage all the stuff from the previous chapters to finally train a simple chatbot.
Re DeepSeek... it may take a bit but there'll be a sequel one day I hope!

1 3

Sebastian Raschka (rasbt) @sebastianraschka.com · Mar 17

Yup, you can find it here: github.com/rasbt/LLMs-f...

Sebastian Raschka (rasbt) @sebastianraschka.com · Mar 17

I just shared a new tutorial: Implementing GPT From Scratch!

In this 1:45 h hands-on coding session, I go over implementing the GPT architecture, the foundation of modern LLMs (and I also have bonus material converting it to Llama 3.2): www.youtube.com/watch?v=YSAk...

Build an LLM from Scratch 4: Implementing a GPT model from Scratch To Generate Text

YouTube video by Sebastian Raschka

www.youtube.com

2 10 48