Lightnews — Scholar-powered news

Sebastian Raschka (rasbt)

@sebastianraschka.com

9.8K followers 250 following 310 posts

ML/AI researcher & former stats professor turned LLM research engineer. Author of "Build a Large Language Model From Scratch" (https://amzn.to/4fqvn0D) & reasoning (https://mng.bz/Nwr7).

Also blogging about AI research at magazine.sebastianraschka.com.

Posts Replies Media Videos

Sebastian Raschka (rasbt)

@sebastianraschka.com

Ha, thanks! Happy new year to you as well!

December 31, 2025 at 1:54 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

Thanks! Is /r/machinelearning still weekend only for unless it's an arxiv article?

December 30, 2025 at 7:29 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

This is an opinion. That's why I prefaced my post with "I think of it as this"

December 29, 2025 at 3:53 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

I agree. I was thinking of “faster” because it frees time when letting it do boilerplate stuff. And I was thinking of “better” as in using it to find issues that were accidentally overlooked.

December 28, 2025 at 9:18 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

Yeah. My point was that LLMs are good amplifiers, but they are not the only tool one should use and learn from.

December 28, 2025 at 5:06 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

It's a cycle: Coding manually, reading resources written by experts, looking at high-quality projects built by experts, getting advice from experts, and repeat...

December 28, 2025 at 4:17 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

I discuss the more historical building blocks here if you are interested (going back to "Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Neural Networks" 1991 by Schmidhuber): magazine.sebastianraschka.com/p/understand...

Understanding Large Language Models

A Cross-Section of the Most Relevant Literature To Get Up to Speed

magazine.sebastianraschka.com

December 23, 2025 at 3:35 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

Yes yes. This is not a complete history.
I assume you are specifically referring to the first line “202x…”? I merely wanted to say that the focus in the early 2020s was more on pre-training than anything else then. (I think the term LLM wasn’t coined until the 175B GPT-3 model came out).

December 23, 2025 at 3:34 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

Actually I didn’t change any of the earlier sections but just appended the new sections to the article.
Re your LLM idea, I could see it as a benchmark for agentic LLMs though to see if they can get the correct architecture info from the code bases.

December 14, 2025 at 3:30 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

Based on the naming resemblance, if I had to guess, DeepSeekMoE was motivated by DeepSpeed-MoE (arxiv.org/abs/2201.05596) 14 Jan 2022

December 12, 2025 at 9:00 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

Tbh if it took them a month to write and release the paper, the DeepSeekMoE team probably also had the model ready in December.
Or in other words, I don't think they trained the model in just a month with all the ablation studies in that paper.

December 12, 2025 at 8:58 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

They don't have a reasoning model, yet. So, it is a bit unfair to compare, but since you asked:

December 12, 2025 at 8:42 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

I think Google originally came up with MoE, and DeepSeek and Mixtral adopted it independently of each other.

Eg looking at arxiv, the Mixtral report came out on 8 Jan 2024 (arxiv.org/abs/2401.04088), and DeepSeekMoE around the same time on 11 Jan 2024 (arxiv.org/abs/2401.06066)

December 12, 2025 at 8:34 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

Good catch, yes that should have been 70% not 40%. Thanks!

December 12, 2025 at 7:20 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

Yes, good point. I must have accidentally moved the text boxes to the wrong position. Someone mentioned that on the forum last week and it's fixed now (the next time the MEAP is updated, the figures will be automatically replaced. Thanks for mentioning.

December 6, 2025 at 1:11 AM

Sebastian Raschka (rasbt)

@sebastianraschka.com

Sounds interesting, but as far as I know, it doesn't have GPU support (but maybe they added that and I missed it)

December 6, 2025 at 1:10 AM

Sebastian Raschka (rasbt)

@sebastianraschka.com

Yes, it's a somewhat scaled-down version of the H100 to make it export-compliant

December 3, 2025 at 3:59 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

I think you recently mentioned their alternative, more efficient GPUs. Actually, in their latest V3.2 technical report they mention H800s, so it looks like they are back to using NVIDIA GPUs.

December 3, 2025 at 2:53 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news