Lightnews — Scholar-powered news

Model Check - MiMo-Audio: Scaling Speech Pre-Training to 100M Hours

erogol.com @erogol.com · 17d

My post on MiMo-Audio

open.substack.com/pub/erogol/p...

🔥 Trained on 100M+ hours and shows emergent few-shot learning:
• Voice conversion
• Emotion transfer• Speech translation
• Cross-modal reasoning

⚡ Key finding: Speech follows same scaling laws as text LLMs

Going over the code and the technical report of the new Speech LM model from Xiaomi that rivals GPT4o-audio and Gemini

Voice + reasoning releases (Ling‑flash‑2.0, VoxCPM, Kimi K2, ultraVAD) and 2 papers: long‑horizon execution & decay‑free LR schedules.

erogol.com @erogol.com · 21d

Machine Learns #55 is out!

Full of new models… check it out

open.substack.com/pub/erogol/p...

Machine Learns #55

🤖 Voice models, long-context tricks, and a token-order loss worth trying Flashy audio releases + 5 papers (MoC, TOP, FELLE, M2N2, Motif TR)

erogol.com @erogol.com · Sep 4

machine learns #54 is out
open.substack.com/pub/erogol/p...

Machine Learns #54

Model Check - VibeVoice: Next-Token Diffusion Meets Long-Form Speech Generation

erogol.com @erogol.com · Aug 26

My breakdown of VibeVoice - new open-weight TTS model from Microsoft.

open.substack.com/pub/erogol/p...

Going over the code and the technical report of the new TTS model from Microsoft Research.

microsoft/VibeVoice-1.5B · Hugging Face

erogol.com @erogol.com · Aug 25

ms released a tts model… nice…

You can create long form convos and podcasts with 4 distinct voice

huggingface.co/microsoft/Vi...

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

Model check - KyutaiTTS: Streaming Text-to-Speech with Delayed Streams Modeling

erogol.com @erogol.com · Aug 2

KyutaiTTS solved streaming text-to-speech with a state machine that generates audio word-by-word as text arrives.

220ms latency, 10-second voice cloning, 32 concurrent users on single GPU.

No more waiting for complete sentences.

Full analysis: erogol.substack.com/p/model-chec...

Going over the Kyutai's new TTS model and its delayed streaming model.

erogol.substack.com

1 2

erogol.com @erogol.com · Jun 12

This is such a great idea

sakanaai.bsky.social @sakanaai.bsky.social · Jun 12

We’re excited to introduce Text-to-LoRA: a Hypernetwork that generates task-specific LLM adapters (LoRAs) based on a text description of the task. Catch our presentation at #ICML2025!

Paper: arxiv.org/abs/2506.06105
Code: github.com/SakanaAI/Tex...

BlaGPT/bla_gpt/llada.py at main · erogol/BlaGPT

erogol.com @erogol.com · Jun 10

claude is the best coding model

gemini cause frequent syntax errors

openai does not even understand the task at hand

erogol.com @erogol.com · Jun 1

lately spending sometime with Diffusion LMs and working on NanoGPT style LlaDA model

so far I've not achieved comparable results to AR models but its a good start

github.com/erogol/BlaGP...

Experimental playground for benchmarking language model (LM) architectures, layers, and tricks on smaller datasets. Designed for flexible experimentation and exploration. - erogol/BlaGPT

OpenAI's 'Sign in with ChatGPT', Meta's AGI ambitions, new models like Gemma 3 & MAGI-1, research breakthroughs in KV caching for diffusion & PaTH Attention, and fresh open-source releases.

Reposted

sakanaai.bsky.social @sakanaai.bsky.social · May 30

This work was done in collaboration with Jeff Clune’s lab at UBC, and led by his PhD students Jenny Zhang and Shengran Hu, together with Cong Lu and Robert Lange.

Paper: arxiv.org/abs/2505.22954
Code: github.com/jennyzzt/dgm

3 12

erogol.com @erogol.com · May 28

⚡ Machine Learns issue 48 is out

🚀 dKV-Cache accelerates diffusion models up to 10x faster
🔐 OpenAI's authentication play (think OAuth for AI)
🎯 PaTH Attention beats RoPE on long-context tasks
🤖 Humanoid Robot fights became real

open.substack.com/pub/erogol/p...

Machine Learns #48

GitHub - erogol/BlaGPT: Experimental playground for benchmarking language model (LM) architectures, layers, and tricks on smaller datasets. Designed for flexible experimentation and exploration.

2

erogol.com @erogol.com · May 27

Following the bread crumbs, implemented PLE from Gemma3n.

It gave a significant performance boost and resulted in a new best model with almost no compute overhead.

github.com/erogol/BlaGPT

Experimental playground for benchmarking language model (LM) architectures, layers, and tricks on smaller datasets. Designed for flexible experimentation and exploration. - erogol/BlaGPT

Paper check: Merging LLMs at Pre-training, Considering Token Probabilities at RL

2

erogol.com @erogol.com · May 21

My paper notes on 2 new papers

- Model Merging in Pre-training of Large Language Models,
- Do Not Let Low-Probability Tokens Over-Dominate in RL,

open.substack.com/pub/erogol/p...

🔬Two papers in scope: "Model Merging in Pre-training for LLMs" and "Do Not Let Low-Probability Tokens Over-Dominate in RL"

GitHub - erogol/BlaGPT: Experimental playground for benchmarking language model (LM) architectures, layers, and tricks on smaller datasets. Designed for flexible experimentation and exploration.

erogol.com @erogol.com · May 8

muon really works. got best results in BlaGPT

```
torchrun --standalone --nproc_per_node=8 train.py --run_name best_model --model_name best
```

github.com/erogol/BlaGPT

Experimental playground for benchmarking language model (LM) architectures, layers, and tricks on smaller datasets. Designed for flexible experimentation and exploration. - erogol/BlaGPT

GitHub - erogol/BlaGPT: Experimental playground for benchmarking language model (LM) architectures, layers, and tricks on smaller datasets. Designed for flexible experimentation and exploration.

erogol.com @erogol.com · May 6

All code is available in BlaGPT if you want to check it out yourself!

github.com/erogol/BlaGPT

Experimental playground for benchmarking language model (LM) architectures, layers, and tricks on smaller datasets. Designed for flexible experimentation and exploration. - erogol/BlaGPT

erogol.com @erogol.com · May 6

My results:

• Canon Layers definitely improved performance when placed before Attention/MLP blocks
• Softpick had worse validation loss but completely removed attention sinks
• Parallel blocks matched baseline performance but trained 15% faster

1 2

erogol.com @erogol.com · May 6

Parallel Transformer blocks run MLP and Attention in parallel instead of one after another.

So you get: z = x + MLP(x) + Attention(x)

PaLM models use this approach, which improves memory usage and speed without hurting performance.

erogol.com @erogol.com · May 6

The Canon Layers paper shows they boost performance when added to transformer blocks.

They also help models without positional encoding work just as well as RoPE models.

❗Worth noting that RWKV used a similar idea years ago.

erogol.com @erogol.com · May 6

Canon Layers are basically causal 1D convolutions that mix the current hidden state with previous states (how many depends on the kernel size).

erogol.com @erogol.com · May 6

Softpick replaces regular softmax in attention blocks.

It allows zero values in the numerator and lets negative values contribute to the denominator.

This prevents attention sinks while keeping math properties similar to regular softmax.

erogol.com @erogol.com · May 6

🧵 Here is a small thread with my notes about some of the recent Transformer papers.

- Softpick: an alternative to softmax in Attention
- Canon Layers: mixing states with conv1d
- Parallel Transformer blocks

OpenAI's social network & GPT-4.1, China launches $8.2B AI fund, NVIDIA's US manufacturing push, new GLM-4 & MineWorld models, C3PO expert pathways optimization, GigaTok's 3B visual tokenizer...

erogol.com @erogol.com · Apr 16

Machine learns #45 - no fluff AI newsletter - is out!

I normally share bi-weekly but last week was full enough so here we go

open.substack.com/pub/erogol/p...

Machine Learns #45

erogol.com @erogol.com · Apr 11

Updated my LLM usage and cancelled ChatGPT sub for now

Coding - Claude, Gemini 2.5
Reading papers - Claude
Research - Gemini 2.5
Daily - Gemini 2.5
Search - Gemini 2.5

erogol.com @erogol.com · Mar 8

Here is my use of LLMs

Coding - Claude (best by far), QwenChat
Reading papers - Claude
Research - ChatGPT (best UI,UX), Gemini (better results)
Daily - ChatGPT
Search - ChatGPT

I'd love to try searching with Claude, but not there yet.

Any suggestions for change?

erogol.com @erogol.com · Apr 11

Thanks :)