@erogol.com
110 followers 44 following 83 posts
Doing ML erogol.com erogol.substack.com github.com/erogol
Posts Media Videos Starter Packs
erogol.com
My post on MiMo-Audio

open.substack.com/pub/erogol/p...

🔥 Trained on 100M+ hours and shows emergent few-shot learning:
• Voice conversion
• Emotion transfer• Speech translation
• Cross-modal reasoning

⚡ Key finding: Speech follows same scaling laws as text LLMs
Model Check - MiMo-Audio: Scaling Speech Pre-Training to 100M Hours
Going over the code and the technical report of the new Speech LM model from Xiaomi that rivals GPT4o-audio and Gemini
open.substack.com
erogol.com
KyutaiTTS solved streaming text-to-speech with a state machine that generates audio word-by-word as text arrives.

220ms latency, 10-second voice cloning, 32 concurrent users on single GPU.

No more waiting for complete sentences.

Full analysis: erogol.substack.com/p/model-chec...
Model check - KyutaiTTS: Streaming Text-to-Speech with Delayed Streams Modeling
Going over the Kyutai's new TTS model and its delayed streaming model.
erogol.substack.com
erogol.com
This is such a great idea
sakanaai.bsky.social
We’re excited to introduce Text-to-LoRA: a Hypernetwork that generates task-specific LLM adapters (LoRAs) based on a text description of the task. Catch our presentation at #ICML2025!

Paper: arxiv.org/abs/2506.06105
Code: github.com/SakanaAI/Tex...
erogol.com
claude is the best coding model

gemini cause frequent syntax errors

openai does not even understand the task at hand
Reposted
sakanaai.bsky.social
This work was done in collaboration with Jeff Clune’s lab at UBC, and led by his PhD students Jenny Zhang and Shengran Hu, together with Cong Lu and Robert Lange.

Paper: arxiv.org/abs/2505.22954
Code: github.com/jennyzzt/dgm
erogol.com
⚡ Machine Learns issue 48 is out

🚀 dKV-Cache accelerates diffusion models up to 10x faster
🔐 OpenAI's authentication play (think OAuth for AI)
🎯 PaTH Attention beats RoPE on long-context tasks
🤖 Humanoid Robot fights became real

open.substack.com/pub/erogol/p...
Machine Learns #48
OpenAI's 'Sign in with ChatGPT', Meta's AGI ambitions, new models like Gemma 3 & MAGI-1, research breakthroughs in KV caching for diffusion & PaTH Attention, and fresh open-source releases.
open.substack.com
erogol.com
My results:

• Canon Layers definitely improved performance when placed before Attention/MLP blocks
• Softpick had worse validation loss but completely removed attention sinks
• Parallel blocks matched baseline performance but trained 15% faster
erogol.com
Parallel Transformer blocks run MLP and Attention in parallel instead of one after another.

So you get: z = x + MLP(x) + Attention(x)

PaLM models use this approach, which improves memory usage and speed without hurting performance.
erogol.com
The Canon Layers paper shows they boost performance when added to transformer blocks.

They also help models without positional encoding work just as well as RoPE models.

❗Worth noting that RWKV used a similar idea years ago.
erogol.com
Canon Layers are basically causal 1D convolutions that mix the current hidden state with previous states (how many depends on the kernel size).
erogol.com
Softpick replaces regular softmax in attention blocks.

It allows zero values in the numerator and lets negative values contribute to the denominator.

This prevents attention sinks while keeping math properties similar to regular softmax.
erogol.com
🧵 Here is a small thread with my notes about some of the recent Transformer papers.

- Softpick: an alternative to softmax in Attention
- Canon Layers: mixing states with conv1d
- Parallel Transformer blocks
erogol.com
Updated my LLM usage and cancelled ChatGPT sub for now

Coding - Claude, Gemini 2.5
Reading papers - Claude
Research - Gemini 2.5
Daily - Gemini 2.5
Search - Gemini 2.5
erogol.com
Here is my use of LLMs

Coding - Claude (best by far), QwenChat
Reading papers - Claude
Research - ChatGPT (best UI,UX), Gemini (better results)
Daily - ChatGPT
Search - ChatGPT

I'd love to try searching with Claude, but not there yet.

Any suggestions for change?