Lightnews — Scholar-powered news

kabir

@kabir25.bsky.social

Built a *tiny-Mixtral model* (~172M, 8 experts) from scratch
with
- Grouped Query Attention,
- Rolling Buffer KV Cache
- Sparse MoEs
- Rotary Positional Embeddings
Trained it on TinyStories.

github.com/kabir2505/ti...

May 5, 2025 at 7:57 AM

kabir

@kabir25.bsky.social

Read a super interesting paper recently — “LLMs Know More Than They Show”. (openreview.net/forum?id=KRn...) It dives into how large language models actually encode way more truthfulness internally than they let on in their outputs.

April 8, 2025 at 3:07 PM

kabir

@kabir25.bsky.social

implemented the Llama architecture from scratch in pytorch

github.com/kabir2505/De...

March 18, 2025 at 2:49 AM

kabir

@kabir25.bsky.social

let's implement llama today 😋

March 9, 2025 at 7:12 AM

kabir

@kabir25.bsky.social

Not enough ml/dl folks on my feed

March 6, 2025 at 5:58 AM

kabir

@kabir25.bsky.social

implemented wgan & wgan-gp in torch
github.com/kabir2505/De...
github.com/kabir2505/De...

onto some more gan models & vaes :)

Deep-Learning-History/GANs/WGan at main · kabir2505/Deep-Learning-History

Deep learning paper implementations. Contribute to kabir2505/Deep-Learning-History development by creating an account on GitHub.

github.com

March 6, 2025 at 3:18 AM

kabir

@kabir25.bsky.social

Spent the day revisiting dropout, so I figured I’d turn it into a blog - kabir25.notion.site/Dropout-16e3...

x.com

January 1, 2025 at 1:51 PM

kabir

@kabir25.bsky.social

Implemented *instruction fine-tuning* on a GPT-2 model on a small dataset & mine claimed Robert Frost wrote Pride and Prejudice😅

github.com/kabir2505/pr...

December 28, 2024 at 5:30 PM

kabir

@kabir25.bsky.social

today's agenda..

December 21, 2024 at 4:27 AM

kabir

@kabir25.bsky.social

my notes on the gpt-3 paper: kabir25.notion.site/GPT3-1603fc0...

December 20, 2024 at 3:37 PM

Reposted by kabir

Csaba Szepesvari

@skiandsolve.bsky.social

If you are into ML theory (RL or not) with a proven track record, and you are interested in an industry research position, PM me. Feel free to spread the word.

December 19, 2024 at 12:55 AM

kabir

@kabir25.bsky.social

today's read :)

December 18, 2024 at 2:33 PM

kabir

@kabir25.bsky.social

NeurIPS FOMO is real 🫠 wish I could teleport..

December 14, 2024 at 4:03 PM

kabir

@kabir25.bsky.social

Built 𝗕𝗘𝗥𝗧 from scratch in pytorch. Took a bit to understand 𝗠̲𝗟̲𝗠̲(Masked Language Modeling) and 𝗡̲𝗦̲𝗣̲ (Next Sentence Prediction) but totally worth the grind.
Code: github.com/kabir2505/De...
Notes: kabir25.notion.site/BERT-1533fc0...