kabir
banner
kabir25.bsky.social
kabir
@kabir25.bsky.social
Built a *tiny-Mixtral model* (~172M, 8 experts) from scratch
with
- Grouped Query Attention,
- Rolling Buffer KV Cache
- Sparse MoEs
- Rotary Positional Embeddings
Trained it on TinyStories.

github.com/kabir2505/ti...
May 5, 2025 at 7:57 AM
Read a super interesting paper recently — “LLMs Know More Than They Show”. (openreview.net/forum?id=KRn...) It dives into how large language models actually encode way more truthfulness internally than they let on in their outputs.
April 8, 2025 at 3:07 PM
implemented the Llama architecture from scratch in pytorch

github.com/kabir2505/De...
March 18, 2025 at 2:49 AM
let's implement llama today 😋
March 9, 2025 at 7:12 AM
Not enough ml/dl folks on my feed
March 6, 2025 at 5:58 AM
implemented wgan & wgan-gp in torch
github.com/kabir2505/De...
github.com/kabir2505/De...

onto some more gan models & vaes :)
Deep-Learning-History/GANs/WGan at main · kabir2505/Deep-Learning-History
Deep learning paper implementations. Contribute to kabir2505/Deep-Learning-History development by creating an account on GitHub.
github.com
March 6, 2025 at 3:18 AM
Spent the day revisiting dropout, so I figured I’d turn it into a blog - kabir25.notion.site/Dropout-16e3...
x.com
x.com
January 1, 2025 at 1:51 PM
Implemented *instruction fine-tuning* on a GPT-2 model on a small dataset & mine claimed Robert Frost wrote Pride and Prejudice😅

github.com/kabir2505/pr...
December 28, 2024 at 5:30 PM
today's agenda..
December 21, 2024 at 4:27 AM
my notes on the gpt-3 paper: kabir25.notion.site/GPT3-1603fc0...
December 20, 2024 at 3:37 PM
Reposted by kabir
If you are into ML theory (RL or not) with a proven track record, and you are interested in an industry research position, PM me. Feel free to spread the word.
December 19, 2024 at 12:55 AM
today's read :)
December 18, 2024 at 2:33 PM
NeurIPS FOMO is real 🫠 wish I could teleport..
December 14, 2024 at 4:03 PM
Built 𝗕𝗘𝗥𝗧 from scratch in pytorch. Took a bit to understand 𝗠̲𝗟̲𝗠̲(Masked Language Modeling) and 𝗡̲𝗦̲𝗣̲ (Next Sentence Prediction) but totally worth the grind.
Code: github.com/kabir2505/De...
Notes: kabir25.notion.site/BERT-1533fc0...
Deep-Learning-papers/transformers/bert at main · kabir2505/Deep-Learning-papers
Deep learning paper implementations. Contribute to kabir2505/Deep-Learning-papers development by creating an account on GitHub.
github.com
December 13, 2024 at 12:56 PM
tackling my first nlp kaggle competition, any suggestions or references?
December 6, 2024 at 4:55 AM
notes on bert: kabir25.notion.site/BERT-1533fc0...
Still a work in progress..
December 5, 2024 at 4:29 PM
Diving into BERT today :)
December 5, 2024 at 12:31 PM