with
- Grouped Query Attention,
- Rolling Buffer KV Cache
- Sparse MoEs
- Rotary Positional Embeddings
Trained it on TinyStories.
github.com/kabir2505/ti...
with
- Grouped Query Attention,
- Rolling Buffer KV Cache
- Sparse MoEs
- Rotary Positional Embeddings
Trained it on TinyStories.
github.com/kabir2505/ti...
github.com/kabir2505/De...
github.com/kabir2505/De...
onto some more gan models & vaes :)
github.com/kabir2505/De...
github.com/kabir2505/De...
onto some more gan models & vaes :)
github.com/kabir2505/pr...
github.com/kabir2505/pr...
Code: github.com/kabir2505/De...
Notes: kabir25.notion.site/BERT-1533fc0...
Code: github.com/kabir2505/De...
Notes: kabir25.notion.site/BERT-1533fc0...
Still a work in progress..
Still a work in progress..