Vimal Thilak
@aggieinca.bsky.social
17 followers 28 following 5 posts
ML Engineer-ist @ Apple Machine Learning Research
Posts Media Videos Starter Packs
Reposted by Vimal Thilak
cemkoch.bsky.social
Today we have released the code and a demo iOS application for FastVLM - our extremely efficient and fast vision language model which runs on your device using MLX! You can check out the code and the app here: github.com/apple/ml-fas...
Reposted by Vimal Thilak
davidgrangier.bsky.social
#ICLR #TrainBetterLM I am at ICLR, come to our posters for improved language model training!

Recycle gradients for faster neural net training with AdEMAmix iclr.cc/virtual/2025... (Fri Apr 25, 10 am).

1/3
aggieinca.bsky.social
Check our Pau and his Apple MLR team's blogpost on activation transport! Soon to be featured as a spotlight at ICLR :)
paurodriguez.bsky.social
Our work on fine-grained control of LLMs and diffusion models via Activation Transport will be presented @iclr_conf as spotlight✨Check out our new blog post machinelearning.apple.com/research/tra...
aggieinca.bsky.social
More scaling laws? Mustafa and his team at Apple MLR have you covered at least when ti comes to native multimodal models scaling laws :)
kindsuss.bsky.social
Check out our Apple research work on scaling laws for native multimodal models! Combined with mixtures of experts, native models develop both specialized and multimodal representations! Lots of rich findings and opportunists for follow up research!
cscv-bot.bsky.social
Shukor, Fini, da Costa, Cord, Susskind, El-Nouby: Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models https://arxiv.org/abs/2504.07951 https://arxiv.org/pdf/2504.07951 https://arxiv.org/html/2504.07951
aggieinca.bsky.social
Calling all SSL practitioners -- check out this library done by the amazing \alpha-Req crew
arnaghosh.bsky.social
Are you training self-supervised/foundation models, and worried if they are learning good representations? We got you covered! 💪
🦖Introducing Reptrix, a #Python library to evaluate representation quality metrics for neural nets: github.com/BARL-SSL/rep...
🧵👇[1/6]
#DeepLearning
Reposted by Vimal Thilak
preetumnakkiran.bsky.social
Paper🧵 (cross-posted at X): When does composition of diffusion models "work"? Intuitively, the reason dog+hat works and dog+horse doesn’t has something to do with independence between the concepts being composed. The tricky part is to formalize exactly what this means. 1/
Left Image: A shaggy dog-horse hybrid standing in a rural landscape.
Right Image: A golden dog wearing a red beret against a blurred outdoor background.
Reposted by Vimal Thilak
pierreablin.bsky.social
Excited to share Soup-of-Experts, a new neural network architecture that, for any given specific task, can instantiate in a flash a small model that is very good on it.

Made with ❤️ at Apple

Thanks to my co-authors David Grangier, Angelos Katharopoulos, and Skyler Seto!

arxiv.org/abs/2502.01804
aggieinca.bsky.social
🚨 Apple Machine Learning Research Internship opportunity! My colleagues in Apple MLR are looking for a PhD research intern with a strong interest in reinforcement learning/post-training for LLMs. If interested, apply by sending an email to Etai Littwin (elittwin at apple dot com)
aggieinca.bsky.social
Mixture of experts is an interesting architecture or so @samiraabnar.bsky.social told me when I joined the project last year. After some brilliant work from @harshay-shah.bsky.social and @samiraabnar.bsky.social , we have a scaling law paper!
samiraabnar.bsky.social
🚨 One question that has always intrigued me is the role of different ways to increase a model's capacity: parameters, parallelizable compute, or sequential compute?

We explored this through the lens of MoEs: