Lightnews — Scholar-powered news

Reposted by Vimal Thilak

Cem Koç @cemkoch.bsky.social · May 7

Today we have released the code and a demo iOS application for FastVLM - our extremely efficient and fast vision language model which runs on your device using MLX! You can check out the code and the app here: github.com/apple/ml-fas...

1 3 4

Reposted by Vimal Thilak

davidgrangier.bsky.social @davidgrangier.bsky.social · Apr 21

#ICLR #TrainBetterLM I am at ICLR, come to our posters for improved language model training!

Recycle gradients for faster neural net training with AdEMAmix iclr.cc/virtual/2025... (Fri Apr 25, 10 am).

1/3

1 3 2

Vimal Thilak @aggieinca.bsky.social · Apr 11

Check our Pau and his Apple MLR team's blogpost on activation transport! Soon to be featured as a spotlight at ICLR :)

Pau Rodriguez @paurodriguez.bsky.social · Apr 11

Our work on fine-grained control of LLMs and diffusion models via Activation Transport will be presented @iclr_conf as spotlight✨Check out our new blog post machinelearning.apple.com/research/tra...

Vimal Thilak @aggieinca.bsky.social · Apr 11

More scaling laws? Mustafa and his team at Apple MLR have you covered at least when ti comes to native multimodal models scaling laws :)

Josh Susskind @kindsuss.bsky.social · Apr 11

Check out our Apple research work on scaling laws for native multimodal models! Combined with mixtures of experts, native models develop both specialized and multimodal representations! Lots of rich findings and opportunists for follow up research!

arXiv cs.CV Computer Vision and Pattern Recognition @cscv-bot.bsky.social · Apr 11

Shukor, Fini, da Costa, Cord, Susskind, El-Nouby: Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models https://arxiv.org/abs/2504.07951 https://arxiv.org/pdf/2504.07951 https://arxiv.org/html/2504.07951

Vimal Thilak @aggieinca.bsky.social · Apr 5

Calling all SSL practitioners -- check out this library done by the amazing \alpha-Req crew

Arna Ghosh @arnaghosh.bsky.social · Apr 1

Are you training self-supervised/foundation models, and worried if they are learning good representations? We got you covered! 💪
🦖Introducing Reptrix, a #Python library to evaluate representation quality metrics for neural nets: github.com/BARL-SSL/rep...
🧵👇[1/6]
#DeepLearning

Reposted by Vimal Thilak

Preetum Nakkiran @preetumnakkiran.bsky.social · Feb 11

Paper🧵 (cross-posted at X): When does composition of diffusion models "work"? Intuitively, the reason dog+hat works and dog+horse doesn’t has something to do with independence between the concepts being composed. The tricky part is to formalize exactly what this means. 1/

Left Image: A shaggy dog-horse hybrid standing in a rural landscape.
Right Image: A golden dog wearing a red beret against a blurred outdoor background.

2 15 39

Reposted by Vimal Thilak

Pierre Ablin @pierreablin.bsky.social · Feb 5

Excited to share Soup-of-Experts, a new neural network architecture that, for any given specific task, can instantiate in a flash a small model that is very good on it.

Made with ❤️ at Apple

Thanks to my co-authors David Grangier, Angelos Katharopoulos, and Skyler Seto!

arxiv.org/abs/2502.01804

4 12

Vimal Thilak @aggieinca.bsky.social · Feb 7

🚨 Apple Machine Learning Research Internship opportunity! My colleagues in Apple MLR are looking for a PhD research intern with a strong interest in reinforcement learning/post-training for LLMs. If interested, apply by sending an email to Etai Littwin (elittwin at apple dot com)

1 3

Vimal Thilak @aggieinca.bsky.social · Jan 28

Mixture of experts is an interesting architecture or so @samiraabnar.bsky.social told me when I joined the project last year. After some brilliant work from @harshay-shah.bsky.social and @samiraabnar.bsky.social , we have a scaling law paper!

Samira @samiraabnar.bsky.social · Jan 28

🚨 One question that has always intrigued me is the role of different ways to increase a model's capacity: parameters, parallelizable compute, or sequential compute?

We explored this through the lens of MoEs: