Lightnews — Scholar-powered news

Reposted by Josh Susskind

Marco Cuturi @marcocuturi.bsky.social · Aug 1

scaling up the computation of optimal transport couplings to hundreds of thousands of 3k dimensional vectors made easy using sharding and OTT-JAX! check this notebook, it only takes a few lines of code thanks to JAX's native sharding abilities ott-jax.readthedocs.io/en/latest/tu...

Sharded Sinkhorn — ott 0.5.1.dev34+g3462f28 documentation

ott-jax.readthedocs.io

2 15

Josh Susskind @kindsuss.bsky.social · Jun 8

Wow. Thank you for your bravery whoever you are.

Jeremy Lindenfeld @jeremotographs.bsky.social · Jun 7

At the ICE raid and subsequent community resistance in Paramount, California this morning, this skater kid ate dozens of munitions from Border Patrol agents, walked away slowly and flipped them off.

1

Josh Susskind @kindsuss.bsky.social · May 5

How about "Machine Learning and Computer Science"? MLCS. 😁

Josh Susskind @kindsuss.bsky.social · Apr 23

What say you @stanford.edu?
www.aacu.org/newsroom/a-c...

A Call for Constructive Engagement | AAC&U

A Call for Constructive Engagement

www.aacu.org

1 1

Josh Susskind @kindsuss.bsky.social · Apr 11

Check out our Apple research work on scaling laws for native multimodal models! Combined with mixtures of experts, native models develop both specialized and multimodal representations! Lots of rich findings and opportunists for follow up research!

arXiv cs.CV Computer Vision and Pattern Recognition @cscv-bot.bsky.social · Apr 11

Shukor, Fini, da Costa, Cord, Susskind, El-Nouby: Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models https://arxiv.org/abs/2504.07951 https://arxiv.org/pdf/2504.07951 https://arxiv.org/html/2504.07951

5 6

Josh Susskind @kindsuss.bsky.social · Mar 24

Paper link: t.co/Z2FZ6YSpbA
Code/Model checkpoints: t.co/bXYHZOONOm

https://arxiv.org/pdf/2502.18435

t.co

Josh Susskind @kindsuss.bsky.social · Mar 24

My colleagues in #Apple ML Research posted a fun paper investigating how autoregressive design choices affect reasoning (in this case, multi-choice question answering), showing a benefit to R2L ordering. Reminds me of similar findings for reverse order addition in arxiv.org/abs/2310.16028!

1 3 9

Josh Susskind @kindsuss.bsky.social · Mar 2

Permanent Hellcountry is a badass name for a band! Too bad it's also us. stranger than fiction.

Josh Susskind @kindsuss.bsky.social · Feb 25

My colleague Shuangfei Zhai is looking for a summer research intern to work on improving TarFlow at Apple. If interested, send your CV to szhai at apple.com by this week.

4

Josh Susskind @kindsuss.bsky.social · Feb 20

youtu.be/rKBM2kS6B8o

Thank you @sanders.senate.gov for speaking up

A Sad Moment in American History

YouTube video by Senator Bernie Sanders

youtu.be

2

Josh Susskind @kindsuss.bsky.social · Feb 14

Is there an article associated with this thread?

1 1

Josh Susskind @kindsuss.bsky.social · Feb 14

Here's a great paper on scaling laws for teacher-student neural network distillation led by @dbusbridge.bsky.social and Apple colleagues. I've often seen people struggle to get distillation working well enough in practical settings, and I expect the insights in this paper can really help!

Dan Busbridge @dbusbridge.bsky.social · Feb 13

Reading "Distilling Knowledge in a Neural Network" left me fascinated and wondering:

"If I want a small, capable model, should I distill from a more powerful model, or train from scratch?"

Our distillation scaling law shows, well, it's complicated... 🧵

arxiv.org/abs/2502.08606

Distillation Scaling Laws

We provide a distillation scaling law that estimates distilled model performance based on a compute budget and its allocation between the student and teacher. Our findings reduce the risks associated ...

arxiv.org

4

Josh Susskind @kindsuss.bsky.social · Feb 12

Here's a fun Apple research paper seeking to understand when/why diffusion models can be composed to generate images containing multiple independent concepts. For example, composing images from a model trained on Preetum's dog and a model trained on hats. Because why wouldn't you want to do that?!!

Preetum Nakkiran @preetumnakkiran.bsky.social · Feb 11

Paper🧵 (cross-posted at X): When does composition of diffusion models "work"? Intuitively, the reason dog+hat works and dog+horse doesn’t has something to do with independence between the concepts being composed. The tricky part is to formalize exactly what this means. 1/

Left Image: A shaggy dog-horse hybrid standing in a rural landscape.
Right Image: A golden dog wearing a red beret against a blurred outdoor background.

1

Josh Susskind @kindsuss.bsky.social · Feb 11

Yeah! that's what I said when I saw it too :) Better than any dog-horse I could make!

2

Josh Susskind @kindsuss.bsky.social · Feb 11

If you are interested in doing an internship in ML research at Apple, I highly recommend talking with Etai Littwin (and Vimal Thilak is pretty awesome too!)

Vimal Thilak @aggieinca.bsky.social · Feb 7

🚨 Apple Machine Learning Research Internship opportunity! My colleagues in Apple MLR are looking for a PhD research intern with a strong interest in reinforcement learning/post-training for LLMs. If interested, apply by sending an email to Etai Littwin (elittwin at apple dot com)

1

Josh Susskind @kindsuss.bsky.social · Jan 29

I think it's really important for more of this kind of work to be published openly rather than be walled off due to corporate greed -- scientific inquiry benefits us all. Hopefully we will continue to see lots and lots more of this!

1

Josh Susskind @kindsuss.bsky.social · Jan 29

This work was born from an Apple internship with Harshay Shah. Samira provided excellent direction and technical contributions along with Vimal, and the entire team was incredibly helpful! I'm intrigued reading comprehension tasks do not follow pre-training scaling curves -- gotta follow this up!

Samira @samiraabnar.bsky.social · Jan 28

🚨 One question that has always intrigued me is the role of different ways to increase a model's capacity: parameters, parallelizable compute, or sequential compute?

We explored this through the lens of MoEs:

1 1

Reposted by Josh Susskind

dominik1klein.bsky.social @dominik1klein.bsky.social · Jan 23

Missing the deep learning part? go check out the follow up work @neuripsconf.bsky.social (tinyurl.com/yvf72kzf) and @iclr-conf.bsky.social (tinyurl.com/4vh8vuzk)

3 11

Josh Susskind @kindsuss.bsky.social · Jan 23

Too disgusted by the Twitter/X vomit and could not justify keeping my account there. Hoping this platform steers clear of disinformation and hate -- and remains a positive place to share science and other good things.

1

Josh Susskind @kindsuss.bsky.social · Jan 23

Here's a really cool cross-institution study leveraging optimal transport techniques developed by my Apple ML Research colleagues! It's great to see basic research in machine learning translate into scientific tools like this. Cuts into the AI hype a bit ;)

Marco Cuturi @marcocuturi.bsky.social · Jan 22

Today is a great day for optimal transport 🎉! Lots of gratitude 🙏 for all folks who contributed to ott-jax.readthedocs.io and pushed for the MOSCOT (now @ nature!) paper, from visionaries @dominik1klein.bsky.social, G. Palla, Z. Piran to the magician, Michal Klein! ❤️

www.nature.com/articles/s41...

1 2

Reposted by Josh Susskind

Cem Koç @cemkoch.bsky.social · Dec 19

Excited about vision-language models? 🚀 Check out our latest work on FastVLM, a new family of efficient vision-language models that balances the tradeoff between high-resolution image understanding and latency without compromising accuracy!

arxiv.org/abs/2412.13303

6 1 1

Josh Susskind @kindsuss.bsky.social · Dec 18

If you're looking for research scientist roles in Europe, check out Marco's post! The Paris team is fantastic, and does diverse idea-driven and impactful research. In addition, MLR is highly collaborative across timezones, so you'd have a chance to work with many others too.

Marco Cuturi @marcocuturi.bsky.social · Dec 18

The Apple Machine Learning Research (MLR) team in Paris has openings for both FTE roles and a short-term post-doc position to contribute to our team's research agenda. Researchers at Apple's MLR (led by Samy Bengio) target impactful publications in top-tier ML venues and OSS.

1 2

Josh Susskind @kindsuss.bsky.social · Dec 10

Last but not least, please check out the flurry of papers being presented at #NeurIPS2024 , highlighted here in this post machinelearning.apple.com/research/neu... that showcases work from many teams at Apple and their academic collaborators.

Thanks for making it to the end ;-)

Apple Machine Learning Research at NeurIPS 2024

Apple researchers are advancing the field of ML through fundamental research that improves the world’s understanding of this technology and…

machinelearning.apple.com

2

Josh Susskind @kindsuss.bsky.social · Dec 10

EC-IJEPA makes the JEPA approach less brittle, and also further unlocks its use in diverse planning and reasoning tasks that leverage pre-tained visual representations as a world model. We're excited to see others build on this work with us!
12/n

1

Josh Susskind @kindsuss.bsky.social · Dec 10

Returning to the theme of powerful visual representation learning, please check out Vimal Thilak, Etai Littwin, and Anand Gopalakrishnan's EC-IJEPA paper, on improving JEPA models with spatial conditioning:
x.com/AggieInCA/st...
11/n

x.com

1