Lightnews — Scholar-powered news

Reposted by Elie

Anton

@anton-l.bsky.social

LLM Reasoning labs will be eating good today🍔

We commandeered the HF cluster for a few days and generated 1.2M reasoning-filled solutions to 500k NuminaMath problems with DeepSeek-R1 🐳
Have fun!

February 12, 2025 at 2:36 PM

Reposted by Elie

Quentin Gallouédec

@qgallouedec.hf.co

Last moments of closed-source AI 🪦 :
Hugging Face is openly reproducing the pipeline of 🐳 DeepSeek-R1. Open data, open training. open models, open collaboration.

🫵 Let's go!
github.com/huggingface/...

GitHub - huggingface/open-r1: Fully open reproduction of DeepSeek-R1

Fully open reproduction of DeepSeek-R1. Contribute to huggingface/open-r1 development by creating an account on GitHub.

github.com

January 25, 2025 at 2:36 PM

Reposted by Elie

Lewis Tunstall

@lewtun.bsky.social

We are reproducing the full DeepSeek R1 data and training pipeline so everybody can use their recipe. Instead of doing it in secret we can do it together in the open!

Follow along: github.com/huggingface/...

GitHub - huggingface/open-r1: Fully open reproduction of DeepSeek-R1

Fully open reproduction of DeepSeek-R1. Contribute to huggingface/open-r1 development by creating an account on GitHub.

github.com

January 25, 2025 at 1:29 PM

Reposted by Elie

Anton

@anton-l.bsky.social

Introducing 📐FineMath: the best open math pre-training dataset with 50B+ tokens!

Math remains challenging for LLMs and by training on FineMath we see considerable gains over other math datasets, especially on GSM8K and MATH.

🤗 huggingface.co/datasets/Hug...

Here’s a breakdown 🧵

A plot showing increased performance of Llama-3.2-3B when pretrained on FineMath

December 19, 2024 at 3:55 PM

Elie

@eliebak.hf.co

WOW, Gemini Flash 2.0 is really impressive. Wondering about the size of this supposedly smol model.

One odd thing is that the model seems to lose some ability with long contexts compared to Flash 1.5. If any google friends could share insights, I'd love to hear them!

December 11, 2024 at 4:19 PM

Elie

@eliebak.hf.co

Hey, I'll be at neurips next week! My DM are open if you want to meet and talk about pre-training/data/whatever you want 🫡

December 4, 2024 at 8:06 AM

Elie

@eliebak.hf.co

Google patent on "Training of large neural network". 😮

I don't know if this give much information but by going quickly through it seems that:
- They are not only using "causal language modeling task" as a pre-training task but also "span corruption" and "prefix modeling". (ref [0805]-[0091])

December 3, 2024 at 11:11 AM

Reposted by Elie

merve

@merve.bsky.social

So many open-source and open releases last week!
Here's a recap, find the text-readable version here huggingface.co/posts/merve/...

December 2, 2024 at 9:53 AM

Reposted by Elie

Loubna Ben Allal

@loubnabnl.hf.co

📬 Summarize and rewrite your text/emails faster, and offline!

Check @andimara.bsky.social's Smol Tools for summarization and rewriting. It uses SmolLM2 to summarize text and make it more friendly or professional, all running locally thanks to llama.cpp github.com/huggingface/...

smollm/smol_tools at main · huggingface/smollm

Everything about the SmolLM & SmolLM2 family of models - huggingface/smollm

github.com

November 30, 2024 at 3:59 PM

Elie

@eliebak.hf.co

What else should we log during LLM training? Right now, it's just loss, grad_norm, and evals, but I want to log more to have a better understanding of pre-training. Thinking about adding stuff like entropix metrics (agreement, varentropy?)

Any thoughts or cool ideas?

November 30, 2024 at 3:19 PM

Reposted by Elie

Xenova

@xenova.bsky.social

WOW! 🤯 Language models are becoming smaller and more capable than ever! Here's SmolLM2 running 100% locally in-browser w/ WebGPU on a 6-year-old GPU. Just look at that speed! ⚡️😍

Powered by 🤗 Transformers.js and ONNX Runtime Web!

How many tokens/second do you get? Let me know! 👇

November 27, 2024 at 1:51 PM

Reposted by Elie

Zach Mueller

@muellerzr.bsky.social

I'm looking for an intern!

If you are:
* Driven
* Love OSS
* Interested in distributed PyTorch training/FSDPv2/DeepSpeed

Come work with me!

Fully remote, more details to apply in the comments

A job description stating:
About this Role

This internship works at the intersections of software engineering, machine learning engineering, and education. With a strong focus on distributed training through the accelerate library (https://huggingface.co/docs/accelerate/index), we'll focus on bringing state-of-the-art training techniques into the library while also documenting and helping
teach others how they work. By the end of this internship, the candidate will have touched on all aspects of distributed training and core library contributions, including large-scale distributed training, API design, writing educational material aimed at a semi-technical audience, and
understanding the nuances of writing software that scales.

November 26, 2024 at 4:01 PM

Elie

@eliebak.hf.co

10000% agree with omar, this is totally disproportionate

Omar Sanseviero @osanseviero.bsky.social · Nov 27

I'm disheartened by how toxic and violent some responses were here.

There was a mistake, a quick follow up to mitigate and an apology. I worked with Daniel for years and is one of the persons most preoccupied with ethical implications of AI. Some replies are Reddit-toxic level. We need empathy.

Daniel van Strien @danielvanstrien.bsky.social · Nov 27

I've removed the Bluesky data from the repo. While I wanted to support tool development for the platform, I recognize this approach violated principles of transparency and consent in data collection. I apologize for this mistake.

November 27, 2024 at 1:09 PM

Elie

@eliebak.hf.co

We’re looking for an intern to join our SmolLM team! If you’re excited about training LLMs and building high-quality datasets, we’d love to hear from you. 🤗

US: apply.workable.com/huggingface/...
EMEA: apply.workable.com/huggingface/...

ML Research Engineer Internship, SmolLMs pretraining and datasets - EMEA Remote - Hugging Face

Here at Hugging Face, we’re on a journey to advance good Machine Learning and make it more accessible. Along the way, we contribute to the development of technology for the better.We have built the fa...

apply.workable.com

November 27, 2024 at 10:20 AM

Reposted by Elie

jsulz

@jsulz.com

On the Xet team at @huggingface.bsky.social we're always looking for ways to move bytes to computer near you as fast as possible.

To do this, we're redesigning the upload and download infrastructure on the Hub. This post describes how, check the thread for details 🧵

huggingface.co/blog/rearchi...

Rearchitecting Hugging Face Uploads and Downloads

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

November 26, 2024 at 5:39 PM

Elie

@eliebak.hf.co

The SmolLM series has a new member: say hi to SmolVLM! 🤏

It uses a preliminary 16k context version of SmolLM2 to tackle long-context vision documents and higher-res images.

And yes, we’re cooking up versions with bigger context lengths. 👨‍🍳

Try it yourself here: huggingface.co/spaces/Huggi...

November 26, 2024 at 4:47 PM

Reposted by Elie

merve

@merve.bsky.social

Small yet mighty! 💫

We are releasing SmolVLM: a new 2B small vision language made for on-device use, fine-tunable on consumer GPU, immensely memory efficient 🤠

We release three checkpoints under Apache 2.0: SmolVLM-Instruct, SmolVLM-Synthetic and SmolVLM-Base huggingface.co/collections/...

November 26, 2024 at 4:04 PM

Reposted by Elie

Andi

@andimara.bsky.social

Let's go! We are releasing SmolVLM, a smol 2B VLM built for on-device inference that outperforms all models at similar GPU RAM usage and tokens throughputs.

SmolVLM can be fine-tuned on a Google collab and be run on a laptop! Or process millions of documents with a consumer GPU!

November 26, 2024 at 3:57 PM

Reposted by Elie

Anton

@anton-l.bsky.social

Check out how easy it is to do LLM evals with LightEval!

* any dataset on the 🤗 Hub can become an eval task in a few lines of code: customize the prompt, metrics, parsing, few-shots, everything!
* model- and data-parallel inference
* auto batching with the new vLLM backend

A screenshot of LightEval benchmarking results in a terminal

November 25, 2024 at 5:24 PM

Reposted by Elie

Alexander Doria

@dorialexander.bsky.social

So first version of an ml anon starter pack. go.bsky.app/VgWL5L Kept half-anons (like me and Vic). Not all anime pfp, but generally drawn.

November 24, 2024 at 4:55 PM

Elie

@eliebak.hf.co

Hey babe, wake up, we just dropped a new SmolLM 🫡

Fully open-source. We’ll release a blog post soon to detail how we trained it. I'm also super excited about all the demos that will come in the next few days, especially looking forward for people to test it with entropix 🐸

October 31, 2024 at 7:35 PM

Reposted by Elie

Alexander Doria

@dorialexander.bsky.social

Since there is all this AI migration on Bluesky: my sister @pandorai1995.bsky.social is looking for an experience in VLM/LLM. She just wrote an amazing in-depth report on OCR by VLMs like Qwen/Florence on @huggingface.bsky.social [Repost appreciated]

Alexander Doria @dorialexander.bsky.social · Oct 19

Are Visual Language Models a game changer for OCR and the transcription of challenging texts? @pandorai1995.bsky.social just published a lengthy report on @huggingface.bsky.social with one main catch: it's complicated. huggingface.co/blog/PandorA...

October 27, 2024 at 5:33 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news