Lightnews — Scholar-powered news

Ben Burtenshaw

@benburtenshaw.bsky.social

4.3K followers 210 following 190 posts

Building tools for AI datasets. 😽 Looking in AI datasets. 🙀 Sharing clean open AI datasets. 😻 at https://bsky.app/profile/hf.co

bsky.app

Posts Media Videos Starter Packs

Pinned

Ben Burtenshaw @benburtenshaw.bsky.social · Dec 3

For anyone interested in fine-tuning or aligning LLMs, I’m running this free and open course called smol course. It’s not a big deal, it’s just smol.

🧵>>

9 64 330

Reposted by Ben Burtenshaw

Margaret Mitchell @mmitchell.bsky.social · 22d

🤖 As AI-generated content is shared in movies/TV/across the web, there's one simple low-hanging fruit 🍇 to help know what's real: Visible watermarks. With others @hf.co, I've made sure it's trivially easy to add this disclosure to images, video, chatbot text. See how:
huggingface.co/blog/waterma...

3 6 32

Reposted by Ben Burtenshaw

kb @keighbee.bsky.social · Jun 12

I'm writing an article series about creating tensors from scratch in Rust. #tensors #machine-learning #ml #ai

huggingface.co/blog/KeighBe...

Building Tensors From Scratch in Rust: Part 1, Core Structure and Indexing

A Blog post by Kyle Birnbaum on Hugging Face

huggingface.co

3 5

Reposted by Ben Burtenshaw

Leshem (Legend) Choshen @ICML @ACL @lchoshen.bsky.social · Mar 26

AI doesn’t get your culture?❌ butchers your language? 😤
With FeeL – you can fix that🛠️🌍

💬 Talk to AI in your language
✏️ Correct its mistakes
👁‍🗨 Watch it improve
The more we use it, the smarter it gets for everyone!

👉 Try it now: huggingface.co/spaces/feel-...

👶🤖📈
#ai #genAI #llm

Feel - a Hugging Face Space by feel-fl

Discover amazing ML apps made by the community

huggingface.co

1 2 7

Reposted by Ben Burtenshaw

Lucie-Aimée Kaffee @frimelle.bsky.social · Feb 12

How should AI tools be designed to support rather than replace workers?

At the Reshaping Work conference, I led a roundtable exploring AI’s impact on labor. We published a blogpost on our key takeaways on responsible AI and the future of work w/ Franco Bastida
🔗 www.rsm.nl/discovery/20...
🧵👇

Start-Up Approaches to Responsible AI: Worker-Centric InnovationRotterdam school of Management, Erasmus University logoRotterdam school of Management, Erasmus University compact logo

Explore how start-ups are reshaping AI development through transparency, worker inclusivity, and ethical approaches that prioritise human augmentation over replacement.

www.rsm.nl

1 3 4

Ben Burtenshaw @benburtenshaw.bsky.social · Jan 28

I've put together some of the handier tools for building courses and educational material on the @huggingface hub.

These should bootstrap you projects with quizzes, friendly sized model, usefule datasets, and informative spaces.

Let me know if you use or need more.

https://buff.ly/42qyanw

3 8

Ben Burtenshaw @benburtenshaw.bsky.social · Jan 27

The science team at Hugging Face reproduced and open source the seek r1. https://buff.ly/4jtbp8x

GitHub - huggingface/open-r1: Fully open reproduction of DeepSeek-R1

Fully open reproduction of DeepSeek-R1. Contribute to huggingface/open-r1 development by creating an account on GitHub.

buff.ly

1 6 33

Ben Burtenshaw @benburtenshaw.bsky.social · Jan 27

Manic few days in open source AI, with game changing development all over the place. Here's a round up of the resources:

Here's a thread on it all:

1 1 8

Ben Burtenshaw @benburtenshaw.bsky.social · Jan 24

quiz app https://buff.ly/4atPzxo
dataset with questions https://buff.ly/3ClY9Sm
agents course we're working on https://buff.ly/4gehzqi

Dataset Quiz - a Hugging Face Space by burtenshaw

A quiz app for rows of a dataset

buff.ly

Ben Burtenshaw @benburtenshaw.bsky.social · Jan 24

Here's how it works:

- make a dataset of multiple choice questions
- duplicate the space add set the dataset repo
- log in and do the quiz
- submit the questions to create a new dataset

I made this to get ready for the agents course, but I hope it's useful for you projects too!

Ben Burtenshaw @benburtenshaw.bsky.social · Jan 24

Teachers and Students! Here's a handy quiz app if you're preparing your own study material.

TLDR, It's a quiz that uses a dataset to make questions and save answers.

2 1 2

Ben Burtenshaw @benburtenshaw.bsky.social · Jan 23

If you need long context for RAG, tool use, agents, or just because, Nvidia released a new library to make it super simple.

TLDR: You can get 128k context at 50% less memory 🐳

Here's a blog post on everything:

Mastering Long Contexts in LLMs with KVPress

A Blog post by NVIDIA on Hugging Face

buff.ly

Reposted by Ben Burtenshaw

Adina Yakup @adinayakup.bsky.social · Jan 21

What happened yesterday in the Chinese AI community? 🚀
huggingface.co/posts/AdinaY...

3 8

Ben Burtenshaw @benburtenshaw.bsky.social · Jan 20

Deepseek just dropped a frontier reasoning model on the hub. It's 685 billion parameters of bleeding edge performance on COMPLEX tasks.

Who's considering this for synthetic datasets, distillation, or pruning?

2 2 15

Ben Burtenshaw @benburtenshaw.bsky.social · Jan 17

Here's a blog post I wrote with the details https://buff.ly/4gVpudi

Gradio spaces are the perfect agent tools\!

A Blog post by ben burtenshaw on Hugging Face

huggingface.co

Ben Burtenshaw @benburtenshaw.bsky.social · Jan 17

Playing around with AI agents, and I reckon Gradio spaces on the hub make the perfect tools.

- super easy to connect your agents to a bunch of useful tools and apps.
- find a Space you like on Hugging Face Hub or make your own with Gradio.
- link it up with smolagents.

🧵

Gradio And Llm Agents

A Step-by-Step Gradio Tutorial

www.gradio.app

1 3 9

Reposted by Ben Burtenshaw

Thomas Simonini @thomassimonini.bsky.social · Jan 15

We’re launching a FREE course on LLM Agents 🥳

📖 Learn what Agents are
🕵️ Build your own Agents using the latest libraries and tools.
🎓 Earn a certificate of completion to showcase your achievement.

Enroll now 👉 huggingface.us17.list-manage.com/subscribe?u=...

15 58

Ben Burtenshaw @benburtenshaw.bsky.social · Jan 15

Here's a collection with tools for:

- create a plotly visualisation
- get travel duration
- transcribe youtube video
- transform image

https://buff.ly/3PAU6od

Tools 4 Agents - a burtenshaw Collection

This is a collection of spaces on the hub that are useful for building agents. https://huggingface.co/docs/smolagents/en/tutorials/tools

buff.ly

Ben Burtenshaw @benburtenshaw.bsky.social · Jan 15

These should setup a few cool agent application, but if not it's easy to build a tool within a gradio application. Here's a guide:

https://buff.ly/3Wm2ZG1

Tools

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

buff.ly

1 7

Ben Burtenshaw @benburtenshaw.bsky.social · Jan 15

Agents need tools and the Hugging Face hub is full of them. You can use Gradio spaces on the hub as agent tools. I created a short list that I tried out and made. Here's an overview

🧵

1 9

Ben Burtenshaw @benburtenshaw.bsky.social · Jan 13

Great deep dive blog post on Agents, covering all the fundamentals from the ground up.

@mmitchell.bsky.social @sashamtl.bsky.social @giadapistilli.com @evijit.io

huggingface.co/blog/ethics-...

AI Agents Are Here. What Now?

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

1 20

Ben Burtenshaw @benburtenshaw.bsky.social · Jan 13

Here's the chapter on agent in smol course: https://buff.ly/3Cf5NOf

smol-course/8_agents at main · huggingface/smol-course

A course on aligning smol models. Contribute to huggingface/smol-course development by creating an account on GitHub.

github.com

1 12

Ben Burtenshaw @benburtenshaw.bsky.social · Jan 13

Free course on Agents by Hugging Face. We just added a chapter to smol course on agents. Naturally, using smolagents! The course cover these topics:

- Code agents
- Retrieval agents
- Custom functional

If you're building agent applications, this course should help.

1 9 32

Ben Burtenshaw @benburtenshaw.bsky.social · Jan 10

If you're looking for real talk from experience, check out this blogpost on emissions from generative ai models:

huggingface.co/blog/leaderb...

CO₂ Emissions and Models Performance: Insights from the Open LLM Leaderboard

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

3 9

Ben Burtenshaw @benburtenshaw.bsky.social · Dec 27

❓What we need now?
Most of use aren't building systems to solve frontier math problems on a daily basis. Shucks! That means we need reward models and representative datasets that represent the kinds of problems we're trying to solve. Crucially, in the domains and languages we're actually working!

2 2

Ben Burtenshaw @benburtenshaw.bsky.social · Dec 27

⏩ What does it mean for us builders?
As these approaches develop, we can use small models on our use cases, and increase inference for challenging domain specific tasks. This means that for most tasks models need minimal compute, but for complex tasks we'll scale up compute.