Caleb Fahlgren
@calebfahlgren.hf.co
850 followers 140 following 28 posts
SWE @hf.co
Posts Media Videos Starter Packs
calebfahlgren.hf.co
You can just ask things 🗣️

"show me messages in the coding category that are in the top 10% of reward model scores"

Download really high quality instructions from the Argilla Llama3.1 405B synthetic dataset 🔥
Reposted by Caleb Fahlgren
thomwolf.bsky.social
Most liked and most downloaded open-source AI models from 2022 to 2024

Interactive viz: aiworld.eu/embed/model/...
Discussion: huggingface.co/spaces/huggi...
calebfahlgren.hf.co
It doesn't get easier than this. Why are you writing SQL by yourself when it's almost 2025
calebfahlgren.hf.co
The amazing, new Qwen2.5-Coder 32B model can now write SQL for any @hf.co dataset ✨
calebfahlgren.hf.co
This is insane! Structured generation in the browser with the new @hf.co SmolLM2-1.7B model

• Tiny 1.7B LLM running at 88 tokens / second ⚡
• Powered by MLC/WebLLM on WebGPU 🔥
• JSON Structured Generation entirely in the browser 🤏
Reposted by Caleb Fahlgren
thomwolf.bsky.social
Releasing SmolVLM, a small 2 billion parameters Vision+Language Model (VLM) built for on-device/in-browser inference with images/videos.

Outperforms all models at similar GPU RAM usage and tokens throughputs

Blog post: huggingface.co/blog/smolvlm
calebfahlgren.hf.co
I did it via

Settings > Account > Handle > I have my own domain

and it should show there!
calebfahlgren.hf.co
You can literally do the histogram in one line in less than 10 seconds 💨

> from histogram(train, "Average ⬆️")
calebfahlgren.hf.co
Here's what the model licenses look like:

Lots of great open licenses in there too! 💪
calebfahlgren.hf.co
The OpenLLM Leaderboard just passed 2k evals 🥳

Here's a look at the distribution of average scores for all those models!

Great work by the @huggingface.bsky.social team to do these evals!
calebfahlgren.hf.co
Let us know what you think or what you want to see :)

cc: @davidberenstein.bsky.social
calebfahlgren.hf.co
** log and get out of the way **
calebfahlgren.hf.co
using supabase theme, @tylerhillery.com would approve
calebfahlgren.hf.co
Automatically tracking all Ollama requests to a dataset with the new observers python library!

With just a few lines of code all your requests can be sent to @huggingface.bsky.social datasets for annotating, analysis and observability 🔭
calebfahlgren.hf.co
The main three stores are:
• DuckDB (local, SQL over traces)
• Hugging Face Datasets (dataset viewer, sql console)
• Argilla - annotation and filtering UI
calebfahlgren.hf.co
observers 🔭 - automatically log all OpenAI compatible requests to a dataset 💽

• supports any OpenAI compatible endpoint 💪
• supports @duckdb.org, @huggingface.bsky.social datasets and Argilla as stores

> pip install observers
calebfahlgren.hf.co
That’s okay, there are lots of incomplete and even snapshots. The UpVoteWeb reddit dataset is one that comes to mind.

Any data that is more accessible is a win :). My hub stats dataset is just a cron script as well haha

huggingface.co/datasets/Ope...
OpenCo7/UpVoteWeb · Datasets at Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
huggingface.co
calebfahlgren.hf.co
+1 and let me know if you need any help with it @tobilg.com would be nice to have the dataset viewer for it!
Reposted by Caleb Fahlgren
davidberenstein.bsky.social
Observers: A Lightweight SDK for AI Observability

TLDR;
- Track and record interactions with AI models
- Store observations in multiple backends @huggingface.bsky.social, @duckdb.org or Argilla
- Query and analyse your AI interactions with ease

GitHub:
github.com/cfahlgren1/o...
lightweight SDK for AI observability