Caleb Fahlgren
@calebfahlgren.hf.co
850 followers 140 following 28 posts
SWE @hf.co
Posts Media Videos Starter Packs
You can just ask things 🗣️

"show me messages in the coding category that are in the top 10% of reward model scores"

Download really high quality instructions from the Argilla Llama3.1 405B synthetic dataset 🔥
Reposted by Caleb Fahlgren
Most liked and most downloaded open-source AI models from 2022 to 2024

Interactive viz: aiworld.eu/embed/model/...
Discussion: huggingface.co/spaces/huggi...
It doesn't get easier than this. Why are you writing SQL by yourself when it's almost 2025
The amazing, new Qwen2.5-Coder 32B model can now write SQL for any @hf.co dataset ✨
This is insane! Structured generation in the browser with the new @hf.co SmolLM2-1.7B model

• Tiny 1.7B LLM running at 88 tokens / second ⚡
• Powered by MLC/WebLLM on WebGPU 🔥
• JSON Structured Generation entirely in the browser 🤏
Reposted by Caleb Fahlgren
Releasing SmolVLM, a small 2 billion parameters Vision+Language Model (VLM) built for on-device/in-browser inference with images/videos.

Outperforms all models at similar GPU RAM usage and tokens throughputs

Blog post: huggingface.co/blog/smolvlm
I did it via

Settings > Account > Handle > I have my own domain

and it should show there!
You can literally do the histogram in one line in less than 10 seconds 💨

> from histogram(train, "Average ⬆️")
Here's what the model licenses look like:

Lots of great open licenses in there too! 💪
The OpenLLM Leaderboard just passed 2k evals 🥳

Here's a look at the distribution of average scores for all those models!

Great work by the @huggingface.bsky.social team to do these evals!
Let us know what you think or what you want to see :)

cc: @davidberenstein.bsky.social
** log and get out of the way **
using supabase theme, @tylerhillery.com would approve
Automatically tracking all Ollama requests to a dataset with the new observers python library!

With just a few lines of code all your requests can be sent to @huggingface.bsky.social datasets for annotating, analysis and observability 🔭
The main three stores are:
• DuckDB (local, SQL over traces)
• Hugging Face Datasets (dataset viewer, sql console)
• Argilla - annotation and filtering UI
observers 🔭 - automatically log all OpenAI compatible requests to a dataset 💽

• supports any OpenAI compatible endpoint 💪
• supports @duckdb.org, @huggingface.bsky.social datasets and Argilla as stores

> pip install observers
That’s okay, there are lots of incomplete and even snapshots. The UpVoteWeb reddit dataset is one that comes to mind.

Any data that is more accessible is a win :). My hub stats dataset is just a cron script as well haha

huggingface.co/datasets/Ope...
OpenCo7/UpVoteWeb · Datasets at Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
huggingface.co
+1 and let me know if you need any help with it @tobilg.com would be nice to have the dataset viewer for it!
Reposted by Caleb Fahlgren
Observers: A Lightweight SDK for AI Observability

TLDR;
- Track and record interactions with AI models
- Store observations in multiple backends @huggingface.bsky.social, @duckdb.org or Argilla
- Query and analyse your AI interactions with ease

GitHub:
github.com/cfahlgren1/o...
lightweight SDK for AI observability