kb
@keighbee.bsky.social
24 followers 150 following 10 posts
Machine Learning Engineer @ HuggingFace
Posts Media Videos Starter Packs
keighbee.bsky.social
🕯️🔥[Candle](github.com/huggingface/...) is now much faster on macOS thanks to a contribution by @EricLBuehler, which brings major speed improvements to the Metal backend.🍎📈
Try it out by running some of our examples with the `--features metal` flag.

#Candle #RustLang #macOS #Metal #HuggingFace
MLX, Llama.cpp, and Candle are performing about equally on an M3 Max now.
keighbee.bsky.social
I just published part 2 of my article series about creating tensors from scratch in Rust. This one is about view operations.
#tensors #machine-learning #ml #ai

Take a look here:
huggingface.co/blog/KeighBe...
Building Tensors from Scratch in Rust (Part 2): View Operations
A Blog post by Kyle Birnbaum on Hugging Face
huggingface.co
keighbee.bsky.social
The mixture of experts model is also an option:

```
cargo run --example qwen --features metal --release -- --prompt "Write a poem about butterflies. <think></think>." --model "3-moe-a3b"
```
keighbee.bsky.social
Qwen 3 is now supported in Candle!
Run the 3-4B model locally with:

```
cargo run --example qwen --release -- --model 3-4b --prompt 'The capital of France is '
```

On macOS, enable Metal for faster inference:

```
--features metal
```

Clone the repo and test it out. github.com/huggingface/...
GitHub - huggingface/candle: Minimalist ML framework for Rust
Minimalist ML framework for Rust. Contribute to huggingface/candle development by creating an account on GitHub.
github.com
Reposted by kb
danielvanstrien.bsky.social
RIFTS Dataset: Solving Critical LLM Conversation Failures

- LLMs 3x less likely to clarify than humans
- 16x less likely to provide follow-up requests
- Early failures predict later breakdowns
- Includes preliminary intervention strategies

huggingface.co/datasets/mic...
microsoft/rifts · Datasets at Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
huggingface.co
keighbee.bsky.social
Google just released Gemma 3, an open, on-device LLM with vision capabilities and support for over 140 different languages. Models range from 1B-27B parameters.

Zero-day support for multiple frameworks including transformers, MLX, llama.cpp, and more! 💼 🚀

Read more here:
huggingface.co/blog/gemma3
Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
huggingface.co
Reposted by kb
danielvanstrien.bsky.social
Made some significant updates to the @hf.co semantic datasets search app. If you love falling into a wiki black hole, you might like this...

huggingface.co/spaces/libra...
Reposted by kb
srushnlp.bsky.social
What to know about DeepSeek

youtu.be/0eMzc-WnBfQ?...

In which we attempt to figure out MoE, o1, scaling, tech reporting, modern semiconductors, microeconomics, and international geopolitics.
How DeepSeek Changes the LLM Story
YouTube video by Sasha Rush 🤗
youtu.be
Reposted by kb
ducha-aiki.bsky.social
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling

Hongzhi Huang, Defa Zhu, Banggu Wu, Yutao Zeng, Ya Wang, Qiyang Min, Xun Zhou

tl;dr: increasing input vocabulary is always good, increasing output vocabularies is good for bigger models.
arxiv.org/abs/2501.16975
Reposted by kb
sashamtl.bsky.social
It’s a green light for the Frugal AI Challenge! 🚀
For the next month, we invite all members of the AI community to participate in one of our 3 AI for Climate tasks, with the goal of developing a highly accurate model while consuming as little energy as possible ⚡
keighbee.bsky.social
We’ve got great examples of PyTorch to CoreML conversions in the Huggingface coreml-examples repo. Currently, there’s one tutorial, but more are coming soon. After converting, you can choose what compute units you want the model to run on!
GitHub - huggingface/coreml-examples: Swift Core ML Examples
Swift Core ML Examples. Contribute to huggingface/coreml-examples development by creating an account on GitHub.
github.com
Reposted by kb
cyrilzakka.bsky.social
Christmas came early! 🎅🏻 Today marks the newest release of the HuggingChat 🤗 update with some really exciting capabilities! First up, automatic context injection!

1) Open a file in a supported app, summon HFChat, and it pre-populates the context window. No more copy-pasting. /cc @hf.co
keighbee.bsky.social
Or, My laptop has a 72 Wh battery (~208,512 J assuming only 80% is usable). Running Llama3.2-1B would drain the battery after processing:

- CPU: 674,249 tokens (~518,653 words, ~7 novels)
- GPU: 2,799,550 tokens (~2,153,500 words, ~30 novels)
- ANE: 11,273,184 tokens (~8,671,679 words, ~123 novels)
keighbee.bsky.social
To put it in perspective: Llama3.2-1B uses ~280 GFLOPS per 20 tokens. My laptop (~2kg) running the model would be the energy equivalent of:

- CPU (6 J): dropping it from 1 foot (31 cm)
- GPU (1.4 J): dropping it from 3 inches (7 cm)
- ANE (0.3 J): dropping it by just half an inch (1.5 cm)!
keighbee.bsky.social
Preliminary data shows the Apple Neural Engine uses ~94% less energy than the CPU and ~75% less than the GPU 🤯

On the On-Device team at Hugging Face, we've been profiling energy usage for CoreML models. Here’s some data I collected:
Chart Title: Model Hardware vs Energy per GigaFLOP.
Vertical Axis: mJ/GFLOP(Log)
Horizontal Axis: Hardware Type(CPU, CPU + GPU, CPU + ANE)
CPU: min 6.9 1st quartile 11.7 median 13.4 3rd quartile 35.6 max 53.1
CPU + GPU: 4.6 4.6 4.7 6.2 9.6
CPU + ANE: 0.9 1.0	1.1 1.4 1.8