Lightnews — Scholar-powered news

kb @keighbee.bsky.social · Jul 21

🕯️🔥[Candle](github.com/huggingface/...) is now much faster on macOS thanks to a contribution by @EricLBuehler, which brings major speed improvements to the Metal backend.🍎📈
Try it out by running some of our examples with the `--features metal` flag.

#Candle #RustLang #macOS #Metal #HuggingFace

MLX, Llama.cpp, and Candle are performing about equally on an M3 Max now.

1 2

kb @keighbee.bsky.social · Jun 18

I just published part 2 of my article series about creating tensors from scratch in Rust. This one is about view operations.
#tensors #machine-learning #ml #ai

Take a look here:
huggingface.co/blog/KeighBe...

Building Tensors from Scratch in Rust (Part 2): View Operations

A Blog post by Kyle Birnbaum on Hugging Face

huggingface.co

1 1 2

kb @keighbee.bsky.social · Jun 12

I'm writing an article series about creating tensors from scratch in Rust. #tensors #machine-learning #ml #ai

huggingface.co/blog/KeighBe...

Building Tensors From Scratch in Rust: Part 1, Core Structure and Indexing

A Blog post by Kyle Birnbaum on Hugging Face

huggingface.co

3 5

kb @keighbee.bsky.social · May 30

The mixture of experts model is also an option:

```
cargo run --example qwen --features metal --release -- --prompt "Write a poem about butterflies. <think></think>." --model "3-moe-a3b"
```

kb @keighbee.bsky.social · May 30

Qwen 3 is now supported in Candle!
Run the 3-4B model locally with:

```
cargo run --example qwen --release -- --model 3-4b --prompt 'The capital of France is '
```

On macOS, enable Metal for faster inference:

```
--features metal
```

Clone the repo and test it out. github.com/huggingface/...

GitHub - huggingface/candle: Minimalist ML framework for Rust

Minimalist ML framework for Rust. Contribute to huggingface/candle development by creating an account on GitHub.

github.com

1

Reposted by kb

Daniel van Strien @danielvanstrien.bsky.social · Mar 21

RIFTS Dataset: Solving Critical LLM Conversation Failures

- LLMs 3x less likely to clarify than humans
- 16x less likely to provide follow-up requests
- Early failures predict later breakdowns
- Includes preliminary intervention strategies

huggingface.co/datasets/mic...

microsoft/rifts · Datasets at Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

1 3 11

kb @keighbee.bsky.social · Mar 12

Google just released Gemma 3, an open, on-device LLM with vision capabilities and support for over 140 different languages. Models range from 1B-27B parameters.

Zero-day support for multiple frameworks including transformers, MLX, llama.cpp, and more! 💼 🚀

Read more here:
huggingface.co/blog/gemma3

Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

1 3

Reposted by kb

Parisa Rashidi @parisarashidi.bsky.social · Feb 13

LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!

Large reasoning models (LRMs) tackle complex reasoning problems by following long chain-of-thoughts (Long CoT) that incorporate reflection, backtracking, and self-validation. However, the training tec...

arxiv.org

1 4

Reposted by kb

Daniel van Strien @danielvanstrien.bsky.social · Feb 13

Made some significant updates to the @hf.co semantic datasets search app. If you love falling into a wiki black hole, you might like this...

huggingface.co/spaces/libra...

5 9

Reposted by kb

Sasha Rush @srushnlp.bsky.social · Feb 4

What to know about DeepSeek

youtu.be/0eMzc-WnBfQ?...

In which we attempt to figure out MoE, o1, scaling, tech reporting, modern semiconductors, microeconomics, and international geopolitics.

How DeepSeek Changes the LLM Story

YouTube video by Sasha Rush 🤗

youtu.be

1 13 95

Reposted by kb

Dmytro Mishkin @ducha-aiki.bsky.social · Feb 5

Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling

Hongzhi Huang, Defa Zhu, Banggu Wu, Yutao Zeng, Ya Wang, Qiyang Min, Xun Zhou

tl;dr: increasing input vocabulary is always good, increasing output vocabularies is good for bigger models.
arxiv.org/abs/2501.16975

2 6

Reposted by kb

Dr Sasha Luccioni @sashamtl.bsky.social · Jan 6

It’s a green light for the Frugal AI Challenge! 🚀
For the next month, we invite all members of the AI community to participate in one of our 3 AI for Climate tasks, with the goal of developing a highly accurate model while consuming as little energy as possible ⚡

2 11 23

kb @keighbee.bsky.social · Dec 12

We’ve got great examples of PyTorch to CoreML conversions in the Huggingface coreml-examples repo. Currently, there’s one tutorial, but more are coming soon. After converting, you can choose what compute units you want the model to run on!

GitHub - huggingface/coreml-examples: Swift Core ML Examples

Swift Core ML Examples. Contribute to huggingface/coreml-examples development by creating an account on GitHub.

github.com

Reposted by kb

Cyril Zakka, MD @cyrilzakka.bsky.social · Dec 9

Christmas came early! 🎅🏻 Today marks the newest release of the HuggingChat 🤗 update with some really exciting capabilities! First up, automatic context injection!

1) Open a file in a supported app, summon HFChat, and it pre-populates the context window. No more copy-pasting. /cc @hf.co

1 2 11

kb @keighbee.bsky.social · Dec 5

Or, My laptop has a 72 Wh battery (~208,512 J assuming only 80% is usable). Running Llama3.2-1B would drain the battery after processing:

- CPU: 674,249 tokens (~518,653 words, ~7 novels)
- GPU: 2,799,550 tokens (~2,153,500 words, ~30 novels)
- ANE: 11,273,184 tokens (~8,671,679 words, ~123 novels)

2

kb @keighbee.bsky.social · Dec 5

To put it in perspective: Llama3.2-1B uses ~280 GFLOPS per 20 tokens. My laptop (~2kg) running the model would be the energy equivalent of:

- CPU (6 J): dropping it from 1 foot (31 cm)
- GPU (1.4 J): dropping it from 3 inches (7 cm)
- ANE (0.3 J): dropping it by just half an inch (1.5 cm)!

1 2

kb @keighbee.bsky.social · Dec 5

Preliminary data shows the Apple Neural Engine uses ~94% less energy than the CPU and ~75% less than the GPU 🤯

On the On-Device team at Hugging Face, we've been profiling energy usage for CoreML models. Here’s some data I collected:

Chart Title: Model Hardware vs Energy per GigaFLOP.
Vertical Axis: mJ/GFLOP(Log)
Horizontal Axis: Hardware Type(CPU, CPU + GPU, CPU + ANE)
CPU: min 6.9 1st quartile 11.7 median 13.4 3rd quartile 35.6 max 53.1
CPU + GPU: 4.6 4.6 4.7 6.2 9.6
CPU + ANE: 0.9 1.0 1.1 1.4 1.8

2 1 4