Lightnews — Scholar-powered news

Philipp Schmid

@philschmid.bsky.social

2.8K followers 320 following 75 posts

Tech Lead and LLMs at @huggingface 👨🏻‍💻 🤗 AWS ML Hero 🦸🏻 | Cloud & ML enthusiast | 📍Nuremberg | 🇩🇪 https://philschmid.de

Posts Replies Media Videos

Pinned

Philipp Schmid @philschmid.bsky.social · Nov 19

Hello, my name is Philipp. I am a Technical Lead at @huggingface.bsky.social, leading our partnerships with AWS, Google, Azure, or NVIDIA. 🧑🏻‍💻

I post about the AI News, Open Models, Interesting AI Paper Summaries, blog posts, and guides!

My is blog at www.philschmid.de

Make sure to follow! 🤗

Philschmid

Personal Blog of Philipp Schmid Technical Lead and LLM at Hugging Face. Learn how to use the latest AI and Cloud Technologies from fine-tuning LLMs with RLHF to deploying them in production.

www.philschmid.de

Philipp Schmid

@philschmid.bsky.social

How we implemented test-time computing for open models to solve complex math problems like OpenAI o1. 👀 Test-time compute methods use dynamic inference strategies to have LLMs “think longer” on harder problems, e.g. difficult math problems.

December 17, 2024 at 7:30 AM

Philipp Schmid

@philschmid.bsky.social

What is better than an LLM as a Judge? Right, an Agent as a Judge! Meta created an Agent-as-a-Judge to evaluate code agents to enable intermediate feedback alongside DevAI a new benchmark of 55 realistic development tasks.

Paper: huggingface.co/papers/2410....

Paper page - Agent-as-a-Judge: Evaluate Agents with Agents

Join the discussion on this paper page

huggingface.co

December 10, 2024 at 9:53 AM

Philipp Schmid

@philschmid.bsky.social

A big day for AI and sad day for the EU. OpenAI releases Sora, their text-to-video model, with a dedicated UI Studio! Sora will be free for all ChatGPT Pro and Plus subscribers without additional cost. Sora will be available to later today, except if you live in the EU or UK. 🤯

December 9, 2024 at 6:41 PM

Philipp Schmid

@philschmid.bsky.social

First open-weights for OpenAI-o1-like reasoning model! QwQ from the Qwen team is a 32B model that beats OpenAI O1 mini and competes w/ O1 preview and is available under Apache 2.0 on Hugging Face! 🤯

November 28, 2024 at 8:01 AM

Philipp Schmid

@philschmid.bsky.social

SmolLM can now see! 👀 Meet SmolVLM - a tiny 2B but powerful vision language model that runs on your device! Built on top of SmolLM and released under Apache 2.0. 🚀

November 26, 2024 at 4:31 PM

Philipp Schmid

@philschmid.bsky.social

How far can we push LLM optimizations? Turns out, pretty far! A new study achieves 98% accuracy recovery on key benchmarks while removing 50% of Llama 3.1 8B's parameters using pruning. Pruning strategically to remove unnecessary connections in a neural network to make it smaller and faster. 👀

November 26, 2024 at 8:24 AM

Philipp Schmid

@philschmid.bsky.social

TIL: @huggingface.bsky.social Transformers has native Tensor Parallelism support for better inference on multiple GPUs! This will enable many benefits and optimizations in the future.🚀

For now, it supports Llama. Which one would you want to see next?

November 25, 2024 at 3:50 PM

Philipp Schmid

@philschmid.bsky.social

Created a visual for how function calling works. Wdyt? 🤔

November 25, 2024 at 11:34 AM

Philipp Schmid

@philschmid.bsky.social

Does Structured Outputs hurt LLM performance? 🤔 The paper "Let Me Speak Freely" paper claimed that it does, but new experiments by @dottxtai.bsky.social (team behind outlines) show it doesn’t if you do it correctly! 👀

November 25, 2024 at 7:25 AM

Philipp Schmid

@philschmid.bsky.social

What is the latest in open-source post-training? Allen AI released Tülu last week, which includes models, all of the data, training recipes, code, infrastructure, and evaluation framework. Here are my insights! 👀

November 24, 2024 at 10:09 AM

Philipp Schmid

@philschmid.bsky.social

Open Source Post Training is going strong! In last 2 weeks, we got data or recipes released for OpenCoder, SmolLM-2, Orca Agent Instruct, and Tülu 3. Read it, learn, and iterate:

November 23, 2024 at 7:45 AM

Philipp Schmid

@philschmid.bsky.social

# 2024-11-22
SQLite is all you need! Big sqlite-vec update! 🚀 sqlite-vec is a plugin to support Vector Search in SQLite or LibSQL databases. v0.1.6 now allows storing non-vector data in vec0 virtual tables, enabling metadata conditioning and filtering! 🤯

November 22, 2024 at 4:37 PM

Philipp Schmid

@philschmid.bsky.social

Add your BSKY 🦋 to your @huggingface.bsky.social profile!

November 22, 2024 at 4:06 PM

Philipp Schmid

@philschmid.bsky.social

New small hybrid model from NVIDIA has been announced! Hymba is a 1.5B hybrid Mamba x Attention Model that outperforms other small LLMs like Meta 3.2 or SmolLM v2 being trained on only 1.5T Tokens. 🤯

November 22, 2024 at 5:42 AM

Philipp Schmid

@philschmid.bsky.social

🚀 Biggest open text dataset release of the year!
SmolTalk: a 1M sample synthetic dataset used to train SmolLM v2 is here! Available under Apache 2.0, it combines newly generated datasets + publicly available ones.
Here’s what you need to know 🧵👇

November 21, 2024 at 2:06 PM

Philipp Schmid

@philschmid.bsky.social

With the preview of deepseek R1 and results equal to OpenAI o1-preview, you might want to take look at "Stream of Search".

R1, "thoughts" are streamed, no MCTS is used during inference. They must have baked the "search" and "backtracking" directly into the model. huggingface.co/papers/2404....

Paper page - Stream of Search (SoS): Learning to Search in Language

Join the discussion on this paper page

huggingface.co

November 21, 2024 at 7:23 AM

Philipp Schmid

@philschmid.bsky.social

Mindblowing! 🤯 New reasoning model preview from Deepseek that matches OpenAI o1! 🐳 DeepSeek-R1-Lite-Preview is now live to test! 🧠

> o1-preview-level performance on AIME & MATH benchmarks.
> Access to CoT and transparent thought process in real-time.
> Open-source models & API coming soon!

November 20, 2024 at 12:04 PM

Philipp Schmid

@philschmid.bsky.social

Sage Attention the next Flash Attention? 🤔

> 3x speed up over Flash Attention2, maintaining 99% performance
> INT4/8 for Q and K matrices, and FP8/16 for P and V + smoothing methods for Q and V
> Drop-in replacement of torch scaled_dot_product_attention
> SageAttention 2 code to be released soon