Lightnews — Scholar-powered news

Philipp Schmid

@philschmid.bsky.social

- Introduce DVTS, a new method of performance on larger compute budgets by maintaining solution diversity
- Using compute-optimal scaling, a Llama 3 3B outperforms 70B (22x larger) on mathematical reasoning tasks

December 17, 2024 at 7:30 AM

Philipp Schmid

@philschmid.bsky.social

- Process Reward Models (PRMs) played a crucial role in the search process by evaluating intermediate solution steps
- Different search strategies work better for different problem difficulties - beam search for harder problems, Best-of-N for simpler ones

December 17, 2024 at 7:30 AM

Philipp Schmid

@philschmid.bsky.social

- Test-time compute scaling offers an alternative to training larger models by allowing smaller models to "think longer"
- Explored Best-of-N sampling, beam search, and Diverse Verifier Tree Search (DVTS)
- Llama 3 1B achieved 55% accuracy on the MATH benchmark using optimal search strategies

December 17, 2024 at 7:30 AM

Philipp Schmid

@philschmid.bsky.social

By scaling test-time compute, smaller models can match or even surpass the performance of larger models. Llama 3.2 3B can outperform Llama 3.1 70B on MATH-500!🤯

December 17, 2024 at 7:30 AM

Philipp Schmid

@philschmid.bsky.social

How we implemented test-time computing for open models to solve complex math problems like OpenAI o1. 👀 Test-time compute methods use dynamic inference strategies to have LLMs “think longer” on harder problems, e.g. difficult math problems.

December 17, 2024 at 7:30 AM

Philipp Schmid

@philschmid.bsky.social

OpenAI trained a new Turbo model to make it easier and faster to use. With "storyboards" users get a CapCut/Tiktok/Reel-like text-to-video editor, that can be used to edit and create new short-form content! Social media will be flooded.🌊

December 9, 2024 at 6:41 PM

Philipp Schmid

@philschmid.bsky.social

A big day for AI and sad day for the EU. OpenAI releases Sora, their text-to-video model, with a dedicated UI Studio! Sora will be free for all ChatGPT Pro and Plus subscribers without additional cost. Sora will be available to later today, except if you live in the EU or UK. 🤯

December 9, 2024 at 6:41 PM

Philipp Schmid

@philschmid.bsky.social

First open-weights for OpenAI-o1-like reasoning model! QwQ from the Qwen team is a 32B model that beats OpenAI O1 mini and competes w/ O1 preview and is available under Apache 2.0 on Hugging Face! 🤯

November 28, 2024 at 8:01 AM

Philipp Schmid

@philschmid.bsky.social

🎥 Surprising video capabilities with 27.14% on CinePile
🔓 Released under Apache 2.0 on @huggingface.bsky.social
📱 Can run efficiently on laptops and edge devices

November 26, 2024 at 4:31 PM

Philipp Schmid

@philschmid.bsky.social

🚀 Smallest SOTA vision language model at only 2B parameters
🛠️ Released 3 variants with Base, Synthetic, and Instruct
💾 Requires only 5GB GPU RAM and achieves 38.8% on MMMU, 81.6% on DocVQA
⚡ 3.3-4.5x faster prefill and 7.5-16x faster generation vs Qwen2-VL

November 26, 2024 at 4:31 PM

Philipp Schmid

@philschmid.bsky.social

SmolLM can now see! 👀 Meet SmolVLM - a tiny 2B but powerful vision language model that runs on your device! Built on top of SmolLM and released under Apache 2.0. 🚀

November 26, 2024 at 4:31 PM

Philipp Schmid

@philschmid.bsky.social

- 📈 Full recovery on fine-tuning tasks (GSM8K, Evol-CodeAlpaca, Ultrachat-200K)
- ⚡ 1.4-2.1x better multi-query throughput
- 🌱 Pruned using 13B tokens training, 26 hours on 32 H100s
- 🔧 Optimized for NVIDIA Ampere GPUs and newer

November 26, 2024 at 8:24 AM

Philipp Schmid

@philschmid.bsky.social

- 🔄 98.4% original accuracy on on Open LLM Leaderboard v1 with 50% less parameters using 2:4 sparsity pattern
- 🚀 30% higher throughput and 1.8x lower latency with up to 5.0x when combined with quantization
- 💻 Works with 4-bit quantization (GPTQ) and Sparse-Marlin kernels

November 26, 2024 at 8:24 AM

Philipp Schmid

@philschmid.bsky.social

How far can we push LLM optimizations? Turns out, pretty far! A new study achieves 98% accuracy recovery on key benchmarks while removing 50% of Llama 3.1 8B's parameters using pruning. Pruning strategically to remove unnecessary connections in a neural network to make it smaller and faster. 👀

November 26, 2024 at 8:24 AM

Philipp Schmid

@philschmid.bsky.social

TIL: @huggingface.bsky.social Transformers has native Tensor Parallelism support for better inference on multiple GPUs! This will enable many benefits and optimizations in the future.🚀

For now, it supports Llama. Which one would you want to see next?

November 25, 2024 at 3:50 PM

Philipp Schmid

@philschmid.bsky.social

Created a visual for how function calling works. Wdyt? 🤔

November 25, 2024 at 11:34 AM

Philipp Schmid

@philschmid.bsky.social

Blog: blog.dottxt.co/say-what-you...
No-structured outputs can actually improve LLM performance when implemented correctly.

November 25, 2024 at 7:25 AM

Philipp Schmid

@philschmid.bsky.social

🎯 JSON generation reached 77% accuracy vs the paper's reported <10%
🔮 Examples in prompts should match the exact format expected in the actual tasks
🧰 Structured generation works best when implemented as "running our response parser as a generator"

November 25, 2024 at 7:25 AM

Philipp Schmid

@philschmid.bsky.social

🛠️ Key success criteria is to align your prompt, parser, and generator - it's not just about using JSON mode
📌 JSON generation requires careful prompt design, including specifying the desired schema.
📝 Good prompts should mimic information for human to understand the task and expected response format

November 25, 2024 at 7:25 AM

Philipp Schmid

@philschmid.bsky.social

📈 The "Let Me Speak Freely" poor results came from weak prompts and wrong use of structured prompting
📊 Structured outputs outperform unstructured on the test GSM8K: 0.78 vs 0.77, Last Letter: 0.77 vs 0.73, Shuffle Object: 0.44 vs 0.41

November 25, 2024 at 7:25 AM

Philipp Schmid

@philschmid.bsky.social

Does Structured Outputs hurt LLM performance? 🤔 The paper "Let Me Speak Freely" paper claimed that it does, but new experiments by @dottxtai.bsky.social (team behind outlines) show it doesn’t if you do it correctly! 👀

November 25, 2024 at 7:25 AM

Philipp Schmid

@philschmid.bsky.social

4️⃣ Reinforcement Learning on Verifiable Rewards (RLVR): RL (PPO) based optimization on specific skills like math and instruction following verifiable rewards, .e.g. math

November 24, 2024 at 10:09 AM

Philipp Schmid

@philschmid.bsky.social

3️⃣ Preference Tuning (DPO): Optimize models using Direct Preference Optimization with a mix of on-policy (SFT vs. other models) and off-policy data.

November 24, 2024 at 10:09 AM

Philipp Schmid

@philschmid.bsky.social

2️⃣ Supervised Finetuning (SFT): Train initial models on a carefully curated mix, iteratively refining the mix and aggressively decontaminating against evaluation datasets.

November 24, 2024 at 10:09 AM

Philipp Schmid

@philschmid.bsky.social

1️⃣ Data Curation: Collect a diverse mix of public datasets and synthetic data using persona-driven methods, focusing on core skills like reasoning, coding, and safety.

November 24, 2024 at 10:09 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news