Lightnews — Scholar-powered news

Martin Genzel

@martingenzel.bsky.social

Posts Replies Media Videos

Martin Genzel

@martingenzel.bsky.social

Image sources:
Llama: pixabay.com/illustration...
Slider Tool: squoosh.app

June 26, 2025 at 3:24 PM

Martin Genzel

@martingenzel.bsky.social

👏 Big shout out to all co-authors for an amazing collab: @pputzky.bsky.social, Pengfei Zhao, Sebastian Schulze, Mattes Mollenhauer, Robert Seidel, Stefan Dietzel, and Thomas Wollmann

📄 Paper: arxiv.org/abs/2502.01717
🧑‍💻 Code: github.com/merantix-mom...
🤗 Models: huggingface.co/collections/...

Choose Your Model Size: Any Compression by a Single Gradient Descent

The adoption of Foundation Models in resource-constrained environments remains challenging due to their large size and inference costs. A promising way to overcome these limitations is post-training c...

arxiv.org

June 26, 2025 at 3:24 PM

Martin Genzel

@martingenzel.bsky.social

You can find many more details and experiments in the paper 📄
Our work has been accepted at the ES-FoMo Workshop at @icmlconf.bsky.social 2025 🥳 So if you’re in Vancouver this year, let’s meet and discuss 🤝
Workshop: es-fomo.com

June 26, 2025 at 3:24 PM

Martin Genzel

@martingenzel.bsky.social

The pruning order allows us to estimate the global importance of all target singular values. This gives rise to a score map that is used to implement an independent compression stage, where a user can flexibly create a model of any size without re-computation or re-calibration.

June 26, 2025 at 3:24 PM

Martin Genzel

@martingenzel.bsky.social

The key idea of ACIP is to decouple an optimization-based pruning stage (calibration) from the actual compression stage. To ensure parameter-efficient pruning, we use low-rank factorizations and L1-regularization to iteratively eliminate singular values of large linear layers.

June 26, 2025 at 3:24 PM

Martin Genzel

@martingenzel.bsky.social

To achieve this, we introduce Any Compression via Iterative Pruning (ACIP). This novel algorithm allows you to determine the entire compression-performance trade-off from a single gradient-descent run, enabling any target size for the model without re-computation.

June 26, 2025 at 3:24 PM

Martin Genzel

@martingenzel.bsky.social

Turning the workflow around, we advocate for Any Compression: Perform a single, upfront computational step that then empowers users to generate a model at any desired size in real-time, without extra cost. In other words, you get a slider like in image compression 🎚️

June 26, 2025 at 3:24 PM

Martin Genzel

@martingenzel.bsky.social

The conventional process with existing methods can be inefficient: You can typically pick one of a few preset target sizes, run a costly computation (calibration), and then must repeat the entire process for every new compression rate you want to test.

June 26, 2025 at 3:24 PM

Martin Genzel

@martingenzel.bsky.social

Post-training compression is an effective way to make LLMs more accessible, but it creates a fundamental trade-off between size and performance. Unfortunately, the process can feel like a black box for users, requiring expertise and trial & error to find an acceptable setup.

June 26, 2025 at 3:24 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news