Martin Genzel
banner
martingenzel.bsky.social
Martin Genzel
@martingenzel.bsky.social
Staff Machine Learning Researcher @MerantixMomentum | Applied Mathematician | Interested in Deep Learning, LLMs, Tabular & Time-Series Data | 🌐 martingenzel.com | GH martin-genzel | Berlin-based
Image sources:
Llama: pixabay.com/illustration...
Slider Tool: squoosh.app
June 26, 2025 at 3:24 PM
👏 Big shout out to all co-authors for an amazing collab: @pputzky.bsky.social, Pengfei Zhao, Sebastian Schulze, Mattes Mollenhauer, Robert Seidel, Stefan Dietzel, and Thomas Wollmann

📄 Paper: arxiv.org/abs/2502.01717
🧑‍💻 Code: github.com/merantix-mom...
🤗 Models: huggingface.co/collections/...
Choose Your Model Size: Any Compression by a Single Gradient Descent
The adoption of Foundation Models in resource-constrained environments remains challenging due to their large size and inference costs. A promising way to overcome these limitations is post-training c...
arxiv.org
June 26, 2025 at 3:24 PM
You can find many more details and experiments in the paper 📄
Our work has been accepted at the ES-FoMo Workshop at @icmlconf.bsky.social 2025 🥳 So if you’re in Vancouver this year, let’s meet and discuss 🤝
Workshop: es-fomo.com
June 26, 2025 at 3:24 PM
The pruning order allows us to estimate the global importance of all target singular values. This gives rise to a score map that is used to implement an independent compression stage, where a user can flexibly create a model of any size without re-computation or re-calibration.
June 26, 2025 at 3:24 PM
The key idea of ACIP is to decouple an optimization-based pruning stage (calibration) from the actual compression stage. To ensure parameter-efficient pruning, we use low-rank factorizations and L1-regularization to iteratively eliminate singular values of large linear layers.
June 26, 2025 at 3:24 PM
To achieve this, we introduce Any Compression via Iterative Pruning (ACIP). This novel algorithm allows you to determine the entire compression-performance trade-off from a single gradient-descent run, enabling any target size for the model without re-computation.
June 26, 2025 at 3:24 PM
Turning the workflow around, we advocate for Any Compression: Perform a single, upfront computational step that then empowers users to generate a model at any desired size in real-time, without extra cost. In other words, you get a slider like in image compression 🎚️
June 26, 2025 at 3:24 PM
The conventional process with existing methods can be inefficient: You can typically pick one of a few preset target sizes, run a costly computation (calibration), and then must repeat the entire process for every new compression rate you want to test.
June 26, 2025 at 3:24 PM
Post-training compression is an effective way to make LLMs more accessible, but it creates a fundamental trade-off between size and performance. Unfortunately, the process can feel like a black box for users, requiring expertise and trial & error to find an acceptable setup.
June 26, 2025 at 3:24 PM