Lightnews — Scholar-powered news

Reposted by Edoardo Ponti

Digital Futures @digitaluom.bsky.social · Jun 9

Up next on stage, Dr. @edoardo-ponti.bsky.social ( @edinburgh-uni.bsky.social / NVIDIA)
🎤 “Adaptive Units of Computation: Towards Sublinear-Memory and Tokenizer-Free Foundation Models”

Fascinating glimpse into the next gen of foundation models.

#FoundationModels #NLP #TokenizerFree #ADSAI2025

1 1 2

Edoardo Ponti @edoardo-ponti.bsky.social · Jun 6

Thanks to the amazing collaborators Adrian Łańcucki, Konrad Staniszewski, and Piotr Nawrot!

It was amazing to spend a year at NVIDIA as a visiting professor!

arXiv: arxiv.org/pdf/2506.05345

Code and models coming soon!

Edoardo Ponti @edoardo-ponti.bsky.social · Jun 6

🏆 We evaluate inference-time hyper-scaling on DeepSeek R1-distilled models of different sizes, increasing accuracy on maths, science, and coding by up to 15 points for a given budget.

1

Edoardo Ponti @edoardo-ponti.bsky.social · Jun 6

💡The idea behind DMS is to *train* existing LLMs to evict tokens from the KV cache, while delaying the eviction some time after the decision.

This allows LLMs to preserve information while reducing latency and memory size.

1

Edoardo Ponti @edoardo-ponti.bsky.social · Jun 6

⚖️ The magic works only if accuracy is preserved even at high compression ratios.

Enter Dynamic Memory Sparsification (DMS), which achieves 8x KV cache compression with 1K training steps and retains accuracy better than SOTA methods.

1

Edoardo Ponti @edoardo-ponti.bsky.social · Jun 6

🚀 By *learning* to compress the KV cache in Transformer LLMs, we can generate more tokens for the same compute budget.

This unlocks *inference-time hyper-scaling*

For the same runtime or memory load, we can boost LLM accuracy by pushing reasoning even further!

1 3 5

Reposted by Edoardo Ponti

Emile van Krieken @emilevankrieken.com · May 21

We propose Neurosymbolic Diffusion Models! We find diffusion is especially compelling for neurosymbolic approaches, combining powerful multimodal understanding with symbolic reasoning 🚀

Read more 👇

4 27 92

Edoardo Ponti @edoardo-ponti.bsky.social · Apr 25

Code: github.com/PiotrNawrot/...

Paper: arxiv.org/abs/2504.17768

Thanks to the lead author, Piotr Nawrot, and all the amazing collaborators!

The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs

Sparse attention offers a promising strategy to extend long-context capabilities in Transformer LLMs, yet its viability, its efficiency-accuracy trade-offs, and systematic scaling studies remain unexp...

arxiv.org

4

Edoardo Ponti @edoardo-ponti.bsky.social · Apr 25

4) Finally, we introduce novel scaling laws for sparse attention and validate them on held-out results: evidence that our findings will likely hold true broadly.

Our insights demonstrate that sparse attention will play a key role in next-generation foundation models.

1 3

Edoardo Ponti @edoardo-ponti.bsky.social · Apr 25

3) There is no single best strategy across tasks and phases.

However, on average Verticals-Slashes for prefilling and Quest for decoding are the most competitive. Context-aware, and highly adaptive variants are preferable.

1 1

Edoardo Ponti @edoardo-ponti.bsky.social · Apr 25

2) Sparsity attainable while statistically guaranteeing accuracy preservation is higher during decoding ✍️ than prefilling 🧠, and correlates with model size in the former.

Importantly, for most settings there is at least one degraded task, even at moderate compressions (<5x).

1 1

Edoardo Ponti @edoardo-ponti.bsky.social · Apr 25

1) For very long sequences, *larger and highly sparse models* are preferable to small, dense ones for the same FLOPS budget.

This suggests a strategy shift where scaling up model size must be combined with sparse attention to achieve an optimal trade-off.

1 1

Edoardo Ponti @edoardo-ponti.bsky.social · Apr 25

Sparse attention is one of the most promising strategies to unlock long-context processing and long-generation reasoning in LLMs.

We performed the most comprehensive study on training-free sparse attention to date.

Here is what we found:

1 6 24

Reposted by Edoardo Ponti

Digital Futures @digitaluom.bsky.social · Apr 1

🚀 Excited to welcome Dr. @edoardo-ponti.bsky.social to #ADSAI2025! Lecturer in NLP @edinburghuni.bsky.social , Affiliated Lecturer @cambridgeuni.bsky.social & Visiting Prof NVIDIA.
🎟️ Tickets for Advances in Data Science & AI Conference 2025 are live!
🔗Secure your spot: tinyurl.com/yurknk7y
#AI

3 4

Reposted by Edoardo Ponti

Benjamin Minixhofer @bminixhofer.bsky.social · Apr 2

We created Approximate Likelihood Matching, a principled (and very effective) method for *cross-tokenizer distillation*!

With ALM, you can create ensembles of models from different families, convert existing subword-level models to byte-level and a bunch more🧵

Image illustrating that ALM can enable Ensembling, Transfer to Bytes, and general Cross-Tokenizer Distillation.

1 14 26

Edoardo Ponti @edoardo-ponti.bsky.social · Jan 31

I have a scholarship for a PhD in efficient memory and tokenization in LLM architectures at
@edinburgh-uni.bsky.social!

Eligibility: UK home fee status

Starting date: flexible, from July 2025 onwards.

informatics.ed.ac.uk/study-with-u...

Please contact me if you're interested!

4 6

Edoardo Ponti @edoardo-ponti.bsky.social · Jan 31

Code and models for Dynamic Memory Compression are finally available!

Stay tuned for architectures with even more efficient inference.

developer.nvidia.com/blog/dynamic...

Dynamic Memory Compression | NVIDIA Technical Blog

Despite the success of large language models (LLMs) as general-purpose AI tools, their high demand for computational resources make their deployment challenging in many real-world scenarios.

developer.nvidia.com

4

Edoardo Ponti @edoardo-ponti.bsky.social · Dec 22

We're hiring a lecturer or reader in embodied NLP at the University of Edinburgh!

Deadline: 31 Jan 2025
Call for applications: elxw.fa.em3.oraclecloud.com/hcmUI/Candid...

11 29

Edoardo Ponti @edoardo-ponti.bsky.social · Dec 20

What's in the future?
- Richer proxies for meaning, including a temporal dimension and internal agent states
- The study of grammaticalization under the lens of groundedness

We release an extensive dataset to support these studies: osf.io/bdhna/

A Grounded Typology of Word Classes

Hosted on the Open Science Framework

osf.io

3

Edoardo Ponti @edoardo-ponti.bsky.social · Dec 20

We focus on the groundedness of lexical classes and find that it
- follows a continuous cline cross-linguistically: nouns > adjectives > verbs
- is non-zero even for functional classes (e.g., adpositions)
- is contextual, so agrees with psycholinguistic norms only in part

1

Edoardo Ponti @edoardo-ponti.bsky.social · Dec 20

We leverage advances in multilingual and multimodal foundation models to quantify their surprisal for both form alone and form given function

Their difference (pointwise mutual information) corresponds to the groundedness of a word: the remaining surprisal once function is known

1

Edoardo Ponti @edoardo-ponti.bsky.social · Dec 20

**Grounded typology**: a new paradigm.

Traditionally, linguists posit functions to compare forms in different languages; however, these are aprioristic and partly arbitrary.

Instead, we resort to perceptual modalities (like vision) as measurable proxies for function.

Coleman Haley @colemanhaley.bsky.social · Dec 20

NEW PREPRINT!

Language is not just a formal system—it connects words to the world. But how do we measure this connection in a cross-linguistic, quantitative way?

🧵 Using multimodal models, we introduce a new approach: groundedness ⬇️

1 4

Edoardo Ponti @edoardo-ponti.bsky.social · Dec 12

Two considerations:

1) reusing / interpolating old token is reminiscent of our FOCUS baseline. Unfortunately it degrades performance as even identical tokens may change their function.

2) you incur a large overhead for calculating the co-occurrence matrix for every new tokenizer.

1

Edoardo Ponti @edoardo-ponti.bsky.social · Dec 12

Two amazing papers from my students at #NeurIPS today:

⛓️💥 Switch the vocabulary and embeddings of your LLM tokenizer zero-shot on the fly (@bminixhofer.bsky.social)
neurips.cc/virtual/2024...

🌊 Align your LLM gradient-free with spectral editing of activations (Yifu Qiu)
neurips.cc/virtual/2024...

2 8 46

Edoardo Ponti @edoardo-ponti.bsky.social · Dec 4

ML is the fox?

1 3