Martin Gubri
@mgubri.bsky.social
110 followers 410 following 31 posts
Research Lead @parameterlab.bsky.social working on Trustworthy AI Speaking 🇫🇷, English and 🇨🇱 Spanish | Living in Tübingen 🇩🇪 | he/him https://gubri.eu
Posts Media Videos Starter Packs
mgubri.bsky.social
They found the universal intro for all papers:
<insert name> should be correct. But in reality, that is rarely true.
mgubri.bsky.social
Thanks a lot Guillaume :)
mgubri.bsky.social
🎉 Delighted to announce that our 🫗Leaky Thoughts paper about contextual privacy with reasoning models is accepted to #EMNLP main!
Huge congrats to the amazing team Tommaso Green, Haritz Puerto @coallaoh.bsky.social @oodgnas.bsky.social
parameterlab.bsky.social
🫗 An LLM's "private" reasoning may leak your sensitive data!

🎉 Excited to share our paper "Leaky Thoughts: Large Reasoning Models Are Not Private Thinkers" was accepted at #EMNLP main!

1/2
Overall diagram about contextual privacy & LRMs
Reposted by Martin Gubri
elisabethbik.bsky.social
Fantastic new paper by @reeserichardson.bsky.social et al.

An enormous amount of work showing the extent of coordinated scientific fraud and involvement of some editors.
The number of fraudulent publications grows at a rate far outpacing that of legitimate science.
www.pnas.org/doi/10.1073/...
mgubri.bsky.social
I agree that there is a gap in the nb of params that a high-end device and a cheap one can run. I guess that "common consumer device" means a mid-range one. But I totally agree that they should specify the type of device: a mobile phone is quite different from a desktop computer.
mgubri.bsky.social
My pleasure! Yes, I guess so. I agree that a moving definition can be quite annoying for research. At the same time, I think it is not specific to LMs: a large file, an heavy software, etc. 15 years ago that required a lot of resources back then, is probably be quite small for today's hardware.
mgubri.bsky.social
There are more details in Appendix A.
Def SLM detailed
mgubri.bsky.social
This NVIDIA position paper has a clear definition of an SLM: arxiv.org/abs/2506.02153
They consider <10B.
Personally, I would not consider 13B models to be SLMs (not even 7B). They require quite a lot of resources without using aggressive efficient inference techniques (like 4 bits quantization).
SLM defintion
mgubri.bsky.social
This has been explored quite a lot for the task of jailbreaking an LLM (ie, adversarial examples against LLM alignment). For examples:
- arxiv.org/abs/2310.08419
- arxiv.org/abs/2312.02119
- arxiv.org/abs/2502.01633
mgubri.bsky.social
📢 New paper out: Does SEO work for LLM-based conversational search?

We introduce C-SEO Bench, a benchmark to test if conversational SEO methods actually help.
Our finding? They don't. But traditional SEO still works because LLMs favour content already ranked higher in the prompt.
parameterlab.bsky.social
🔎Does Conversational SEO actually work? Our new benchmark has an answer!
Excited to announce our new paper: C-SEO Bench: Does Conversational SEO Work?

🌐 RTAI: researchtrend.ai/papers/2506....
📄 Paper: arxiv.org/abs/2506.11097
💻 Code: github.com/parameterlab...
📊 Data: huggingface.co/datasets/par...
Paper thumbnail.
mgubri.bsky.social
The mood on a Friday evening
Meme: 'EMNLP' crashing in 'The week-end after NeurIPS deadline'
Reposted by Martin Gubri
parameterlab.bsky.social
Excited to share that our paper "Scaling Up Membership Inference: When and How Attacks Succeed on LLMs" will be presented next week at #NAACL2025!
🖼️ Catch us at Poster Session 8 - APP: NLP Applications
🗓️ May 2, 11:00 AM - 12:30 PM
🗺️ Hall 3
Hope to see you there!
mgubri.bsky.social
📄 Excited to share our latest paper on the scale required for successful membership inference in LLMs! We investigate a continuum from single sentences to large document collections. Huge thanks to an incredible team: Haritz Puerto, @coallaoh.bsky.social and @oodgnas.bsky.social!
Main figure of the paper
mgubri.bsky.social
The authors show that LLMs often give opposite answers when forced to answer vs. when not (e.g., open-ended generation). And similarly, the conclusions are highly unstable with prompting.
mgubri.bsky.social
A Bluesky filter to recommend only posts about papers from your followers. This is what I was missing to use Bluesky!
nkgarg.bsky.social
*Please repost* @sjgreenwood.bsky.social and I just launched a new personalized feed (*please pin*) that we hope will become a "must use" for #academicsky. The feed shows posts about papers filtered by *your* follower network. It's become my default Bluesky experience bsky.app/profile/pape...
mgubri.bsky.social
I am pleased to announce that our paper on the scale of LLM membership inference from @parameterlab.bsky.social has been accepted for publication at #NAACL2025 as Findings!
mgubri.bsky.social
📄 Excited to share our latest paper on the scale required for successful membership inference in LLMs! We investigate a continuum from single sentences to large document collections. Huge thanks to an incredible team: Haritz Puerto, @coallaoh.bsky.social and @oodgnas.bsky.social!
Main figure of the paper
mgubri.bsky.social
Congrats Michael! 👏🎉
Will you stay in Paris?
Reposted by Martin Gubri
parameterlab.bsky.social
🎉We’re pleased to share the release of the models from our Apricot🍑 paper, accepted at ACL 2024!
At Parameter Lab, we believe openness and reproducibility are essential for advancing science, and we've put in our best effort to ensure it.
🤗 huggingface.co/collections/...
🧵 bsky.app/profile/dnns...
mgubri.bsky.social
📄 Excited to share our latest paper on the scale required for successful membership inference in LLMs! We investigate a continuum from single sentences to large document collections. Huge thanks to an incredible team: Haritz Puerto, @coallaoh.bsky.social and @oodgnas.bsky.social!
Main figure of the paper
mgubri.bsky.social
Congrats Guillaume! 👏
mgubri.bsky.social
Have a look at the 🍑 Apricot paper that we presented at ACL earlier this year. This project was a wonderful collaboration with @dnnslmr.bsky.social!
dnnslmr.bsky.social
Obtaining calibrated confidence scores from LLMs is hard, especially for black-box models. So, can we maybe predict them directly from the generated text? 🤔 Internship work at
Parameter Lab with Martin Gubri, Sangdoo Yun, Hwaran Lee, Seong Joon Oh! arxiv.org/abs/2403.059... [1/6]
Reposted by Martin Gubri
joestacey.bsky.social
After going to NAACL, ACL and #EMNLP2024 this year, here are a few tips I’ve picked up about attending #NLP conferences.

Would love to hear any other tips if you have them!

This proved very popular on another (more evil) social media platform, so sharing here also 🙂

My 10 tips:
mgubri.bsky.social
Thanks a lot 😊 I am also happy about what we achieved!
mgubri.bsky.social
Overall, we propose a new fingerprinting algorithm for LLM based on prompts suffixes optimized to output an answer chosen at random.
🎉A big shoutout to my amazing co-authors from @parameterlab.bsky.social & Naver AI Lab: @dnnslmr.bsky.social, Hwaran Lee, @oodgnas.bsky.social @coallaoh.bsky.social!
mgubri.bsky.social
🛡️Nevertheless, the third party can deploy the reference LLM with changes, so we explore the robustness of our identification:
- TRAP is robust to generation hyperparameters (usual ranges)
- TRAP is not robust to some system prompts