Aida Nematzadeh
@aidanematzadeh.bsky.social
170 followers 35 following 10 posts
Research scientist at Google DeepMind.🦎 She/her. http://www.aidanematzadeh.me/
Posts Media Videos Starter Packs
aidanematzadeh.bsky.social
This result is particularly interesting for capabilities that are harder for models (like numerical reasoning or text rendering) as prompt-aware guidance boosts performance on these failure modes without retraining.
aidanematzadeh.bsky.social
Most diffusion-based models use a fixed (model-tuned) guidance schedule. We show that picking the guidance value during inference, conditioned on the prompt/capability, significantly improves performance.

arxiv.org/abs/2509.16131
Reposted by Aida Nematzadeh
askoepke.bsky.social
Our #CVPR2025 workshop on Emergent Visual Abilities and Limits of Foundation Models (EVAL-FoMo) is taking place this afternoon (1-6pm) in room 210.

Workshop schedule: sites.google.com/view/eval-fo...
EVAL-FoMo 2 - Schedule
Date: June 11 (1:00pm - 6:00pm)
sites.google.com
aidanematzadeh.bsky.social
Also at #ICLR2025: See this work in action! We're demoing "Gecko" showing how we use capability-based evaluators to help users customize evals & select models. 🦎

Find us at the Google booth, Fri 4/26, 12:00-12:30 PM.
aidanematzadeh.bsky.social
At #ICLR2025, we're diving into what makes prompt adherence evaluators work for image/video generation.
Check out our poster Friday at 3 PM: iclr.cc/virtual/2025/p… 🦎
https://iclr.cc/virtual/2025/p…
aidanematzadeh.bsky.social
Generative models are powerful evaluators/verifiers, impacting evaluation and post-training. Yet, making them effective, particularly for highly similar models/checkpoints, is challenging. The devil is in the details.
Reposted by Aida Nematzadeh
metaomicsnerd.bsky.social
I know multiple people who need to hear this piped into their offices during working hours
Reposted by Aida Nematzadeh
askoepke.bsky.social
Our 2nd Workshop on Emergent Visual Abilities and Limits of Foundation Models (EVAL-FoMo) is accepting submissions. We are looking forward to talks by our amazing speakers that include @saining.bsky.social, @aidanematzadeh.bsky.social, @lisadunlap.bsky.social, and @yukimasano.bsky.social. #CVPR2025
bayesiankitten.bsky.social
🔥 #CVPR2025 Submit your cool papers to Workshop on
Emergent Visual Abilities and Limits of Foundation Models 📷📷🧠🚀✨

sites.google.com/view/eval-fo...

Submission Deadline: March 12th!
EVAL-FoMo 2
A Vision workshop on Evaluations and Analysis
sites.google.com
Reposted by Aida Nematzadeh
pcastr.bsky.social
if you would like to attend #ICLR2025 but have financial barriers, apply for financial assistance!

our priority categories are student authors, and contributors from underrepresented demographic groups & geographic regions.

deadline is march 2nd.

iclr.cc/Conferences/...
ICLR 2025 Financial Assistance
iclr.cc
Reposted by Aida Nematzadeh
eringrant.me
Our representational alignment workshop returns to #ICLR2025! Submit your work on how ML/cogsci/neuro systems represent the world & what shapes these representations 💭🧠🤖

w/ @thisismyhat.bsky.social @dotadotadota.bsky.social, @sucholutsky.bsky.social @lukasmut.bsky.social @siddsuresh97.bsky.social
dotadotadota.bsky.social
🚨Call for Papers🚨
The Re-Align Workshop is coming back to #ICLR2025

Our CfP is up! Come share your representational alignment work at our interdisciplinary workshop at
@iclr-conf.bsky.social

Deadline is 11:59 pm AOE on Feb 3rd

representational-alignment.github.io
aidanematzadeh.bsky.social
The RE application is now open: boards.greenhouse.io/deepmind/job...

And here is the link to the RS position:
boards.greenhouse.io/deepmind/job...
Reposted by Aida Nematzadeh
thomwolf.bsky.social
What was the most impactful/visible/useful release on evaluation in AI in 2024?
Reposted by Aida Nematzadeh
janexwang.bsky.social
A brilliant colleague and wonderful soul Felix Hill recently passed away. This was a shock and in an effort to sort some things out, I wrote them down. Maybe this will help someone else, but at the very least it helped me. Rest in peace, Felix, you will be missed. www.janexwang.com/blog/2025/1/...
Felix — Jane X. Wang
From the moment I heard him give a talk, I knew I wanted to work with Felix . His ideas about generalization and situatedness made explicit thoughts that had been swirling around in my head, incohe...
www.janexwang.com
Reposted by Aida Nematzadeh
lampinen.bsky.social
Felix Hill was such an incredible mentor — and occasional cold water swimming partner — to me. He's a huge part of why I joined DeepMind and how I've come to approach research. Even a month later, it's still hard to believe he's gone.
Felix Hill and some other DMers and I after cold water swimming at Parliament Hill Lido a few years ago
Reposted by Aida Nematzadeh
carlbergstrom.com
It seems to me that the time is ripe for a Bluesky thread about how—and maybe even why—to befriend crows.

(1/n)
Beautiful crow against a black background
Reposted by Aida Nematzadeh
sedielem.bsky.social
Here's Veo 2, the latest version of our video generation model, as well as a substantial upgrade for Imagen 3 🧑‍🍳🚢

(Did I mention we are hiring on the Generative Media team, btw 👀)

blog.google/technology/g...
State-of-the-art video and image generation with Veo 2 and Imagen 3
We’re rolling out a new, state-of-the-art video model, Veo 2, and updates to Imagen 3. Plus, check out our new experiment, Whisk.
blog.google
Reposted by Aida Nematzadeh
sedielem.bsky.social
I've been getting a lot of questions about autoregression vs diffusion at #NeurIPS2024 this week! I'm speaking at the adaptive foundation models workshop at 9AM tomorrow (West Hall A), about what happens when we combine modalities and modelling paradigms.
adaptive-foundation-models.org
NeurIPS 2024 Workshop on Adaptive Foundation Models
adaptive-foundation-models.org
aidanematzadeh.bsky.social
We design 3 main tasks with varying degrees of difficulty and evaluate 13 models across different families. Models show rudimentary numerical reasoning skills, limited to small numbers and simple prompt formats; many models are affected by non-numerical prompt manipulations.
Our main task categories
aidanematzadeh.bsky.social
What do text-to-image models know about numbers? Find out in our new paper 🦎 "Evaluating Numerical Reasoning in text-to-image Models" to be presented at #NeurIPS2024 (Wed 4:30-7:30 PM, #5304).

Dataset: github.com/google-deepm... (1386 prompts, 52,721 images, 479,570 annotations)
GitHub - google-deepmind/geckonum_benchmark_t2i: GeckoNum Benchmark for T2I Model Eval.
GeckoNum Benchmark for T2I Model Eval. Contribute to google-deepmind/geckonum_benchmark_t2i development by creating an account on GitHub.
github.com
Reposted by Aida Nematzadeh
jennhu.bsky.social
Stop by our #NeurIPS tutorial on Experimental Design & Analysis for AI Researchers! 📊

neurips.cc/virtual/2024/tutorial/99528

Are you an AI researcher interested in comparing models/methods? Then your conclusions rely on well-designed experiments. We'll cover best practices + case studies. 👇
NeurIPS Tutorial Experimental Design and Analysis for AI ResearchersNeurIPS 2024
neurips.cc
Reposted by Aida Nematzadeh
sharky6000.bsky.social
If you will be at #NeurIPS2024 @neuripsconf.bsky.social and would like to come see our models in action, come say hi 👋 and check out our demo at the GDM booth!

Wednesday, Dec. 11th @ 9:30-10:00.

Lots of other great things to see as well! Check it out: 👇
deepmind.google/discover/blo...
aidanematzadeh.bsky.social
I am hiring for RS/RE positions! If you are interested in language-flavored multimodal learning, evaluation, or post-training apply here 🦎 boards.greenhouse.io/deepmind/job...

I will also be #NeurIPS2024 so come say hi! (Please email me to find time to chat)
Research Scientist, Language
London, UK
boards.greenhouse.io