Gillian Hadfield
@ghadfield.bsky.social
1.2K followers 1.1K following 42 posts
Economist and legal scholar turned AI researcher focused on AI alignment and governance. Prof of government and policy and computer science at Johns Hopkins where I run the Normativity Lab. Recruiting CS postdocs and PhD students. gillianhadfield.org
Posts Media Videos Starter Packs
ghadfield.bsky.social
Future work should focus on developing smarter debate protocols that weight expertise, discourage blind agreement, and reward critical verification of reasoning. We need to move beyond the naive assumption that 'more talk = better outcomes. (10/10) arxiv.org/abs/2509.05396
Talk Isn't Always Cheap: Understanding Failure Modes in Multi-Agent Debate
While multi-agent debate has been proposed as a promising strategy for improving AI reasoning ability, we find that debate can sometimes be harmful rather than helpful. The prior work has exclusively…
arxiv.org
ghadfield.bsky.social
We suspect RLHF training creates sycophantic behavior, models trained to be agreeable may prioritize consensus over critical evaluation. This suggests current alignment techniques might undermine collaborative reasoning.
ghadfield.bsky.social
Stronger agents were more likely to change from correct to incorrect answers in response to weaker agents' reasoning than vice versa. Models showed a tendency toward favoring agreement over critical evaluation, creating an echo chamber instead of an actual debate.
ghadfield.bsky.social
However, we still observed performance gains on math problems under most conditions, suggesting debate effectiveness depends heavily on the type of reasoning required.
ghadfield.bsky.social
The impact varies significantly by task type. On CommonSenseQA—a dataset we newly examined—debate reduced performance across ALL experimental conditions.
ghadfield.bsky.social
Even when stronger models outweighed weaker ones, group accuracy decreased over successive debate rounds. Introducing weaker models into debates produced results that were worse than the results when agents hadn’t engaged in discussion at all.
ghadfield.bsky.social
We tested debate effectiveness across three tasks (CommonSenseQA, MMLU, GSM8K) using three different models (GPT-4o-mini, LLaMA-3.1-8B, Mistral-7B) in various configurations.
ghadfield.bsky.social
We found that multi-agent debate among large language models can sometimes harm performance rather than improve it, contradicting the assumption that more discussion can lead to better outcomes.
ghadfield.bsky.social
My lab members Harsh Satija and Andrea Wynn and I have a new preprint examining AI multi-agent debate among diverse models, based on our ICML MAS 2025 workshop.
ghadfield.bsky.social
Using debate among AI agents has been proposed as a promising strategy for improving AI reasoning capabilities. Our new research shows that this strategy can often have the opposite effect - and the implications for AI deployment are significant. (1/10) arxiv.org/abs/2509.05396
Talk Isn't Always Cheap: Understanding Failure Modes in Multi-Agent Debate
While multi-agent debate has been proposed as a promising strategy for improving AI reasoning ability, we find that debate can sometimes be harmful rather than helpful. The prior work has exclusively…
arxiv.org
ghadfield.bsky.social
These roles will shape the conversation on AI and provide the opportunity for rich, interdisciplinary collaboration with colleagues and researchers in the Department of Computer Science and the School of Government and Policy.
Please spread the word in your network! 5/5
gillianhadfield.org/jobs/
Jobs
I have postdoc and staff openings for our lab at the Johns Hopkins University in either Baltimore, MD or Washington, DC.Postdoctoral FellowWe are hiring an interdisciplinary scholar with a track re…
gillianhadfield.org
ghadfield.bsky.social
We're recruiting for a Postdoctoral fellow with a track record in computational modeling that explores AI systems and autonomous AI agent dynamics, and experience with ML systems to investigate the foundations of human normativity, and how to build AI systems aligned with human values. 4/5
ghadfield.bsky.social
We're hiring an AI Communications Associate to craft and execute a multi-channel strategy that turns leading computer science and public policy research into accessible content for a broad audience of stakeholders. 3/5
ghadfield.bsky.social
We're hiring an AI Policy Researcher to conduct in-depth research into the technical and policy challenges in AI alignment, safety, and governance, and to produce high-quality research reports, white papers, and policy recommendations. 2/5
Jobs
I have postdoc and staff openings for our lab at the Johns Hopkins University in either Baltimore, MD or Washington, DC.Postdoctoral FellowWe are hiring an interdisciplinary scholar with a track re…
gillianhadfield.org
ghadfield.bsky.social
My lab @johnshopkins is recruiting research and communications professionals, and AI postdocs to advance our work ensuring that AI is safe and aligned to human well-being worldwide. 1/5
Jobs
I have postdoc and staff openings for our lab at the Johns Hopkins University in either Baltimore, MD or Washington, DC.Postdoctoral FellowWe are hiring an interdisciplinary scholar with a track re…
gillianhadfield.org
ghadfield.bsky.social
destabilize or harm our communities, economies, or politics.Together with @djjrjr.bsky.social and @torontosri.bsky.social we held a design workshop last year with a stunning group of experts from AI labs, regulatory technology startups, enterprise clients, civil society, academia,and government.2/3
ghadfield.bsky.social
Six years ago @jackclarksf.bsky.social and I proposed regulatory markets as a new model for AI governance that would attract more investment---money and brains—in a democratically legitimate way, fostering AI innovation while ensuring these powerful technologies don’t 1/2
Reposted by Gillian Hadfield
aihub.org
AIhub.org @aihub.org · May 23
In this insightful interview, AIhub ambassador Kumar Kshitij Patel met @ghadfield.bsky.social‬, keynote speaker at ‪@ijcai.org, to find out more about her interdisciplinary research, career trajectory, AI alignment, and her thoughts on AI systems in general.

aihub.org/2025/05/22/i...
Interview with Gillian Hadfield: Normative infrastructure for AI alignment - ΑΙhub
aihub.org
Reposted by Gillian Hadfield
aihub.org
Our latest monthly digest features:
-Ananya Joshi on healthcare data monitoring
-AI alignment with @ghadfield.bsky.social
-Onur Boyar on drug and material design
-Object state classification with Filippos Gouidis
aihub.org/2025/05/30/a...
AIhub monthly digest: May 2025 – materials design, object state classification, and real-time monitoring for healthcare data - ΑΙhub
aihub.org
ghadfield.bsky.social
Everyone, including those who think we're building powerful AI to improve lives for everyone, should take seriously how poorly our current economic indicators (unemployment, earnings, inflation) capture the well-being of low- and moderate-income folks. www.politico.com/news/magazin...
Voters Were Right About the Economy. The Data Was Wrong.
Here’s why unemployment is higher, wages are lower and growth less robust than government statistics suggest.
www.politico.com
Reposted by Gillian Hadfield
sean-o-h.bsky.social
I was at this meeting Mon, and the quality & seriousness of discussion made it a highlight. But Fu Ying is right that forging the cooperation needed, even limited to the extreme risks that threaten everyone, is becoming ever harder. We must keep trying.
www.scmp.com/news/china/d...
China, US should fight rogue AI risks together, despite tensions: ex-diplomat
Open-source AI models like DeepSeek allow collaborators to find security vulnerabilities more easily, Fu Ying tells Paris’ AI Action Summit.
www.scmp.com
ghadfield.bsky.social
I think that would only require “read” access
ghadfield.bsky.social
Do we think Musk is using treasury payments data to train, fine tune or do inference on AI models? @caseynewton.bsky.social
ghadfield.bsky.social
Video from our tutorial @NeurIPSConf 2024 is up! @dhadfieldmenell @jzl86 @rstriv and I explore how frameworks from economics, institutional and political theory, and biological and cultural evolution can advance approaches to AI alignment neurips.cc/virtual/2024...
NeurIPS Tutorial Cross-disciplinary insights into alignment in humans and machinesNeurIPS 2024
neurips.cc