Lightnews — Scholar-powered news

Gillian Hadfield @ghadfield.bsky.social · 15d

Future work should focus on developing smarter debate protocols that weight expertise, discourage blind agreement, and reward critical verification of reasoning. We need to move beyond the naive assumption that 'more talk = better outcomes. (10/10) arxiv.org/abs/2509.05396

Talk Isn't Always Cheap: Understanding Failure Modes in Multi-Agent Debate

While multi-agent debate has been proposed as a promising strategy for improving AI reasoning ability, we find that debate can sometimes be harmful rather than helpful. The prior work has exclusively…

arxiv.org

1 1

Gillian Hadfield @ghadfield.bsky.social · 15d

We suspect RLHF training creates sycophantic behavior, models trained to be agreeable may prioritize consensus over critical evaluation. This suggests current alignment techniques might undermine collaborative reasoning.

1 1

Gillian Hadfield @ghadfield.bsky.social · 15d

Stronger agents were more likely to change from correct to incorrect answers in response to weaker agents' reasoning than vice versa. Models showed a tendency toward favoring agreement over critical evaluation, creating an echo chamber instead of an actual debate.

1 1

Gillian Hadfield @ghadfield.bsky.social · 15d

However, we still observed performance gains on math problems under most conditions, suggesting debate effectiveness depends heavily on the type of reasoning required.

1 1

Gillian Hadfield @ghadfield.bsky.social · 15d

The impact varies significantly by task type. On CommonSenseQA—a dataset we newly examined—debate reduced performance across ALL experimental conditions.

1 1

Gillian Hadfield @ghadfield.bsky.social · 15d

Even when stronger models outweighed weaker ones, group accuracy decreased over successive debate rounds. Introducing weaker models into debates produced results that were worse than the results when agents hadn’t engaged in discussion at all.

1 1

Gillian Hadfield @ghadfield.bsky.social · 15d

We tested debate effectiveness across three tasks (CommonSenseQA, MMLU, GSM8K) using three different models (GPT-4o-mini, LLaMA-3.1-8B, Mistral-7B) in various configurations.

1 1

Gillian Hadfield @ghadfield.bsky.social · 15d

We found that multi-agent debate among large language models can sometimes harm performance rather than improve it, contradicting the assumption that more discussion can lead to better outcomes.

1 1

Gillian Hadfield @ghadfield.bsky.social · 15d

My lab members Harsh Satija and Andrea Wynn and I have a new preprint examining AI multi-agent debate among diverse models, based on our ICML MAS 2025 workshop.

1 2

Gillian Hadfield @ghadfield.bsky.social · 15d

Using debate among AI agents has been proposed as a promising strategy for improving AI reasoning capabilities. Our new research shows that this strategy can often have the opposite effect - and the implications for AI deployment are significant. (1/10) arxiv.org/abs/2509.05396

Talk Isn't Always Cheap: Understanding Failure Modes in Multi-Agent Debate

While multi-agent debate has been proposed as a promising strategy for improving AI reasoning ability, we find that debate can sometimes be harmful rather than helpful. The prior work has exclusively…

arxiv.org

1 1 5

Gillian Hadfield @ghadfield.bsky.social · Jun 16

These roles will shape the conversation on AI and provide the opportunity for rich, interdisciplinary collaboration with colleagues and researchers in the Department of Computer Science and the School of Government and Policy.
Please spread the word in your network! 5/5
gillianhadfield.org/jobs/

Jobs

I have postdoc and staff openings for our lab at the Johns Hopkins University in either Baltimore, MD or Washington, DC.Postdoctoral FellowWe are hiring an interdisciplinary scholar with a track re…

gillianhadfield.org

Gillian Hadfield @ghadfield.bsky.social · Jun 16

We're recruiting for a Postdoctoral fellow with a track record in computational modeling that explores AI systems and autonomous AI agent dynamics, and experience with ML systems to investigate the foundations of human normativity, and how to build AI systems aligned with human values. 4/5

Gillian Hadfield @ghadfield.bsky.social · Jun 16

We're hiring an AI Communications Associate to craft and execute a multi-channel strategy that turns leading computer science and public policy research into accessible content for a broad audience of stakeholders. 3/5

Gillian Hadfield @ghadfield.bsky.social · Jun 16

We're hiring an AI Policy Researcher to conduct in-depth research into the technical and policy challenges in AI alignment, safety, and governance, and to produce high-quality research reports, white papers, and policy recommendations. 2/5

Jobs

I have postdoc and staff openings for our lab at the Johns Hopkins University in either Baltimore, MD or Washington, DC.Postdoctoral FellowWe are hiring an interdisciplinary scholar with a track re…

gillianhadfield.org

2

Gillian Hadfield @ghadfield.bsky.social · Jun 16

My lab @johnshopkins is recruiting research and communications professionals, and AI postdocs to advance our work ensuring that AI is safe and aligned to human well-being worldwide. 1/5

Jobs

I have postdoc and staff openings for our lab at the Johns Hopkins University in either Baltimore, MD or Washington, DC.Postdoctoral FellowWe are hiring an interdisciplinary scholar with a track re…

gillianhadfield.org

4 1 1

Gillian Hadfield @ghadfield.bsky.social · Jun 12

Our report is now out, chock-a-block with new ideas including insurance partnerships, government oversight of private regulators, building a robust ecosystem, and fostering trust and investment. Check it out here: srinstitute.utoronto.ca/news/co-desi...

Can a market-based regulatory framework help govern AI? New report weighs in — Schwartz Reisman Institute

In April 2024, the Schwartz Reisman Institute for Technology and Society (SRI) hosted a workshop that brought together 33 high-level experts to explore the viability of regulatory markets. Over the co...

srinstitute.utoronto.ca

4

Gillian Hadfield @ghadfield.bsky.social · Jun 12

destabilize or harm our communities, economies, or politics.Together with @djjrjr.bsky.social and @torontosri.bsky.social we held a design workshop last year with a stunning group of experts from AI labs, regulatory technology startups, enterprise clients, civil society, academia,and government.2/3

1

Gillian Hadfield @ghadfield.bsky.social · Jun 12

Six years ago @jackclarksf.bsky.social and I proposed regulatory markets as a new model for AI governance that would attract more investment---money and brains—in a democratically legitimate way, fostering AI innovation while ensuring these powerful technologies don’t 1/2

1 1 4

Reposted by Gillian Hadfield

AIhub.org @aihub.org · May 23

In this insightful interview, AIhub ambassador Kumar Kshitij Patel met @ghadfield.bsky.social‬, keynote speaker at ‪@ijcai.org, to find out more about her interdisciplinary research, career trajectory, AI alignment, and her thoughts on AI systems in general.

aihub.org/2025/05/22/i...

Interview with Gillian Hadfield: Normative infrastructure for AI alignment - ΑΙhub

aihub.org

3 4

Reposted by Gillian Hadfield

AIhub.org @aihub.org · Jun 4

Our latest monthly digest features:
-Ananya Joshi on healthcare data monitoring
-AI alignment with @ghadfield.bsky.social‬
-Onur Boyar on drug and material design
-Object state classification with Filippos Gouidis
aihub.org/2025/05/30/a...

AIhub monthly digest: May 2025 – materials design, object state classification, and real-time monitoring for healthcare data - ΑΙhub

aihub.org

1 3

Gillian Hadfield @ghadfield.bsky.social · Feb 15

Everyone, including those who think we're building powerful AI to improve lives for everyone, should take seriously how poorly our current economic indicators (unemployment, earnings, inflation) capture the well-being of low- and moderate-income folks. www.politico.com/news/magazin...

Voters Were Right About the Economy. The Data Was Wrong.

Here’s why unemployment is higher, wages are lower and growth less robust than government statistics suggest.

www.politico.com

4

Reposted by Gillian Hadfield

Sean O hEigeartaigh @sean-o-h.bsky.social · Feb 14

I was at this meeting Mon, and the quality & seriousness of discussion made it a highlight. But Fu Ying is right that forging the cooperation needed, even limited to the extreme risks that threaten everyone, is becoming ever harder. We must keep trying.
www.scmp.com/news/china/d...

China, US should fight rogue AI risks together, despite tensions: ex-diplomat

Open-source AI models like DeepSeek allow collaborators to find security vulnerabilities more easily, Fu Ying tells Paris’ AI Action Summit.

www.scmp.com

1 2

Gillian Hadfield @ghadfield.bsky.social · Feb 5

I think that would only require “read” access

3

Gillian Hadfield @ghadfield.bsky.social · Feb 4

Do we think Musk is using treasury payments data to train, fine tune or do inference on AI models? @caseynewton.bsky.social

1 1 8

Gillian Hadfield @ghadfield.bsky.social · Jan 26

Video from our tutorial @NeurIPSConf 2024 is up! @dhadfieldmenell @jzl86 @rstriv and I explore how frameworks from economics, institutional and political theory, and biological and cultural evolution can advance approaches to AI alignment neurips.cc/virtual/2024...

NeurIPS Tutorial Cross-disciplinary insights into alignment in humans and machinesNeurIPS 2024

neurips.cc

2 9