Lightnews — Scholar-powered news

Fazl Barez

@fbarez.bsky.social

🚨New AI Safety Course
@aims_oxford
!

I’m thrilled to launch a new called AI Safety & Alignment (AISAA) course on the foundations & frontier research of making advanced AI systems safe and aligned at
@UniofOxford

what to expect 👇
robots.ox.ac.uk/~fazl/aisaa/

October 6, 2025 at 4:40 PM

Reposted by Fazl Barez

Toby Ord

@tobyord.bsky.social

Evaluating the Infinite
🧵
My latest paper tries to solve a longstanding problem afflicting fields such as decision theory, economics, and ethics — the problem of infinities.
Let me explain a bit about what causes the problem and how my solution avoids it.
1/N
arxiv.org/abs/2509.19389

Evaluating the Infinite

I present a novel mathematical technique for dealing with the infinities arising from divergent sums and integrals. It assigns them fine-grained infinite values from the set of hyperreal numbers in a ...

arxiv.org

September 25, 2025 at 3:28 PM

Fazl Barez

@fbarez.bsky.social

🚀 Excited to have 2 papers accepted at #NeurIP2025! 🎉 congrats to my amazing co-authors!

More details (and more bragging) soon! and maybe even more news on sep 25 👀

See you all in… Mexico? San Diego? Copenhagen? Who knows! 🌍✈️

September 19, 2025 at 9:08 AM

Reposted by Fazl Barez

Jakob Mökander

@jakobmokander.bsky.social

🚨 NEW PAPER 🚨: Embodied AI (incl. AI-powered drones, self-driving cars and robots) is here, but policies are lagging. We analyzed the EAI risks and found significant gaps in governance

arxiv.org/pdf/2509.00117

Co-authors Jared Perlo @fbarez.bsky.social Alex Robey & @floridi.bsky.social

1\4

September 4, 2025 at 5:51 PM

Reposted by Fazl Barez

Martin Tutek

@mtutek.bsky.social

Other works have highlighted that CoTs ≠ explainability alphaxiv.org/abs/2025.02 (@fbarez.bsky.social), and that intermediate (CoT) tokens ≠ reasoning traces arxiv.org/abs/2504.09762 (@rao2z.bsky.social).

Here, FUR offers a fine-grained test if LMs latently used information from CoTs for answers!

Chain-of-Thought Is Not Explainability | alphaXiv

View 3 comments: There should be a balance of both subjective and observable methodologies. Adhering to just one is a fools errand.

alphaxiv.org

August 21, 2025 at 3:21 PM

Reposted by Fazl Barez

Jeroen ‘Jeremy’ Fransen

@jeroenjeremy.bsky.social

It is so easy to confuse chain of thought and explainability and in fact in a lot of the media it is presented as if with current LLMs we are allowed to view their actual thought processes. It is not that!

Fazl Barez @fbarez.bsky.social · Jul 1

Excited to share our paper: "Chain-of-Thought Is Not Explainability"! We unpack a critical misconception in AI: models explaining their steps (CoT) aren't necessarily revealing their true reasoning. Spoiler: the transparency can be an illusion. (1/9) 🧵

July 2, 2025 at 12:41 PM

Fazl Barez

@fbarez.bsky.social

Excited to share our paper: "Chain-of-Thought Is Not Explainability"! We unpack a critical misconception in AI: models explaining their steps (CoT) aren't necessarily revealing their true reasoning. Spoiler: the transparency can be an illusion. (1/9) 🧵

July 1, 2025 at 3:41 PM

Fazl Barez

@fbarez.bsky.social

Technology = power. AI is reshaping power — fast.

Today’s AI doesn’t just assist decisions; it makes them. Governments use it for surveillance, prediction, and control — often with no oversight.

Technical safeguards aren’t enough on their own — but they’re essential for AI to serve society.

June 27, 2025 at 8:07 AM

Reposted by Fazl Barez

David Duvenaud

@davidduvenaud.bsky.social

And Anna Yelizarov, @fbarez.bsky.social, @scasper.bsky.social, Beatrice Erkers, among others.

We'll draw from political theory, cooperative AI, economics, mechanism design, history, and hierarchical agency.

June 18, 2025 at 6:12 PM

Reposted by Fazl Barez

Yoav Gur Arieh

@yoav.ml

This is a step toward targeted, interpretable, and robust knowledge removal — at the parameter level.

Joint work with Clara Suslik, Yihuai Hong, and @fbarez.bsky.social, advised by @megamor2.bsky.social
🔗 Paper: arxiv.org/abs/2505.22586
🔗 Code: github.com/yoavgur/PISCES

May 29, 2025 at 4:22 PM

Fazl Barez

@fbarez.bsky.social

Come work with me at Oxford this summer! Paid research opportunity to:

White-box LLMs & model security
Safe RL & reward hacking
Interpretability & governance tools

Remote or Oxford.

Apply by 30 May 23:59 UTC. DM with questions.

May 20, 2025 at 5:13 PM

Fazl Barez

@fbarez.bsky.social

Come work with me at Oxford!

We’re hiring a Postdoc in Causal Systems Modelling to:

- Build causal & white-box models that make frontier AI safer and more transparent
- Turn technical insights into safety cases, policy briefs, and governance tools
]

DM if you have any questions.

May 15, 2025 at 11:12 AM

Fazl Barez

@fbarez.bsky.social

First-time Area Chair seeking advice! What helped you most when evaluating papers beyond just averaging scores?

After suffering through unhelpful reviews as an author, I want to do right by papers in my track.

April 8, 2025 at 11:59 AM

Reposted by Fazl Barez

Mor Geva

@megamor2.bsky.social

🎉 Our Actionable Interpretability workshop has been accepted to #ICML2025! 🎉
> Follow @actinterp.bsky.social
> Website actionable-interpretability.github.io

@talhaklay.bsky.social @anja.re @mariusmosbach.bsky.social @sarah-nlp.bsky.social @iftenney.bsky.social

Paper submission deadline: May 9th!

March 31, 2025 at 4:59 PM

Reposted by Fazl Barez

Technical AI Governance @ ICML 2025

@taig-icml.bsky.social

Organizers: Ben Bucknall, @lisasoder.bsky.social, @ankareuel.bsky.social @fbarez.bsky.social, @carlosmougan.bsky.social
Weiwei Pan, Siddharth Swaroop, @ankareuel.bsky.social , Robert Trager @maosbot.bsky.social

April 1, 2025 at 2:58 PM

Fazl Barez

@fbarez.bsky.social

Technical AI Governance (TAIG) at #ICML2025 this July in Vancouver!

Credit to
Ben and Lisa for all the work!

We have a new centre at Oxford working on technical AI governance with Robert Trager and @maosbot.bsky.social many other great minds. We are hiring - please reach out!
Quote

Technical AI Governance @ ICML 2025 @taig-icml.bsky.social · Apr 1

📣We’re thrilled to announce the first workshop on Technical AI Governance (TAIG) at #ICML2025 this July in Vancouver! Join us (& this stellar list of speakers) in bringing together technical & policy experts to shape the future of AI governance! www.taig-icml.com

April 1, 2025 at 3:10 PM

Reposted by Fazl Barez

Naomi Saphra

@nsaphra.bsky.social

Life update: I'm starting as faculty at Boston University
@bucds.bsky.social in 2026! BU has SCHEMES for LM interpretability & analysis, I couldn't be more pumped to join a burgeoning supergroup w/ @najoung.bsky.social @amuuueller.bsky.social. Looking for my first students, so apply and reach out!

CDS building which looks like a jenga tower

March 27, 2025 at 2:24 AM

Reposted by Fazl Barez

Itay Itzhak @ COLM 🍁

@itay-itzhak.bsky.social

New paper alert!

Curious how small prompt tweaks impact LLM accuracy but don’t want to run endless inferences? We got you. Meet DOVE - a dataset built to uncover these sensitivities.

Use DOVE for your analysis or contribute samples -we're growing and welcome you aboard!

Eliya Habba @eliyahabba.bsky.social · Mar 17

Care about LLM evaluation? 🤖 🤔

We bring you ️️🕊️ DOVE a massive (250M!) collection of LLMs outputs
On different prompts, domains, tokens, models...

Join our community effort to expand it with YOUR model predictions & become a co-author!

March 17, 2025 at 4:33 PM

Reposted by Fazl Barez

wdmacaskill.bsky.social

@wdmacaskill.bsky.social

What happens once AI can design better AI, which can itself design better AI? Will we get an "intelligence explosion" where AI capabilities increase very rapidly? Tom Davidson, Rose Hadshar and I have a new paper out with analysis of these dynamics.

March 17, 2025 at 2:54 PM

Reposted by Fazl Barez

Jakob Foerster

@jfoerst.bsky.social

My group @FLAIR_Ox is recruiting a postdoc and looking for someone who can get started by the end of April. Deadline to apply is in one week (!), 19th of March at noon, so please help spread the word: my.corehr.com/pls/uoxrecru...

Job Details

my.corehr.com

March 12, 2025 at 3:17 PM

Reposted by Fazl Barez

Tal Haklay

@talhaklay.bsky.social

1/13 LLM circuits tell us where the computation happens inside the model—but the computation varies by token position, a key detail often ignored!
We propose a method to automatically find position-aware circuits, improving faithfulness while keeping circuits compact. 🧵👇

March 6, 2025 at 10:15 PM

Fazl Barez

@fbarez.bsky.social

🔍 Excited to share our paper: "Same Question, Different Words: A Latent Adversarial Framework for Prompt Robustness"!

March 4, 2025 at 5:24 PM

Fazl Barez

@fbarez.bsky.social

New paper alert! 🚨

Important question: Do SAEs generalise?
We explore the answerability detection in LLMs by comparing SAE features vs. linear residual stream probes.

Answer:
probes outperform SAE features in-domain, out-of-domain generalization varies sharply between features and datasets. 🧵

March 1, 2025 at 6:14 PM

Reposted by Fazl Barez

Adi Simhi

@adisimhi.bsky.social

🚨New arXiv preprint!🚨
LLMs can hallucinate - but did you know they can do so with high certainty even when they know the correct answer? 🤯
We find those hallucinations in our latest work with @itay-itzhak.bsky.social, @fbarez.bsky.social, @gabistanovsky.bsky.social and Yonatan Belinkov

February 19, 2025 at 3:50 PM

Reposted by Fazl Barez

Oxford Martin AI Governance Initiative

@aigioxfordmartin.bsky.social

We are excited to welcome Fazl Barez @fbarez.bsky.social, who joins us as a senior postdoctoral research fellow. He will be leading research initiatives in AI safety and interpretability.
@oxmartinschool.bsky.social

Find out more: www.oxfordmartin.ox.ac.uk/people/fazl-...

February 18, 2025 at 3:37 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news