Ryan Moulton
@moultano.bsky.social
6.3K followers
1.1K following
8.3K posts
Algorithmist
https://moultano.wordpress.com/
Posts
Media
Videos
Starter Packs
Pinned
Ryan Moulton
@moultano.bsky.social
· Jun 4
Reposted by Ryan Moulton
⚡️🌙
@dystopiabreaker.xyz
· 1d
Stress Testing Deliberative Alignment for Anti-Scheming Training — Apollo Research
Future AIs might secretly pursue unintended goals — “scheme”. In a collaboration with OpenAI, we tested a training method to reduce existing versions of such behavior. We see major improvements, but ...
www.apolloresearch.ai