Posts
Media
Videos
Starter Packs
Reposted by Seven Years in Quebec
⚡️🌙
@dystopiabreaker.xyz
· 13h
Stress Testing Deliberative Alignment for Anti-Scheming Training — Apollo Research
Future AIs might secretly pursue unintended goals — “scheme”. In a collaboration with OpenAI, we tested a training method to reduce existing versions of such behavior. We see major improvements, but ...
www.apolloresearch.ai
Reposted by Seven Years in Quebec