⚡️🌙
@dystopiabreaker.xyz
4.6K followers
290 following
1.4K posts
recovering cryptographer building ML models, doing systems work, security, etc.
Posts
Media
Videos
Starter Packs
⚡️🌙
@dystopiabreaker.xyz
· 5h
Reposted by ⚡️🌙
⚡️🌙
@dystopiabreaker.xyz
· 6h
⚡️🌙
@dystopiabreaker.xyz
· 6h
Stress Testing Deliberative Alignment for Anti-Scheming Training — Apollo Research
Future AIs might secretly pursue unintended goals — “scheme”. In a collaboration with OpenAI, we tested a training method to reduce existing versions of such behavior. We see major improvements, but ...
www.apolloresearch.ai
⚡️🌙
@dystopiabreaker.xyz
· 6h
⚡️🌙
@dystopiabreaker.xyz
· 6h
⚡️🌙
@dystopiabreaker.xyz
· 6h
⚡️🌙
@dystopiabreaker.xyz
· 6h
⚡️🌙
@dystopiabreaker.xyz
· 6h
Reposted by ⚡️🌙
⚡️🌙
@dystopiabreaker.xyz
· 1d
⚡️🌙
@dystopiabreaker.xyz
· 1d
⚡️🌙
@dystopiabreaker.xyz
· 1d
⚡️🌙
@dystopiabreaker.xyz
· 1d
Reposted by ⚡️🌙
Grace
@gracekind.net
· Sep 6
⚡️🌙
@dystopiabreaker.xyz
· 2d
⚡️🌙
@dystopiabreaker.xyz
· 2d
⚡️🌙
@dystopiabreaker.xyz
· 2d
⚡️🌙
@dystopiabreaker.xyz
· 2d
Reposted by ⚡️🌙