docandalf.bsky.social
@docandalf.bsky.social
Reposted
Early signs of deception, cheating & self-preservation in top-performing models in terms of reasoning are extremely worrisome. We don't know how to guarantee AI won't have undesired behavior to reach goals & this must be addressed before deploying powerful autonomous agents.
time.com/7259395/ai-c...
When AI Thinks It Will Lose, It Sometimes Cheats
When sensing defeat in a match against a skilled chess bot, advanced models sometimes hack their opponent, a study found.
time.com
February 20, 2025 at 4:45 PM