Working towards the safe development of AI for the benefit of all at Université de Montréal, LawZero and Mila.
A.M. Turing Award Recipient and most-cited AI researcher.
https://lawzero.org/en
https://yoshuabengio.org/profile/ ..
more
Working towards the safe development of AI for the benefit of all at Université de Montréal, LawZero and Mila.
A.M. Turing Award Recipient and most-cited AI researcher.
https://lawzero.org/en
https://yoshuabengio.org/profile/
Yoshua Bengio is a Canadian computer scientist, and a pioneer of artificial neural networks and deep learning. He is a professor at the Université de Montréal and scientific director of the AI institute MILA. .. more
As former founder of Korbit, he brings deep expertise in GenAI, software security, and research to our mission.
Full press release: lawzero.org/en/news/lawz...
Reposted by Yoshua Bengio
As former founder of Korbit, he brings deep expertise in GenAI, software security, and research to our mission.
Full press release: lawzero.org/en/news/lawz...
· When given 10 attempts, attackers can use malicious prompts to bypass leading systems' safeguards about half the time.
· Inserting as few as 250 malicious documents into a model's training data can introduce vulnerabilities.
(6/6)
(5/6)
· Technical safeguards are beginning to inform transparency measures in governance frameworks.
(4/6)
· Improvements in adversarial training methods to make models more resistant to potentially harmful requests,
· Better tools for tracking AI-generated content.
(3/6)
Read the full Update here: internationalaisafetyreport.org/publication/...
(2/6)
(1/6)
From a technical perspective, safeguarding open-weight model safety is AI safety in hard mode. But there's still a lot of progress to be made. Our new paper covers 16 open problems.
🧵🧵🧵
To continue benefiting from the advantages of open-weight models, we must develop risk mitigation methodologies specifically for them, as discussed in this paper.
Reposted by Yoshua Bengio
From a technical perspective, safeguarding open-weight model safety is AI safety in hard mode. But there's still a lot of progress to be made. Our new paper covers 16 open problems.
🧵🧵🧵
Innovation & regulation must go hand in hand, notably by building technical safeguards to make AI systems more trustworthy for individuals and businesses. That's at the heart of our work at @law-zero.bsky.social.
We touched on AI's early signs of self-preservation and deceptive behaviours, as well as the technical and policy solutions on the horizon.
www.ft.com/content/181f...