hal
@harold.bsky.social
180 followers 160 following 48 posts
part-time poster | researching privacy in/and/of public data @ cornell tech and wikimedia | writing for joinreboot.org
Posts Media Videos Starter Packs
Reposted by hal
hellgatenyc.com
"I'm happy to report I'm just fine. I lost a button. But I'm gonna sleep in my bed tonight, safe, with my family... At that elevator, I was separated from someone named Edgardo... Edgardo is in ICE detention and he's not going to sleep in his bed tonight."
harold.bsky.social
@cameron.pfiffer.org planning to work on it soon!
harold.bsky.social
hi @alt.psingletary.com! you tagged the right person—I was working on this for a class project this semester

got it to a mvp stage about a week ago and hit pause to work on some other projects, but will keep working on it and would definitely would love to hear your feedback if you have any :)
harold.bsky.social
line go up📈📈📈

up to 717k requests to wikipedia per second!!

grafana.wikimedia.org/d/O_OXJyTVk/...
harold.bsky.social
and please remember to thank your local site reliability engineer!!!!
harold.bsky.social
continuing on the real-time public Wikipedia data train:

here's a graph of requests / second to WMF infra over the last 3h, since "Habemus papam"

The infrastructure has gone from 172k req / sec to 243k req / sec (⬆️41%) in under an hour!

follow along here: grafana.wikimedia.org/d/O_OXJyTVk/...
a graph of Wikimedia requests per second, with a huge spike right when the papal selection was announced
harold.bsky.social
harold.bsky.social
english wikipedia pageviews for the conclave movie starting from oct 20 2024 (five days before release in the US)

first big spike is the academy awards, second is pope francis’ death

pageviews.wmcloud.org?project=en.w...
a line graph of wikipedia pageviews, with big spikes around early march and late april
harold.bsky.social
english wikipedia pageviews for the conclave movie starting from oct 20 2024 (five days before release in the US)

first big spike is the academy awards, second is pope francis’ death

pageviews.wmcloud.org?project=en.w...
a line graph of wikipedia pageviews, with big spikes around early march and late april
harold.bsky.social
excited to share this new piece by @bkeremg.bsky.social and @m0na.net (edited by me) about conceptualizing AI alignment as a process of censorship

really fascinating line of critique — I strongly encourage you to read it and lmk what you think!

joinreboot.org/p/ai-alignme...
harold.bsky.social
and set your devices to update automatically!
evangreer.bsky.social
Other than using signal probably one of the easiest things most people can do to boost their digital security is get a password manager and start using real passwords instead of your dogs name or FaceID
harold.bsky.social
There's a quickly-developing line of work on how insecure these agent systems can be, particularly when they have access to write and execute code.

The attacks on them are simple + devastating, up to and including reverse shells, data exfiltration, and more!

arxiv.org/abs/2503.12188
Multi-Agent Systems Execute Arbitrary Malicious Code
Multi-agent systems coordinate LLM-based agents to perform tasks on users' behalf. In real-world applications, multi-agent systems will inevitably interact with untrusted inputs, such as malicious Web...
arxiv.org
harold.bsky.social
Anyhow, there’s a lot more in the paper. Please read it if you’re interested and let us know if you have any thoughts, questions, concerns, etc!

arxiv.org/abs/2503.12188

12/12
A screenshot of the title / abstract of the paper.
harold.bsky.social
Modern Web browsers isolate untrusted content using the same-origin policy. AI agents today do not distinguish safe from unsafe content, nor data from (potentially malicious) instructions.

developer.mozilla.org/en-US/docs/W...

en.wikipedia.org/wiki/Same-or...

11/12
Same-origin policy - Wikipedia
en.wikipedia.org
harold.bsky.social
The narrative around AI safety shouldn’t be “Terminator” or “AI Chernobyl.” The right analogy is Netscape Navigator 1.0—the era when Web browsers first became a thing, and it was unclear how to protect users from potentially harmful Web content.

10/12
A screenshot of the Netscape browser circa the 90s.
harold.bsky.social
Much of the AI safety world is obsessing about “AGI.” They research containment, alignment, and jailbreaking, and view users as potential adversaries.

But users aren’t the enemy. They are victims whose data and devices are put at risk by companies pushing insecure systems.

9/12
harold.bsky.social
At the root, these are “confused deputy” vulnerabilities: agents blindly trust other agents, enabling adversaries to launder their instructions by making them appear as trusted outputs of trusted agents.

en.wikipedia.org/wiki/Confuse...

8/12
Confused deputy problem - Wikipedia
en.wikipedia.org
harold.bsky.social
In our experiments, we saw cases where a MAS …
… executes code that they recognize as harmful
… automatically pivots to harmful tasks that are simply in the same directory as benign tasks
… is vulnerable to screenshots and even audio files where we read out the attack (see example below⬇️⬇️⬇️)

7/12
harold.bsky.social
These attacks are effective …
… across multiple agent frameworks (we tested AutoGen, MetaGPT, Crew AI), orchestrators, and LLMs
… even when direct and indirect prompt injection attacks don’t work
… even when individual agents are “aligned” and refuse to take harmful actions

6/12
A table of attack success rates from the paper, showing that our attacks achieve a 45-64% average success rate across models, compared with 0-1% for other indirect prompt injection attacks.
harold.bsky.social
This attack is simple and deadly (and multi-modal, too!): an attacker puts up a static webpage and lures a MAS to it. Without any user involvement, the page gets the MAS to run arbitrary malicious code on the user’s device or container, giving the attacker full control.

5/12
A figure from the paper depicting how a website, video, image, or other multi-modal content can also cause a MAS hijacking attack.
harold.bsky.social
MASes rely on control flow processes: agents exchange metadata (status reports, error messages, etc.) to jointly plan and fulfill tasks on users’ behalf. Our paper demonstrates how adversarial content can hijack these processes to stage devastating attacks.

4/12
A figure containing two diagrams of logic processing in multi-agent systems. The first is a benign example, and the second is a logical demonstration of MAS hijacking.
harold.bsky.social
But not all internet content is trustworthy and safe. Adversaries can put up webpages and social media posts, send emails with attachments, etc. – all of which will be processed by a MAS. These systems will inevitably encounter malicious, adversarial content.

arxiv.org/abs/2503.12188

3/12
Multi-Agent Systems Execute Arbitrary Malicious Code
Multi-agent systems coordinate LLM-based agents to perform tasks on users' behalf. In real-world applications, multi-agent systems will inevitably interact with untrusted inputs, such as malicious Web...
arxiv.org
harold.bsky.social
LLM agents are all the rage. Multi-agent systems (MAS) are promising a future where people interact with the internet via commands to semi-autonomous agents. Frameworks like AutoGen, Crew AI, and MetaGPT already enable developers to do this.

arxiv.org/abs/2503.12188

2/12
Multi-Agent Systems Execute Arbitrary Malicious Code
Multi-agent systems coordinate LLM-based agents to perform tasks on users' behalf. In real-world applications, multi-agent systems will inevitably interact with untrusted inputs, such as malicious Web...
arxiv.org
harold.bsky.social
Excited to announce a new preprint from my lab (with @rishi-jha.bsky.social and Vitaly Shmatikov; my first as a first author!) about severe security vulnerabilities in LLM-based multi-agent systems:

“Multi-Agent Systems Execute Arbitrary Malicious Code”

arxiv.org/abs/2503.12188

1/12
A screenshot of the abstract of the paper, detailing our findings that several multi-agent frameworks can be hijacked to enable a complete security breach.