Joan Velja
banner
joanvelja.bsky.social
Joan Velja
@joanvelja.bsky.social
Container of multitudes | MS AI @UvA Amsterdam | Prev @LondonSafeAI

joanvelja.vercel.app/about
Reposted by Joan Velja
I will be at @neuripsconf.bsky.social this week!

Would love to chat about Multi-agent systems, RL, Human-AI Alignment, or anything interesting :)

I'm also applying for PhD programs this cycle, feel free to reach out for any advice!

More about me: karim-abdel.github.io
December 8, 2024 at 11:59 PM
I’ll be at #NeurIPS for the first time :)

I’ll be presenting arxiv.org/abs/2410.03768 at SoLaR!

I’d be happy to chat all things Alignment and Safety, test-time compute and RL.

I’m also looking for PhD programs starting Fall 2025 👀, lmk if you would like to talk it over a coffee!
Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs
The rapid proliferation of frontier model agents promises significant societal advances but also raises concerns about systemic risks arising from unsafe interactions. Collusion to the disadvantage of...
arxiv.org
December 9, 2024 at 12:13 AM
Is this the most consequential finding of modern AI to date?
(Sentiment Neuron paper, openai.com/index/unsupe...)
November 19, 2024 at 4:15 PM
As a lefty, the only nitpick I have so far about Bluesky is the constant pop up menu coming from inadvertently stopping on someone’s pfp…

Research vibes are immaculate so far tho
November 19, 2024 at 2:10 PM