Lightnews — Scholar-powered news

Javier Rando

@javirandor.com

Adversarial ML research is evolving, but not necessarily for the better. In our new paper, we argue that LLMs have made problems harder to solve, and even tougher to evaluate. Here’s why another decade of work might still leave us without meaningful progress. 👇

February 10, 2025 at 4:24 PM

Javier Rando

@javirandor.com

This Thursday, I will be presenting my work on poisoning RLHF and LLM pretraining @cohereforai.bsky.social

More info here cohere.com/events/coher...

Cohere For AI - Javier Rando, AI Safety PhD Student at ETH Zürich

Javier Rando, AI Safety PhD Student at ETH Zürich - Poisoned Training Data Can Compromise LLMs

cohere.com

January 20, 2025 at 3:39 PM

Reposted by Javier Rando

Daniel Paleka

@dpaleka.bsky.social

Recent LLM forecasters are getting better at predicting the future. But there's a challenge: How can we evaluate and compare AI forecasters without waiting years to see which predictions were right? (1/11)

January 11, 2025 at 1:53 AM

Javier Rando

@javirandor.com

Tomorrow @jakublucki.bsky.social will be presenting the BEST TECHNICAL PAPER at the SoLaR workshop at NeurIPS. Come check our poster and his oral presentation!

Jakub Łucki @jakublucki.bsky.social · Dec 6

Our paper on how unlearning fails to remove hazardous knowledge from LLM weights received 🏆 Best Paper 🏆 award at SoLaR @ NeurIPS!

Join my oral presentation on Saturday at 4:30 pm to learn more.

December 14, 2024 at 3:43 AM

Reposted by Javier Rando

Kristina Nikolić

@nkristina.bsky.social

I am at NeurIPS 🇨🇦, please reach out if you want to grab a coffee!

December 12, 2024 at 10:36 PM

Reposted by Javier Rando

Michael Aerni

@aemai.bsky.social

I am in beautiful Vancouver for #NeurIPS2024 with those amazing folks!
Say hi if you want to chat about ML privacy and security
(or speciality ☕)

Javier Rando @javirandor.com · Dec 10

SPY Lab is in Vancouver for NeurIPS! Come say hi if you see us around 🕵️

December 10, 2024 at 7:48 PM

Javier Rando

@javirandor.com

SPY Lab is in Vancouver for NeurIPS! Come say hi if you see us around 🕵️

December 10, 2024 at 7:43 PM

Javier Rando

@javirandor.com

A new competition on LLM-agents prompt injection is out! Send malicious emails and get agents to perform unauthorised actions.

The competition is hosted at SaTML 2025 and has a pool of $10k in prizes! What are you waiting for?

Santiago Zanella-Beguelin @xefffffff.bsky.social · Dec 9

📢Have experience jailbreaking LLMs?
Want to learn how an indirect / cross prompt injection attack works? Want to try something different to an advent of code?
Then, I have a challenge for you!

The LLMail-Inject competition (llmailinject.azurewebsites.net) starts at 11am UTC (that's in 5min!)

December 9, 2024 at 5:06 PM

Javier Rando

@javirandor.com

I will be at #NeurIPS2024 in Vancouver. I am excited to meet people working on AI Safety and Security. Drop a DM if you want to meet.

I will be presenting two (spotlight!) works. Come say hi to our posters.

December 9, 2024 at 5:02 PM

Reposted by Javier Rando

Jakub Łucki

@jakublucki.bsky.social

🚨Unlearned hazardous knowledge can be retrieved from LLMs 🚨

Our results show that current unlearning methods for AI safety only obfuscate dangerous knowledge, just like standard safety training.

Here's what we found👇

December 6, 2024 at 5:47 PM

Reposted by Javier Rando

floriantramer.bsky.social

@floriantramer.bsky.social

Come do open AI with us in Zurich!
We're hiring PhD students, postdocs (and faculty!)

Javier Rando @javirandor.com · Dec 4

Zurich is a great place to live and do research. It became a slightly better one overnight! Excited to see OAI opening an office here with such a great starting team 🎉

Alexander Kolesnikov @kolesnikov.ch · Dec 4

Ok, it is yesterdays news already, but good night sleep is important.

After 7 amazing years at Google Brain/DM, I am joining OpenAI. Together with @xzhai.bsky.social and @giffmana.ai, we will establish OpenAI Zurich office. Proud of our past work and looking forward to the future.

December 4, 2024 at 1:49 PM

Javier Rando

@javirandor.com

I am curating a list of researchers working on AI Safety and Security here go.bsky.app/BcjeVbN.

Reply to this post with your user or other people you think should be included!

AI Safety and Security

Join the conversation

go.bsky.app

December 4, 2024 at 10:38 AM

Javier Rando

@javirandor.com

Zurich is a great place to live and do research. It became a slightly better one overnight! Excited to see OAI opening an office here with such a great starting team 🎉

Alexander Kolesnikov @kolesnikov.ch · Dec 4

Ok, it is yesterdays news already, but good night sleep is important.

After 7 amazing years at Google Brain/DM, I am joining OpenAI. Together with @xzhai.bsky.social and @giffmana.ai, we will establish OpenAI Zurich office. Proud of our past work and looking forward to the future.

December 4, 2024 at 9:46 AM

Javier Rando

@javirandor.com

Great opportunity to do impactful work on AI alignment!

Dylan Hadfield-Menell @dhadfieldmenell.bsky.social · Dec 2

📢 Seeking PhD students for AI alignment research. Our lab investigates technical mechanisms for value learning, pre-training alignment, and regulatory frameworks. Come work with us if you want to bridge technical ML and legal/policy domains. Details in thread 🧵

December 2, 2024 at 4:07 PM

Javier Rando

@javirandor.com

Jailbreaks have become a new sort of ImageNet competition instead of helping us better understand LLM security. I wrote a blogpost about what I think valuable research could look like 🧵

📖 javirando.com/blog/2024/ja...

Do not write that jailbreak paper | Javier Rando | AI Safety and Security

Jailbreaks are becoming a new ImageNet competition instead of helping us better understand LLM security. Some takes on how LLM jailbreak and security research should look like.

javirando.com

November 26, 2024 at 12:18 PM

Javier Rando

@javirandor.com

Anyone may be able to compromise LLMs with malicious content posted online. With just a small amount of data, adversaries can backdoor chatbots to become unusable for RAG, or bias their outputs towards specific beliefs. Check our latest work! 👇🧵

November 25, 2024 at 12:27 PM

Reposted by Javier Rando

floriantramer.bsky.social

@floriantramer.bsky.social

Ensemble Everything Everywhere is a defense against adversarial examples that people got quite exited about a few months ago (in particular, the defense causes "perceptually aligned" gradients just like adversarial training)

Unfortunately, we show it's not robust...

arxiv.org/abs/2411.14834

Gradient Masking All-at-Once: Ensemble Everything Everywhere Is Not Robust

Ensemble everything everywhere is a defense to adversarial examples that was recently proposed to make image classifiers robust. This defense works by ensembling a model's intermediate representations...

arxiv.org

November 25, 2024 at 8:38 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news