Kevin Schaul
@kevinschaul.bsky.social
4.2K followers 260 following 110 posts
hacker/journalist covering AI for the washington post, lives in chicago, wants to see your data visualizations and hear about your open source projects https://kschaul.com
Posts Media Videos Starter Packs
kevinschaul.bsky.social
That's it for last week. What are you testing?
kevinschaul.bsky.social
Made a bunch of Sora 2 videos. Mindblowing, terrifying stuff. Here's Sam Altman brewing a fresh pot of AI slop. Lmk if you need an invite code, I have a few left.
kevinschaul.bsky.social
Asked claude code to evaluate whether I should migrate my llm evals site to Eleuther’s eval harness. It churned a bit and suggested “inspect ai” instead since I want to include tool usage evals too. Seems like the right call. ✔️
kevinschaul.bsky.social
Tried the same task with Gemini for Chrome. Total failure -- made up a bunch of dates. Seemed like it didn’t have access to click through different links? Not sure what went wrong. ✖️
Screenshot of Gemini
kevinschaul.bsky.social
Needed to know when something was removed from a website. I asked ChatGPT “agent mode” to visit the url on wayback machine and figure it out. Five minutes later, it gave me the answer. Save me a ton of tedious clicking, and easily verifiable, too. ✔️
Screenshot of ChatGPT
kevinschaul.bsky.social
I keep a running log of how AI did on real tasks. This week's notes 🧵
Reposted by Kevin Schaul
blockclubchi.bsky.social
The Block Club Chicago newsroom continues to report on today’s federal immigration activity across Chicago.

If you have footage of agent activity you filmed, please reach out via this website:
blockclubchi.co/3IIYou4
Reposted by Kevin Schaul
blockclubchi.bsky.social
Federal agents handcuffed Humboldt Park Ald. Jessie Fuentes as she confronted them Friday at Humboldt Park Hospital.

Agents handcuffed her, forced her out of the hospital and threatened her with arrest, Fuentes told a Block Club reporter. blockclubchi.co/3Wkje5V
Reposted by Kevin Schaul
clancyny.bsky.social
Logan Square neighborhood, Chicago. Man wearing a balaclava in a white, unmarked vehicle, pulls the pin on a tear gas canister and tosses it in the road.

"Just trying to grab some lunch and these fucking losers showed up. FUCK ICE!"
source: www.reddit.com/r/LoganSquar...
Reposted by Kevin Schaul
reichlinmelnick.bsky.social
Surreal moment for America. Needless to say, if the normal police ever pulled something like this — pulling every single person out of an apartment building and handcuffing them to run warrant checks — they would be sued into oblivion.

Yet ICE is going to get away with it entirely.
"It was scary, because I had never had a gun in my face," Fisher said. "They asked my name and my date of birth and asked me, did I have any warrants? And I told them, 'No, 'Ididn't."
Fisher said she was handcuffed before being released around 3 a.m., and she was told that if anyone had any kind of warrant out for them, even if it was unrelated to immigration, they would not be released.
kevinschaul.bsky.social
We're about to get flooded with deepfakes.

Here's an AI-generated clip of me "asking" Sam Altman what they train their systems on (made in 10 seconds with Sora 2)
kevinschaul.bsky.social
With Sora 2 (launched yesterday), you don't even need the trick. Just type "spongebob explains government shutdown" and it'll happily make it, no questions asked
Reposted by Kevin Schaul
drewharwell.com
OpenAI employees are very excited about how well their new AI tool can create fake videos of people doing crimes and have definitely thought through all the implications of this
kevinschaul.bsky.social
Just ran some evals on Claude Sonnet 4.5. It's better than 4 on some but worse on a lot. LLM progress is so weird. You really gotta test this stuff on what you care about.
kevinschaul.bsky.social
Worth a read: OpenAI released an eval for real work tasks across a bunch of industries. They didn't release the individual results (lame), but you can replicate them from the prompts and files.

A usable nugget: If you're outputting pdfs, xlsx or pptx, use Claude.

https://openai.com/index/gdpval/
Chart showing Claude with highest winrate for non-text file extensions
kevinschaul.bsky.social
My favorite nugget is when I tried to make SpongeBob but kept hitting the policy filter.

Thought for a bit, tried "robert the sponge" -- it worked.
AI-generated videos of a SpongeBob-ish character
kevinschaul.bsky.social
New by me: OpenAI won’t say whose content trained its video tool. We found some clues.

Gift link: wapo.st/3KeqLR0
kevinschaul.bsky.social
Google's blog post on launching Gemini in Chrome does not include the word "privacy" or "security." Am I missing something or are they not addressing the very real threat of prompt injections?

https://blog.google/products/chrome/new-ai-features-for-chrome/
Screenshot of agentic browsing assistant checking out with Instacart
kevinschaul.bsky.social
New lawsuit against Character AI is “third high-profile case to allege an AI chatbot contributed to a teen’s death by suicide” https://www.washingtonpost.com/technology/2025/09/16/character-ai-suicide-lawsuit-new-juliana/
Reposted by Kevin Schaul
nitasha.bsky.social
A new wrongful death lawsuit alleges that Character AI and Google are liable for the death by suicide of a 13-year-old girl in Colorado, Juliana Peralta

This is the third wrongful death claim against a popular AI app for a teen's death by suicide this year www.washingtonpost.com/technology/2...
A teen contemplating suicide turned to a chatbot. Is it liable for her death?
A lawsuit filed by the parents of 13-year-old Juliana Peralta against Character AI is the latest to allege a chatbot contributed to a teen’s death by suicide.
www.washingtonpost.com