Lightnews — Scholar-powered news

Sayash Kapoor

@sayash.bsky.social

This is the specific use case I have in mind (Operator shouldn't be the *only* thing developers use, but rather that it can be a helpful addition to a suite of tools): x.com/random_walke...

x.com

February 3, 2025 at 6:12 PM

Sayash Kapoor

@sayash.bsky.social

It is also better for end users. As
@randomwalker.bsky.social and I have argued, focusing on products (rather than just models) means companies must understand user demand and build tools people want. It leads to more applications that people can productively use: www.aisnakeoil.com/p/ai-compani...

AI companies are pivoting from creating gods to building products. Good.

Turning models into products runs into five challenges

www.aisnakeoil.com

February 3, 2025 at 6:10 PM

Sayash Kapoor

@sayash.bsky.social

Finally, the new product launches from OpenAI (Operator, Search, Computer use, Deep research) show that it doesn't just want to be in the business of creating more powerful AI — it also wants a piece of the product pie. This is a smart move as models become commoditized.

February 3, 2025 at 6:10 PM

Sayash Kapoor

@sayash.bsky.social

This also highlights the need for agent interoperability: who would want to teach a new agent 100s of tasks from scratch? If web agents become widespread, preventing agent lock-in will be crucial.

(I'm working on fleshing out this argument with
@sethlazar.org + Noam Kolt)

February 3, 2025 at 6:10 PM

Sayash Kapoor

@sayash.bsky.social

Seen this way, Operator is a *tool* to easily create new web automation using natural language.

It could expand the web automation that businesses already use, making it easier to create new ones.

So it is quite surprising that Operator isn't available on ChatGPT Teams yet.

February 3, 2025 at 6:09 PM

Sayash Kapoor

@sayash.bsky.social

Instead of thinking of Operator as a "universal assistant" that completes all tasks, it is better to think of it as a task template tool that automates specific tasks (for now).

Once a human has overseen a task a few times, we can estimate Operator's ability to automate it.

OpenAI allows you to delegate daily tasks to Operator

February 3, 2025 at 6:09 PM

Sayash Kapoor

@sayash.bsky.social

OpenAI also allows you to "Save" tasks you completed using Operator. Once you complete a task and provide feedback to complete it successfully, you don't need to repeat it the next time.

I can imagine this becoming powerful (though it's not very detailed right now).

screenshot of the save task template for Operator

February 3, 2025 at 6:09 PM

Sayash Kapoor

@sayash.bsky.social

3) In many cases, the challenge isn't Operator's ability to complete a task, it is eliciting human preferences. Chatbots aren't a great form factor for that.

But there are many tasks where reliability isn't important. This is where today's agents shine. For example: x.com/random_walke...

x.com

February 3, 2025 at 6:08 PM

Sayash Kapoor

@sayash.bsky.social

Could more training data lead to automation without human oversight? Not quite:

1) Prompt injection remains a pitfall for web agents. Anyone who sends you an email can control your agent.
2) Low reliability means agents fail on edge cases

February 3, 2025 at 6:08 PM

Sayash Kapoor

@sayash.bsky.social

But being able to see agent actions and give feedback with a human in the loop converts Operator from an unreliable agent, like the Humane Pin or Rabbit R1, to a workable but imperfect product.

Operator is as much as UX advance as it is a tech advance.

February 3, 2025 at 6:08 PM

Sayash Kapoor

@sayash.bsky.social

In the end, Operator struggled to file my expense reports even after an hour of trying and prompting. Then I took over, and my reports were filed 5 minutes later.

This is the bind for web agents today: not reliable enough to be automatable, not quick enough to save time.

February 3, 2025 at 6:08 PM

Sayash Kapoor

@sayash.bsky.social

OpenAI also trained Operator to ask the user for feedback before taking consequential actions, though I am not sure how robust this is — a simple instruction to avoid asking the user changed its behavior, and I can easily imagine this being exploited by prompt injection attacks.

February 3, 2025 at 6:07 PM

Sayash Kapoor

@sayash.bsky.social

But things went south quickly. It couldn't match the receipts to the amounts. Even after prompts directing it to missing receipts, it couldn't download them. It almost deleted previous receipts from other expenses!

February 3, 2025 at 6:07 PM

Sayash Kapoor

@sayash.bsky.social

It navigated to the correct URLs, asked me to log into my OpenAI and Concur accounts. Once in my accounts, it downloaded receipts from the correct URL, and even started uploading the receipts under the right headings!

screenshot of concur with the categories for the expense filled in

February 3, 2025 at 6:07 PM

Sayash Kapoor

@sayash.bsky.social

I asked Operator to file reports for my OpenAI and Anthropic API expenses for the last month. This is a task I do manually each month, so I knew exactly what it would need to do. To my surprise, Operator got the first few steps exactly right:

screenshot of a conversation with Operator

February 3, 2025 at 6:06 PM

Sayash Kapoor

@sayash.bsky.social

OpenAI's Operator is a web agent that can solve arbitrary tasks on the internet *with human supervision*. It runs on a virtual machine (*not* your computer). Users can see what the agent is doing on the browser in real-time. It is available to ChatGPT Pro subscribers.

screenshot of Operator writing "Hello World" in an online notepad.

February 3, 2025 at 6:05 PM

Sayash Kapoor

@sayash.bsky.social

Grateful to @katygb.bsky.social for feedback on the draft. Read the full essay (w/@randomwalker.bsky.social): www.aisnakeoil.com/p/we-looked-...

We Looked at 78 Election Deepfakes. Political Misinformation is not an AI Problem.

Technology Isn’t the Problem—or the Solution.

www.aisnakeoil.com

December 16, 2024 at 3:11 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news