We reproduced Tau-bench to see for ourselves. Here's what we learned:
medium.com/alan/benchma...
We reproduced Tau-bench to see for ourselves. Here's what we learned:
medium.com/alan/benchma...
Real agents need to handle database state, tool calling, and multi-turn conversations. Stateful benchmarks show the path forward.
New post on agent evaluation 👇
Real agents need to handle database state, tool calling, and multi-turn conversations. Stateful benchmarks show the path forward.
New post on agent evaluation 👇
medium.com/alan/our-iso...
medium.com/alan/our-iso...
Our Claim Agent investigates dynamically - just like human agents, but faster. Now automating 30% of tickets it receives.
Our Claim Agent investigates dynamically - just like human agents, but faster. Now automating 30% of tickets it receives.
Our new blog post reveals how Devbox transformed our dev experience, slashed onboarding time, and created consistent environments across our entire team.
Check it out: medium.com/alan/from-ch...
Our new blog post reveals how Devbox transformed our dev experience, slashed onboarding time, and created consistent environments across our entire team.
Check it out: medium.com/alan/from-ch...
Learn about our journey:
Learn about our journey: