Alan engineering
banner
alanengineering.bsky.social
Alan engineering
@alanengineering.bsky.social
30 followers 1 following 8 posts
All things engineering @avec_alan https://medium.com/alan/tagged/engineering
Posts Media Videos Starter Packs
There are many LLM benchmarks such as MMLU and GSM8k, but they're useless for AI agents.
Real agents need to handle database state, tool calling, and multi-turn conversations. Stateful benchmarks show the path forward.

New post on agent evaluation 👇
Benchmarking AI Agents: The Challenge of Real-World Evaluation
AI agents need stateful benchmarks. Unlike LLMs, agents interact with databases and users. We explore why and how to evaluate them…
medium.com
You've heard of the recent ISO27001:2022 certification of Alan by SGS, but want to know more about our journey towards certification? Head up to Maxime's post and enjoy the read!
medium.com/alan/our-iso...
Our ISO 27001 journey: From security blueprint to certification success
Hey 👋 I’m Maxime, the ISMS lead at Alan, and I’d like to tell you about our ISO journey 🗺️
medium.com
Static chatbots couldn't handle complex support tickets about insurance claims. So we built something different with tool calls and the ReAct framework.

Our Claim Agent investigates dynamically - just like human agents, but faster. Now automating 30% of tickets it receives.
🛠️ How we tamed the "works on my machine" chaos at Alan Engineering!

Our new blog post reveals how Devbox transformed our dev experience, slashed onboarding time, and created consistent environments across our entire team.

Check it out: medium.com/alan/from-ch...
From Chaos to Consistency: How Alan Transformed Developer Experience with Devbox
medium.com
In late January, DeepSeek shocked the world by dropping an open-weight successor to OpenAI's o1: R1. Their tech report discusses how to incentivize reasoning capability in LLMs. We share our learnings at:
DeepSeek R1: Demystifying LLM’s Reasoning Capabilities
DeepSeek shocked the world by dropping an open-weight successor to OpenAI’s o1: R1. This post summarizes our learnings from the tech…
medium.com