EvalEval Coalition
@eval-eval.bsky.social
40 followers 8 following 7 posts
We are a researcher community developing scientifically grounded research outputs and robust deployment infrastructure for broader impact evaluations. https://evalevalai.com/
Posts Media Videos Starter Packs
eval-eval.bsky.social
🚨New blog: The AI Evaluation Chart Crisis 📝

From misleading bar heights to missing error bars, recent model launches have sparked debate on AI evals. In our new blogpost, we dig into what’s broken, why it matters and how they should be presented 👇

evalevalai.com/documentatio...
The AI Evaluation Chart Crisis
Charts used to showcase performance demonstrate broader issues in the AI evaluation ecosystem: a lack of balance between competitive benchmarking and statistical rigor.
evalevalai.com
eval-eval.bsky.social
This kickoff post lays out: 1) 🔍 Why we need a science of evaluation; 2) 🤝 Our goals for the community; 3) 🛠️ How you can get involved (2/2)

Interested in joining? Check out evalevalai.com
EvalEval Coalition
We are a researcher community developing scientifically grounded research outputs and robust deployment infrastructure for broader impact evaluations.
evalevalai.com
eval-eval.bsky.social
🚨 AI Evals Crisis: Officially kicking off the Eval Science Workstream 🚨

We’re building a shared scientific foundation for evaluating AI systems, one that’s rigorous, open, and grounded in real-world & cross-disciplinary best practices👇 (1/2)

Read our new blog post: tinyurl.com/evalevalai
The Science of Evaluations: Workstream Kickoff Post
Announcing the launch of a research-driven initiative among a community of researchers to strengthen the science of AI evaluations.
tinyurl.com
eval-eval.bsky.social
Join us for the Eval Eval Coalition Social at @facct.bsky.social tomorrow Tuesday June 24th from 4-4:30 pm during the coffee break! We would love to have you join us and we look forward to seeing you there!! #FAccT2025 #EvalEval
eval-eval.bsky.social
Our coalition is focused on producing scientifically grounded research outputs, robust deployment infrastructure for broader impact evaluations, and fostering a community of researchers passionate about developing better evaluations 🌎🌍🌏 (2/3)
eval-eval.bsky.social
Introducing the Eval Eval Coalition! ✨
We are a community of researchers dedicated to designing, developing, and deploying better evaluations (1/3)