Werner Geyer
@wernergeyer.bsky.social
160 followers 210 following 34 posts
Chief Scientist Human-Center Trustworthy AI @ IBM Research. Interested in Human+AI Interaction & AI-Assisted Productivity. Opinions are my own! https://wernergeyer.com
Posts Media Videos Starter Packs
Reposted by Werner Geyer
picardtips.bsky.social
Picard management tip: Even without game-changing results, experimentation is time well spent.
wernergeyer.bsky.social
5/ And we’re planning to bring several backend capabilities into the UI soon. Stay tuned 👀
wernergeyer.bsky.social
4/ ⚙️ Backend updates
• Independent Judges module (no UI - see: github.com/IBM/eval-ass...)
• Unified Judge API
• Extensible: supports Unitxt, M-Prometheus & more
• Self-consistency: run judges multiple times
• In-context examples
• Multi-criteria evals w/roll-ups
• Custom prompts supported
eval-assist/backend/src/evalassist/judges at main · IBM/eval-assist
EvalAssist is an open-source project that simplifies using large language models as evaluators (LLM-as-a-Judge) of the output of other large language models by supporting users in iteratively refin...
github.com
wernergeyer.bsky.social
3/ 🖥️ UI updates
• Export & import test data (CSV)
• More benchmarks: JudgeBench & BigGen, grouped by capabilities
• 50+ Unitxt () criteria via Unixt (www.unitxt.ai) catalog integration
• Export/import test cases in JSON
• Model provider connections can be tested before evals
www.unitxt.ai
wernergeyer.bsky.social
2/ 📄 Paper @acmuist.bsky.social : EvalAssist: Insights on Task-Specific Evaluations and AI-Assisted Judgment Strategy Preferences
By @dohyojin.bsky.social - presenting Wed 9:00–10:30 in “Managing Tasks.” session
👉 arxiv.org/pdf/2410.00873
arxiv.org
wernergeyer.bsky.social
1/ EvalAssist makes it easier to test, refine & share evaluation criteria for LLMs. ibm.github.io/eval-assist/
We’ve added powerful new features on both the UI and backend, plus we’ll be at UIST next week presenting our paper on task-specific evaluations & AI-assisted judgment strategies.
EvalAssist
EvalAssist simplifies LLM-as-a-Judge by supporting users in iteratively refining evaluation criteria in a web-based user experience.
ibm.github.io
wernergeyer.bsky.social
🚀 Excited to share some updates from EvalAssist, the open-source LLM-as-a-Judge framework we released a few months ago! 🧵
wernergeyer.bsky.social
We've just extended the IUI Workshop deadline by one week to August 29.

Looking forward to your contributions!
acm-iui.bsky.social
📢 Call for Workshop & Tutorial Proposals 📢
Bring your ideas and discuss them with fellow researchers in Paphos, Cyprus, from March 22-26, 2026.

iui.hosting.acm.org/2026/call-fo...

#CallForProposals #IUI2026 #HCI #AI
wernergeyer.bsky.social
Getting ready! Come visit us at the IBM booth @acl to learn about our latest Research. We have a number of super interesting demos lined up. research.ibm.com/events/acl-2...
Reposted by Werner Geyer
chiwork.bsky.social
We’re growing and going global! 🌍

CHIWORK 2025 is shaping up to be our biggest and most diverse edition yet. Thanks to everyone who submitted, reviewed, and supported us 💙

Can’t wait to see you in Amsterdam!

🔗 chiwork.org

#CHIWORK2025 #HCI #FutureOfWork
Reposted by Werner Geyer
acm-iui.bsky.social
📢 Call for Workshop & Tutorial Proposals 📢
Bring your ideas and discuss them with fellow researchers in Paphos, Cyprus, from March 22-26, 2026.

iui.hosting.acm.org/2026/call-fo...

#CallForProposals #IUI2026 #HCI #AI
Reposted by Werner Geyer
pkahr.bsky.social
📣 Call for Workshop & Tutorial Proposals 📣 #IUI2026 is looking forward to your contribution! Bring your ideas and discuss them with fellow researchers in Paphos, Cyprus, from March 22-26, 2026. 🚨 Proposal Deadlines: Aug 22 (Workshops) and Oct 17 (Tutorials)🚨 iui.hosting.acm.org/2026/call-fo...
Call for Workshop & Tutorial Proposals | IUI
iui.hosting.acm.org
wernergeyer.bsky.social
📣 IUI 2026 Call for Workshops and Tutorials is live 📣

iui.acm.org/2026/call-fo...

Note that this year, submissions will be due August 22 earlier than previous years. Pls. spread the word! We had a fantastic workshop program in 2025 and I'm looking forward to an even better one in 2026 in Cyprus.
Call for Workshop & Tutorial Proposals | IUI
iui.acm.org
wernergeyer.bsky.social
We just published a summary the 6th workshop on Human-AI Co-Creation with Generative Models at IUI 2025 in March. This year's special topic, of course, AI agents and agency. Two of our sessions covered this topic and we had an exciting panel discussion. Check it out! medium.com/human-center...
HAI-GEN 2025: 6th Workshop on Human-AI Co-Creation with Generative Models
by Osnat Mokryn (University of Haifa, IL), Orit Shaer (Wellesley College, US), Werner Geyer (IBM Research, US), Mary Lou Maher (Computing…
medium.com
Reposted by Werner Geyer
krvarshney.bsky.social
A summary of decolonial AI alignment in the Human-Centered AI publication on Medium. Thanks to @jweisz3.bsky.social for asking me to write it, and for editing the piece. medium.com/human-center...
Decolonial AI Alignment
by Kush Varshney (IBM Research, US)
medium.com
Reposted by Werner Geyer
krvarshney.bsky.social
I'm on the IBM Mixture of Experts podcast wearing a safety vest. We talk about all the new things in AI this week. I also connect to older work by IBM Fellows Irene Greif, Bob Dennard, Rolf Landauer, and Charlie Bennett and to Mauro Martino's new AI-generated film. www.youtube.com/watch?v=CgqH...
DeepSeek-V3-0324, Gemini Canvas and GPT-4o image generation
YouTube video by IBM Technology
www.youtube.com
wernergeyer.bsky.social
And the final product 😋
wernergeyer.bsky.social
Asparagus time in Germany. This is an automated peeling machine. No AI 😀
wernergeyer.bsky.social
All set up for demo time at IUI. We are showing a tool for GenAU-assisted hypotheses exploration. dl.acm.org/doi/10.1145/...
wernergeyer.bsky.social
IBM Research on their way to IUI