Lightnews — Scholar-powered news

Reposted by Werner Geyer

Picard Tips @picardtips.bsky.social · 12d

Picard management tip: Even without game-changing results, experimentation is time well spent.

Werner Geyer @wernergeyer.bsky.social · 13d

6/ Try it out & explore more:
👉 GitHub: github.com/IBM/eval-ass...
👉 Demo: evalassist-evalassist.hf.space
👉 Project page: ibm.github.io/eval-assist/

Werner Geyer @wernergeyer.bsky.social · 13d

5/ And we’re planning to bring several backend capabilities into the UI soon. Stay tuned 👀

1

Werner Geyer @wernergeyer.bsky.social · 13d

4/ ⚙️ Backend updates
• Independent Judges module (no UI - see: github.com/IBM/eval-ass...)
• Unified Judge API
• Extensible: supports Unitxt, M-Prometheus & more
• Self-consistency: run judges multiple times
• In-context examples
• Multi-criteria evals w/roll-ups
• Custom prompts supported

eval-assist/backend/src/evalassist/judges at main · IBM/eval-assist

EvalAssist is an open-source project that simplifies using large language models as evaluators (LLM-as-a-Judge) of the output of other large language models by supporting users in iteratively refin...

github.com

1

Werner Geyer @wernergeyer.bsky.social · 13d

3/ 🖥️ UI updates
• Export & import test data (CSV)
• More benchmarks: JudgeBench & BigGen, grouped by capabilities
• 50+ Unitxt () criteria via Unixt (www.unitxt.ai) catalog integration
• Export/import test cases in JSON
• Model provider connections can be tested before evals

www.unitxt.ai

1

Werner Geyer @wernergeyer.bsky.social · 13d

2/ 📄 Paper @acmuist.bsky.social : EvalAssist: Insights on Task-Specific Evaluations and AI-Assisted Judgment Strategy Preferences
By @dohyojin.bsky.social - presenting Wed 9:00–10:30 in “Managing Tasks.” session
👉 arxiv.org/pdf/2410.00873

arxiv.org

1 1

Werner Geyer @wernergeyer.bsky.social · 13d

1/ EvalAssist makes it easier to test, refine & share evaluation criteria for LLMs. ibm.github.io/eval-assist/
We’ve added powerful new features on both the UI and backend, plus we’ll be at UIST next week presenting our paper on task-specific evaluations & AI-assisted judgment strategies.

EvalAssist

EvalAssist simplifies LLM-as-a-Judge by supporting users in iteratively refining evaluation criteria in a web-based user experience.

ibm.github.io

1

Werner Geyer @wernergeyer.bsky.social · 13d

🚀 Excited to share some updates from EvalAssist, the open-source LLM-as-a-Judge framework we released a few months ago! 🧵

1

Werner Geyer @wernergeyer.bsky.social · Aug 21

We've just extended the IUI Workshop deadline by one week to August 29.

Looking forward to your contributions!

ACM - Intelligent User Interfaces @acm-iui.bsky.social · Jun 9

📢 Call for Workshop & Tutorial Proposals 📢
Bring your ideas and discuss them with fellow researchers in Paphos, Cyprus, from March 22-26, 2026.

iui.hosting.acm.org/2026/call-fo...

#CallForProposals #IUI2026 #HCI #AI

2

Werner Geyer @wernergeyer.bsky.social · Jul 28

Getting ready! Come visit us at the IBM booth @acl to learn about our latest Research. We have a number of super interesting demos lined up. research.ibm.com/events/acl-2...

Reposted by Werner Geyer

CHIWORK @chiwork.bsky.social · Apr 4

We’re growing and going global! 🌍

CHIWORK 2025 is shaping up to be our biggest and most diverse edition yet. Thanks to everyone who submitted, reviewed, and supported us 💙

Can’t wait to see you in Amsterdam!

🔗 chiwork.org

#CHIWORK2025 #HCI #FutureOfWork

3 5

Reposted by Werner Geyer

ACM - Intelligent User Interfaces @acm-iui.bsky.social · Jun 9

📢 Call for Workshop & Tutorial Proposals 📢
Bring your ideas and discuss them with fellow researchers in Paphos, Cyprus, from March 22-26, 2026.

iui.hosting.acm.org/2026/call-fo...

#CallForProposals #IUI2026 #HCI #AI

1 2

Werner Geyer @wernergeyer.bsky.social · Jun 16

📣 Today we open-sourced EvalAssist, a web-based tool that makes it super easy to develop criteria for llm judges. You can run this now locally and then scale up with notebooks using Unitxt. Check out the AI Alliance article to get the scoop:
thealliance.ai/blog/llm-as-...

LLM-as-a-Judge Without the Headaches: EvalAssist Brings Structure and Simplicity to the Chaos of LLM Output Review | AI Alliance

Evaluating AI model outputs at scale is a major challenge for teams using LLMs, especially when assessing nuanced qualities like politeness, fairness, and tone that traditional benchmarks miss. IBM Re...

thealliance.ai

3 4

Reposted by Werner Geyer

Patricia Kahr @pkahr.bsky.social · Jun 5

📣 Call for Workshop & Tutorial Proposals 📣 #IUI2026 is looking forward to your contribution! Bring your ideas and discuss them with fellow researchers in Paphos, Cyprus, from March 22-26, 2026. 🚨 Proposal Deadlines: Aug 22 (Workshops) and Oct 17 (Tutorials)🚨 iui.hosting.acm.org/2026/call-fo...

Call for Workshop & Tutorial Proposals | IUI

iui.hosting.acm.org

1 2 2

Werner Geyer @wernergeyer.bsky.social · Jun 5

📣 IUI 2026 Call for Workshops and Tutorials is live 📣

iui.acm.org/2026/call-fo...

Note that this year, submissions will be due August 22 earlier than previous years. Pls. spread the word! We had a fantastic workshop program in 2025 and I'm looking forward to an even better one in 2026 in Cyprus.

Call for Workshop & Tutorial Proposals | IUI

iui.acm.org

2

Werner Geyer @wernergeyer.bsky.social · May 6

We just published a summary the 6th workshop on Human-AI Co-Creation with Generative Models at IUI 2025 in March. This year's special topic, of course, AI agents and agency. Two of our sessions covered this topic and we had an exciting panel discussion. Check it out! medium.com/human-center...

HAI-GEN 2025: 6th Workshop on Human-AI Co-Creation with Generative Models

by Osnat Mokryn (University of Haifa, IL), Orit Shaer (Wellesley College, US), Werner Geyer (IBM Research, US), Mary Lou Maher (Computing…

medium.com

1

Werner Geyer @wernergeyer.bsky.social · Apr 29

Great work from our team @ IBM Research

jweisz3.bsky.social @jweisz3.bsky.social · Apr 29

🤖 ✏️ There is a better way to explain how you used AI in your {research paper, college essay, blog posts, …}. Check out our new AI Attribution Toolkit and look for us at #CHI2025!

aiattribution.github.io
dl.acm.org/doi/full/10....

AI Attribution Toolkit

An attribution statement identifies not only the presence of AI involvement, but also how AI was used. This approach makes important distinctions between different types and amounts of AI…

aiattribution.github.io

1

Reposted by Werner Geyer

Kush Varshney कुश वार्ष्णेय @krvarshney.bsky.social · Apr 8

A summary of decolonial AI alignment in the Human-Centered AI publication on Medium. Thanks to @jweisz3.bsky.social for asking me to write it, and for editing the piece. medium.com/human-center...

Decolonial AI Alignment

by Kush Varshney (IBM Research, US)

medium.com

2 4

Reposted by Werner Geyer

Kush Varshney कुश वार्ष्णेय @krvarshney.bsky.social · Mar 28

I'm on the IBM Mixture of Experts podcast wearing a safety vest. We talk about all the new things in AI this week. I also connect to older work by IBM Fellows Irene Greif, Bob Dennard, Rolf Landauer, and Charlie Bennett and to Mauro Martino's new AI-generated film. www.youtube.com/watch?v=CgqH...