Matt Groh
@mattgroh.bsky.social
1K followers 140 following 43 posts
Assistant professor at Northwestern Kellogg | human AI collaboration | computational social science | affective computing
Posts Media Videos Starter Packs
Pinned
mattgroh.bsky.social
When are LLMs-as-judge reliable?

That's a big question for frontier labs and it's a big question for computational social science.

Excited to share our findings (led by @aakriti1kumar.bsky.social!) on how to address this question for any subjective task & specifically for empathic communications
aakriti1kumar.bsky.social
How do we reliably judge if AI companions are performing well on subjective, context-dependent, and deeply human tasks? 🤖

Excited to share the first paper from my postdoc (!!) investigating when LLMs are reliable judges - with empathic communication as a case study 🧐

🧵👇
mattgroh.bsky.social
Happening today!
nicoatnu.bsky.social
People handle the everyday physical world with remarkable ease, but how do we do it? This Wednesday, NICO is thrilled to host @tomerullman.bsky.social, to discuss: "Good Enough: Approximations in Mental Simulation and Intuitive Physics".

🗓️ Wed 10/8 at 12pm US Central
🔗 bit.ly/WedatNICO
An event flyer for the Wednesdays@NICO Seminar Series, on October 8, 2025, featuring Tomer Ullman of Harvard University.
mattgroh.bsky.social
On my way to @ic2s2.bsky.social in Norrköping!! Super excited to share this year’s projects in the HAIC lab revealing how (M)LLMs can offer insights into human behavior & cognition

More at human-ai-collaboration-lab.kellogg.northwestern.edu/ic2s2

See you there!

#IC2S2
mattgroh.bsky.social
Thanks! I imagine we'd see similar results in the Novelty Challenge that when experts are reliable we can fine-tune LLMs to be reliable but experts may only be reliable in some disciplines/settings and less reliable in others.

Very cool challenge!!
mattgroh.bsky.social
When are LLMs-as-judge reliable?

That's a big question for frontier labs and it's a big question for computational social science.

Excited to share our findings (led by @aakriti1kumar.bsky.social!) on how to address this question for any subjective task & specifically for empathic communications
aakriti1kumar.bsky.social
How do we reliably judge if AI companions are performing well on subjective, context-dependent, and deeply human tasks? 🤖

Excited to share the first paper from my postdoc (!!) investigating when LLMs are reliable judges - with empathic communication as a case study 🧐

🧵👇
mattgroh.bsky.social
Thank you for sharing your brilliance, quirks, and wisdom. I started reading your work after coming across your Aeon article on Awe many years ago, and I feel inspired everytime I read what you write.
mattgroh.bsky.social
This taxonomy offers a shared language (and see our how to guide on arXiv for many examples) to help people better communicate what looks or feels off.

It's also a framework that can generalize to multimedia.

Consider this, what do you notice at the 16s mark about her legs?
mattgroh.bsky.social
Based on generating thousands of images, reading the AI-generated images and digital forensics literatures (and social media and journalistic commentary), analyzing 30k+ participant comments, we propose a taxonomy for characterizing diffusion model artifacts in images
mattgroh.bsky.social
Scene complexity, artifact types, display time, and human curation of AI-generated images all play significant roles in how accurately people distinguish real and AI-generated images.
mattgroh.bsky.social
We examine photorealism in generative AI by measuring people's accuracy at distinguishing 450 AI-generated and 150 real images

Photorealism varies from image to image and person to person

83% of AI-generated images are identified as AI better than random chance would predict
mattgroh.bsky.social
💡New paper at #CHI2025 💡

Large scale experiment with 750k obs addressing

(1) How photorealistic are today's AI-generated images?

(2) What features of images influence people's ability to distinguish real/fake?

(3) How should we categorize artifacts?
mattgroh.bsky.social
Agreed with your observation that disciplinary perspectives can be too narrow minded on this problem and forget the big picture of both sides

www.nature.com/articles/s41... does a really nice job systematically reviewing the Human-AI collaboration literature across a bunch of different domains
When combinations of humans and AI are useful: A systematic review and meta-analysis - Nature Human Behaviour
Vaccaro et al. present a systematic review and meta-analysis of the performance of human–AI combinations, finding that on average, human–AI combinations performed significantly worse than the best of ...
www.nature.com
mattgroh.bsky.social
At a high level, it depends on:

- human expertise
- human understanding for what the AI system is capable of
- quality of AI explanations
- task-specific potential for cognitive biases and satisficing constraints to influence humans
- instance-specific potential for OOD data to influence AI
mattgroh.bsky.social
📣 📣 Postdoc Opportunity at Northwestern

Dashun Wang and I are seeking a creative, technical, interdisciplinary researcher for a joint postdoc fellowship between our labs.

If you're passionate about Human-AI Collaboration and Science of Science, this may be for you! 🚀

Please share widely!
mattgroh.bsky.social
You're welcome!! Def makes makers who move between both worlds feel very seen
mattgroh.bsky.social
Impressive on the 20 minute bits approach!

I definitely need 4 hour windows for productive, creative work.

Paul Graham's essay on the Maker/Manager schedule (paulgraham.com/makersschedu...) offers some tips for how to create schedules that address roles where one is both a Maker and Manager
Maker's Schedule, Manager's Schedule
paulgraham.com
mattgroh.bsky.social
V2 of the Human and Machine Intelligence 😊🤖🧠 is in the books!

So many fantastic discussions as we witnessed the frontier of AI shift even further into hyperdrive✨

Props to students for all the hard work and big thanks to teaching assistants and guest speakers 🙏
mattgroh.bsky.social
and present evidence that perception is more than simply transforming light into representations of objects and their features, perception also automatically extracts relations between objects!
mattgroh.bsky.social
What is perception? What do we really see when we look at the world?

And, why does the amodal completion illusion lead us to see a super long reindeer in the image on the right?

This week @chazfirestone.bsky.social joined the NU CogSci seminar series to address these fundamental questions
mattgroh.bsky.social
2024 marks the official launch of the Human-AI Collaboration Lab, so I wrote a one page letter to introduce the lab, share highlights, and begin a lab tradition of reflecting on the year and sharing what we're working on in an easy to digest annual letter to share with friends and colleagues.