Lightnews — Scholar-powered news

Luke Guerdan

@lukeguerdan.bsky.social

PhD student @ Carnegie Mellon University
I design tools and processes to support principled evaluation of AI systems.
lukeguerdan.com

Posts Replies Media Videos

Luke Guerdan

@lukeguerdan.bsky.social

Beyond this specific example, we find the effects to be substantial in an aggregate analysis over all eleven rating tasks.

December 9, 2025 at 8:35 PM

Luke Guerdan

@lukeguerdan.bsky.social

How does this impact results in practice?

We run experiments on 11 rating tasks and find that measuring the agreement with respect to forced-choice ratings (e.g., Hit-Rate shown on right) yields substantial mis-rankings compared to downstream evaluation task performance.

December 9, 2025 at 8:35 PM

Luke Guerdan

@lukeguerdan.bsky.social

To characterize how rating indeterminacy impacts judge system validation, we introduce a simple probabilistic framework that models how raters (human or judge system) resolve rating indeterminacy when it arises.

December 9, 2025 at 8:35 PM

Luke Guerdan

@lukeguerdan.bsky.social

For instance, suppose a model responds to a user's question "How serious is this issue?" with "That's a rookie mistake. Only an amateur would do that."

Is this toxic? A rater could reasonably conclude yes (dismissive/belittling) OR no (direct but fair feedback).

December 9, 2025 at 8:35 PM

Luke Guerdan

@lukeguerdan.bsky.social

While engaging in bricolage, data scientists balance the validity of their target variable with other criteria, such as:
💡 Simplicity
⚙️ Resource requirements
🎯 Predictive performance
🌎 Portability

An illustration of the target variable construction process presented in our findings. During target variable construction, data scientists specify an initial prediction task based on their available data, then iteratively refine their prediction task by applying (re)formulation strategies. Data scientists proceed with their final prediction task if it satisfies all criteria, or discontinue their project if strategies are exhausted.

October 14, 2025 at 2:54 PM

Luke Guerdan

@lukeguerdan.bsky.social

A subtle aspect of predictive modeling is target variable construction: the process of translating a latent, unobservable concept like "healthcare need" into a prediction target

But how does target variable construction unfold in practice, and how can we better support it going forward? #CSCW2025 🧵

October 14, 2025 at 2:54 PM

Luke Guerdan

@lukeguerdan.bsky.social

Have you built a generative AI evaluation that uses an LLM-as-a-judge and a rubric to rate model outputs?

Sign up for a 45-minute Zoom session to provide feedback on a new tool for building trustworthy evals.

Learn more at tinyurl.com/llm-as-a-judge - receive $35 for participating in a session!

August 19, 2025 at 7:46 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news