Timo Kaufmann
timokauf.bsky.social
Timo Kaufmann
@timokauf.bsky.social
PhD Student at LMU Munich. Focus on RL, reward learning and learning from human preferences.
Joint work with Yannick Metz, Daniel Keim, and Eyke Hüllermeier.
December 4, 2025 at 9:06 PM
Benefits: improved reward model generalization, better data efficiency, and stronger policies. Looking forward to seeing you at the poster!

Paper and more: timokaufmann.com/responserank/
December 4, 2025 at 9:05 PM
The key insight is that these signals only need to be locally valid and relative (e.g., within one annotator's comparisons). No need to model the exact relationship to strength. Just rank which comparisons are stronger.
December 4, 2025 at 9:04 PM
The core idea: Not all preferences are equal. ResponseRank learns preference strength from implicit signals in your data, like inter-annotator agreement, stated confidence, or response times.
December 4, 2025 at 9:04 PM