Hyunwoo Kim
@hyunwoo-kim.bsky.social
630 followers 450 following 15 posts
Social Reasoning/Cognition + AI, Postdoc at NVIDIA | Previously @ai2.bsky.social | PhD from Seoul Natl Univ. http://hyunwookim.com
Posts Media Videos Starter Packs
Pinned
hyunwoo-kim.bsky.social
🚨New Paper! So o3-mini and R1 seem to excel on math & coding. But how good are they on other domains where verifiable rewards are not easily available, such as theory of mind (ToM)? Do they show similar behavioral patterns? 🤔 What if I told you it's...interesting, like the below?🧵
hyunwoo-kim.bsky.social
Verrrry loosely in the sense we weight the hypotheses, but sequential monte carlo does resampling based on the weights.
hyunwoo-kim.bsky.social
Our algorithm is designed for social reasoning rather than solving math problems! But the general idea of having multiple hypotheses during reasoning can be used! You can view this as a kind of stochastic beam search if applied to math or puzzle solving
hyunwoo-kim.bsky.social
This work was done with my amazing and fabulous collaborators 🌟
@melaniesclar.bsky.social, @xuanalogue.bsky.social, Lance Ying, @sydneylevine.bsky.social, Yang Liu, @joshtenenbaum.bsky.social, @yejinchoinka.bsky.social
😊 Couldn't have done it without them!
hyunwoo-kim.bsky.social
✨Our paper aims to spark new discussions around inference-time compute and reasoning in more broader domains, such as social reasoning. Check out our paper for more details on our ThoughtTracing and interesting results on recent reasoning models! arxiv.org/abs/2502.11881
Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models
Existing LLM reasoning methods have shown impressive capabilities across various tasks, such as solving math and coding problems. However, applying these methods to scenarios without ground-truth answ...
arxiv.org
hyunwoo-kim.bsky.social
Results show TT outperforms reasoning models with significantly fewer output tokens. Also, unlike math, we do not observe a substantially higher token usage for incorrect responses from reasoning models on ToM benchmarks; in some cases, the pattern is even reversed🤔 TT shows balanced token usage
hyunwoo-kim.bsky.social
We present ThoughtTracing💭, an inference-time reasoning algorithm for tracing mental states of specific agents. It's inspired from the sequential Monte Carlo algorithm and modeled after the Bayesian ToM framework, using LLMs to approximate probabilistic inference over agents’ evolving mental states
hyunwoo-kim.bsky.social
The false belief Qs in ToM benchmarks are known to be challenging, whereas true belief Qs are easier (see GPT-4o above). Interestingly, o3-mini scores near 100, but crashes on easy ones, resulting in lower scores than CoT overall. In contrast, our ThoughtTracing significantly improves GPT-4o📈
hyunwoo-kim.bsky.social
🚨New Paper! So o3-mini and R1 seem to excel on math & coding. But how good are they on other domains where verifiable rewards are not easily available, such as theory of mind (ToM)? Do they show similar behavioral patterns? 🤔 What if I told you it's...interesting, like the below?🧵
hyunwoo-kim.bsky.social
Can you please add me too? Thanks!
hyunwoo-kim.bsky.social
You could follow me haha
hyunwoo-kim.bsky.social
Thank youuu!! I'm already in the list! 💙
hyunwoo-kim.bsky.social
Hello there! Can you please add me too? Thanks!
hyunwoo-kim.bsky.social
Oh wow, looks like you’re allowed to do free speech, but you won’t get any audience 😅
hyunwoo-kim.bsky.social
Hey Natalie, can you please add me to this pack?? Thank you!