Lightnews — Scholar-powered news

Hyunwoo Kim

@hyunwoo-kim.bsky.social

630 followers 450 following 15 posts

Social Reasoning/Cognition + AI, Postdoc at NVIDIA | Previously @ai2.bsky.social | PhD from Seoul Natl Univ. http://hyunwookim.com

hyunwookim.com

Posts Media Videos Starter Packs

Pinned

Hyunwoo Kim @hyunwoo-kim.bsky.social · Feb 20

🚨New Paper! So o3-mini and R1 seem to excel on math & coding. But how good are they on other domains where verifiable rewards are not easily available, such as theory of mind (ToM)? Do they show similar behavioral patterns? 🤔 What if I told you it's...interesting, like the below?🧵

3 5 22

Hyunwoo Kim @hyunwoo-kim.bsky.social · Feb 23

Verrrry loosely in the sense we weight the hypotheses, but sequential monte carlo does resampling based on the weights.

Hyunwoo Kim @hyunwoo-kim.bsky.social · Feb 22

Our algorithm is designed for social reasoning rather than solving math problems! But the general idea of having multiple hypotheses during reasoning can be used! You can view this as a kind of stochastic beam search if applied to math or puzzle solving

Hyunwoo Kim @hyunwoo-kim.bsky.social · Feb 20

This work was done with my amazing and fabulous collaborators 🌟
@melaniesclar.bsky.social, @xuanalogue.bsky.social, Lance Ying, @sydneylevine.bsky.social, Yang Liu, @joshtenenbaum.bsky.social, @yejinchoinka.bsky.social
😊 Couldn't have done it without them!

Hyunwoo Kim @hyunwoo-kim.bsky.social · Feb 20

✨Our paper aims to spark new discussions around inference-time compute and reasoning in more broader domains, such as social reasoning. Check out our paper for more details on our ThoughtTracing and interesting results on recent reasoning models! arxiv.org/abs/2502.11881

Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models

Existing LLM reasoning methods have shown impressive capabilities across various tasks, such as solving math and coding problems. However, applying these methods to scenarios without ground-truth answ...

arxiv.org

1 2

Hyunwoo Kim @hyunwoo-kim.bsky.social · Feb 20

Results show TT outperforms reasoning models with significantly fewer output tokens. Also, unlike math, we do not observe a substantially higher token usage for incorrect responses from reasoning models on ToM benchmarks; in some cases, the pattern is even reversed🤔 TT shows balanced token usage

1 1

Hyunwoo Kim @hyunwoo-kim.bsky.social · Feb 20

We present ThoughtTracing💭, an inference-time reasoning algorithm for tracing mental states of specific agents. It's inspired from the sequential Monte Carlo algorithm and modeled after the Bayesian ToM framework, using LLMs to approximate probabilistic inference over agents’ evolving mental states

1 1 5

Hyunwoo Kim @hyunwoo-kim.bsky.social · Feb 20

The false belief Qs in ToM benchmarks are known to be challenging, whereas true belief Qs are easier (see GPT-4o above). Interestingly, o3-mini scores near 100, but crashes on easy ones, resulting in lower scores than CoT overall. In contrast, our ThoughtTracing significantly improves GPT-4o📈

1 1

Hyunwoo Kim @hyunwoo-kim.bsky.social · Feb 20

3 5 22

Hyunwoo Kim @hyunwoo-kim.bsky.social · Nov 21

Can you please add me too? Thanks!

1 7

Hyunwoo Kim @hyunwoo-kim.bsky.social · Nov 20

I will! Thanks!!

Hyunwoo Kim @hyunwoo-kim.bsky.social · Nov 19

You could follow me haha

1 6

Hyunwoo Kim @hyunwoo-kim.bsky.social · Nov 19

Thank youuu!! I'm already in the list! 💙

Hyunwoo Kim @hyunwoo-kim.bsky.social · Nov 19

Hello there! Can you please add me too? Thanks!

Hyunwoo Kim @hyunwoo-kim.bsky.social · Nov 19

Oh wow, looks like you’re allowed to do free speech, but you won’t get any audience 😅

Hyunwoo Kim @hyunwoo-kim.bsky.social · Nov 19

Hey Natalie, can you please add me to this pack?? Thank you!