Hokin
@hokin.bsky.social
38 followers 66 following 53 posts
Philosopher, Scientist, Engineer https://hokindeng.github.io/
Posts Media Videos Starter Packs
hokin.bsky.social
I would love to review more papers. Could you send me some invites at times?
hokin.bsky.social
I totally agree with you!
Reposted by Hokin
tobigerstenberg.bsky.social
Josh Tenenbaum's inspiring keynote at #cogsci2025 on growing vs scaling AI, the big questions of cognitive science, and the many open questions for the field.
Reposted by Hokin
quiltydunn.bsky.social
this was a hilarious final slide that just hung up there during q&a
drbarner.bsky.social
Susan Carey mic-drop at #cogsci2025. "There are no innate concepts: Discuss"
Reposted by Hokin
drbarner.bsky.social
Susan Carey mic-drop at #cogsci2025. "There are no innate concepts: Discuss"
hokin.bsky.social
I had two Private Equity internships in freshman & sophomore summers and a BCG PTA internship in my junior year. But it's the same thing for people who went straight into a research lab since the first semester of their college.
hokin.bsky.social
💥 Huge thanks to my amazing collaborators @williamium3000.bsky.social, Kaia Gao, Icy Wang, Tianwei Zhao, Haoran Sun, Zoey Lyu, Robert Hawkins, Nuno Vasconcelos, @talgolanneuro.bsky.social & @carrot0817.bsky.social 💐
hokin.bsky.social
7) The result is astonishing 😱 No models are able to get both the manipulation tasks and the control tasks right at the same time, which seems to completely trivial to humans ...

This suggests MLLMs completely lack core knowledge 🦜 and purely rely on shortcuts 🤫 ... (11/n)
hokin.bsky.social
6) Last, we introduce "Concept Hacking" to reveal core knowledge deficiencies in the control experiment set-up.

Concept Hacking systematically manipulates the task-relevant features while preserving all task-irrelevant conditions ... (10/n)
hokin.bsky.social
5) Does Reasoning Help 🤔 We further compared the reasoning-augmented models and their corresponding instruction-tuned counterparts.

Scaling test-time compute doesn't seem to be a solution 😮‍💨 ... (9/n)
hokin.bsky.social
4) 🧐 Would Core Knowledge Emerge from Pure Scaling? ‼️‼️‼️Nope‼️‼️‼️

Regressions on performance of 230 models with diff. parameters & data sizes, we quantify the scaling effects.

The observation is very intriguing: "higher-level" abilities seem to be more "scalable" than "lower-level" abilities (8/n)
hokin.bsky.social
3) Performance on core cognition abilities serves as a reliable predictor for achieving top results on high-level benchmarks.

Concretely, we compute the correlation matrices of MLLMs abilities on our dataset and another 26 public benchmarks and 9 higher-level abilities defined by SEED-Bench (7/n)
hokin.bsky.social
2) The topology of internal cognitive representation of MLLMs is akin to human, suggesting "cognition" as a natural kind.

Like humans, physical vs. intention understanding are orthogonal, while tool-use and mechanics co-emerge.

🤔 "Artificial model organisms" for "cognition lesion study" ? 🤔 (6/n)
hokin.bsky.social
‼️ RESULTS ‼️ First, MLLMs exhibit a reversed developmental trajectory 📉 compared to humans 📈: they excel at "high-level" tasks that we learn later in life 🙀 but struggle with "basic" ones that we develop in infancy 👶

This observation is statistically significant (5/n) 📊
hokin.bsky.social
Living in the AI world, we sample from classic experiments built as physical setups or simulations & convert stimuli to 4 VQA formats (<img>, <vid>, <multi w/wo interleave>) for inference on 230 MLLMs. Evals use exact, in, template, LLM, and hybrid-match w/ filters to ensure method integrity (4/n)
hokin.bsky.social
For example, we have integrated Elizabeth Spelke and Renée Baillargeon's object permanence tasks, Susan Carey's core knowledge framework, Josh Tenenbaum and Michael McCloskey's intuitive physics tasks, Mary Hegarty's mechanical reasoning tasks, among many others ... (3/n)
hokin.bsky.social
Acknowledging Piaget's shortcomings and efforts of coming generations of brilliant scholars and researchers, our CoreCognition framework aims to be an integration of 100 years of human cognitive developmental research with a backbone of the Piagetian 4-stage theory of cognitive development ... (2/n)
hokin.bsky.social
#CoreCognition #LLM #multimodal #GrowAI We spent 3 years to curate 1503 classic experiments spanning 12 core concepts in human cognitive development and evaluated on 230 MLLMs with 11 different prompts for 5 times to get over 3.8 millions inference data points.

A thread (1/n) - #ICML2025
hokin.bsky.social
🤣🤣🤣
hokin.bsky.social
2 years of efforts and mighty @zoryzhang.bsky.social finally bring it to the finish-line ❗️

Also huge thanks for @tomerullman.bsky.social advices, very inspiring for us to design the control experiments and dimensions we looked at 💐 💐

🏄 Kudos to to awesome people at GrowAI @growai.bsky.social
hokin.bsky.social
What is the best testbed for human-like cognition in machines?🤔 Hardly anything comes better than gaze understanding.

Humans are extremely good at gaze reading 👁️ , and this ability develops extremely early in childhood.

Here, we tested 111 VLMs and 65 humans ⬇️
zoryzhang.bsky.social
👁️ 𝐂𝐚𝐧 𝐕𝐢𝐬𝐢𝐨𝐧 𝐋𝐚𝐧𝐠𝐮𝐚𝐠𝐞 𝐌𝐨𝐝𝐞𝐥𝐬 (𝐕𝐋𝐌𝐬) 𝐈𝐧𝐟𝐞𝐫 𝐇𝐮𝐦𝐚𝐧 𝐆𝐚𝐳𝐞 𝐃𝐢𝐫𝐞𝐜𝐭𝐢𝐨𝐧?
Knowing where someone looks is key to a Theory of Mind. We test 111 VLMs and 65 humans to compare their inferences.
Project page: grow-ai-like-a-child.github.io/gaze/
🧵1/11
Reposted by Hokin
lintonvision.bsky.social
Beautiful to see this initiative from a group of like minded PhD students collaborating together! 🚀
hokin.bsky.social
New Paper Alert ‼️ Current VLMs completely fail human gaze understanding 🙀 and scaling does NO help ‼️

However, humans, since an extremely age 🧒, are extremely sensitive to other people's gaze 🙄 👀

No mentors, no labs, only pre-doc students, 111 VLMs, and we did it 😎
hokin.bsky.social
Thank you for your very nice words, Paul!