@lucaschubu.bsky.social
18 followers 15 following 7 posts
Posts Media Videos Starter Packs
Reposted
marcelbinz.bsky.social
Excited to see our Centaur project out in @nature.com.
TL;DR: Centaur is a computational model that predicts and simulates human behavior for any experiment described in natural language.
lucaschubu.bsky.social
Excited to say our paper got accepted to ICML! We added new findings including this: models fine-tuned on a visual counterfactual reasoning task do not generalize to the underlying factual physical reasoning task, even with test images matched to the fine-tuning data set.
lucaschubu.bsky.social
Finally, we fine-tuning a model on human responses for the synthetic intuitive physics dataset. We find that this model not only shows a higher agreement with human observers, but that it also generalizes better to the real block towers.
lucaschubu.bsky.social
Models fine-tuned on intuitive physics also do not robustly generalize to an almost identical but visually different dataset (Lerer columns below). They are fine-tuned on synthetic block towers, while the dataset by Lerer et al. features pictures of real block towers.
lucaschubu.bsky.social
We fine-tuned models on tasks from intuitive physics and causal reasoning. Models fine-tuned on intuitive physics (first two rows) do not perform well on causal reasoning and vice versa. Models fine-tuned on both perform well in either domain, showing models can learn both.
lucaschubu.bsky.social
In previous work we found that VLMs fall short of human visual cognition. To make them better, we fine-tuned them on visual cognition tasks. We find that while this improves performance on the fine-tuning task, it does not lead to models that generalize to other related tasks: