Jorge Morales
@jorge-morales.bsky.social
3.8K followers 3.2K following 650 posts
I'm a philosopher, psychologist and neuroscientist studying vision, mental imagery, consciousness and introspection. As S.S. Stevens said "there are numerous pitfalls in this business." https://www.subjectivitylab.org
Posts Media Videos Starter Packs
Pinned
jorge-morales.bsky.social
Imagine an apple 🍎. Is your mental image more like a picture or more like a thought? In a new preprint led by Morgan McCarty—our lab's wonderful RA—we develop a new approach to this old cognitive science question and find that LLMs excel at tasks thought to be solvable only via visual imagery. 🧵
Artificial Phantasia: Evidence for Propositional Reasoning-Based Mental Imagery in Large Language Models
This study offers a novel approach for benchmarking complex cognitive behavior in artificial systems. Almost universally, Large Language Models (LLMs) perform best on tasks which may be included in th...
arxiv.org
Reposted by Jorge Morales
jpeelle.bsky.social
I’m scheduled for surgery today on my Achilles tendon, followed by 2 weeks of no weight bearing.😵‍💫

So, like any good scientist, I got together 7 colleagues to study consequences of limb disuse.

Introducing the HEALING study

with @laurelgd.bsky.social @sneuroble.bsky.social @briemreid.bsky.social
jorge-morales.bsky.social
I had to google what that meant, I’m a complete newbie. But yeah we were all puzzled and the most experienced among us showed us how to do it (it was a foot jam actually). It was pretty good group activity indeed!
jorge-morales.bsky.social
Our lab went climbing (yes, on a Tuesday morning, oops) and it was really fun! 🧗‍♂️ it was the first time for a few of us and I can totally see why people get into it.
Reposted by Jorge Morales
brialong.bsky.social
We’re recruiting a postdoctoral fellow to join our team! 🎉

I’m happy to share that I’ve opened back up the search for this position (it was temporarily closed due to funding uncertainty).

See lab page and doc below for details!
Reposted by Jorge Morales
sampendu.bsky.social
Long time in the making: our preprint of survey study on the diversity with how people seem to experience #mentalimagery. Suggests #aphantasia should be redefined as absence of depictive thought, not merely "not seeing". Some more take home msg:
#psychskysci #neuroscience

doi.org/10.1101/2025...
jorge-morales.bsky.social
Interestingly, it may just be gaps all the way down. Our experiences themselves may be built out of impoverished signals. In other words, the richness of experience is not necessarily an illusion but a reconstruction. E.g.:
Subjective inflation: phenomenology’s get-rich-quick scheme
How do we explain the seemingly rich nature of visual phenomenology while accounting for impoverished perception in the periphery? This apparent misma…
www.sciencedirect.com
jorge-morales.bsky.social
Absolutely! Hard to capture in words and introspective reports, and impossible to fully capture once it’s operationalized in an experiment.
jorge-morales.bsky.social
This one explored differences in experience and eye movements during reading “Aphantasia modulates immersion experience but not eye fixation patterns during story reading” osf.io/preprints/ps...
OSF
osf.io
Reposted by Jorge Morales
francesegan.bsky.social
Shamelessly promoting my favorite paper. Everybody who was anybody in the history of science/philosophy/mathematics had a view on the moon illusion. frances-egan.org/uploads/3/5/...
frances-egan.org
jorge-morales.bsky.social
This is part of what makes it interesting. They are so good at some examples but terrible at others (in almost any task but definitely in ours). This means they aren't doing it in any principled way (otherwise it should be trivial to get most of them right). But how are they getting some right then?
jorge-morales.bsky.social
Thank you, Greg! This is very encouraging to hear from you. This project has been a lot of fun, and mentoring the undergrad who led it has been super rewarding. He's already working on new questions around the same topic, so hopefully there will be more to share in the coming months.
jorge-morales.bsky.social
That's pretty good! Even with graphic design elements!
Reposted by Jorge Morales
jorge-morales.bsky.social
A few people have asked if the reason why some LLMs perform this visual imagery task successfully is only because the stimuli / task-type we used were in the models' training data. If this were so, data contamination would make our results uninteresting. See this thread for why this isn't the case.
AI-generated image of two scientists looking really worried at a cat stepping on a computer’s keyboard with a monitor showing folder labeled “top secret” about to be moved into the folder labeled “training data”.
jorge-morales.bsky.social
We haven't, but that's a cool idea!
jorge-morales.bsky.social
We have no clue what's going on under the hood. One thing we did explore was varying the reasoning effort parameter in the OpenAI reasoning models we tested. We found, perhaps unsurprisingly, that as reasoning token and time allocations decreased, so did the performance.
jorge-morales.bsky.social
A few people have asked if the reason why some LLMs perform this visual imagery task successfully is only because the stimuli / task-type we used were in the models' training data. If this were so, data contamination would make our results uninteresting. See this thread for why this isn't the case.
AI-generated image of two scientists looking really worried at a cat stepping on a computer’s keyboard with a monitor showing folder labeled “top secret” about to be moved into the folder labeled “training data”.
jorge-morales.bsky.social
Now *that* is cool! I guess I’m not surprised it didn’t work back then. We tried with several small, open models and nether got a single answer right. In fact, had we done our study six months ago (before o3 and GPT-5 were released) we wouldn’t have found performance above the human baseline.
jorge-morales.bsky.social
Lastly, all models may had access to the old stimuli and, hence, to that *style* of task. And yet, the majority of them performed terribly (way worse than humans). This suggests that there's something different in o3 and GPT-5 *as models* that allowed them to perform better (albeit not perfectly).
jorge-morales.bsky.social
Our novel examples were ~9% harder than the older ones. Importantly, every item was solvable (some rarely, some frequently). We think that this difference in difficulty explains a similarly sized decrease in performance in our novel trials compared to the old ones. Data leak is an unlikely cause.
jorge-morales.bsky.social
We purposefully made the novel tasks more difficult (and with a wider difficulty range) than Finke's. We confirmed this with a weighted difficulty scale that had objective (number of steps, number of objects) and subjective dimensions (clarity, identifiability, and response uniqueness).
jorge-morales.bsky.social
As you can see in our figure S3 above, the rank order and overall performance of each model is quite similar across the new and old instructions sets. The slight drop in performance for the new trials happened across the board, including in humans. This was driven by differences in difficulty:
jorge-morales.bsky.social
Indeed 80% of the trials were completely novel and could not have possibly been in the training data. But, importantly, 20% were the old ones from Finke et al. This gave us an opportunity to compare performance across both sets. We didn’t see major performance differences across old and new trials.
jorge-morales.bsky.social
Thanks, Brad! Great questions! We considered this issue as carefully as we could while planning our study and also while analyzing the results. We don’t think high performance by some models is explained by data contamination for several reasons. Bear with me:
Reposted by Jorge Morales
jorge-morales.bsky.social
Imagine an apple 🍎. Is your mental image more like a picture or more like a thought? In a new preprint led by Morgan McCarty—our lab's wonderful RA—we develop a new approach to this old cognitive science question and find that LLMs excel at tasks thought to be solvable only via visual imagery. 🧵
Artificial Phantasia: Evidence for Propositional Reasoning-Based Mental Imagery in Large Language Models
This study offers a novel approach for benchmarking complex cognitive behavior in artificial systems. Almost universally, Large Language Models (LLMs) perform best on tasks which may be included in th...
arxiv.org