Author | Lightnews

Arna Ghosh

@arnaghosh.bsky.social

Thanks Ken! ☺️
Here's the (more updated) NeurIPS version: proceedings.neurips.cc/paper_files/...

Also, more recently we extended the use of powerlaws for characterizing how representations change over (pre/post) training in LLMs. 🙂
🧵 here: bsky.app/profile/arna...

November 18, 2025 at 4:13 AM

Arna Ghosh

@arnaghosh.bsky.social

This is an excellent blueprint on a very fascinating use of AI scientist! And the results and super cool and interesting! 🤩
I have been asked this when talking about our work on using powerlaws to study representation quality in deep neural networks, glad to have a more concrete answer now! 😃

Kenneth Harris @kenneth-harris.bsky.social · 11d

1. New preprint resolving a conundrum in systems neuroscience with an AI scientist, and humans Reilly Tilbury, Dabin Kwon, @haydari.bsky.social, @jacobmratliff.bsky.social, @bio-emergent.bsky.social, @carandinilab.net, @kevinjmiller.bsky.social, @neurokim.bsky.social
www.biorxiv.org/content/10.1...

Characterizing neuronal population geometry with AI equation discovery

The visual cortex contains millions of neurons, whose combined activity forms a population code representing visual stimuli. There is, however, a discrepancy between our understanding of this code at ...

www.biorxiv.org

November 16, 2025 at 10:29 PM

Reposted by Arna Ghosh

Greg Priest

@gregpriest.bsky.social

Conrad Hal Waddington was born OTD in 1905.

His “epigenetic landscape” is a diagrammatic representation of the constraints influencing embryonic development.

On his 50th birthday, his colleagues gave him a pinball machine on the model of the epigenetic landscape.

🧪 🦫🦋 🌱🐋 #HistSTM #philsci #evobio

November 8, 2025 at 4:03 PM

Arna Ghosh

@arnaghosh.bsky.social

You mean the algorithms "generate" some auxilliary targets and then do supervised learning?

November 8, 2025 at 8:17 AM

Arna Ghosh

@arnaghosh.bsky.social

I got you 😉

November 8, 2025 at 8:14 AM

Reposted by Arna Ghosh

Shahab Bakhtiari

@shahabbakht.bsky.social

I’m looking for interns to join our lab for a project on foundation models in neuroscience.

Funded by @ivado.bsky.social and in collaboration with the IVADO regroupement 1 (AI and Neuroscience: ivado.ca/en/regroupem...).

Interested? See the details in the comments. (1/3)

🧠🤖

AI and Neuroscience | IVADO

ivado.ca

November 7, 2025 at 1:52 PM

Reposted by Arna Ghosh

Olivier Codol

@oliviercodol.bsky.social

A tad late (announcements coming) but very happy to share the latest developments in my previous preprint!

Previously, we show that neural representations for control of movement are largely distinct following supervised or reinforcement learning. The latter most closely matches NHP recordings.

Olivier Codol @oliviercodol.bsky.social · Oct 7

Here’s our latest work at @glajoie.bsky.social and @mattperich.bsky.social ‘s labs! Excited to see this out.

We used a combination of neural recordings & modelling to show that RL yields neural dynamics closer to biology, with useful continual learning properties.

www.biorxiv.org/content/10.1...

Brain-like neural dynamics for behavioral control develop through reinforcement learning

During development, neural circuits are shaped continuously as we learn to control our bodies. The ultimate goal of this process is to produce neural dynamics that enable the rich repertoire of behavi...

www.biorxiv.org

November 6, 2025 at 2:10 AM

Arna Ghosh

@arnaghosh.bsky.social

Thank you! 😁

November 3, 2025 at 1:34 PM

Arna Ghosh

@arnaghosh.bsky.social

Indeed! We show in the paper that the DPO objective is analogous to contrastive learning objectives used for self-supervised vision pretraining, which is indeed entropy-seeking in nature (shown in prev works).

I feel spectral metrics can go a long way in unlocking LLM understanding+design. 🚀

November 3, 2025 at 1:51 AM

Arna Ghosh

@arnaghosh.bsky.social

A big shoutout to @koustuvsinha.com for insightful discussions that shaped this work, and
@natolambert.bsky.social + the OLMo team!

Paper 📝: arxiv.org/abs/2509.23024
👩‍💻 Code : Coming soon! 👨‍💻

Tracing the Representation Geometry of Language Models from Pretraining to Post-training

Standard training metrics like loss fail to explain the emergence of complex capabilities in large language models. We take a spectral approach to investigate the geometry of learned representations a...

arxiv.org

October 31, 2025 at 4:19 PM

Arna Ghosh

@arnaghosh.bsky.social

This work was done with dream team 🤩
@melodylizx.bsky.social @kumarkagrawal.bsky.social Komal Teru @glajoie.bsky.social @adamsantoro.bsky.social @tyrellturing.bsky.social
at @mila-quebec.bsky.social @berkeleyair.bsky.social @cohere.com & @googleresearch.bsky.social!

🧵9/9

October 31, 2025 at 4:19 PM

Arna Ghosh

@arnaghosh.bsky.social

Takeaway: LLM training exhibits multi-phasic information geometry changes! ✨

- Pretraining: Compress → Expand (Memorize) → Compress (Generalize).

- Post-training: SFT/DPO → Expand; RLVR → Consolidate.

Representation geometry offers insights into when models memorize vs. generalize! 🤓

🧵8/9

The multi-phasic information geometry changes in LLM pretraining and post-training. Pretraining undergoes an initial warmup phase, which corresponds to echolalia behavior, followed by entropy-seeking where the model learns high-frequency n-gram statistics, and finally a compression-seeking phase, where the model learns long-range dependencies. The post-training stages of SFT and DPO exhibit entropy-seeking behavior, where the model memorizes instruction following behavior, whereas RLVR exhibits compress-seeking behavior, where the model learns generalized reasoning at the cost of exploration.

October 31, 2025 at 4:19 PM

Arna Ghosh

@arnaghosh.bsky.social

BONUS: Is task-relevant info contained in the top eigendirections?

On SciQ:

- Removing top 10/50 directions barely hurts accuracy.✅

- Retaining only top 10/50 directions CRUSHES accuracy.📉

As supported by our theoretical results, eigenspectrum tail encodes critical task information! 🤯

🧵7/9

October 31, 2025 at 4:19 PM

Arna Ghosh

@arnaghosh.bsky.social

Why do these geometric phases arise?🤔

We show, both through theory and with simulations in a toy model, that these non-monotonic spectral changes occur due to gradient descent dynamics with cross-entropy loss under 2 conditions:

1. skewed token frequencies
2. representation bottlenecks

🧵6/9

October 31, 2025 at 4:19 PM

Arna Ghosh

@arnaghosh.bsky.social

Post-training also yields distinct geometric signatures:

- SFT & DPO exhibit entropy-seeking expansion, favoring instruction memorization but reducing OOD robustness.📈

- RLVR exhibits compression-seeking consolidation, learning reward-aligned behaviors at the cost of reduced exploration.📉

🧵5/9

Supervised Finetuning (SFT) exhibits entropy-seeking expansion, coupled with decreased OOD robustness, whereas Reinforcement Learning from Verifiable Rewards (RLVR) exhibits compression-seeking consolidation, coupled with reduced exploration.

October 31, 2025 at 4:19 PM

Arna Ghosh

@arnaghosh.bsky.social

How do these phases relate to LLM behavior?

- Entropy-seeking: Correlates with short-sequence memorization (♾️-gram alignment).

- Compression-seeking: Correlates with dramatic gains in long-context factual reasoning, e.g. TriviaQA.

Curious about ♾️-grams?
See: bsky.app/profile/liuj...
🧵4/9

Different geometric phases correspond to acquiring different behaviors: entropy-seeking phase correlates with increased short-sequence memorization, compression-seeking phase correlates with better TriviaQA performance.

October 31, 2025 at 4:19 PM

Arna Ghosh

@arnaghosh.bsky.social

LLMs have 3 pretraining phases:

Warmup: Rapid compression, collapsing representation to dominant directions.

Entropy-seeking: Manifold expansion, adding info in non-dominant directions.📈

Compression-seeking: Anisotropic consolidation, selectively packing more info in dominant directions.📉

🧵3/9

OLMo-2 and Pythia models undergo multiple distinct geometric phases during pretraining, indicating a non-monotonic change in representation complexity underlying monotonic decrease in training loss.

October 31, 2025 at 4:19 PM

Arna Ghosh

@arnaghosh.bsky.social

When investigating OLMo (@ai2.bsky.social) & Pythia (@eleutherai.bsky.social) model checkpoints, as expected, pretraining loss ⬇️monotonically.

BUT

🎢The spectral metrics (RankMe, αReQ) change non-monotonically (with more pretraining)!

Takeaway: We discover geometric phases of LLM learning!

🧵2/9

October 31, 2025 at 4:19 PM

Arna Ghosh

@arnaghosh.bsky.social

📐We measured representation complexity using the #eigenspectrum of the final layer representations. We used 2 spectral metrics:

- Spectral Decay Rate, αReQ: Fraction of variance in non-dominant directions.

- RankMe: Effective Rank; #dims truly active.

⬇️αReQ ⇒ ⬆️RankMe ⇒ More complex!

🧵1/9

Spectral decomposition methods and metrics used to quantify representation space complexity in LLMs.

October 31, 2025 at 4:19 PM

Arna Ghosh

@arnaghosh.bsky.social

LLMs are trained to compress data by mapping sequences to high-dim representations!
How does the complexity of this mapping change across LLM training? How does it relate to the model’s capabilities? 🤔
Announcing our #NeurIPS2025 📄 that dives into this.

🧵below
#AIResearch #MachineLearning #LLM

New paper titled "Tracing the Representation Geometry of Language Models from Pretraining to Post-training" by Melody Z Li, Kumar K Agrawal, Arna Ghosh, Komal K Teru, Adam Santoro, Guillaume Lajoie, Blake A Richards.

October 31, 2025 at 4:19 PM

Arna Ghosh

@arnaghosh.bsky.social

Very cool study, with interesting insights about theta sequences and learning!

Abhilasha Joshi, PhD @rhythmicspikes.bsky.social · Sep 17

1/
🚨 New preprint! 🚨

Excited and proud (& a little nervous 😅) to share our latest work on the importance of #theta-timescale spiking during #locomotion in #learning. If you care about how organisms learn, buckle up. 🧵👇

📄 www.biorxiv.org/content/10.1...
💻 code + data 🔗 below 🤩

#neuroskyence

September 19, 2025 at 7:07 AM

Reposted by Arna Ghosh

Blake Richards

@tyrellturing.bsky.social

Together with @repromancer.bsky.social, I have been musing for a while that the exponentiated gradient algorithm we've advocated for comp neuro would work well with low-precision ANNs.

This group got it working!

arxiv.org/abs/2506.17768

May be a great way to reduce AI energy use!!!

#MLSky 🧪

Log-Normal Multiplicative Dynamics for Stable Low-Precision Training of Large Networks

Studies in neuroscience have shown that biological synapses follow a log-normal distribution whose transitioning can be explained by noisy multiplicative dynamics. Biological networks can function sta...

arxiv.org

July 9, 2025 at 2:34 PM

Arna Ghosh

@arnaghosh.bsky.social

Congratulations, Dan!! 😁

June 24, 2025 at 9:44 PM

Arna Ghosh

@arnaghosh.bsky.social

This looks like a very cool result! 😀
Can't wait to read in detail.

Samuel Liebana @samuel-liebana.bsky.social · Jun 15

Does the brain learn by gradient descent?

It's a pleasure to share our paper at @cp-cell.bsky.social, showing how mice learning over long timescales display key hallmarks of gradient descent (GD).

The culmination of my PhD supervised by @laklab.bsky.social, @saxelab.bsky.social and Rafal Bogacz!

Dopamine encodes deep network teaching signals for individual learning trajectories

Longitudinal tracking of long-term learning behavior and striatal dopamine reveals that dopamine teaching signals shape individually diverse yet systematic learning trajectories, captured mathematical...

www.cell.com

June 15, 2025 at 11:03 PM

Arna Ghosh

@arnaghosh.bsky.social

Fantastic work on Multi-agent RL from
@dvnxmvlhdf5.bsky.social & @tyrellturing.bsky.social! 🤩

Dane Carnegie Malenfant @dvnxmvlhdf5.bsky.social · Jun 5

Preprint Alert 🚀

Multi-agent reinforcement learning (MARL) often assumes that agents know when other agents cooperate with them. But for humans, this isn’t always the case. For example, plains indigenous groups used to leave resources for others to use at effigies called Manitokan.
1/8

Manitokan are images set up where one can bring a gift or receive a gift. 1930s Rocky Boy Reservation, Montana, Montana State University photograph. Colourized with AI

June 9, 2025 at 6:16 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news