Lightnews — Scholar-powered news

Flavio Martinelli

@flavioh.bsky.social

120 followers 160 following 10 posts

I like brains 🧟‍♂️ 🧠
PhD student in computational neuroscience supervised by Wulfram Gerstner and Johanni Brea
https://flavio-martinelli.github.io/

Posts Replies Media Videos

Flavio Martinelli

@flavioh.bsky.social

[bonus] Here's a function that two neurons in a channel can implement

December 4, 2025 at 5:27 PM

Flavio Martinelli

@flavioh.bsky.social

More interesting details can be found in the paper: arxiv.org/abs/2506.14951

Or come by our poster if at Neurips (Session 3, poster #4200)

Wonderful team with Alex Van Meegen @avm.bsky.social, Berfin Simsek, Wulfram Gerstner @gerstnerlab.bsky.social and Johanni Brea

Flat Channels to Infinity in Neural Loss Landscapes

The loss landscapes of neural networks contain minima and saddle points that may be connected in flat regions or appear in isolation. We identify and characterize a special structure in the loss lands...

arxiv.org

December 4, 2025 at 5:27 PM

Flavio Martinelli

@flavioh.bsky.social

But what happens with standard gradient descent?

Channels to infinity get sharper with O(γ^2), this is a clear example of the edge of stability phenomenon:
gradient descent does not converge to a minimum (at infinity) but gets stuck where the sharpness of the channel is 2/η (η: learning rate)

December 4, 2025 at 5:27 PM

Flavio Martinelli

@flavioh.bsky.social

These channels are surprisingly common in MLPs, we find them to be a significant proportion of all minima reached in our training runs

But they can only be spotted by training for a long time, by following the gradient flow with ODE solvers

December 4, 2025 at 5:27 PM

Flavio Martinelli

@flavioh.bsky.social

But what do these pairs of neurons compute?
In the limit of γ→∞ and ε→0 (where ε is the distance of the two neurons input weights) they compute a directional derivative!

The MLP is learning to implement a Gated Linear Unit, with a non-linearity that is the derivative of the original

December 4, 2025 at 5:27 PM

Flavio Martinelli

@flavioh.bsky.social

Here’s some more pictures from different angles

December 4, 2025 at 5:27 PM

Flavio Martinelli

@flavioh.bsky.social

When perturbing networks from their saddle points, gradient trajectories get stuck in nearby channels that run parallel to the saddle line

The gradient dynamics are simple: after a first phase of alignment, trajectories are straight and γ→∞

December 4, 2025 at 5:27 PM

Flavio Martinelli

@flavioh.bsky.social

These channels are parallel to lines of saddle points arising from permutation symmetries, as described by Fukumizu & Amari in 2000

Saddles can be formed by taking a network at a local minimum and splitting a neuron's contribution into two, with splitting factor γ

December 4, 2025 at 5:27 PM

Flavio Martinelli

@flavioh.bsky.social

Isn't NeuroAI a modern rebranding of computational neuroscience?
My take is that NeuroAI just sounds a little broader as a term, incorporating cognition and behaviour in the picture (that were not so accurately modelled before ANNs).
To me the goals of compneuro and NeuroAI are fully overlapping.

November 21, 2024 at 7:46 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news