Lightnews — Scholar-powered news

Alexandra Proca @aproca.bsky.social · Jun 20

In summary, this work provides a novel flexible framework for studying learning in linear RNNs, allowing us to generate new insights into their learning process and the solutions they find, and progress our understanding of cognition in dynamic task settings.

2

Alexandra Proca @aproca.bsky.social · Jun 20

Finally, although many results we present are based on SVD, we also derive a form based on an eigendecomposition, allowing for rotational dynamics and to which our framework naturally extends to. We use this to study learning in terms of polar coordinates in the complex plane.

1 2

Alexandra Proca @aproca.bsky.social · Jun 20

To study how recurrence might impact feature learning, we derive the NTK for finite-width LRNNs and evaluate its movement during training. We find that recurrence appears to facilitate kernel movement across many settings, suggesting a bias towards rich learning.

1 2

Alexandra Proca @aproca.bsky.social · Jun 20

Motivated by this, we study task dynamics without zero-loss solutions and find that there exists a tradeoff between recurrent and feedforward computations that is characterized by a phase transition and leads to low-rank connectivity.

1 2

Alexandra Proca @aproca.bsky.social · Jun 20

By analyzing the energy function, we identify an effective regularization term that incentivizes small weights, especially when task dynamics are not perfectly learnable.

1 2

Alexandra Proca @aproca.bsky.social · Jun 20

Additionally, these results predict behavior in networks performing integration tasks, where we relax our theoretical assumptions.

1 2

Alexandra Proca @aproca.bsky.social · Jun 20

Next, we show that task dynamics determine a RNN’s ability to extrapolate to other sequence lengths and its hidden layer stability, even if there exists a perfect zero-loss solution.

1 2

Alexandra Proca @aproca.bsky.social · Jun 20

We find that learning speed is dependent on both the scale of SVs and their temporal ordering, such that SVs occurring later in the trajectory have a greater impact on learning speed.

1 2

Alexandra Proca @aproca.bsky.social · Jun 20

Using this form, we derive solutions to the learning dynamics of the input-output modes and local approximations of the recurrent modes separately, and identify differences in the learning dynamics of recurrent networks compared to feedforward ones.

1 2

Alexandra Proca @aproca.bsky.social · Jun 20

We derive a form where the task dynamics are fully specified by the data correlation singular values (or eigenvalues) across time (t=1:T), and learning is characterized by a set of gradient flow equations and energy function that are decoupled across different dimensions.

1 2

Alexandra Proca @aproca.bsky.social · Jun 20

We study a RNN that receives an input at each timestep and produces a final output at the last timestep (and generalize to the autoregressive case later). For each input at time t and the output, we can construct correlation matrices and compute their SVD (or eigendecomposition).

1 2

Alexandra Proca @aproca.bsky.social · Jun 20

RNNs are popular in both ML and neuroscience to learn tasks with temporal dependencies and model neural dynamics. However, despite substantial work on RNNs, it's unknown how their underlying functional structures emerge from training on temporally-structured tasks.

1 2

Alexandra Proca @aproca.bsky.social · Jun 20

How do task dynamics impact learning in networks with internal dynamics?

Excited to share our ICML Oral paper on learning dynamics in linear RNNs!
with @clementinedomine.bsky.social @mpshanahan.bsky.social and Pedro Mediano

openreview.net/forum?id=KGO...

Learning dynamics in linear recurrent neural networks

Recurrent neural networks (RNNs) are powerful models used widely in both machine learning and neuroscience to learn tasks with temporal dependencies and to model neural dynamics. However, despite...

openreview.net

1 12 34

Reposted by Alexandra Proca

Clementine Domine 🍊 @CCN @clementinedomine.bsky.social · Apr 4

🚀 An other Exciting news! Our paper "From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks" has been accepted at ICLR 2025!

arxiv.org/abs/2409.14623

A thread on how relative weight initialization shapes learning dynamics in deep networks. 🧵 (1/9)

1 9 30

Alexandra Proca @aproca.bsky.social · Dec 4

Had a really fun time collaborating on this project with a great team. I’ll be at NeurIPs next week to present it, come by and check out our paper for more!

Kai Sandbrink @ackaisa.bsky.social · Dec 3

Thrilled to share our NeurIPS Spotlight paper with Jan Bauer*, @aproca.bsky.social*, @saxelab.bsky.social, @summerfieldlab.bsky.social, Ali Hummos*! openreview.net/pdf?id=AbTpJ...

We study how task abstractions emerge in gated linear networks and how they support cognitive flexibility.

2

Reposted by Alexandra Proca

Kai Sandbrink @ackaisa.bsky.social · Dec 3

Thrilled to share our NeurIPS Spotlight paper with Jan Bauer*, @aproca.bsky.social*, @saxelab.bsky.social, @summerfieldlab.bsky.social, Ali Hummos*! openreview.net/pdf?id=AbTpJ...

We study how task abstractions emerge in gated linear networks and how they support cognitive flexibility.

2 15 64

Alexandra Proca @aproca.bsky.social · Nov 22

🙋‍♀️

1