k(·, u)
banner
k-utsunomiya.bsky.social
k(·, u)
@k-utsunomiya.bsky.social
Univ. of Tsukuba, M.S. in CS ('27) / Researching learning dynamics & feature learning in DNNs.
Shokichi Takakura, Taiji Suzuki.
Mean-field Analysis on Two-layer Neural Networks from a Kernel Perspective.
arxiv.org/abs/2403.14917
Mean-field Analysis on Two-layer Neural Networks from a Kernel Perspective
In this paper, we study the feature learning ability of two-layer neural networks in the mean-field regime through the lens of kernel methods. To focus on the dynamics of the kernel induced by the...
arxiv.org
March 26, 2024 at 1:18 PM
機械学習の数学(鈴木 大慈 氏、FD研修会「人工知能と数学」)

youtu.be/qXjDEAdjw0s?...
March 20, 2024 at 9:08 AM
積分表現でニューラルネットを理解する(園田 翔 氏、FD研修会「人工知能と数学」)

youtu.be/z8TXubu3Uko?...
March 20, 2024 at 9:07 AM
超豪華ですね...

DL2024 (Tokyo Deep Learning Workshop)
sites.google.com/view/dl2024/
March 17, 2024 at 2:23 PM
Reposted by k(·, u)
Daniel Beaglehole ,Ioannis Mitliagkas ,Atish Agarwala
Gradient descent induces alignment between weights and the empirical NTK for deep non-linear networks
https://arxiv.org/abs/2402.05271
February 9, 2024 at 10:41 AM
Reposted by k(·, u)
最近のお仕事。

勾配法によるハイパーパラメータ最適化で超勾配のダイナミクスの推定にKoopman作用素を使ってみた話
arxiv.org/abs/2402.02741

Self-attentionの偏在、遍在と重み行列の固有値の関係を評価して、正則化で制御できるようにした話
arxiv.org/abs/2402.02098
Glocal Hypergradient Estimation with Koopman Operator
Gradient-based hyperparameter optimization methods update hyperparameters using hypergradients, gradients of a meta criterion with respect to hyperparameters. Previous research used two distinct update strategies: optimizing hyperparameters using global hypergradients obtained after completing model training or local hypergradients derived after every few model updates. While global hypergradients offer reliability, their computational cost is significant; conversely, local hypergradients provide speed but are often suboptimal. In this paper, we propose glocal hypergradient estimation, blending "global" quality with "local" efficiency. To this end, we use the Koopman operator theory to linearize the dynamics of hypergradients so that the global hypergradients can be efficiently approximated only by using a trajectory of local hypergradients. Consequently, we can optimize hyperparameters greedily using estimated global hypergradients, achieving both reliability and efficiency simultaneously. Through numerical experiments of hyperparameter optimization, including optimization of optimizers, we demonstrate the effectiveness of the glocal hypergradient estimation.
arxiv.org
February 9, 2024 at 11:41 AM
まんまるフクロウ
January 8, 2024 at 10:01 AM
TeX打ちではじまる2024
December 31, 2023 at 3:19 PM
Reposted by k(·, u)
December 25, 2023 at 3:10 AM
Hello Bluesky.
December 16, 2023 at 11:51 PM