Lightnews — Scholar-powered news

Georg Bökman @bokmangeorg.bsky.social · 6d

Using Fourier theory of finite groups, we can block-diagonalize these group-circulant matrices. Hence, incorporating symmetries (group equivariance) in neural networks can make the networks faster. We used this to obtain 𝑞𝑢𝑖𝑐𝑘𝑒𝑟 𝑉𝑖𝑇𝑠. arxiv.org/abs/2505.15441

5

Georg Bökman @bokmangeorg.bsky.social · 6d

Mapping such 8-tuples to new 8-tuples that permute in the same way under transformations of the input is done by convolutions over the transformation group, or (equivalently) multiplication with group-circulant matrices.

1 2

Georg Bökman @bokmangeorg.bsky.social · 6d

Images (or image patches) are secretly multi-channel signals over groups. Below, the dihedral group of order 8: reflecting/rotating the image permutes the values in the magenta vector. So we can reshape the image into 8-tuples that all permute according to the dihedral group (edge case diagonals).

2 1 9

Georg Bökman @bokmangeorg.bsky.social · 8d

The recovery is on

1

Georg Bökman @bokmangeorg.bsky.social · 9d

Had a skim of Kostelec-Rockmore. There are some interesting pointers suggesting non-triviality of fast implementations of asymptotically fast FFTs at the end. 🙃 Also, there seems to be a version that uses three 1D FFTs, but it is not as fast as possible asymptotically.

1 1

Georg Bökman @bokmangeorg.bsky.social · 9d

At least some FFTs for SO(3) work by separation of variables and a sequence of 1D FFTs right? So is the butterfly decomposition "straightforward" for them? Regarding small finite groups, the entire FFT might be unnecessary and can simply be a dense fourier transform matrix.

1

Georg Bökman @bokmangeorg.bsky.social · 9d

@parskatt.bsky.social

1 1

Georg Bökman @bokmangeorg.bsky.social · 11d

Do you have good examples from other areas of taking the hardware as the prior?

1

Georg Bökman @bokmangeorg.bsky.social · 13d

Also quite generous to cite the paper as a generic reference for the term "FLOPs" 😅

1 2

Georg Bökman @bokmangeorg.bsky.social · 13d

Nice LLM generated citation found by @davnords.bsky.social. I wonder who M. Lindberg and A. Andersson are...

2 4

Georg Bökman @bokmangeorg.bsky.social · 13d

Got to honor the traditions. "In Sweden, the west coast city of Gothenburg is known for its puns."

1 3

Georg Bökman @bokmangeorg.bsky.social · 13d

Big if true

1 4

Reposted by Georg Bökman

Aaron Roth @aaroth.bsky.social · 14d

The opportunities and risks of the entry of LLMs into mathematical research in one screenshot. I think it is clear that LLMs will make trained researchers more effective. But they will also lead to a flood of bad/wrong papers, and I'm not sure we have the tools to deal with this.

4 6 31

Georg Bökman @bokmangeorg.bsky.social · 13d

Nice perspective, you look like a giant! And congrats!

1

Georg Bökman @bokmangeorg.bsky.social · 15d

If you were working at meta you could have called the paper "Mental rotation capabilities emerge at scale with DINOv3" :)

3

Georg Bökman @bokmangeorg.bsky.social · 15d

I see, yeah plots of proportions over the layers would be cool!

1

Georg Bökman @bokmangeorg.bsky.social · 16d

Also, I think it is possible to argue for equivariance at scale from a purely computational perspective. bsky.app/profile/bokm...

Georg Bökman @bokmangeorg.bsky.social · 20d

A simple argument for equivariance at scale: 1) At scale, token-wise linear layers dominate compute. 2) Token-wise linear equivariant layers implemented in the Fourier domain are block-diagonal and hence fast.

Georg Bökman @bokmangeorg.bsky.social · 16d

I like the point made in this paragraph. It might follow that it's a good idea to build equivariant architectures that are as similar to proven non-equivariant architectures as possible.

1 1

Georg Bökman @bokmangeorg.bsky.social · 16d

I think the difficult part here is to tell whether the object has been rotated or both rotated and mirrored. I.e. the model needs to be sensitive to mirroring. Mirroring is often part of data aug, but the model can (should i.m.o.) still be internally sensitive to mirroring.

2

Georg Bökman @bokmangeorg.bsky.social · 16d

Interesting how basically one single layer in one single model out of all these can solve the "Shepard-Metzler Free" case

1 4

Georg Bökman @bokmangeorg.bsky.social · 17d

Congrats, very interesting work! When training with only 8 filters, how did you choose how many of each to use in each layer? Did you just use equally many of each?

1 1

Georg Bökman @bokmangeorg.bsky.social · 20d

Thanks! Also I'm currently doing a postdoc in Amsterdam and would love to visit Delft ;)

1 2

Georg Bökman @bokmangeorg.bsky.social · 20d

Perhaps true, depending on the problem. But in general the framing could be "can I get away with equivariance for my problem?" rather than "do I need equivariance for my problem?".

1

Georg Bökman @bokmangeorg.bsky.social · 20d

However, these potential issues are "skill issues" in my opinion...

1 1

Georg Bökman @bokmangeorg.bsky.social · 20d

We've been exploring this for image data recently. Exhibit A arxiv.org/abs/2502.05169, exhibit B arxiv.org/abs/2505.15441 .

4