Kimon Fountoulakis
@kfountou.bsky.social
1.1K followers 88 following 140 posts
Associate Professor at CS UWaterloo Machine Learning Lab: opallab.ca
Posts Media Videos Starter Packs
kfountou.bsky.social
2) Can a neural network discover instructions for performing multiplication itself?

The answer to the first question is yes, with high probability and up to some arbitrary, predetermined precision (see the quoted post).
kfountou.bsky.social
Learning to execute arithmetic exactly, with high probability, can be quite expensive. In the plot, 'ensemble complexity' refers to the number of independently trained models required to achieve exact learning with high probability. ell is the number of bits per number in the input.
kfountou.bsky.social
New paper: Learning to Add, Multiply, and Execute Algorithmic Instructions Exactly with Neural Networks
kfountou.bsky.social
Learning to execute arithmetic exactly, with high probability, can be quite expensive. In the plot, 'ensemble complexity' refers to the number of independently trained models required to achieve exact learning with high probability. ell is the number of bits per number in the input.
kfountou.bsky.social
I never understood the point of trams. They're slow and expensive. I've been to two cities that built them while I was there, Edinburgh and Athens, and in both cases, the projects were born out of corruption. Especially in Edinburgh, it was a disaster. en.wikipedia.org/wiki/Edinbur...
Edinburgh Tram Inquiry - Wikipedia
en.wikipedia.org
kfountou.bsky.social
Thanks, the connection to formal languages is quite interesting. I have a section in the repo regarding formal languages but it's small mainly because it's not a topic that I am familiar with. I will add them!
kfountou.bsky.social
Update, 14 empirical papers added!
kfountou.bsky.social
Computational Capability and Efficiency of Neural Networks: A Repository of Papers

I compiled a list of theoretical papers related to the computational capabilities of Transformers, recurrent networks, feedforward networks, and graph neural networks.

Link: github.com/opallab/neur...
kfountou.bsky.social
The SIAM Conference on Optimization 2026 will be in Edinburgh! I don’t really work on optimization anymore (at least not directly), but it’s cool to see a major optimization conference taking place where I did my PhD.
kfountou.bsky.social
Currently NeurIPS has 21390 submissions. The final number last year was 15671.

Observation made by my student George Giapitzakis.
Reposted by Kimon Fountoulakis
jasondeanlee.bsky.social
Our new work on scaling laws that includes compute, model size, and number of samples. The analysis involves an extremely fine-grained analysis of online sgd built up over the last 8 years of understanding sgd on simple toy models (tensors, single index models, multi index model)
eshaannichani.bsky.social
Excited to announce a new paper with Yunwei Ren, Denny Wu,
@jasondeanlee.bsky.social!

We prove a neural scaling law in the SGD learning of extensive width two-layer neural networks.

arxiv.org/abs/2504.19983

🧵below (1/10)
kfountou.bsky.social
Hey, I definitely predicted this correctly.
kfountou.bsky.social
ChatGPT gives me the ability to expand my search capabilities on topics that I can only roughly describe, or even illustrate with a figure, when I don’t know the exact keywords to use in a Google search.
kfountou.bsky.social
Positional Attention is accepted at ICML 2025! Thanks to all co-authors for the hard work (64 pages). If you’d like to read the paper, check the quoted post.
kfountou.bsky.social
NeurIPS 2026 in the Cyclades. Just saying.
kfountou.bsky.social
Wait, isn't that America?
kfountou.bsky.social
I enjoyed reading the paper "A Generalized Neural Tangent Kernel for Surrogate Gradient Learning" (Spotlight, NeurIPS 2024).

They extend the NTK framework to activation functions that have finitely many jumps.