Paul Barba
happily-human.bsky.social
Paul Barba
@happily-human.bsky.social
Machine Learning, AI and Natural Language Processing Expert. Soon to be published author. Curious human.
Research proposal I lack time for - Pull out the old Word2Vec King-Queen trick. Subtract embedding for 2-1, 3-2, 4-3, 5-4, ...: is there a constant subspace? How about 2-1, 4-2, 8-3, 16-4, ...? My guess is you'd find them for various common f(x)'s.
December 4, 2024 at 3:37 PM
Similarly, it might need to occasionally guess somebody's height in cm in a context. So having a subspace that expressed the CDF of that gaussian would help. Perhaps different parameter subspaces of a number token end up encoding various mathematical functions?
December 4, 2024 at 3:37 PM
Hypothesis - this reminds me of kernel methods. With few examples to work from, it feels like the MLP has to be "selecting" from already existing numerical functions in the embedding. Since LLMs can "add", it seems like they would want to have some subspace scale linearly with numeric tokens
December 4, 2024 at 3:37 PM
First, that's wild. It really feels like a regression algorithm on actual numbers should outperform the embedding from a next-token prediction model trained on random web content.

Looking closer, it seems to work better when the degrees of freedom are high compared to the training instance.
December 4, 2024 at 3:37 PM
I look at it as 50% more papers I get to read! That said I'm out in industry where reading time feels a little too much like a luxury
December 3, 2024 at 1:07 PM
To piggy back on this discussion of readable papers, I don't know when the field settled on "write everything as if it were a 10 page conference proceeding, and then shove everything else that ought to be there into 50 pages of appendices", but I love it.
December 3, 2024 at 2:02 AM
That was a great moment in the field. Last time I can think of a deep theoretical underpinning just showing up, instead of "I don't know, it just works ok???" Not that there aren't attempts, but never that definitive.
November 27, 2024 at 7:55 PM
Great prompt! For underrated NLP papers I'll go with arxiv.org/abs/1911.11423

Sitting somewhere between proper research and a blog article, questioning some fundamental directions with the dominance of multi attention heads, readable. I wish more papers were like that one somehow.
Single Headed Attention RNN: Stop Thinking With Your Head
The leading approaches in language modeling are all obsessed with TV shows of my youth - namely Transformers and Sesame Street. Transformers this, Transformers that, and over here a bonfire worth of G...
arxiv.org
November 27, 2024 at 5:28 PM
If you train the annotating model with the annotations, you can call it Bootstrapping (although that's an annoyingly overloaded term with the stats community)
November 26, 2024 at 2:31 PM