Lightnews — Scholar-powered news

Paul Barba

@happily-human.bsky.social

13 followers 15 following 15 posts

Machine Learning, AI and Natural Language Processing Expert. Soon to be published author. Curious human.

Posts Replies Media Videos

Paul Barba

@happily-human.bsky.social

Research proposal I lack time for - Pull out the old Word2Vec King-Queen trick. Subtract embedding for 2-1, 3-2, 4-3, 5-4, ...: is there a constant subspace? How about 2-1, 4-2, 8-3, 16-4, ...? My guess is you'd find them for various common f(x)'s.

December 4, 2024 at 3:37 PM

Paul Barba

@happily-human.bsky.social

Similarly, it might need to occasionally guess somebody's height in cm in a context. So having a subspace that expressed the CDF of that gaussian would help. Perhaps different parameter subspaces of a number token end up encoding various mathematical functions?

December 4, 2024 at 3:37 PM

Paul Barba

@happily-human.bsky.social

Hypothesis - this reminds me of kernel methods. With few examples to work from, it feels like the MLP has to be "selecting" from already existing numerical functions in the embedding. Since LLMs can "add", it seems like they would want to have some subspace scale linearly with numeric tokens

December 4, 2024 at 3:37 PM

Paul Barba

@happily-human.bsky.social

First, that's wild. It really feels like a regression algorithm on actual numbers should outperform the embedding from a next-token prediction model trained on random web content.

Looking closer, it seems to work better when the degrees of freedom are high compared to the training instance.

December 4, 2024 at 3:37 PM

Paul Barba

@happily-human.bsky.social

I look at it as 50% more papers I get to read! That said I'm out in industry where reading time feels a little too much like a luxury

December 3, 2024 at 1:07 PM

Paul Barba

@happily-human.bsky.social

To piggy back on this discussion of readable papers, I don't know when the field settled on "write everything as if it were a 10 page conference proceeding, and then shove everything else that ought to be there into 50 pages of appendices", but I love it.

December 3, 2024 at 2:02 AM

Paul Barba

@happily-human.bsky.social

That was a great moment in the field. Last time I can think of a deep theoretical underpinning just showing up, instead of "I don't know, it just works ok???" Not that there aren't attempts, but never that definitive.

November 27, 2024 at 7:55 PM

Paul Barba

@happily-human.bsky.social

Great prompt! For underrated NLP papers I'll go with arxiv.org/abs/1911.11423

Sitting somewhere between proper research and a blog article, questioning some fundamental directions with the dominance of multi attention heads, readable. I wish more papers were like that one somehow.

Single Headed Attention RNN: Stop Thinking With Your Head

The leading approaches in language modeling are all obsessed with TV shows of my youth - namely Transformers and Sesame Street. Transformers this, Transformers that, and over here a bonfire worth of G...

arxiv.org

November 27, 2024 at 5:28 PM

Paul Barba

@happily-human.bsky.social

If you train the annotating model with the annotations, you can call it Bootstrapping (although that's an annoyingly overloaded term with the stats community)

November 26, 2024 at 2:31 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news