Lightnews — Scholar-powered news

@an-exlab.bsky.social

7 followers 3 following 9 posts

Posts Media Videos Starter Packs

Pinned

an-exlab.bsky.social @an-exlab.bsky.social · Jul 17

We will be advertising for a postdoc position soon, to work on #generative #models #structure #induction and #uncertainty with Michael Gutmann as part of @genaihub.bsky.social !

Keep an eye out, and get in touch! ( #ML #AI #ICML2025 )

👉 homepages.inf.ed.ac.uk/snaraya3/
👉 michaelgutmann.github.io

Siddharth - Home

Siddharth's page

homepages.inf.ed.ac.uk

2 1

an-exlab.bsky.social @an-exlab.bsky.social · Jul 17

Siddharth - Home

Siddharth's page

homepages.inf.ed.ac.uk

2 1

an-exlab.bsky.social @an-exlab.bsky.social · Jul 17

Interested?

If you want to see all the experiments and find out why it works in the first place, you can check out the paper here:
arxiv.org/abs/2407.17771

Our code is available with the paper :)

Banyan: Improved Representation Learning with Explicit Structure

We present Banyan, a model that efficiently learns semantic representations by leveraging explicit hierarchical structure. While transformers excel at scale, they struggle in low-resource settings. Co...

arxiv.org

an-exlab.bsky.social @an-exlab.bsky.social · Jul 17

Banyan stays competitive often even managing to outperform the baselines. This is despite the fact that it is a much much smaller model 7/🧵:

an-exlab.bsky.social @an-exlab.bsky.social · Jul 17

Where this really shines is in the low resource setting, where embeddings still play a critical role, but scale just isn’t available. That’s what we evaluate next, and this time we compare to LLMs in the 100M - 7B range as well as supervised embedding models 6/🧵:

1 1

an-exlab.bsky.social @an-exlab.bsky.social · Jul 17

Banyan turns out to be a pretty efficient learner! Its embeddings outperform our prior recursive net, as well as a RoBERTa medium ( a few million parameter encoder) and several word embedding baselines trained on 10x more data
5/🧵

an-exlab.bsky.social @an-exlab.bsky.social · Jul 17

2) We change our parameterization to a diagonal mechanism inspired by SSMs, which lets us reduce parameters by 10x while massively increasing performance 💪

For our initial benchmarks we pre-train Banyan on 10M tokens of English and test STS, retrieval and classification... 4/🧵

an-exlab.bsky.social @an-exlab.bsky.social · Jul 17

We can make this set up much more powerful with two changes:

1) Entangling: whenever any instance of the encoder merges the same span, we reconstruct it from every possible context it can occur in, learning the global connective structure of our pre-training corpus 3/🧵

an-exlab.bsky.social @an-exlab.bsky.social · Jul 17

Banyan is a special type of AutoEncoder, called a Self-StrAE (see fig). Given a sequence it needs to learn which elements to merge with each other, and in what order, to get the best compression. This means its representations model compositional semantics 2/🧵

1 1 1

an-exlab.bsky.social @an-exlab.bsky.social · Jul 17

Are you compositionally curious 🤓

Want to know how to learn embeddings using🌲?

In our new #ICML2025 paper, we present Banyan:

A recursive net that you can train super efficiently for any language or domain, and get embeddings competitive with much much larger LLMs 1/🧵

1 2