Lightnews — Scholar-powered news

Sam Blouir

@samblouir.bsky.social

Hey Marc! Could you add me to this student list? I can’t seem to DM you.

November 29, 2024 at 6:51 PM

Sam Blouir

@samblouir.bsky.social

Haha, cool list! Can you add me?

November 25, 2024 at 6:11 AM

Sam Blouir

@samblouir.bsky.social

Hi! Thanks for making this. Could you add me, please? :)

November 23, 2024 at 4:29 PM

Sam Blouir

@samblouir.bsky.social

Hah! Can you add me please?

November 23, 2024 at 6:17 AM

Sam Blouir

@samblouir.bsky.social

Sent a DM!

November 21, 2024 at 10:59 PM

Sam Blouir

@samblouir.bsky.social

Please add me :) PhD student here in a multilingual focused NLP lab.

November 20, 2024 at 8:18 PM

Sam Blouir

@samblouir.bsky.social

Hi, can you add me? Thank you.

November 20, 2024 at 2:36 PM

Sam Blouir

@samblouir.bsky.social

Hi! Could I please join this group? Thank you.

November 20, 2024 at 10:57 AM

Sam Blouir

@samblouir.bsky.social

Hi, could you please add me? Thank you.

November 19, 2024 at 5:51 PM

Sam Blouir

@samblouir.bsky.social

Hi! Can I join this group? Working on several AI for Science research projects :)

November 19, 2024 at 5:49 PM

Sam Blouir

@samblouir.bsky.social

Definitely D

November 19, 2024 at 5:22 AM

Sam Blouir

@samblouir.bsky.social

Huge thanks to the George Mason University NLP Lab (@GMNLP), Stanford AI Lab (@StanfordAILab), and all of our collaborators! 🙏

November 18, 2024 at 5:28 PM

Sam Blouir

@samblouir.bsky.social

General benchmark scores remain intact across 21 tasks on the EleutherAI LM Eval harness, and greatly improve on our new infilling task.

💡 With smarter training, we maintain SSMs’ efficiencies while dramatically enhancing their capabilities.

Table of the Story Infilling Task, where the model is given a causal story with 3-7 entries each. One entry is masked out and the model is then asked to choose the most likely option.

Hawk with Birdie gets 42.5% accuracy,
Hawk with a causal version of Birdie gets 41.5% accuracy.
Hawk with Next Token Prediction gets only 33.1%.
That is an enormous performance boost for Hawk trained with Birdie - 42.5% vs 33.1% accuracy.

A Transformer trained with Birdie gets 42.2% accuracy, and with Next Token Prediction, gets 41.9% accuracy. The performance difference here is more muted for the Transformer on this task, in contrast to the generative SQuAD V2 results, which saw the Transformer with Birdie pull ahead strongly.

November 18, 2024 at 5:28 PM

Sam Blouir

@samblouir.bsky.social

🔑 What's new?

• Dynamic Pre-training Curriculum: Optimized via Reinforcement Learning.

• Specialized Training Objectives: Tailored to SSMs' unique strengths.

• Bidirectional Processing: Maximizes fixed state capacity for extra performance.

November 18, 2024 at 5:28 PM

Sam Blouir

@samblouir.bsky.social

🌟 Stellar Results:

• Multi-Phone Number Retrieval: Birdie SSMs achieve 100% accuracy on single lookups; outperform standard SSMs even more as tasks become more complex.

• SQuAD V2: We match a Transformer's performance curve across sequence lengths, while standard SSMs fall behind.

Graph of the SQUAD V2 question-answering task. The X-axis shows the context length, showing the length of the tokenized Wikipedia articles used as context, and the Y-axis shows "Response Contains Labels", or the percentage of generated model responses that contained an acceptable answer to a question.

The SQUAD V2 question-answering task entails the model reading a Wikipedia article, then being immediately asked a question about what it just read. The information is always found in the article.

Training Hawk using BIrdie strongly outperforms using Next Token Prediction. Training with Next Token Prediction results in performance strongly declining when the Wikipedia article length increases to about 500 tokens.
In this 500 token scenario, Hawk trained using Next Token Prediction retrieves the exact label less than 10% of the time, while the Birdie procedure results in over 55% accuracy.

When the article is only 100 tokens long, Birdie retrieves the correct answer more than 40% of the time, while the Next Token Prediction model does this less than 30% of the time.
With Birdie, Hawk matches the "context length vs performance" curves of the Transformer trained with Next Token Prediction, but has slightly worse performance.

The Transformer trained with Birdie outperforms all models, with an average of about 75% accuracy, compared to the Next Token Prediction Transformer at 60%.
Hawk trained with Birdie gets around 50%.
Hawk trained with Next Token Prediction gets around 15%.

November 18, 2024 at 5:28 PM

Sam Blouir

@samblouir.bsky.social

🌟 Stellar Results:

• Multi-Phone Number Retrieval: Birdie SSMs achieve 100% accuracy on single lookups; outperform standard SSMs even more as tasks become more complex.

• SQuAD V2: We match a Transformer's performance curve across sequence lengths, while standard SSMs fall behind.

The SQUAD V2 question-answering task entails the model reading a Wikipedia article, then being immediately asked a question about what it just read. The information is always found in the article.

Training Hawk using BIrdie strongly outperforms using Next Token Prediction. Training with Next Token Prediction results in performance strongly declining when the Wikipedia article length increases to about 500 tokens.
In this 500 token scenario, Hawk trained using Next Token Prediction retrieves the exact label less than 10% of the time, while the Birdie procedure results in over 55% accuracy.

When the article is only 100 tokens long, Birdie retrieves the correct answer more than 40% of the time, while the Next Token Prediction model does this less than 30% of the time.
With Birdie, Hawk matches the "context length vs performance" curves of the Transformer trained with Next Token Prediction, but has slightly worse performance.

The Transformer trained with Birdie outperforms all models, with an average of about 75% accuracy, compared to the Next Token Prediction Transformer at 60%.
Hawk trained with Birdie gets around 50%.
Hawk trained with Next Token Prediction gets around 15%.

November 18, 2024 at 5:06 PM

Sam Blouir

@samblouir.bsky.social

🌟 Stellar Results:

• Multi-Phone Number Retrieval: Birdie SSMs achieve 100% accuracy on single lookups; outperform standard SSMs even more as tasks become more complex.

• SQuAD V2: We match a Transformer's performance curve across sequence lengths, while standard SSMs fall behind.

Hawk (SSM) trained using Birdie strongly outperforms Hawk trained using Next Token Prediction on the squad v2 question-answering task - which entails the model reading a Wikipedia article, then being immediately asked a question about what it just read. Hawk trained using Next Token Prediction strongly declines in performance when the wikipedia article length increases to about 500 tokens. In this scenario, Hawk retrieves the exact label less than 10% of the time. When the article was only 96 tokens long, it was correct about 25% of the time. Hawk trained using Birdie matches the performance curves of the Transformer trained with Next Token Prediction, but has slightly worse performance. The Transformer trained with BIrdie outperforms all models, with an average of about 75% accuracy, compared to the Next Token Prediction Transformer at 60%. Hawk trained with Birdie gets around 50%. Hawk trained with Birdie gets around 15%.

November 18, 2024 at 4:48 PM

Sam Blouir

@samblouir.bsky.social

🔑 What's new with Birdie?
• Dynamic Pre-training Curriculum: Optimized via Reinforcement Learning.

• Specialized Training Objectives: Tailored to SSMs' unique strengths.

• Bidirectional Processing: Maximizes fixed state capacity for extra performance.

November 18, 2024 at 4:48 PM

Sam Blouir

@samblouir.bsky.social

Would like to be added to this :)

November 18, 2024 at 4:27 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news