Lightnews — Scholar-powered news

Beomseok Lee

@beomseok-lee.bsky.social

35 followers 10 following 4 posts

PhD student @uniTrento. Affiliated in @naverlabseurope and @fbk_mt. Ex research engineer @samsungresearch

Posts Replies Media Videos

Beomseok Lee

@beomseok-lee.bsky.social

Can we make Speech LLMs actually think as they listen? 👂💭
This fascinating work applies CoT inspired by human “thinking while listening”, training models to find the inflection point when reasoning starts.
📄 arxiv.org/abs/2510.07497

Can Speech LLMs Think while Listening?

Recent advances in speech large language models (speech LLMs) have enabled seamless spoken interactions, but these systems still struggle with complex reasoning tasks. Previously, chain-of-thought (Co...

arxiv.org

October 29, 2025 at 12:48 PM

Beomseok Lee

@beomseok-lee.bsky.social

🤔 Ever wondered how discrete tokens vs. continuous features behave in SpeechLLMs?
This new work dives into 6 SLU tasks and reveals some interesting takeaways!
arxiv.org/abs/2508.17863

Speech Discrete Tokens or Continuous Features? A Comparative Analysis for Spoken Language Understanding in SpeechLLMs

With the rise of Speech Large Language Models (SpeechLLMs), two dominant approaches have emerged for speech processing: discrete tokens and continuous features. Each approach has demonstrated strong c...

arxiv.org

August 28, 2025 at 9:02 AM

Beomseok Lee

@beomseok-lee.bsky.social

Speech-language models show promise in multimodal tasks—but how well are speech & text actually aligned? 🤔

This paper arxiv.org/abs/2505.19937 proposes a new metric to measure layer-wise correlation between the two, with a focus on SLU tasks. 🔍🗣️📄

ALAS: Measuring Latent Speech-Text Alignment For Spoken Language Understanding In Multimodal LLMs

Large Language Models (LLMs) are widely used in Spoken Language Understanding (SLU). Recent SLU models process audio directly by adapting speech input into LLMs for better multimodal learning. A key c...

arxiv.org

June 11, 2025 at 12:53 PM

Beomseok Lee

@beomseok-lee.bsky.social

Should speech come before the instruction text, or should the instruction text come first in a speech-language model?
Find out the best positioning for speech and text—and the novel adapter that aligns speech and text modalities!
arxiv.org/abs/2412.01145

AlignFormer: Modality Matching Can Achieve Better Zero-shot Instruction-Following Speech-LLM

Integrating speech into LLM (speech-LLM) has gaining increased attention recently. The mainstream solution is to connect a well-trained speech encoder and LLM with a neural adapter. However, the lengt...

arxiv.org

April 3, 2025 at 10:42 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news