Lightnews — Scholar-powered news

Reposted by Shikhar

Kwanghee Choi

@juice500ml.bsky.social

Can we make discrete speech units lightweight🪶 and streamable🏎? Excited to share our new #Interspeech2025 paper: On-device Streaming Discrete Speech Units arxiv.org/abs/2506.01845 (1/n)

August 15, 2025 at 8:44 PM

Shikhar

@shikharb.bsky.social

Meows, music, murmurs and more - we trained a general purpose audio encoder and open sourced the code, checkpoint and evaluation toolkit.

arXiv Sound @arxiv-sound.bsky.social · Jul 21

OpenBEATs, an open-source audio encoder, extends BEATs via multi-domain pre-training; achieves state-of-the-art performance on bioacoustics, environmental sound, and reasoning datasets, outperforming larger models.

OpenBEATs: A Fully Open-Source General-Purpose Audio Encoder

Shikhar Bharadwaj, Samuele Cornell, Kwanghee Choi, Satoru Fukayama, Hye-jin Shim, Soham Deshmukh, Shinji Watanabe

arxiv.org

July 22, 2025 at 3:36 AM

Reposted by Shikhar

Earth Species Project (ESP)

@earthspecies.bsky.social

📢 We've open-sourced NatureLM-audio, the first audio-language foundation model for #bioacoustics.

Trained on large-scale animal vocalization, human speech & music datasets, the model enables zero-shot classification, detection & querying across diverse species & environments 👇🏽

April 24, 2025 at 3:55 PM

Reposted by Shikhar

siddhant-arora.bsky.social

@siddhant-arora.bsky.social

🔗 Resources for ESPnet-SDS:
📂 Codebase (part of ESPnet): github.com/espnet/espnet
📖 README & User Guide: github.com/espnet/espne...
🎥 Demo Video: www.youtube.com/watch?v=kI_D...

March 17, 2025 at 2:29 PM

Reposted by Shikhar

siddhant-arora.bsky.social

@siddhant-arora.bsky.social

New #NAACL2025 demo, Excited to introduce ESPnet-SDS, a new open-source toolkit for building unified web interfaces for both cascaded & end-to-end spoken dialogue system, providing real-time evaluation, and more!
📜: arxiv.org/abs/2503.08533
Live Demo: huggingface.co/spaces/Siddh...

March 17, 2025 at 2:29 PM

Reposted by Shikhar

siddhant-arora.bsky.social

@siddhant-arora.bsky.social

🚀 New #ICLR2025 Paper Alert! 🚀

Can Audio Foundation Models like Moshi and GPT-4o truly engage in natural conversations? 🗣️🔊

We benchmark their turn-taking abilities and uncover major gaps in conversational AI. 🧵👇

📜: arxiv.org/abs/2503.01174

March 5, 2025 at 4:03 PM

Reposted by Shikhar

Dom Joly

@realdomjoly.bsky.social

Happy New Year

January 2, 2025 at 11:21 PM

Reposted by Shikhar

arxiv cs.DS

@arxiv-cs-ds.bsky.social

Philip Whittington, Gregor Bachmann, Tiago Pimentel
Tokenisation is NP-Complete
https://arxiv.org/abs/2412.15210

December 20, 2024 at 5:18 AM

Reposted by Shikhar

Earth Species Project (ESP)

@earthspecies.bsky.social

Today, we’re introducing NatureLM-audio: the first large audio-language model tailored for understanding animal sounds. arxiv.org/abs/2411.07186 🧵👇

December 5, 2024 at 12:45 AM

Reposted by Shikhar

Guilherme Penedo

@guilherme.hf.co

Announcing 🥂 FineWeb2: A sparkling update with 1000s of 🗣️languages.

We applied the same data-driven approach that led to SOTA English performance in🍷 FineWeb to thousands of languages.

🥂 FineWeb2 has 8TB of compressed text data and outperforms other datasets.

December 8, 2024 at 9:19 AM

Reposted by Shikhar

WAVLab@CMU

@wavlab.bsky.social

WAVLab is up in bsky!

December 6, 2024 at 7:15 PM

Reposted by Shikhar

Shinji Watanabe

@shinjiw.bsky.social

We are excited to announce the launch of ML SUPERB 2.0 (multilingual.superbbenchmark.org) as part of the Interspeech 2024 official challenge! We hope this upgraded version of ML SUPERB advances universal access to speech processing worldwide. Please join it!

#Interspeech2025

December 4, 2024 at 2:45 PM

Reposted by Shikhar

Grzegorz Chrupała

@grzegorz.chrupala.me

I've started putting together a starter pack with people working on Speech Technology and Speech Science: go.bsky.app/BQ7mbkA

(Self-)nominations welcome!

November 19, 2024 at 11:13 AM