Alexandre Défossez
@honualx.bsky.social
140 followers 59 following 9 posts
Chief Exploration Officer @kyutai-labs.bsky.social in Paris.
Posts Media Videos Starter Packs
honualx.bsky.social
Learn more about it with Vaclav, lead engineer for Unmute, getting interviewed by the AI:
honualx.bsky.social
We just released unmute.sh 🔇🔊
It is a text LLM wrapper, based on in-house streaming ASR, TTS, semantic VAD to reduce latency. ⏱️
Unlike Moshi 🟢, Unmute 🔊 is turn base, but allows customization in two clicks🖱️: voice and prompt!
Paper and open source coming soon.
honualx.bsky.social
We just open sourced a fine tuning codebase for Moshi!
kyutai-labs.bsky.social
Have you enjoyed talking to 🟢Moshi and dreamt of making your own speech to speech chat experience🧑‍🔬🤖? It's now possible with the moshi-finetune codebase! Plug your own dataset and change the voice/tone/personality of Moshi 💚🔌💿. An example after finetuning w/ only 20 hours of the DailyTalk dataset. 🧵
honualx.bsky.social
Just back from holidays, so a bit late, to announce MoshiVis, extending Moshi's multimodal capabilities to take in images 📷.
Only 200M weights were added to plug a ViT through cross attention with gating 🖼️🔀🎤
Training relies on a mix of text only and text+audio synthetic data (~20k hours) 💽
kyutai-labs.bsky.social
Meet MoshiVis🎙️🖼️, the first open-source real-time speech model that can talk about images!

It sees, understands, and talks about images — naturally, and out loud.

This opens up new applications, from audio description for the visual impaired to visual access to information.
honualx.bsky.social
I'll start my presentation in 10 minutes, you can join in Zoom: concordia-ca.zoom.us/j/81541793947
See you there!
honualx.bsky.social
I'll present a dive into Moshi 🟢 and our translation model Hibiki 🇫🇷♻️🇬🇧 as part of the next @convai-rg.bsky.social reading group 👨‍🏫📗.

📅 13th of March 🕰️ 11am ET, 4pm in Paris.

I'll discuss Mimi 🗜️ and multi-stream audio modeling 🔊.
Join on Zoom, replay on YT.

⬛ ⬛ 🟧 🟧 🟨 🟨 🟩 🟩 🟩 ⬛
⬛ 🟧 🟧 🟨 🟨 🟩 🟩 🟩 ⬛ ⬛
convai-rg.bsky.social
📢 Join our Conversational AI Reading Group!
📅 Thursday, March 13 | 11 AM - 12 PM EST
🎙Speaker: Alexandre Defossez
📖 Topic: "Moshi: a speech-text foundation model for real-time dialogue"
🔗 Details: (poonehmousavi.github.io/rg)
▶️ Missed a session? Watch on YouTube: (www.youtube.com/@CONVAI_RG) 🚀
honualx.bsky.social
I'll present a dive into Moshi 🟢 and our translation model Hibiki 🇫🇷♻️🇬🇧 as part of the next @convai-rg.bsky.social reading group 👨‍🏫📗.

📅 13th of March 🕰️ 11am ET, 4pm in Paris.

I'll discuss Mimi 🗜️ and multi-stream audio modeling 🔊.
Join on Zoom, replay on YT.

⬛ ⬛ 🟧 🟧 🟨 🟨 🟩 🟩 🟩 ⬛
⬛ 🟧 🟧 🟨 🟨 🟩 🟩 🟩 ⬛ ⬛
convai-rg.bsky.social
📢 Join our Conversational AI Reading Group!
📅 Thursday, March 13 | 11 AM - 12 PM EST
🎙Speaker: Alexandre Defossez
📖 Topic: "Moshi: a speech-text foundation model for real-time dialogue"
🔗 Details: (poonehmousavi.github.io/rg)
▶️ Missed a session? Watch on YouTube: (www.youtube.com/@CONVAI_RG) 🚀
Pooneh Mousavi
Homepage of Pooneh Mousavi
poonehmousavi.github.io
Reposted by Alexandre Défossez
kyutai-labs.bsky.social
Even Kavinsky 🎧🪩 can't break Hibiki! Just like Moshi, Hibiki is robust to extreme background conditions 💥🔊.
Reposted by Alexandre Défossez
Reposted by Alexandre Défossez
jeanremiking.bsky.social
Our latest studies on the decoding text from brain activity, reviewed by MIT Tech Review @technologyreview.com

www.technologyreview.com/2025/02/07/1...
honualx.bsky.social
Excited to meet and exchange with a number of actors from all around the world at the AI Summit 🌍
honualx.bsky.social
We just released Hibiki, a 🎙️-to-🔊 simultaneous translation model 🇫🇷🇬🇧
We leverage a large synthetic corpus synthesized from the text translation model MADLAD, and our own TTS + simple lag rule.
Model is decoder only, runs at scale, even on device 📲
github.com/kyutai-labs/hibiki
kyutai-labs.bsky.social
Meet Hibiki, our simultaneous speech-to-speech translation model, currently supporting 🇫🇷➡️🇬🇧.
Hibiki produces spoken and text translations of the input speech in real-time, while preserving the speaker’s voice and optimally adapting its pace based on the semantic content of the source speech. 🧵
Reposted by Alexandre Défossez
jeanremiking.bsky.social
🚨Job alert (Please RT)

What: masters internship and/or PhD positions
Where: Rothschild Foundation Hospital (Paris, France)
Topic: AI and Neuroscience
Supervised by: Pierre Bourdillon and myself
Apply here: forms.gle/KKnea2QAjhAe...
Deadline: Feb 5th
forms.gle
honualx.bsky.social
We just released the Helium-1 model , a 2B multi-lingual LLM which @exgrv.bsky.social and @lmazare.bsky.social have been crafting for us! Best model so far under 2.17B params on multi-lingual benchmarks 🇬🇧🇮🇹🇪🇸🇵🇹🇫🇷🇩🇪
On HF, under CC-BY licence: huggingface.co/kyutai/heliu...