Shikhar
shikharb.bsky.social
Shikhar
@shikharb.bsky.social
PhD student WAVLab@LTI, CMU
Multimodality and multilinguality
prev. predoc Google Deepmind
Reposted by Shikhar
🔗 Resources for ESPnet-SDS:
📂 Codebase (part of ESPnet): github.com/espnet/espnet
📖 README & User Guide: github.com/espnet/espne...
🎥 Demo Video: www.youtube.com/watch?v=kI_D...
March 17, 2025 at 2:29 PM
Wait I thought the rock was named Dwayne Johnson
February 6, 2025 at 1:29 PM
gpu poverty is real
January 28, 2025 at 5:10 AM
Language bind arxiv.org/abs/2310.01852
Language as the pivoting modality instead of images. Different training dataset.
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
The video-language (VL) pretraining has achieved remarkable improvement in multiple downstream tasks. However, the current VL pretraining framework is hard to extend to multiple modalities (N modaliti...
arxiv.org
December 8, 2024 at 2:24 PM
🙋‍♂️
November 30, 2024 at 4:55 PM
🙋‍♂️🙏
November 24, 2024 at 11:49 PM
🙋‍♂️🙏
November 24, 2024 at 11:44 PM
🙋‍♂️
November 23, 2024 at 12:36 AM
bsky.app
November 22, 2024 at 11:09 PM