📜Paper: arxiv.org/pdf/2410.05586
🎦Demo: wx83.github.io/TeaserGen_Of...
📜Paper: arxiv.org/pdf/2410.05586
🎦Demo: wx83.github.io/TeaserGen_Of...
1️⃣a pretraining-based model using pretrained contrastive language-vision models and
2️⃣a deep sequential model that learns the mapping between the narrations and visuals.
1️⃣a pretraining-based model using pretrained contrastive language-vision models and
2️⃣a deep sequential model that learns the mapping between the narrations and visuals.
1️⃣first, we generate the teaser narration given the transcribed narration of the documentary;
2️⃣then, we select the relevant visual content to accompany the generated narration.
1️⃣first, we generate the teaser narration given the transcribed narration of the documentary;
2️⃣then, we select the relevant visual content to accompany the generated narration.