1️⃣a pretraining-based model using pretrained contrastive language-vision models and
2️⃣a deep sequential model that learns the mapping between the narrations and visuals.
1️⃣a pretraining-based model using pretrained contrastive language-vision models and
2️⃣a deep sequential model that learns the mapping between the narrations and visuals.
1️⃣first, we generate the teaser narration given the transcribed narration of the documentary;
2️⃣then, we select the relevant visual content to accompany the generated narration.
1️⃣first, we generate the teaser narration given the transcribed narration of the documentary;
2️⃣then, we select the relevant visual content to accompany the generated narration.
🔍We explored a new task of generating teasers for long documentaries.
🤩We presented a new dataset, new models, and new evaluation metrics for teaser generation.
🔍We explored a new task of generating teasers for long documentaries.
🤩We presented a new dataset, new models, and new evaluation metrics for teaser generation.