TVC dataset is a large-scale multimodal video captioning dataset consisting of 261,490 descriptions
on 108,965 moments from TVR.
Each training moment in TVC is paired with 2 descriptions, and each testing moment is paired with
4 descriptions for more accurate evaluation.
Click the Play Moment
button to play the video clip specified by the timestamp annotation.
The salmon color label
that follows the query sentence indicates which
modality the annotator believe has to be used in localizing the corresponding query.
The video come with subtitle, turn it on if you haven't see it. Best viewed in Chrome.
Ask us questions: email@example.com or jielei [at] cs.unc.edu.