TVC Stats

TVC dataset is a large-scale multimodal video captioning dataset consisting of 261,490 descriptions on 108,965 moments from TVR. Each training moment in TVC is paired with 2 descriptions, and each testing moment is paired with 4 descriptions for more accurate evaluation.


Click the Play Moment button to play the video clip specified by the timestamp annotation. The salmon color label that follows the query sentence indicates which modality the annotator believe has to be used in localizing the corresponding query. The video come with subtitle, turn it on if you haven't see it. Best viewed in Chrome.


Ask us questions: or jielei [at]

TVC Samples

Handle Left
Handle Right