TVR Stats

TVR dataset is a large-scale, high-quality video (with subtitle) moment retrieval dataset consisting of 108,965 queries on 21,793 videos from 6 TV shows of diverse genres, where each query is associated with a tight temporal alignment. The queries in TVR can be related to both video and/or subtitle, below we show the query type distribution:

As TVR is collected on TV shows, queries often involve rich interactions between characters. Based on a random sample of 100 queries, we found 66% of the queries involve at least two people and 67% of them involve at least two actions. This makes TVR an interesting testbed for studying multimodal interactions between people.


Click the Play Moment button to play the video clip specified by the timestamp annotation. The salmon color label that follows the query sentence indicates which modality the annotator believe has to be used in localizing the corresponding query. The video come with subtitle, turn it on if you haven't see it. Best viewed in Chrome.


Ask us questions: or jielei [at]

TVR Samples

Handle Left
Handle Right