TVR dataset is a large-scale, high-quality video (with subtitle) moment retrieval dataset consisting of 108,965 queries on 21,793 videos from 6 TV shows of diverse genres, where each query is associated with a tight temporal alignment. The queries in TVR can be related to both video and/or subtitle, below we show the query type distribution:
As TVR is collected on TV shows, queries often involve rich interactions between characters. Based on a random sample of 100 queries, we found 66% of the queries involve at least two people and 67% of them involve at least two actions. This makes TVR an interesting testbed for studying multimodal interactions between people.
Ask us questions: email@example.com or jielei [at] cs.unc.edu.