TV show Retrieval is a new multimodal retrieval task, in which a short video moment has to be localized from a large video (with subtitle) corpus, given a natural language query. Its associated TVR dataset is a large-scale, high-quality dataset consisting of 108,965 queries on 21,793 videos from 6 TV shows of diverse genres, where each query is associated with a tight temporal alignment. Read our paper
TVR text data files, including train/val/test-public sets annotations and subtitles:
We use the same set of videos as the TVQA dataset, click the button below to download 3FPS video frames. Note that you will be re-directed to TVQA website.
We provide a code base for you to start, which includes basic data preprocessing and analysis tools, feature extraction tools as well as our XML baseline model code. You can also find associated video features in the repo.
The ground-truth video name and timestamp annotations are not released for test-public set, you need to submit your model predictions to our evaluation server. Follow the instruction below:
Submission InstructionsFill out the Google Form below if you want to show your results on our Leaderboard:
This research is supported by NSF, DARPA, Google, and ARO.
Ask us questions: tvr-tvc-unc@googlegroups.com or jielei [at] cs.unc.edu.
TVR tests a system's ability of localizing a moment from a large video (with subtitle) corpus. The performance is measured by R@K (Recall@K, K=1, 10, 100), with temporal IoU = 0.7.
Rank | Model | R@1 | R@10 | R@100 |
---|---|---|---|---|
1 Sep 15, 2020 |
HERO
MS D365 AI Paper, Code |
6.21 | 19.34 | 36.66 |
2 Jan 20, 2020 |
XML
UNC Chapel Hill Paper, Code |
3.32 | 13.41 | 30.52 |
3 Jan 20, 2020 |
MEE + CAL ENS & INRIA & CIIRC + KAUST & Adobe Research & INRIA (Implemented by UNC) MEE Paper, CAL Paper |
0.66 | 3.09 | 12.03 |
4 Jan 20, 2020 |
MEE + ExCL ENS & INRIA & CIIRC + CMU (Implemented by UNC) MEE Paper, ExCL Paper |
0.40 | 1.73 | 2.87 |
5 Jan 20, 2020 |
Chance UNC Chapel Hill |
0.00 | 0.00 | 0.07 |