TV show Retrieval is a new multimodal retrieval task, in which a short video moment has to be localized from a large video (with subtitle) corpus, given a natural language query. Its associated TVR dataset is a large-scale, high-quality dataset consisting of 108,965 queries on 21,793 videos from 6 TV shows of diverse genres, where each query is associated with a tight temporal alignment. Read our paper
TVR text data files, including train/val/test-public sets annotations and subtitles:
We use the same set of videos as the TVQA dataset, click the button below to download 3FPS video frames. Note that you will be re-directed to TVQA website.
We provide a code base for you to start, which includes basic data preprocessing and analysis tools, feature extraction tools as well as our XML baseline model code. You can also find associated video features in the repo.
The ground-truth video name and timestamp annotations are not released for test-public set, you need to submit your model predictions to our evaluation server. Follow the instruction below:Submission Instructions
Fill out the Google Form below if you want to show your results on our Leaderboard:
This research is supported by NSF, DARPA, Google, and ARO.
Ask us questions: email@example.com or jielei [at] cs.unc.edu.
TVR tests a system's ability of localizing a moment from a large video (with subtitle) corpus. The performance is measured by R@K (Recall@K, K=1, 10, 100), with temporal IoU = 0.7.
1Jan 20, 2020
UNC Chapel HillPaper, Code
2Jan 20, 2020
|MEE + CAL
ENS & INRIA & CIIRC + KAUST & Adobe Research & INRIA (Implemented by UNC)MEE Paper, CAL Paper
3Jan 20, 2020
|MEE + ExCL
ENS & INRIA & CIIRC + CMU (Implemented by UNC)MEE Paper, ExCL Paper
4Jan 20, 2020
UNC Chapel Hill