This paper introduces MovieCORE, a novel video question answering (VQA) dataset designed to probe deeper cognitive understanding of movie content.
Unlike existing datasets that focus on surface-level comprehension, MovieCORE emphasizes thought-provoking questions that engage System-2 thinking while remaining specific to the video material. We propose an innovative agentic brainstorming approach, utilizing multiple large language models (LLMs) as thought agents to generate and refine highquality question-answer pairs. To evaluate dataset quality, we develop a set of cognitive tests assessing depth, thoughtprovocation potential, and syntactic complexity. We also propose a comprehensive evaluation scheme for assessing VQA model performance on deeper cognitive tasks. Our work contributes to advancing movie understanding in AI systems and provides valuable insights into the capabilities and limitations of current VQA models when faced with more challenging, nuanced questions about cinematic content. We will make our agentic annotation system, the dataset and its metadata publicly available.
The Critic Agent, acting as the master of ceremonies (MC), orchestrates interactions among specialized agents using video context and task instructions. It sequentially engages the System II VQA Expert, Skeptical Researcher, Detective, and Meta Reviewer, accumulating insights at each stage
Upon receiving final recommendations from the Meta Reviewer, the MC relays them to the System II VQA Expert for VQA refinement. Subsequently, a subset of these refined VQAs undergoes evaluation by human experts for final validation.
MovieCORE, unlike other datasets priotizes system-2 thinking, leading to more depth in the QAs.
Single-Pass versus Agentic Annotation. The agentic method (bottom) elicits specific scene details, concrete examples, and detailed story elements, demonstrating the enhanced granularity achieved through multi-agent refinement.
@inproceedings{faure2moviecore,
title={MovieCORE: COgnitive REasoning in Movies},
author={Faure, Gueter Josmy and Chen, Min-Hung and Yeh, Jia-Fong and Cheng, Ying and Su, Hung-Ting and Lai, Shang-Hong and Hsu, Winston H},
booktitle={The First Workshop on System-2 Reasoning at Scale, NeurIPS'24}
}