| 1 |
PREMACK D, WOODRUFF G.. Does the chimpanzee have a theory of mind?. Behavioral and Brain Sciences, 1978 (4): 515- 526.
|
| 2 |
SHU T, BHANDWALDAR A, GAN C, et al. AGENT: A benchmark for core psychological reasoning [C]// International Conference on Machine Learning. 2021: 9614-9625.
|
| 3 |
BISWAS-DIENER R, DIENER E. Theory of mind [EB/OL]. (2021-09-13) [2024-01-02]. https://nobaproject.com/modules/theory-of-mind.
|
| 4 |
WIMMER H, PERNER J.. Beliefs about beliefs: Representation and constraining function of wrong beliefs in young children’s understanding of deception. Cognition, 1983, 13 (1): 103- 128.
|
| 5 |
KIM J, MA M, KIM K, et al. Gaining extra supervision via multi-task learning for multi-modal video question answering [C]// 2019 International Joint Conference on Neural Networks (IJCNN). IEEE, 2019: 1-8.
|
| 6 |
WANG A R, LUU A T, FOO C S, et al. Holistic multi-modal memory network for movie question answering [J]. IEEE Transactions on Image Processing, 2019, 29: 489-499.
|
| 7 |
GAO J Y, GE R Z, CHEN K, et al. Motion-appearance co-memory networks for video question answering [C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2018: 6576-6585.
|
| 8 |
GARCIA N, NAKASHIMA Y. Knowledge-based video question answering with unsupervised scene descriptions [C]// Computer Vision – ECCV 2020. 2020: 581-598.
|
| 9 |
WANG J Y, BAO B K, XU C S.. DualVGR: A dual-visual graph reasoning unit for video question answering. IEEE Transactions on Multimedia, 2021, 24, 3369- 3380.
|
| 10 |
YANG A, MIECH A, SIVIC J, et al. Zero-shot video question answering via frozen bidirectional language models [C]// Proceedings of the 36th International Conference on Neural Information Processing Systems. ACM, 2022: 124-141.
|
| 11 |
PUIG X, SHU T, LI S, et al. Watch-and-help: A challenge for social perception and human-ai collaboration [EB/OL]. (2021-05-03)[2024-01-05]. https://arxiv.org/pdf/2010.09890.
|
| 12 |
MAO Y, LIN X, NI Q, et al. BDIQA: a new dataset for video question answering to explore cognitive reasoning through theory of mind [C]// Proceedings of the AAAI Conference on Artificial Intelligence. AAAI, 2024: 583-591.
|
| 13 |
BAE W, YOO J, YE J C. Beyond deep residual learning for image restoration: Persistent homology-guided manifold simplification [C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, 2017: 1141-1149.
|
| 14 |
YANG Z K, GARCIA N, CHU C H, et al. BERT representations for video question answering [C]// 2020 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 2020: 1556-1565.
|
| 15 |
LE T M, LE V, VENKATESH S, et al. Hierarchical conditional relation networks for video question answering [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 9972-9981.
|
| 16 |
FAN C Y, ZHANG X F, ZHANG S, et al. Heterogeneous memory enhanced multimodal attention model for video question answering [C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2019: 1999-2007.
|
| 17 |
JIANG P, HAN Y H.. Reasoning with heterogeneous graph alignment for video question answering. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34 (7): 11109- 11116.
|
| 18 |
CHEN G Y, LIU X, WANG G R, et al. Tem-adapter: Adapting image-text pretraining for video question answer [C]// 2023 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, 2023: 13945-13955.
|