华东师范大学学报(自然科学版) ›› 2024, Vol. 2024 ›› Issue (2): 131-142.doi: 10.3969/j.issn.1000-5641.2024.02.014

• 计算机科学 • 上一篇    

基于解耦常识性关联的图像描述生成算法

刘家伟, 林欣*()   

  1. 1. 华东师范大学 计算机科学与技术学院, 上海 200062
  • 收稿日期:2023-03-08 出版日期:2024-03-25 发布日期:2024-03-18
  • 通讯作者: 林欣 E-mail:xlin@cs.ecnu.edu.cn

An image caption generation algorithm based on decoupling commonsense association

Jiawei LIU, Xin LIN*()   

  1. 1. School of Computer Science and Technology, East China Normal University, Shanghai 200062, China
  • Received:2023-03-08 Online:2024-03-25 Published:2024-03-18
  • Contact: Xin LIN E-mail:xlin@cs.ecnu.edu.cn

摘要:

基于解耦常识性关联的图像描述生成算法旨在排除各类实体间常识性关联对模型推理的干扰, 提高描述生成的流畅性与准确性. 针对当前图像描述生成中存在的符合常识但与图像内容不相符的关系语句, 该算法先通过一种新颖的训练方式加强关系检测模型对图像中真实关系的关注程度, 提高关系推理的准确性. 再通过一种关系感知的实体交互方法, 对存在关系的实体进行有针对性的信息交互, 对关系信息进行强化. 实验表明, 该算法能够纠正一些常识性的虚假关系, 生成较为准确的图像描述, 并在各项评价指标上获得了较好的实验结果.

关键词: 图像描述生成, 解耦常识性关联, 注意力机制

Abstract:

The image caption generation algorithm based on decoupling commonsense association aims to eliminate the interference of commonsense association between various types of entities on the model reasoning, and improve the fluency and accuracy of the generated description. Aiming at the relationship sentences in the current image description that conform to common sense but do not conform to the image content, the algorithm first uses a novel training method to improve the attention of the relationship detection model to the real relationship in the image and improve the accuracy of relationship reasoning. Then, a relation-aware entity interaction method was used to carry out targeted information interaction for entities with relationships, and the relationship information was strengthened. The experimental results show that the proposed algorithm can correct some commonsense false relationships, generate more accurate image captions, and obtain better experimental results on various evaluation indicators.

Key words: image captioning, decoupling commonsense association, attention

中图分类号: