华东师范大学学报(自然科学版) ›› 2023, Vol. 2023 ›› Issue (2): 60-72.doi: 10.3969/j.issn.1000-5641.2023.02.008

• 计算机科学 • 上一篇    下一篇

目标依赖的新闻事件识别

张甜甜, 兰曼*()   

  1. 华东师范大学 计算机科学与技术学院, 上海 200062
  • 收稿日期:2021-09-28 出版日期:2023-03-25 发布日期:2023-03-23
  • 通讯作者: 兰曼 E-mail:lman@cs.ecnu.edu.cn

Target-dependent event detection from news

Tiantian ZHANG, Man LAN*()   

  1. School of Computer Science and Technology, East China Normal University, Shanghai 200062, China
  • Received:2021-09-28 Online:2023-03-25 Published:2023-03-23
  • Contact: Man LAN E-mail:lman@cs.ecnu.edu.cn

摘要:

海量新闻文本中往往涉及多个实体, 并蕴含复杂多样的事件. 为了挖掘这些实体、事件信息, 先前的以事件为中心的事件抽取方法大多先检测事件, 再抽取事件论元. 受限于触发词和事件识别, 该方法无法应用于真实工业场景下的新闻事件抽取. 考虑到命名实体识别(named entity recognition , NER)的性能达到90%以上, 提出了以目标实体为视角的事件抽取任务—目标依赖的事件识别(target-dependent event detection, TDED), 旨在抽取出实体并识别其对应的事件. 基于该任务, 提出了先抽取实体再识别目标级事件类型的两阶段模型框架. 该模型融合了事件关键词和句法依存距离特征, 能够学习目标依赖的上下文信息. 在构建好的真实中文金融数据集上的实验结果表明, 该模型抽取性能较佳, 即使在句中存在多个实体或事件的复杂情形下也能取得很好的性能表现.

关键词: 目标依赖, 事件识别, 实体识别, 事件关键词, 句法依存距离

Abstract:

In real-world scenarios, various events in the news are not only too nuanced and complex to distinguish, but also involve multiple entities. To address these problems, previous event-centric methods are designed to detect events first and then extract arguments, relying on imperfect performance for event trigger detection; this process, however, is unfit to deal with the sheer volume of news in the real world. Given that the performance of named entity recognition (NER) is satisfactory, we shift our perspective from an event-centric to a target-centric view. This paper proposes a new task: target-dependent event detection (TDED), which aims to extract target entities and detect their corresponding events. We also propose a semantic and syntactic aware approach to support thousands of target entity extractions first and subsequently the detection of dozens of event types; this approach can be applied to data from massive corporations. Experimental results on a real-world Chinese financial dataset demonstrated that our model outperformed previous methods, particularly in complex scenarios.

Key words: target-dependent, event detection, entity recognition, event keywords, syntactic dependency distance

中图分类号: