计算机科学

裁定文书中企业破产事件的自动化抽取

  • 杨佳乐 ,
  • 王俊豪 ,
  • 钱卫宁 ,
  • 罗轶凤
展开
  • 华东师范大学 数据科学与工程学院, 上海 200062

收稿日期: 2019-08-26

  网络出版日期: 2020-07-20

基金资助

国家重点研发计划(2018YFC0831900)

Automatic extraction of corporate bankruptcy-related events from ruling documents

  • YANG Jiale ,
  • WANG Junhao ,
  • QIAN Weining ,
  • LUO Yifeng
Expand
  • School of Data Science and Engineering, East China Normal University, Shanghai 200062, China

Received date: 2019-08-26

  Online published: 2020-07-20

摘要

提出了一种企业破产事件抽取框架, 该框架可以从法律裁定书等卷宗资料中检测出相应的法律事件, 并抽取出与事件相关的结构化要素信息. 该框架结合从法院所获得的裁定书等卷宗信息, 运用远程监督技术来构建模型训练数据; 再通过命名实体识别技术对句级别的文书进行序列标注; 最后结合自定义的事件触发词表与事件字典, 运用事件抽取技术对法律文书进行事件识别, 并给出对应事件的结构化信息. 实验结果表明本框架能够取得较高的事件识别精度, 是一种有效的企业破产事件抽取框架.

本文引用格式

杨佳乐 , 王俊豪 , 钱卫宁 , 罗轶凤 . 裁定文书中企业破产事件的自动化抽取[J]. 华东师范大学学报(自然科学版), 2020 , 2020(4) : 88 -97 . DOI: 10.3969/j.issn.1000-5641.201921015

Abstract

This paper proposes a framework for extracting corporate bankruptcy-related events from ruling documents and thus extracts structured information about the related events. Combined with ruling documents, our framework uses distant supervision to generate training data; applies named entity recognition techniques to implement sequence label tagging on sentences of litigation documents; and implements event extraction with a self-defined list of event trigger words as well as an event dictionary to detect bankruptcy-related events and gather structured information. Our experimental results demonstrate the effectiveness of the framework.

参考文献

[1] MCCALLUM A, FREITAG D, PEREIRA F. Maximum entropy markov models for information extraction and segmentation [C]//ICML, 2000, 17: 591-598.
[2] LAFFERTY J, MCCALLUM A, PEREIRA F. Conditional random fields: Probabilistic models for segmenting and labeling sequence data [C]//Proc 18th International Conf on Machine Learning, New York: ACM, 2001: 282-289.
[3] COLLOBERT R, WESTON J, BOTTOU L, et al. Natural language processing (almost) from scratch [J]. Journal of Machine Learning Research, 2011(12): 2493-2537.
[4] HUANG Z, XU W, YU K. Bidirectional LSTM-CRF Models for sequence tagging [J]. Computer Science, 2015: 1508. 01991v1.
[5] 高丹, 彭敦陆, 刘丛. 海量法律文书中基于CNN的实体关系抽取技术 [J]. 小型微型计算机系统, 2018, 39(5): 1021-1026. DOI: 10.3969/j.issn.1000-1220.2018.05.028
[6] KOTSIANTIS S B, ZAHARAKIS I, PINTELAS P. Supervised machine learning: A review of classification techniques [J]. Emerging Artificial Intelligence Applications in Computer Engineering, 2007, 160: 3-24.
[7] BELAVAGI M C, MUNIYAL B. Performance evaluation of supervised machine learning algorithms for intrusion detection [J]. Procedia Computer Science, 2016, 89: 117-123. DOI: 10.1016/j.procs.2016.06.016.
[8] CARLSON A, BETTERIDGE J, WANG R C, et al. Coupled semi-supervised learning for information extraction [C]//Proceedings of the Third ACM International Conference on Web Search and Data Mining. New York: ACM, 2010: 101-110.
[9] HAN J, NGAN K N, LI M, et al. Unsupervised extraction of visual attention objects in color images [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2005, 16(1): 141-145.
[10] ZENG D, LIU K, CHEN Y, et al. Distant supervision for relation extraction via piecewise convolutional neural networks [C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. New York: ACM, 2015: 1753-1762.
[11] MINTZ M, BILLS S, SNOW R, et al. Distant supervision for relation extraction without labeled data [C]//Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2. Association for Computational Linguistics, 2009: 1003-1011.
[12] 王礼敏. 面向法律文书的中文命名实体识别方法研究 [D]. 江苏 苏州: 苏州大学, 2018.
文章导航

/