华东师范大学学报(自然科学版) ›› 2018, Vol. 2018 ›› Issue (5): 1-16.doi: 10.3969/j.issn.1000-5641.2018.05.001
• 综述论文 • 下一篇
余若男, 黄定江, 董启文
收稿日期:
2018-06-27
出版日期:
2018-09-25
发布日期:
2018-09-26
通讯作者:
黄定江,男,教授,研究方向为机器学习与人工智能及其在计算金融等跨领域中大数据的解析和应用.E-mail:djhuang@dase.ecnu.edu.cn.
E-mail:djhuang@dase.ecnu.edu.cn
作者简介:
余若男,女,硕士研究生,研究方向为深度学习与目标检测.E-mail:yrn130814232@163.com.
基金资助:
YU Ruo-nan, HUANG Ding-jiang, DONG Qi-wen
Received:
2018-06-27
Online:
2018-09-25
Published:
2018-09-26
摘要: 在大数据驱动应用的背景下,随着计算机硬件性能的提高,基于深度学习的目标检测和图像分割算法冲破了传统算法的瓶颈,成为当前计算机视觉领域的主流算法.而场景文字检测任务受到目标检测和图像分割算法发展的影响,近年来也有了极大的突破.这篇综述的目的主要有3个方面:介绍近5年场景文字检测工作进展;比较分析先进算法的优点及不足;总结该领域相关的基准数据集和评价方法.
中图分类号:
余若男, 黄定江, 董启文. 基于深度学习的场景文字检测研究进展[J]. 华东师范大学学报(自然科学版), 2018, 2018(5): 1-16.
YU Ruo-nan, HUANG Ding-jiang, DONG Qi-wen. Survey on scene text detection based on deep learning[J]. Journal of East China Normal University(Natural Sc, 2018, 2018(5): 1-16.
[1] ZHU Y, YAO C, BAI X. Scene text detection and recognition:Recent advances and future trends[J]. Front Comput Sci, 2014, 10(1):19-36. [2] YE Q, DOERMANN D. Text detection and recognition in imagery:A survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(7):1480-1500. [3] WANG K, BELONGIE S. Word spotting in the wild[C]//Computer Vision-ECCV 2010. Berlin:Springer, 2010:591-604. [4] NEUMANN L, MATAS J. Scene text localization and recognition with oriented stroke detection[C]//2013 IEEE International Conference on Computer Vision. IEEE, 2013:97-104. [5] JADERBERG M, VEDALDI A, ZISSERMAN A. Deep features for text spotting[C]//Computer Vision-ECCV 2014. Cham:Springer, 2014:512-528. [6] WANG T, WU D J, COATES A, et al. End-to-end text recognition with convolutional neural networks[C]//Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012). 2012:3304-3308. [7] EPSHTEIN B, OFEK E, WEXLER Y. Detecting text in natural scenes with stroke width transform[C]//2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2010:2963-2970. [8] MATAS J, CHUM O, URBAN M, et al. Robust wide baseline stereo from maximally stable extremal regions[J]. Image and Vision Computing, 2004, 22:761-767. [9] YAO C, BAI X, LIU W, et al. Detecting texts of arbitrary orientations in natural images[C]//2012 IEEE Conference on Computer Vision and Pattern Recognition. 2012:1083-1090. [10] KANG L, LI Y, DOERMANN D. Orientation robust text line detection in natural images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2014:4034-4041. [11] YIN X C, YIN X, HUANG K, et al. Robust text detection in natural scene images[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(5):970-983. [12] YIN X C, PEI W Y, ZHANG J, et al. Multi-orientation scene text detection with adaptive clustering[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9):1930-1937. [13] CHO H, SUNG M, JUN B. Canny text detector:Fast and robust scene text localization algorithm[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2016:3566-3573. [14] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2014:580-587. [15] GIRSHICK R. Fast R-CNN[C]//2015 IEEE International Conference on Computer Vision (ICCV). IEEE, 2015:1440-1448. [16] REN S, HE K, GIRSHICK R, et al. Faster R-CNN:Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017(6):1137-1149. [17] DAI J, LI Y, HE K, et al. R-FCN:Object detection via region-based fully convolutional networks[C]//Advances in Neural Information Processing Systems 29. NIPS, 2016:379-387. [18] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once:Unified, real-time object detection[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2016:779-788. [19] LIU W, ANGUELOV D, ERHAN D, et al. SSD:Single shot MultiBox detector[C]//European Conference on Computer Vision. Cham:Springer, 2016:21-37. [20] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems 25. NIPS, 2012:1097-1105. [21] UIJLINGS J R R, VAN DE SANDE K E A, GEVERS T, et al. Selective search for object recognition[J]. International Journal of Computer Vision, 2013, 104(2):154-171. [22] HE K, ZHANG X, REN S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[C]//Computer Vision-ECCV 2014. Cham:Springer, 2014:346-361. [23] REDMON J, FARHADI A. YOLO9000:Better, faster, stronger[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017:6517-6525. [24] REDMON J, FARHADI A. YOLOv3:An incremental improvement[J]. arXiv preprint, arXiv:1804. 02767v1[cs.CV] 8 Apr 2018. [25] CIRESAN D, GIUSTI A, GAMBARDELLA L M, et al. Deep neural networks segment neuronal membranes in electron microscopy images[G]//Advances in Neural Information Processing Systems 25. Curran Associates, Inc, 2012:2843-2851. [26] LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2015:3431-3440. [27] LI Y, QI H, DAI J, et al. Fully convolutional instance-aware semantic segmentation[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017:4438-4446. [28] HE K, GKIOXARI G, DOLLAR P, et al. Mask R-CNN[C]//2017 IEEE International Conferé nce on Computer Vision (ICCV). IEEE, 2017:2980-2988. [29] TIAN Z, HUANG W, HE T, et al. Detecting text in natural image with connectionist text proposal network[C]//European Conference on Computer Vision. Cham:Springer, 2016:56-72. [30] ZHONG Z, JIN L, ZHANG S, et al. DeepText:A unified framework for text proposal generation and text detection in natural images[J]. arXiv preprint, arXiv:1605. 07314v1[cs.CV] 24 May 2016. [31] JIANG Y, ZHU X, WANG X, et al. R2CNN:Rotational region CNN for orientation robust scene text detection[J]. arXiv preprint, arXiv:1706. 09579v2[cs.CV] 30 Jun 2017. [32] MA J, SHAO W, YE H, et al. Arbitrary-oriented scene text detection via rotation proposals[J]. arXiv preprint, arXiv:1703. 01086v3[cs.CV] 15 Mar 2018. [33] ZHANG S, LIU Y, JIN L, et al. Feature enhancement network:A refined scene text detector[J]. arXiv preprint, arXiv:1711. 04249v1[cs.CV] 12 Nov 2017. [34] GRAVES A, SCHMIDHUBER J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures[J]. Neural Networks, 2005, 18(5/6):602-610. [35] SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions[J]. arXiv preprint, arXiv:1409. 4842v1[cs.CV] 17 Sep 2014. [36] SHI B, BAI X, BELONGIE S. Detecting oriented text in natural images by linking segments[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017:3482-3490. [37] TIAN S, LU S, LI C. WeText:Scene text detection under weak supervision[C]//2017 IEEE International Conference on Computer Vision (ICCV). IEEE, 2017:1501-1509. [38] QIN S, MANDUCHI R. Cascaded segmentation-detection networks for word-level text spotting[C]//201714th IAPR International Conference on Document Analysis and Recognition (ICDAR). 2017:1275-1282. [39] HU H, ZHANG C, LUO Y, et al. WordSup:Exploiting word annotations for character based text detection[C]//2017 IEEE International Conference on Computer Vision (ICCV). IEEE, 2017:4950-4959. [40] ZHANG Z, ZHANG C, SHEN W, et al. Multi-oriented text detection with fully convolutional networks[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2016:4159-4167. [41] HE T, HUANG W, QIAO Y, et al. Accurate text localization in natural image with cascaded convolutional text network[J]. arXiv preprint, arXiv:1603. 09423v1[cs.CV] 31 Mar 2016. [42] YAO C, BAI X, SANG N, et al. Scene text detection via holistic, multi-channel prediction[J]. arXiv preprint, arXiv:1606. 09002v2[cs.CV] 5 Jul 2016. [43] POLZOUNOV A, ABLAVATSKI A, ESCALERA S, et al. Wordfence:Text detection in natural images with border awareness[C]//2017 IEEE International Conference on Image Processing (ICIP). IEEE, 2017:1222-1226. [44] DENG D, LIU H, LI X, et al. PixelLink:Detecting scene text via instance segmentation[J]. arXiv preprint, arXiv:1801. 01315v1[cs.CV] 4 Jan 2018. [45] YANG Q, CHENG M, ZHOU W, et al. Incep text:A new inception-text module with deformable PSROI pooling for multi-oriented scene text detection[C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI). 2018:1071-1077. [46] DAI Y, HUANG Z, GAO Y, et al. Fused text segmentation networks for multi-oriented scene text detection[J]. arXiv preprint, arXiv:1709. 03272v4[cs.CV] 7 May 2018. [47] HE W, ZHANG X Y, YIN F, et al. Deep direct regression for multi-oriented scene text detection[C]//2017 IEEE International Conference on Computer Vision (ICCV). IEEE, 2017:745-753. [48] JIANG F, HAO Z, LIU X. Deep scene text detection with connected component proposals[J]. arXiv preprint, arXiv:1708. 05133v1[cs.CV] 17 Aug 2017. [49] ZHOU X, YAO C, WEN H, et al. EAST:An efficient and accurate scene text detector[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017:2642-2651. [50] KIM K H, HONG S, ROH B, et al. PVANET:Deep but lightweight neural networks for real-time object detection[J]. arXiv preprint, arXiv:1608. 08021v3[cs.CV] 30 Sep 2016. [51] JADERBERG M, SIMONYAN K, VEDALDI A, et al. Reading text in the wild with convolutional neural networks[J]. International Journal of Computer Vision, 2016, 116(1):1-20. [52] GUPTA A, VEDALDI A, ZISSERMAN A. Synthetic data for text localisation in natural images[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2016:2315-2324. [53] LIAO M, SHI B, BAI X, et al. TextBoxes:A fast text detector with a single deep neural network[C]//31st AAAI Conference on Artificial Intelligence. 2017:4161-4167. [54] LI H, WANG P, SHEN C. Towards end-to-end text spotting with convolutional recurrent neural networks[C]//2017 IEEE International Conference on Computer Vision (ICCV). IEEE, 2017:5248-5256. [55] BUSTA M, NEUMANN L, MATAS J. Deep textspotter:An end-to-end trainable scene text localization and recognition framework[C]//Computer Vision (ICCV), 2017 IEEE International Conference on. IEEE, 2017:2223-2231. [56] LIAO M, SHI B, BAI X. TextBoxes++:A single-shot oriented scene text detector[J]. IEEE Transactions on Image Processing, 2018, 27(8):3676-3690. [57] BARTZ C, YANG H, MEINEL C. See:Towards semi-supervised end-to-end scene text recognition[J]. arXiv preprint, arXiv:1712. 05404v1[cs.CV] 14 Dec 2017. [58] LIU X, LIANG D, YAN S, et al. FOTS:Fast oriented text spotting with a unified network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2018:5676-5685. [59] JADERBERG M, SIMONYAN K, VEDALDI A, et al. Synthetic data and artificial neural networks for natural scene text recognition[J]. arXiv preprint, arXiv:1406. 2227v4[cs.CV] 9 Dec 2014. [60] SHI B, BAI X, YAO C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(11):2298-2304. [61] GRAVES A, FERNÁNDEZ S, GOMEZ F, et al. Connectionist temporal classification:Labelling unsegmented sequence data with recurrent neural networks[C]//Proceedings of the 23rd International Conference on Machine Learning. New York:ACM, 2006:369-376. [62] JADERBERG M, SIMONYAN K, ZISSERMAN A. Spatial transformer networks[C]//Advances in Neural Information Processing Systems 27. NIPS, 2015:2017-2025. [63] LUCAS S M, PANARETOS A, SOSA L, et al. ICDAR 2003 robust reading competitions:Entries, results, and future directions[J]. International Journal of Document Analysis and Recognition (IJDAR), 2005, 7(2/3):105-122. [64] LUCAS S M. ICDAR 2005 text locating competition results[C]//8th International Conference on Document Analysis and Recognition (ICDAR'05). 2005:80-84. [65] SHAHAB A, SHAFAIT F, DENGEL A. ICDAR 2011 robust reading competition challenge 2:Reading text in scene images[C]//Document Analysis and Recognition (ICDAR), 2011 International Conference on. IEEE, 2011:1491-1496. [66] KARATZAS D, SHAFAIT F, UCHIDA S, et al. ICDAR 2013 robust reading competition[C]//International Conference on Document Analysis and Recognition. IEEE Computer Society, 2013:1484-1493. [67] KARATZAS D, GOMEZ-BIGORDA L, NICOLAOU A, et al. ICDAR 2015 competition on robust reading[C]//International Conference on Document Analysis and Recognition. IEEE 2015:1156-1160. [68] NAYEF N, YIN F, BIZID I, et al. ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification-RRC-MLT[C]//201714th IAPR International Conference on Document Analysis and Recognition (ICDAR). 2017:1454-1459. [69] LEE S, CHO M S, JUNG K, et al. Scene text extraction with edge constraint and text collinearity[C]//201020th International Conference on Pattern Recognition. 2010:3983-3986. [70] NAGY R, DICKER A, MEYER-WEGENER K. NEOCR:A configurable dataset for natural image text recognition[C]//Camera-Based Document Analysis and Recognition. Berlin:Springer, 2011:150-163. [71] YI C, TIAN Y. Text string detection from natural scenes by structure-based partition and grouping[J]. IEEE Transactions on Image Processing, 2011, 20(9):2594-2605. [72] RISNUMAWAN A, SHIVAKUMARA P, CHAN C S, et al. A robust arbitrary text detection system for natural scene images[J]. Expert Systems with Applications, 2014, 41(18):8027-8048. [73] YAO C, BAI X, LIU W. A unified framework for multioriented text detection and recognition[J]. IEEE Transactions on Image Processing, 2014, 23(11):4737-4749. [74] YIN X C, PEI W Y, ZHANG J, et al. Multi-orientation scene text detection with adaptive clustering[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9):1930-1937. [75] 张树业. 深度模型及其在视觉文字分析中的应用[D]. 广州:华南理工大学, 2016. [76] VEIT A, MATERA T, NEUMANN L, et al. COCO-Text:Dataset and benchmark for text detection and recognition in natural images[J]. arXiv preprint, arXiv:1601. 07140v2[cs.CV] 19 Jun 2016. [77] SHI B, YAO C, LIAO M, et al. ICDAR2017 competition on reading chinese text in the wild (RCTW-17)[C]//Document Analysis and Recognition (ICDAR), 201714th IAPR International Conference on. IEEE, 2017:1429-1434. [78] CHNG C K, CHAN C S. Total-text:A comprehensive dataset for scene text detection and recognition[C]//201714th IAPR International Conference on Document Analysis and Recognition (ICDAR). 2017:935-942. [79] LIU Y L, JIN L W, ZHANG S T, et al. Detecting curve text in the wild:New dataset and new solution[J]. arXiv preprint, arXiv:1712. 02170v1[cs.CV] 6 Dec 2017. [80] YUAN T L, ZHU Z, XU K, et al. Chinese text in the wild[J]. arXiv preprint, arXiv:1803. 00085v1[cs.CV] 28 Feb 2018. [81] HUA X S, LIU W Y, ZHANG H J. An automatic performance evaluation protocol for video text detection algorithms[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2004, 14(4):498-507. [82] WOLF C, JOLION J M. Object count/area graphs for the evaluation of object detection and segmentation algorithms[J]. International Journal of Document Analysis and Recognition (IJDAR), 2006, 8(4):280-296. [83] EVERINGHAM M, ESLAMI S M A, GOOL L V, et al. The pascal visual object classes challenge:A retrospective[J]. International Journal of Computer Vision, 2015, 111(1):98-136. |
[1] | 刘波, 白晓东, 张更新, 沈俊, 谢继东, 赵来定, 洪涛. 深度学习在认知无线电中的应用研究综述[J]. 华东师范大学学报(自然科学版), 2021, 2021(1): 36-52. |
[2] | 张旭, 黄定江. 基于深度学习的铝材表面缺陷检测[J]. 华东师范大学学报(自然科学版), 2020, 2020(6): 105-114. |
[3] | 韩程程, 李磊, 刘婷婷, 高明. 语义文本相似度计算方法[J]. 华东师范大学学报(自然科学版), 2020, 2020(5): 95-112. |
[4] | 刘恒宇, 张天成, 武培文, 于戈. 知识追踪综述[J]. 华东师范大学学报(自然科学版), 2019, 2019(5): 1-15. |
[5] | 陈远哲, 匡俊, 刘婷婷, 高明, 周傲英. 共指消解技术综述[J]. 华东师范大学学报(自然科学版), 2019, 2019(5): 16-35. |
[6] | 杨康, 黄定江, 高明. 面向自动问答的机器阅读理解综述[J]. 华东师范大学学报(自然科学版), 2019, 2019(5): 36-52. |
[7] | 叶健, 赵慧. 基于大规模弹幕数据监听和情感分类的舆情分析模型[J]. 华东师范大学学报(自然科学版), 2019, 2019(3): 86-100. |
[8] | 袁培森, 张勇, 李美玲, 顾兴健. 基于深度哈希学习的商标图像检索研究[J]. 华东师范大学学报(自然科学版), 2018, 2018(5): 172-182. |
[9] | 刘贵如, 王陆林, 邹姗. 非均匀噪声下基于双剔除门限的恒虚警[2mm]目标检测算法[J]. 华东师范大学学报(自然科学版), 2018, 2018(1): 135-145. |
[10] | 刘贵如, 王陆林, 邹姗. 基于排序的自动剔除Switching-CFAR检测器[J]. 华东师范大学学报(自然科学版), 2017, (3): 120-132. |
[11] | 吴璋, 朱敏. 一种改进的水平集主动轮廓模型[J]. 华东师范大学学报(自然科学版), 2015, 2015(1): 161-171. |
[12] | 韩文文;王玲;陈优广 . 基于亚像素文本图像的分割算法 [J]. 华东师范大学学报(自然科学版), 2007, 2007(3): 100-106. |
[13] | 徐 盛;李志斌. 基于AOS格式的扩展Chan-Vese模型及多水平集图像分割方法[J]. 华东师范大学学报(自然科学版), 2006, 2006(3): 66-70,1. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||