[1] ZHU Y, YAO C, BAI X. Scene text detection and recognition:Recent advances and future trends[J]. Front Comput Sci, 2014, 10(1):19-36.
[2] YE Q, DOERMANN D. Text detection and recognition in imagery:A survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(7):1480-1500.
[3] WANG K, BELONGIE S. Word spotting in the wild[C]//Computer Vision-ECCV 2010. Berlin:Springer, 2010:591-604.
[4] NEUMANN L, MATAS J. Scene text localization and recognition with oriented stroke detection[C]//2013 IEEE International Conference on Computer Vision. IEEE, 2013:97-104.
[5] JADERBERG M, VEDALDI A, ZISSERMAN A. Deep features for text spotting[C]//Computer Vision-ECCV 2014. Cham:Springer, 2014:512-528.
[6] WANG T, WU D J, COATES A, et al. End-to-end text recognition with convolutional neural networks[C]//Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012). 2012:3304-3308.
[7] EPSHTEIN B, OFEK E, WEXLER Y. Detecting text in natural scenes with stroke width transform[C]//2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2010:2963-2970.
[8] MATAS J, CHUM O, URBAN M, et al. Robust wide baseline stereo from maximally stable extremal regions[J]. Image and Vision Computing, 2004, 22:761-767.
[9] YAO C, BAI X, LIU W, et al. Detecting texts of arbitrary orientations in natural images[C]//2012 IEEE Conference on Computer Vision and Pattern Recognition. 2012:1083-1090.
[10] KANG L, LI Y, DOERMANN D. Orientation robust text line detection in natural images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2014:4034-4041.
[11] YIN X C, YIN X, HUANG K, et al. Robust text detection in natural scene images[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(5):970-983.
[12] YIN X C, PEI W Y, ZHANG J, et al. Multi-orientation scene text detection with adaptive clustering[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9):1930-1937.
[13] CHO H, SUNG M, JUN B. Canny text detector:Fast and robust scene text localization algorithm[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2016:3566-3573.
[14] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2014:580-587.
[15] GIRSHICK R. Fast R-CNN[C]//2015 IEEE International Conference on Computer Vision (ICCV). IEEE, 2015:1440-1448.
[16] REN S, HE K, GIRSHICK R, et al. Faster R-CNN:Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017(6):1137-1149.
[17] DAI J, LI Y, HE K, et al. R-FCN:Object detection via region-based fully convolutional networks[C]//Advances in Neural Information Processing Systems 29. NIPS, 2016:379-387.
[18] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once:Unified, real-time object detection[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2016:779-788.
[19] LIU W, ANGUELOV D, ERHAN D, et al. SSD:Single shot MultiBox detector[C]//European Conference on Computer Vision. Cham:Springer, 2016:21-37.
[20] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems 25. NIPS, 2012:1097-1105.
[21] UIJLINGS J R R, VAN DE SANDE K E A, GEVERS T, et al. Selective search for object recognition[J]. International Journal of Computer Vision, 2013, 104(2):154-171.
[22] HE K, ZHANG X, REN S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[C]//Computer Vision-ECCV 2014. Cham:Springer, 2014:346-361.
[23] REDMON J, FARHADI A. YOLO9000:Better, faster, stronger[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017:6517-6525.
[24] REDMON J, FARHADI A. YOLOv3:An incremental improvement[J]. arXiv preprint, arXiv:1804. 02767v1[cs.CV] 8 Apr 2018.
[25] CIRESAN D, GIUSTI A, GAMBARDELLA L M, et al. Deep neural networks segment neuronal membranes in electron microscopy images[G]//Advances in Neural Information Processing Systems 25. Curran Associates, Inc, 2012:2843-2851.
[26] LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2015:3431-3440.
[27] LI Y, QI H, DAI J, et al. Fully convolutional instance-aware semantic segmentation[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017:4438-4446.
[28] HE K, GKIOXARI G, DOLLAR P, et al. Mask R-CNN[C]//2017 IEEE International Conferé nce on Computer Vision (ICCV). IEEE, 2017:2980-2988.
[29] TIAN Z, HUANG W, HE T, et al. Detecting text in natural image with connectionist text proposal network[C]//European Conference on Computer Vision. Cham:Springer, 2016:56-72.
[30] ZHONG Z, JIN L, ZHANG S, et al. DeepText:A unified framework for text proposal generation and text detection in natural images[J]. arXiv preprint, arXiv:1605. 07314v1[cs.CV] 24 May 2016.
[31] JIANG Y, ZHU X, WANG X, et al. R2CNN:Rotational region CNN for orientation robust scene text detection[J]. arXiv preprint, arXiv:1706. 09579v2[cs.CV] 30 Jun 2017.
[32] MA J, SHAO W, YE H, et al. Arbitrary-oriented scene text detection via rotation proposals[J]. arXiv preprint, arXiv:1703. 01086v3[cs.CV] 15 Mar 2018.
[33] ZHANG S, LIU Y, JIN L, et al. Feature enhancement network:A refined scene text detector[J]. arXiv preprint, arXiv:1711. 04249v1[cs.CV] 12 Nov 2017.
[34] GRAVES A, SCHMIDHUBER J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures[J]. Neural Networks, 2005, 18(5/6):602-610.
[35] SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions[J]. arXiv preprint, arXiv:1409. 4842v1[cs.CV] 17 Sep 2014.
[36] SHI B, BAI X, BELONGIE S. Detecting oriented text in natural images by linking segments[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017:3482-3490.
[37] TIAN S, LU S, LI C. WeText:Scene text detection under weak supervision[C]//2017 IEEE International Conference on Computer Vision (ICCV). IEEE, 2017:1501-1509.
[38] QIN S, MANDUCHI R. Cascaded segmentation-detection networks for word-level text spotting[C]//201714th IAPR International Conference on Document Analysis and Recognition (ICDAR). 2017:1275-1282.
[39] HU H, ZHANG C, LUO Y, et al. WordSup:Exploiting word annotations for character based text detection[C]//2017 IEEE International Conference on Computer Vision (ICCV). IEEE, 2017:4950-4959.
[40] ZHANG Z, ZHANG C, SHEN W, et al. Multi-oriented text detection with fully convolutional networks[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2016:4159-4167.
[41] HE T, HUANG W, QIAO Y, et al. Accurate text localization in natural image with cascaded convolutional text network[J]. arXiv preprint, arXiv:1603. 09423v1[cs.CV] 31 Mar 2016.
[42] YAO C, BAI X, SANG N, et al. Scene text detection via holistic, multi-channel prediction[J]. arXiv preprint, arXiv:1606. 09002v2[cs.CV] 5 Jul 2016.
[43] POLZOUNOV A, ABLAVATSKI A, ESCALERA S, et al. Wordfence:Text detection in natural images with border awareness[C]//2017 IEEE International Conference on Image Processing (ICIP). IEEE, 2017:1222-1226.
[44] DENG D, LIU H, LI X, et al. PixelLink:Detecting scene text via instance segmentation[J]. arXiv preprint, arXiv:1801. 01315v1[cs.CV] 4 Jan 2018.
[45] YANG Q, CHENG M, ZHOU W, et al. Incep text:A new inception-text module with deformable PSROI pooling for multi-oriented scene text detection[C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI). 2018:1071-1077.
[46] DAI Y, HUANG Z, GAO Y, et al. Fused text segmentation networks for multi-oriented scene text detection[J]. arXiv preprint, arXiv:1709. 03272v4[cs.CV] 7 May 2018.
[47] HE W, ZHANG X Y, YIN F, et al. Deep direct regression for multi-oriented scene text detection[C]//2017 IEEE International Conference on Computer Vision (ICCV). IEEE, 2017:745-753.
[48] JIANG F, HAO Z, LIU X. Deep scene text detection with connected component proposals[J]. arXiv preprint, arXiv:1708. 05133v1[cs.CV] 17 Aug 2017.
[49] ZHOU X, YAO C, WEN H, et al. EAST:An efficient and accurate scene text detector[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017:2642-2651.
[50] KIM K H, HONG S, ROH B, et al. PVANET:Deep but lightweight neural networks for real-time object detection[J]. arXiv preprint, arXiv:1608. 08021v3[cs.CV] 30 Sep 2016.
[51] JADERBERG M, SIMONYAN K, VEDALDI A, et al. Reading text in the wild with convolutional neural networks[J]. International Journal of Computer Vision, 2016, 116(1):1-20.
[52] GUPTA A, VEDALDI A, ZISSERMAN A. Synthetic data for text localisation in natural images[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2016:2315-2324.
[53] LIAO M, SHI B, BAI X, et al. TextBoxes:A fast text detector with a single deep neural network[C]//31st AAAI Conference on Artificial Intelligence. 2017:4161-4167.
[54] LI H, WANG P, SHEN C. Towards end-to-end text spotting with convolutional recurrent neural networks[C]//2017 IEEE International Conference on Computer Vision (ICCV). IEEE, 2017:5248-5256.
[55] BUSTA M, NEUMANN L, MATAS J. Deep textspotter:An end-to-end trainable scene text localization and recognition framework[C]//Computer Vision (ICCV), 2017 IEEE International Conference on. IEEE, 2017:2223-2231.
[56] LIAO M, SHI B, BAI X. TextBoxes++:A single-shot oriented scene text detector[J]. IEEE Transactions on Image Processing, 2018, 27(8):3676-3690.
[57] BARTZ C, YANG H, MEINEL C. See:Towards semi-supervised end-to-end scene text recognition[J]. arXiv preprint, arXiv:1712. 05404v1[cs.CV] 14 Dec 2017.
[58] LIU X, LIANG D, YAN S, et al. FOTS:Fast oriented text spotting with a unified network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2018:5676-5685.
[59] JADERBERG M, SIMONYAN K, VEDALDI A, et al. Synthetic data and artificial neural networks for natural scene text recognition[J]. arXiv preprint, arXiv:1406. 2227v4[cs.CV] 9 Dec 2014.
[60] SHI B, BAI X, YAO C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(11):2298-2304.
[61] GRAVES A, FERNÁNDEZ S, GOMEZ F, et al. Connectionist temporal classification:Labelling unsegmented sequence data with recurrent neural networks[C]//Proceedings of the 23rd International Conference on Machine Learning. New York:ACM, 2006:369-376.
[62] JADERBERG M, SIMONYAN K, ZISSERMAN A. Spatial transformer networks[C]//Advances in Neural Information Processing Systems 27. NIPS, 2015:2017-2025.
[63] LUCAS S M, PANARETOS A, SOSA L, et al. ICDAR 2003 robust reading competitions:Entries, results, and future directions[J]. International Journal of Document Analysis and Recognition (IJDAR), 2005, 7(2/3):105-122.
[64] LUCAS S M. ICDAR 2005 text locating competition results[C]//8th International Conference on Document Analysis and Recognition (ICDAR'05). 2005:80-84.
[65] SHAHAB A, SHAFAIT F, DENGEL A. ICDAR 2011 robust reading competition challenge 2:Reading text in scene images[C]//Document Analysis and Recognition (ICDAR), 2011 International Conference on. IEEE, 2011:1491-1496.
[66] KARATZAS D, SHAFAIT F, UCHIDA S, et al. ICDAR 2013 robust reading competition[C]//International Conference on Document Analysis and Recognition. IEEE Computer Society, 2013:1484-1493.
[67] KARATZAS D, GOMEZ-BIGORDA L, NICOLAOU A, et al. ICDAR 2015 competition on robust reading[C]//International Conference on Document Analysis and Recognition. IEEE 2015:1156-1160.
[68] NAYEF N, YIN F, BIZID I, et al. ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification-RRC-MLT[C]//201714th IAPR International Conference on Document Analysis and Recognition (ICDAR). 2017:1454-1459.
[69] LEE S, CHO M S, JUNG K, et al. Scene text extraction with edge constraint and text collinearity[C]//201020th International Conference on Pattern Recognition. 2010:3983-3986.
[70] NAGY R, DICKER A, MEYER-WEGENER K. NEOCR:A configurable dataset for natural image text recognition[C]//Camera-Based Document Analysis and Recognition. Berlin:Springer, 2011:150-163.
[71] YI C, TIAN Y. Text string detection from natural scenes by structure-based partition and grouping[J]. IEEE Transactions on Image Processing, 2011, 20(9):2594-2605.
[72] RISNUMAWAN A, SHIVAKUMARA P, CHAN C S, et al. A robust arbitrary text detection system for natural scene images[J]. Expert Systems with Applications, 2014, 41(18):8027-8048.
[73] YAO C, BAI X, LIU W. A unified framework for multioriented text detection and recognition[J]. IEEE Transactions on Image Processing, 2014, 23(11):4737-4749.
[74] YIN X C, PEI W Y, ZHANG J, et al. Multi-orientation scene text detection with adaptive clustering[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9):1930-1937.
[75] 张树业. 深度模型及其在视觉文字分析中的应用[D]. 广州:华南理工大学, 2016.
[76] VEIT A, MATERA T, NEUMANN L, et al. COCO-Text:Dataset and benchmark for text detection and recognition in natural images[J]. arXiv preprint, arXiv:1601. 07140v2[cs.CV] 19 Jun 2016.
[77] SHI B, YAO C, LIAO M, et al. ICDAR2017 competition on reading chinese text in the wild (RCTW-17)[C]//Document Analysis and Recognition (ICDAR), 201714th IAPR International Conference on. IEEE, 2017:1429-1434.
[78] CHNG C K, CHAN C S. Total-text:A comprehensive dataset for scene text detection and recognition[C]//201714th IAPR International Conference on Document Analysis and Recognition (ICDAR). 2017:935-942.
[79] LIU Y L, JIN L W, ZHANG S T, et al. Detecting curve text in the wild:New dataset and new solution[J]. arXiv preprint, arXiv:1712. 02170v1[cs.CV] 6 Dec 2017.
[80] YUAN T L, ZHU Z, XU K, et al. Chinese text in the wild[J]. arXiv preprint, arXiv:1803. 00085v1[cs.CV] 28 Feb 2018.
[81] HUA X S, LIU W Y, ZHANG H J. An automatic performance evaluation protocol for video text detection algorithms[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2004, 14(4):498-507.
[82] WOLF C, JOLION J M. Object count/area graphs for the evaluation of object detection and segmentation algorithms[J]. International Journal of Document Analysis and Recognition (IJDAR), 2006, 8(4):280-296.
[83] EVERINGHAM M, ESLAMI S M A, GOOL L V, et al. The pascal visual object classes challenge:A retrospective[J]. International Journal of Computer Vision, 2015, 111(1):98-136.