

  • 刘婷婷 ,
  • 程涛 ,
  • 金冈增 ,
  • 王熙堃 ,
  • 高明
  • 1. 华东师范大学 数据科学与工程学院, 上海 200062;
    2. 辽宁师范大学附属中学, 辽宁 大连 164500

收稿日期: 2018-08-06

  网络出版日期: 2019-05-30



Recognition of mathematical formulas based on support vector machines

  • LIU Ting-ting ,
  • CHENG Tao ,
  • JIN Gang-zeng ,
  • WANG Xi-kun ,
  • GAO Ming
  • 1. School of Data Science and Engineering, East China Normal University, Shanghai 200062, China;
    2. The High School Affiliated to Liaoning Normal University, Dalian Liaoning 164500, China

Received date: 2018-08-06

  Online published: 2019-05-30




刘婷婷 , 程涛 , 金冈增 , 王熙堃 , 高明 . 基于支持向量机的数学公式识别[J]. 华东师范大学学报(自然科学版), 2019 , 2019(3) : 78 -85 . DOI: 10.3969/j.issn.1000-5641.2019.03.009


The recognition of mathematical formulas has been widely used in intelligent education applications, such as searching for answers to questions in image format, automatic marking, and constructing a database of questions. Mathematical formulas often exist in the form of images in many applications; hence, identifying the formulas in these images is an important research topic in the field of intelligent education. Given the complex structure of mathematical formulas, however, recognizing their presence within images is far more complicated than a general optical character recognition task. This paper decomposes formula recognition into three steps:character segmentation, character recognition, and formula reconstruction. First, the characters are separated from an image by using a combination of projection and connected-domain methods. Second, the features of characters are extracted based on the proportion of pixels in a single character relative to pixels in all characters, and a supervised learning model is established to identify each character. Finally, the mathematical formula is reconstructed based on the location of each character in the formula. Experimental results on a real data set show the proposed mathematical formula recognition method can achieve an accuracy of up to 98.0%.


[1] OKAMOTO M, IMAI H, TAKAGI K. Performance evaluation of a robust method for mathematical expression recognition[C]//International Conference on Document Analysis and Recognition. IEEE Computer Society, 2001:121.
[2] CHANG F, CHEN C J. A Component-labeling algorithm using contour tracing technique[J]. Computer Vision & Image Understanding, 2004, 93(2):206-220.
[3] FATEMAN R J, TOKUYASU T, BERMAN B P, et al. Optical character recognition and parsing of typeset mathematics[J]. Journal of Visual Communication & Image Representation, 1996, 7(1):2-15.
[4] ÁLVARO F, SÁNCHEZ J A, BENEDÍ J M. An integrated grammar-based approach for mathematical expression recognition[J]. Pattern Recognition, 2016, 51:135-147.
[5] LEBOURGEOIS F. Robust multifont OCR system from gray level images[C]//International Conference on Document Analysis and Recognition. IEEE, 1997:1-5.
[6] MACLEAN S, LABAHN G. A new approach for recognizing handwritten mathematics using relational grammars and fuzzy sets[J]. International Journal on Document Analysis & Recognition, 2013, 16(2):139-163.
[7] AWAL A M, MOUCHÈRE H, VIARD-GAUDIN C. A global learning approach for an online handwritten mathematical expression recognition system[J]. Pattern Recognition Letters, 2014, 35(1):68-77.
[8] ÁLVARO F, BENEDI J M. Recognition of printed mathematical expressions using two-dimensional stochastic context-free grammars[C]//International Conference on Document Analysis and Recognition. IEEE, 2011:1225-1229.
[9] CHOWDHURY A M S, RAHMAN M S. Towards optimal convolutional neural network parameters for bengali handwritten numerals recognition[C]//International Conference on Computer and Information Technology. IEEE, 2017:431-436.
[10] TOSELLI A H, JUAN A, VIDAL E. Spontaneous handwriting recognition and classification[C]//International Conference on Pattern Recognition. IEEE, 2004:433-436.
[11] ALY W, UCHIDA S, SUZUKI M. Automatic classification of spatial relationships among mathematical symbols using geometric features[J]. Ieice Transactions on Information & Systems, 2009, 92-D(11):2235-2243.
[12] 靳简明, 江红英, 王庆人. 数学公式识别系统:MatheReader[J]. 计算机学报, 2006, 11:2018-2026.
[13] INFTY PROJECT. A ground truth database of characters, symbols and formulas in mathematical documents:InftyCDB-1[EB/OL]. (2005-03-18)[2018-06-25]. http://www.inftyproject.org/en/database.html.
[14] LANDGREBE D. A survey of decision tree classifier methodology[J]. IEEE Transactions on Systems Man and Cybernetics, 2002, 21(3):660-674.
[15] KUMAR P, SHARMA N, RANA A. Handwritten character recognition using different kernel based SVM classifier and MLP neural network (A COMPARISON)[J]. International Journal of Computer Applications, 2012, 53(11):25-31.
