Journal of East China Normal University(Natural Sc ›› 2019, Vol. 2019 ›› Issue (3): 78-85.doi: 10.3969/j.issn.1000-5641.2019.03.009

• Computer Science • Previous Articles     Next Articles

Recognition of mathematical formulas based on support vector machines

LIU Ting-ting1, CHENG Tao1, JIN Gang-zeng1, WANG Xi-kun2, GAO Ming1   

  1. 1. School of Data Science and Engineering, East China Normal University, Shanghai 200062, China;
    2. The High School Affiliated to Liaoning Normal University, Dalian Liaoning 164500, China
  • Received:2018-08-06 Online:2019-05-25 Published:2019-05-30

Abstract: The recognition of mathematical formulas has been widely used in intelligent education applications, such as searching for answers to questions in image format, automatic marking, and constructing a database of questions. Mathematical formulas often exist in the form of images in many applications; hence, identifying the formulas in these images is an important research topic in the field of intelligent education. Given the complex structure of mathematical formulas, however, recognizing their presence within images is far more complicated than a general optical character recognition task. This paper decomposes formula recognition into three steps:character segmentation, character recognition, and formula reconstruction. First, the characters are separated from an image by using a combination of projection and connected-domain methods. Second, the features of characters are extracted based on the proportion of pixels in a single character relative to pixels in all characters, and a supervised learning model is established to identify each character. Finally, the mathematical formula is reconstructed based on the location of each character in the formula. Experimental results on a real data set show the proposed mathematical formula recognition method can achieve an accuracy of up to 98.0%.

Key words: mathematical formula recognition, support vector machine (SVM), optical character recognition (OCR)

CLC Number: