A Learning-Based Approach for Word Segmentation in Text Document Images


  • Jean-Pierre Lomaliza Department of Electronic Engineering, Pukyong National University, Busan, South Korea
  • Hanhoon Park Department of Electronic Engineering, Pukyong National University, Busan, South Korea
  • Kwang-Seok Moon Department of Electronic Engineering, Pukyong National University, Busan, South Korea


Word Segmentation, Deep Learning, Space Classification, Locally Likely Arrangement Hashing, Document Image Retrieval,


In conventional document retrieval (DIR) systems based on locally likely arrangement hashing (LLAH), the word detection approach is sensitive to the distance between the camera and the text document, e.g. a single word may be detected as several words when the camera is too close. Thus, the systems work well only when the distance in which the text document was registered is similar to the one of the retrieval. Moreover, they were implemented in a desktop setup where it might not suffer from the distance problem since the camera is rigidly attached to the computer. In this paper, a new word segmentation approach is proposed to increase the robustness of LLAH-based DIR systems so that they may be implemented on a mobile platform where the distance between the camera and text document may be easily changeable. The proposed method uses a deep neural network to classify spaces between connected components as between-words space or intra-word space. From experiments results, the proposed method successfully could detect the same words in different camera distances and orientation as the neural networks offered classification accuracy as high as 92.5%. Moreover, it showed higher robustness than the state-of-the-art methods when implemented on a mobile platform.


K. Takeda, K. Kise, and M. Iwamura, “Real-time document image retrieval for a 10 million pages database with a memory efficient and stability improved LLAH,” Proc. of ICDAR, pp. 1054-1058, 2011.

F. Alaei, A. Alaei, M. Blumenstein, and U. Pal, “A brief review of document image retrieval methods: recent advances,” Proc. of IJCNN, 2016.

A. Gordo, J. Gibert, E. Valveny, and M. Rusinol, “A kernel-based approach to document retrieval,” Proc. of DAS, 2010.

A. Gordo, F. Perronnin, and E. Valveny, “Large-scale document image retrieval and classification with runlength histograms and binary embeddings,” Pattern Recognition, vol. 46, no. 7, pp. 1898-1905, 2013.

M. Shirdhonkar, and M.B. Kokare, “Handwritten document image retrieval,” Proc. of ICCMS, pp. VI-506–VI-510, 2011.

R.M. Haralick, K. Shanmugam, and I.H. Dinstein, “Textural features for image classification,” IEEE Trans. on Systems, Man and Cybernetics, vol. 3, no. 6, pp. 610-621, 1973.

A. Mishra, K. Alahari, and C.V. Jawahar, “Image retrieval using textual cues,” Proc. of ICCV, 2013.

H. Uchiyama, J. Pilet, and H. Saito, “On-line document registering and retrieving system for AR annotation overlay,” Proc. of AH, no. 23, pp. 1-5, 2010.

Google Tensorflow is a machine learning library for python, https://www.tensorflow.org [Online; accessed 16-Jan-2018]

Android Studio, http://www.developer.android.com/studio/index.html [Online; accessed 16-Jan-2018].

OpenCV, http://www.opencv.org [Online; accessed 16-Jan-2018]

IntelliJ idea, http://www.jetbrains.com/idea [Online; accessed 16-Jan- 2018]

Pycharm development tool for python, https://www.jetbrains.com/pycharm [Online; accessed 16-Jan-2018]




How to Cite

Lomaliza, J.-P., Park, H., & Moon, K.-S. (2018). A Learning-Based Approach for Word Segmentation in Text Document Images. Journal of Telecommunication, Electronic and Computer Engineering (JTEC), 10(3), 1–7. Retrieved from https://jtec.utem.edu.my/jtec/article/view/3289