Comparative Evaluation of String Metrics for Context Ontology Database

Authors

  • Farhanah Atiqah Norki Department of Software Engineering, Faculty of Computing, Universiti Teknologi Malaysia, 81310 Johor Bahru, Johor, Malaysia.
  • Radziah Mohamad Department of Software Engineering, Faculty of Computing, Universiti Teknologi Malaysia, 81310 Johor Bahru, Johor, Malaysia.
  • Noraini Ibrahim Department of Software Engineering, Faculty of Computing, Universiti Teknologi Malaysia, 81310 Johor Bahru, Johor, Malaysia.

Keywords:

Comparative Evaluation, Context Ontology, String Matching, String Similarity,

Abstract

Static Context Code Coverage Program (SCCCP) is a program developed to calculate the coverage of context code in a Java file of an Android application. The database built for SCCCP includes records on location and speech context, exclusive to Android. There is a huge need for string matching algorithm since strings from the source codes and database have to be checked for any similarity first before moving on to the calculation of context coverage. Therefore, three different string metrics were analyzed prior to choosing the most suitable one for SCCCP. In this paper, the results obtained from using JaroWinkler, Levenshtein, and Strike a Match string distance metrics are analyzed based on the task of matching the source codes with database records and other pair of strings. Some issues related during our experiment on source code matching are discussed in this paper. The findings conclude that Strike a Match algorithm is the best option since it gives the highest accuracy among others.

References

W. H. Gomaa, and A. A. Fahmy, “A survey of text similarity Approaches,” International Journal of Computer Applications, vol. 68, no. 13, pp. 13-18, Apr. 2013.

V. I. Levenshtein, “Binary codes capable of correcting deletions, insertions, and reversals,” Soviet Physics Doklady, vol.10, no. 8, pp. 707-710, Feb. 1966.

J. L. Peterson, “Computer programs for detecting and correcting spelling errors,” Communications of the ACM, vol. 23, no. 12, pp.676- 687, Dec. 1980.

W. E. Winkler, “String comparator metrics and enhanced decision rules in the fellegi-sunter model of record linkage, ” in Proc. of the Section on Survey Research Methods, American Statistical Association, 1990, pp. 354-359.

S. White, “How to strike a match,” 2014, Retrieved October 11, 2016, from http://www.catalysoft.com/articles/StrikeAMatch.html.

Z. Su, B. R. Ahn, K. Y. Eom, M. K. Kang, J. P. Kim, and M.K. Kim, “Plagiarism detection using the levenshtein distance and smithwaterman algorithm”, in 3rd Int. Conf. on Innovative Computing Information and Control, 2008, pp.569-569.

S. Mihov, S. Koeva, C. Ringlstetter, K. U. Schulz and C. Strohmaier, “Precise and efficient text correction using levenshtein automata, dynamic web dictionaries and optimized correction models,” in Proc. of Workshop on International Proofing Tools and Language Technologies, Patras, 2004, pp. 1-10 .

P. A. V. Hall, and G. R. Dowling, “Approximate string matching,” ACM Computing Surveys, vol. 12, no. 4, pp.381-402, Dec. 1980.

J. J. Pollock, and A. Zamora, “Automatic spelling correction in scientific and scholarly text,” Communications of the ACM, vol. 27, no. 4, pp.358-368, Apr. 1984.

Downloads

Published

2017-10-20

How to Cite

Norki, F. A., Mohamad, R., & Ibrahim, N. (2017). Comparative Evaluation of String Metrics for Context Ontology Database. Journal of Telecommunication, Electronic and Computer Engineering (JTEC), 9(3-3), 7–11. Retrieved from https://jtec.utem.edu.my/jtec/article/view/2864