Minimizing Human Labelling Effort for Annotating Named Entities in Historical Newspaper

W. M. F. Wan Tamlikha; B. Ranaivo-Malançon; S. Chua

Minimizing Human Labelling Effort for Annotating Named Entities in Historical Newspaper

Authors

W. M. F. Wan Tamlikha Faculty of Computer Science and Information Technology, Universiti Malaysia Sarawak, 94300 Kota Samarahan, Sarawak, Malaysia.
B. Ranaivo-Malançon Faculty of Computer Science and Information Technology, Universiti Malaysia Sarawak, 94300 Kota Samarahan, Sarawak, Malaysia.
S. Chua Faculty of Computer Science and Information Technology, Universiti Malaysia Sarawak, 94300 Kota Samarahan, Sarawak, Malaysia.

Keywords:

J48, Naïve Bayes, Named Entity, Sarawak Gazette, SVM-SMO,

Abstract

To accelerate the annotation of named entities (NEs) in historical newspapers like Sarawak Gazette, only two choices are possible: an automatic approach or a semi-automatic approach. This paper presents a fully automatic annotation of NEs occurring in Sarawak Gazette. At the initial stage, a subset of the historical newspapers is fed to an established rule-based named entity recognizer (NER), that is ANNIE. Then, the preannotated corpus is used as training and testing data for three supervised learning NER, which are based on Naïve Bayes, J48 decision trees, and SVM-SMO methods. These methods are not always accurate and it appears that SVM-SMO and J48 have better performance than Naïve Bayes. Thus, a thorough study on the errors done by SVM-SMO and J48 yield to the creation of ad hoc rules to correct the errors automatically. The proposed approach is promising even though it still needs more experiments to refine the rules.

Downloads

Published

2017-09-15

How to Cite

Wan Tamlikha, W. M. F., Ranaivo-Malançon, B., & Chua, S. (2017). Minimizing Human Labelling Effort for Annotating Named Entities in Historical Newspaper. Journal of Telecommunication, Electronic and Computer Engineering (JTEC), 9(2-10), 41–46. Retrieved from https://jtec.utem.edu.my/jtec/article/view/2704

Download Citation

Issue

Vol. 9 No. 2-10: Technology Transforming Lives II

Section

Articles

License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

Minimizing Human Labelling Effort for Annotating Named Entities in Historical Newspaper

Authors

Keywords:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Similar Articles

Information