Solving the Difficult Problem of Topic Extraction in Thai Tweets

Rungsiman Nararatwong; Roberto Legaspi; Nagul Cooharojananone; Hitoshi Okada; Hiroshi Maruyama

Solving the Difficult Problem of Topic Extraction in Thai Tweets

Authors

Rungsiman Nararatwong The Graduate University for Advanced Studies, Kanagawa, Japan. National Institute of Informatics, Tokyo, Japan
Roberto Legaspi Research Organization of Information and Systems, Transdisciplinary Research Integration Center, The Institute of Statistical Mathematics, Tokyo, Japan
Nagul Cooharojananone Chulalongkorn University
Hitoshi Okada National Institute of Informatics, Tokyo, Japan
Hiroshi Maruyama Research Organization of Information and Systems, Transdisciplinary Research Integration Center, The Institute of Statistical Mathematics, Tokyo, Japan

Keywords:

LDA, Topic Extraction, Thai Tweets,

Abstract

We tackled in this study the difficult problem of topic extraction in Thai tweets on the country’s historic flood in 2011. After using Latent Dirichlet Allocation (LDA) to extract the topics, the first difficulty that faced us was the inaccuracy the word segmentation task that affected our interpretation of the LDA result. To solve this, we refined the stop word list from the LDA result by removing uninformative words caused by the word segmentation, which resulted to a more relevant and comprehensible outcome. With the improved results, we then constructed a rule-based categorization model and used it to categorize all the collected tweets on a per-week scale to observe changes in tweeting trend. Not only did the categories reveal the most relevant and compelling topics that people raised at that time, they also allowed us to understand how people perceived the situations as they unfold over time

Downloads

Published

2016-09-01

How to Cite

Nararatwong, R., Legaspi, R., Cooharojananone, N., Okada, H., & Maruyama, H. (2016). Solving the Difficult Problem of Topic Extraction in Thai Tweets. Journal of Telecommunication, Electronic and Computer Engineering (JTEC), 8(6), 141–145. Retrieved from https://jtec.utem.edu.my/jtec/article/view/1263

Download Citation

Issue

Vol. 8 No. 6: Emerging Technologies in Communication and Computer Engineering for Smart City Applications II

Section

Articles

License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

Solving the Difficult Problem of Topic Extraction in Thai Tweets

Authors

Keywords:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

Information