Solving the Difficult Problem of Topic Extraction in Thai Tweets

Authors

  • Rungsiman Nararatwong The Graduate University for Advanced Studies, Kanagawa, Japan. National Institute of Informatics, Tokyo, Japan
  • Roberto Legaspi Research Organization of Information and Systems, Transdisciplinary Research Integration Center, The Institute of Statistical Mathematics, Tokyo, Japan
  • Nagul Cooharojananone Chulalongkorn University
  • Hitoshi Okada National Institute of Informatics, Tokyo, Japan
  • Hiroshi Maruyama Research Organization of Information and Systems, Transdisciplinary Research Integration Center, The Institute of Statistical Mathematics, Tokyo, Japan

Keywords:

LDA, Topic Extraction, Thai Tweets,

Abstract

We tackled in this study the difficult problem of topic extraction in Thai tweets on the country’s historic flood in 2011. After using Latent Dirichlet Allocation (LDA) to extract the topics, the first difficulty that faced us was the inaccuracy the word segmentation task that affected our interpretation of the LDA result. To solve this, we refined the stop word list from the LDA result by removing uninformative words caused by the word segmentation, which resulted to a more relevant and comprehensible outcome. With the improved results, we then constructed a rule-based categorization model and used it to categorize all the collected tweets on a per-week scale to observe changes in tweeting trend. Not only did the categories reveal the most relevant and compelling topics that people raised at that time, they also allowed us to understand how people perceived the situations as they unfold over time

Downloads

Download data is not yet available.

Downloads

Published

2016-09-01

How to Cite

Nararatwong, R., Legaspi, R., Cooharojananone, N., Okada, H., & Maruyama, H. (2016). Solving the Difficult Problem of Topic Extraction in Thai Tweets. Journal of Telecommunication, Electronic and Computer Engineering (JTEC), 8(6), 141–145. Retrieved from https://jtec.utem.edu.my/jtec/article/view/1263