Evaluating LSTM Networks, HMM and WFST in Malay Part-of-Speech Tagging

Tien-Ping Tan; Bali Ranaivo-Malançon; Laurent Besacier; Yin-Lai Yeong; Keng Hoon Gan; Enya Kong Tang

Authors

Tien-Ping Tan School of Computer Sciences, Universiti Sains Malaysia, Penang, Malaysia.
Bali Ranaivo-Malançon Faculty of Computer Science & Information Technology, Universiti Malaysia Sarawak, Sarawak, Malaysia.
Laurent Besacier LIG, Université Grenoble Alpes, CNRS, Grenoble, France.
Yin-Lai Yeong School of Computer Sciences, Universiti Sains Malaysia, Penang, Malaysia.
Keng Hoon Gan School of Computer Sciences, Universiti Sains Malaysia, Penang, Malaysia.
Enya Kong Tang School of Computer Sciences, Universiti Sains Malaysia, Penang, Malaysia.

Keywords:

Malay Part-Of-Speech Tagging, Recurrence Neural Network (RNN), Long Short Term Memory (LSTM) Networks, Sequence-To-Sequence Learning,

Abstract

Long short term memory (LSTM) networks have been gaining popularity in modeling sequential data such as phoneme recognition, speech translation, language modeling, speech synthesis, chatbot-like dialog systems and others. This paper investigates the attention-based encoder-decoder LSTM networks in Malay part-of-speech (POS) tagging when it is compared to weighted finite state transducer (WFST) and hidden Markov model (HMM). The attractiveness of LSTM networks is its strength in modeling long distance dependencies. Malay POS tagging is examined from two different conditions: with and without morphological information. The experiment results show that LSTM networks that are trained without any explicit morphological knowledge perform nearly equally with WFST but better than HMM approach that is trained with morphological information.

Evaluating LSTM Networks, HMM and WFST in Malay Part-of-Speech Tagging

Authors

Keywords:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Information