Wavelet-based Parametric Feature Subset Selection for Speaker and Accent Recognition using Genetic Algorithm

Authors

  • Rokiah Abdullah Faculty of Electronic Engineering Technology, Universiti Malaysia Perlis (UniMAP), Kampus Pauh Putra, 02600 Arau, Perlis, Malaysia
  • Vikneswaran Vijean Faculty of Electronic Engineering Technology, Universiti Malaysia Perlis (UniMAP), Kampus Pauh Putra, 02600 Arau, Perlis, Malaysia
  • Hariharan Muthusamy Department of Electronic Engineering, National Institute of Technology, Srinagal (Garhwal), Uttarakhand, India
  • Farah Nazlia Che Kassim Faculty of Electronic Engineering Technology, Universiti Malaysia Perlis (UniMAP), Kampus Pauh Putra, 02600 Arau, Perlis, Malaysia
  • Zulkapli Abdullah Faculty of Electronic Engineering Technology, Universiti Malaysia Perlis (UniMAP), Kampus Pauh Putra, 02600 Arau, Perlis, Malaysia
  • Mohammad Nazri Md Noor Faculty of Electronic Engineering Technology, Universiti Malaysia Perlis (UniMAP), Kampus Pauh Putra, 02600 Arau, Perlis, Malaysia
  • Jamaludin A. R. Rawi Kolej Komuniti Bandar Darulaman, No. 17, Bandar Darulaman Jaya, 06000 Jitra, Kedah, Malaysia

Keywords:

Feature Selection, Genetic Algorithm, Speaker and accent recognition, Wavelet packet Transform

Abstract

Research on speaker and accent recognition studies using the Malay language in the field of Automatic Speech Recognition (ASR) is limited, with most studies focusing on speech recognition. This study proposes to increase the performance Malaysian speakers and accent recognition using wavelets transform, namely Wavelet Packet Transform (WPT) and Dual-Tree Complex Wavelet Packet Transform (DT-CWPT). A variety of feature extraction combinations, including conventional Mel Frequency Cepstral Coefficients (MFCC), Linear Predictive Coding (LPC) and wavelets transform, were implemented to compare the effectiveness of the proposed method. Although the proposed approach resulted in improved detection rate, it faced challenges in terms of high feature dimensionality and increased computation time. To address these issues, the Genetic Algorithm (GA) approach has been adopted to reduce the number of irrelevant features, accelerate the learning system and achieve better performance. The extracted features were trained using various classifiers, including  k-Nearest Neighbors (k-NN), Support Vector Machine (SVM) and Extreme Learning Machine (ELM). The experimental results showed that the best speaker recognition accuracy was 97.33% for English numbers using SVM classifier and 96.02% for Malay words using the ELM classifier with a combination of wavelets, LPC and MFCC features. For accent recognition, the ELM classifier yielded the best performance, achieving 95.28% accuracy for English numbers with a combination of wavelets and MFCC features and 96.72% for Malay words using combined feature extraction of wavelets, LPC and MFCC feature extraction. It can be concluded that Malay words yielded better recognition rates than English numbers. Furthermore, use of GA effectively reduced the overall number of features while maintaining high accuracy level.

References

M. Hariharan, K. Polat and S. Yaacob, “A new feature constituting approach to detection of vocal fold pathology”, International Journal of Systems Science, vol 45(8), 2014, doi: 10.1080/00207721.2013.794905.

R Rasnayake, M.W.P Maduranga and J.P.D.M. Sithara, “Surface Electromyography signal acquisition and classification using Artificial Neural Networks (ANN)”, International Journal Modern Education and Computer Science, vol 3, pp.64-75, 2022, doi: 10.5815/ijmecs.2022.03.04.

H.Ali, AFA Zaidi, WKW Ahmad, MSZ Azalan, TST Amran, M. R Ahmad and M Elshaikh. “A cascade hyperbolic recognition of buried objects using hybrid feature extraction in ground penetrating radar images,” Journal of Physics: Conference Science vol 1997, 2021, doi: 10.1088/1742-6596/1997/1/012018.

F He and Q Ye, “ A Bearing fault diagnosis method based on Wavelet Packet Transform and Convolutional Neural Network optimized by simulated Annealing Algorithm”, Sensors, vol 22 (4), 2022, doi:10.3390/s22041410.

A Kamra, K Singh and SS Dhaliwal, “ Speech Signal Analysis using Wavelet Domain”, nternational Journal for Research in Applied Science & Engineering Technology (IJRASET), vol 8 (X), pp. 83-88, 2020, doi: 10.22214/ijraset.2020.31806.

H Muthusamy, K Polat and S Yaacob, “ Improved emotion recognition using Gaussian Mixture Model and Extreme Learning Machine in speech and glottal signals”, Mathematical Problems in Engineering, vol 2015, 2015, doi: 10.1155/2015/394083.

L. Lei and S. Kun, “Speaker recognition using Wavelet Packet Entropy, I-Vector, and Cosine Distance Scoring,” Journal of Electrical and Computer Engineering, 2017,

doi: 10.1155/2017/1735698.

G. Kaur, M. Srivastava and A. Kumar, “Genetic Algorithm for combined speaker and speech recognition using Deep Neural Networks,” Journal of Telecommunications and Information Technology, pp. 23-31, 2018, doi:10.26636/jtit.2018.119617.

T. Kawase, M. Okamoto, T. Fukutomi, Y. Takahashi, R. Masuda and T. Ootake, “Self-adjustable speech enhancement and recognition system,” International Conference on Consumer Electronics (ICCE). pp. 1-2, 2019,

doi:10.1109/ICCE.2019.8661925.

A. Ibrahim, Y. Mohammad Seddiq, A. Hamid Meftah, M. Alghamdi, S. Ahmed Selouani, M. A. Qamhan, “Optimizing Arabic speech distinctive phonetic features and phoneme recognition using Genetic Algorithm,” IEEE Access vol.8, pp. 200395-200411, 2020, doi:10.1109/ACCESS.2020.3034762.

L. He, G. Jin and S. Bing Tsai, “Design and implementation of embedded real-time English speech recognition system based on big data analysis,” Mathematical Problems in Engineering, vol 2021, Sept 2021, doi:10.1155/2021/6561730.

M. Abbass Abood Albadr, S. Tiun, M. Ayob, F. Taha Al-Dhief, K. Omar and M. Khaled Maen, “Speech emotion recognition using optimized Genetic Algorithm-Extreme Learning Machine”,

Multimedia Tools and Applications, pp. 1-27, 2022.

R. Abdullah, H. Muthusamy, V. Vijean, Z. Abdullah and F. Nazlia Che Kassim, “Real and Complex Wavelet Transform approaches for Malaysian speaker and accent recognition,” Pertanika Journal of Science & Technology, 27(2), pp. 737-752, 2019.

O. Chia Ai, M. Hariharan, S. Yaacob and L. Sin Chee, “Classification of speech dysfluencies with MFCC and LPCC features,” Expert Systems with Applications, vol 39(2), pp. 2157-2165, 2012, doi:10.1016/j.eswa.2011.07.065.

RL. Haupt and SE. Haupt, “ Practical Genetic Algorithms”, 2nd Ed John Wiley & Sons, 2004.

N. Aida Amira Johari, M. Hariharan, A. Saidatul and S. Yaacob, “Multistyle classification of speech under stress using Wavelet Packet Energy and entropy features,” IEEE Conference on Sustainable Utilization and Development in Engineering and Technology (STUDENT), pp.74-78,2011, doi:10.1109/STUDENT.2011.6089328.

S. Z. Bong, K. Wong, M. Murugappan, N. Mohamed Ibrahim, Y. Rajamanickam and K. Mohamad, “Implementation of Wavelet Packet Transform and non-linear analysis for emotion classification in stroke patient using brain signals,” Biomedical signal processing and control 36, pp. 102-112, 2017, doi:10.1016/j.bspc.2017.03.016.

M. Hariharan, R. Sindhu, V. Vijean, H. Yazid, T. Nadarajaw, S. Yaacob and K. Polat, “Improved binary dragonfly optimization algorithm and Wavelet Packet based non-linear features for infant cry classification,” Computer Methods and Programs in Biomedicine,155, pp. 39-51, 2018,

doi: 10.1016/j.cmpb.2017.11.021.

R. Abdullah, V. Vijean, H. Muthusamy, F. Nazlia Che Kassim and Z. Abdullah, “Real and Complex Wavelet Transform using Singular Value Decomposition for Malaysian speaker and accent recognition,” Advances in Mechatronics, Manufacturing, and Mechanical Engineering, pp. 22-35, Springer, 2021,

doi:10.1007/978-981-15-7309-5_3.

L. Lei and S. Kun, “Speaker recognition using Wavelet Packet Entropy, I-Vector, and Cosine Distance Scoring,” Journal of Electrical and Computer Engineering, 2017,

doi: 10.1155/2017/1735698.

F. Nazlia Che Kassim, H. Muthusamy, V. Vijean, Z. Abdullah and R. Abdullah, “Dual-Tree Complex Wavelet Packet Transform for voice pathology analysis,” Pertanika Journal of Science & Technology, 28(3), pp. 839-858, 2020.

M.P. Paulraj, S. Yaacob and S. A Mohd Yusof, “Vowel recognition based on frequency ranges determined by bandwidth approach”, International Conference on Audio, Language and Image Processing (ICALIP), pp.75-79, 2008,

doi: 10.1109/ICALIP.2008.4590133.

S. R. Mahadeva Prasanna, C. S. Gupta and B. Yegnanarayana, “Extraction of speaker-specific excitation information from linear prediction residual of speech,” Speech Communication, 48(10), pp. 1243-1261, 2006, doi: 10.1016/j.specom.2006.06.002.

M. A Yusnita, M. P Paulraj, S. Yaacob and A. B. Shariman, “Classification of speaker accent using hybrid DWT-LPC features and K-nearest neighbors in ethnically diverse Malaysian English,” 2012 International Symposium on Computer Applications and Industrial Electronics (ISCAIE), pp. 179-184, 2012,

doi:10.1109/ISCAIE.2012.6482092.

A. Jain and O. P. Sharma, “A Vector Quantization approach for voice recognition using Mel Frequency Cepstral Coefficient (MFCC): A Review 1,” International Journal of Electronics & Communication Technology (IJECT),vol 4(4), pp. 26-29, 2013.

M. Inal, “Feature extraction of speech signal by Genetic Algorithms-simulated annealing and comparison with Linear Predictive Coding based methods,” International Conference on Adaptive and Natural Computing Algorithms, pp. 266-275, 2007, doi:10.1007/978-3-540-71618-1_30.

T. Chen, K. Tang, G. Chen and X. Yao, “A large population size can be unhelpful in evolutionary algorithms,” Theoretical Computer Science, 436, pp. 54-70, 2012, doi:10.1016/j.tcs.2011.02.016.

O. Roeva, S. Fidanova and M. Paprzycki, “Population size influence on the genetic and ant algorithms performance in case of cultivation process modeling,” Recent advances in computational optimization, pp.107-120, 2015, doi:10.1007/978-3-319-12631-9_7.

A. E. Eiben and J. E. Smith, “Introduction to evolutionary computing,” Natural Computing Series. Second Edition. Springer, 2003, pp.1-287.

B. Baudry, F. Fluerey, J. M. Jezequel, Y. L. Traon, “Automatic test case optimization: A bacteriologic algorithm,” IEEE software, vol 22(2), pp.76-82, 2005, doi:10.1109/MS.2005.30.

P.Civicioglu, “Transforming geocentric cartesian coordinates to geodetic coordinates by using differential search algorithm,” Computers & Geosciences, 46, pp. 229-247, 2012.

doi:10.1016/j.cageo.2011.12.011.

M. Srivinas and L. M. Patnaik, “Adaptive probabilities of crossover and mutation in Genetic Algorithms,” IEEE Transactions on Systems, Man, and Cybernetics, 24(4), pp. 656-667, 1994, doi:10.1109/21.286385.

W. Y. Lin W. Y Lee and T. P. Hong, “Adapting crossover and mutation rates in Genetic Algorithms,” Journal of Information Science and Engineering, vol 19(5), 889-903, 2003.

M. Hariharan, L. Sin Chee, O. Chia Ai and S. Yaacob, “Classification of speech dysfluencies using LPC based parameterization techniques,” Journal of medical systems, 36(3), pp. 1821-1830, 2012.

H. Thanh Le, L.Van Tran, X. Hoai Nguyen and T. Hien Nguyen,“ Optimizing Genetic Algorithm in feature selection for named entity recognition,” Proceeding of the Sixth International Symposium on Information and Communication Technology (SoICT),pp. 11-16, 2015, doi: 10.1145/2833258.2833262.

H. Saleem Ibrahim Harba and E. Saleem Ebraham Harba, “Voice recognition with Genetic Algorithms two modules crossover and mutation,” International Journal of Modern Trends in Engineering and Research (IJMTER), vol 2 (12), pp. 144–155, 2015.

C. Liang Liu, C. Hoang Lee and P. Min Lin, “A fall detection system

using k-Nearest Neighbor classifier,” Expert systems with applications, vol 37(10), pp. 7174-7181, 2010,

doi:10.1016/j.eswa.2010.04.014.

R. Amami, D. Ben Ayed and N. Ellouze, “Practical selection of SVM supervised parameters with different feature representations for vowel recognition,” International Journal of Digital Content Technology and its Application (IJDCTA), vol 7(9), pp. 418-424, 2015, doi:10.48550/arXiv.1507.06020.

S. Sangeetha and N. Radha, “A new framework for IRIS and fingerprint recognition using SVM classification and Extreme Learning Machine based on score level fusion,” 2013 7th International Conference on Intelligent Systems and Control (ISCO), pp. 183-188, 2013, doi: 10.1109/ISCO.2013.6481145.

C. Chung Chang and C. Jen Lin, “LIBSVM: a library for support vector machines,” ACM transactions on intelligent systems and technology (TIST), vol 2(3), pp. 1-27, 2011,

doi: 10.1145/1961189.1961199.

G. Bin Huang, H. Zhou, X. Ding and R. Zhang, “Extreme Learning Machine for regression and multiclass classification,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol 42(2), pp.513-529, 2011,

doi:10.1109/TSMCB.2011.2168604.

K. Anggraini, L. L. Van, and Y. Darmayunata , “Speech recognition for English sentences with Malay accent”, Jurnal Teknologi Informasi dan Komunikasi, vol 13(2), 2022,

doi: 10.31849/digitalzone.v13i2.10759.

S. Darshana, H. Theivaprakasham, G. J. Lal, B. Premjith,V. Sowmya and K.P Soman, “A Hybrid Deep CNN-based Multi-accent recognition system for English language”, International Conference on Artificial Intelligence Trends and Pattern Recognition (ICAITPR), 2022.

doi: 10.1109/ICAITPR51569.2022.9844177.

Downloads

Published

2023-03-29

How to Cite

Abdullah, R. ., Vijean, V. ., Muthusamy, H. ., Che Kassim, F. N. ., Abdullah, Z. ., Md Noor, M. N. ., & Rawi, J. A. R. . (2023). Wavelet-based Parametric Feature Subset Selection for Speaker and Accent Recognition using Genetic Algorithm. Journal of Telecommunication, Electronic and Computer Engineering (JTEC), 15(1), 29–37. Retrieved from https://jtec.utem.edu.my/jtec/article/view/6258

Funding data

  • Universiti Malaysia Perlis
    Grant numbers Dr Noriha Basir, Senior Lecturer of Centre for Liberal Sciences, Faculty of Applied and Human Sciences