Big Data Analytics: Feature Selection and Machine Learning for Intrusion Detection on Microsoft Azure Platform
Keywords:Big Data, Feature Selection, Intrusion Detection.,
AbstractIn recent years, the overwhelming networking data has been growing at an exponential rate. Not only storage but also computing needs a system to process an intrusion detection system with a massive dataset. This research used cloud analytics to store big dataset, preprocess data, classify and evaluate results by using Microsoft azure, which can provide the appropriate environment. Because of the growth of data volume, intrusion detection model that adopts data mining technique has been used to detect intrusion pattern. Our research used mutual information and chi-square as a feature selection technique to reduce a feature set for computation time. Then, decision forest and neural network were used to classify the attack type of intrusion by 100% KDD CUP 1999 dataset. The performance of intrusion detection was measured by the accuracy of detection rate of attack type from the evaluation process in Microsoft azure.
Intel IT Center, “Planning Guide: Getting Started with Hadoop, Steps IT Managers Can Take to Move Forward with Big Data Analytics”, retrieved November, 10, 2015 from
Sagiroglu, S., and Sinanc, D., “Big data: A review”, 2013 International Conference on Collaboration Technologies and Systems (CTS), 2013, pp.42-47.
Sharma, S. and Gupta, R. K., “Intrusion Detection System: A Review”, International Journal of Security and Its Applications, vol.9, no.5,2015, pp.69-76.
Mukherjee, S. and Sharma, N., “Intrusion detection using naive Bayes classifier with feature reduction”, Procedia Technology, vol.4, 2012, pp.119-128.
Gong, Y., Fang, Y., Liu, L. and Li, J., “Multi-agent Intrusion Detection System Using Feature Selection Approach”, 2014 Tenth International Conference on Intelligent Information Hiding and Multimedia Signal Processing, 2014, pp.528-531.
Wang, W. and Gombault, S., “Efficient detection of DDoS attacks with important attributes”, Third International Conference on Risks and Security of Internet and Systems: CRiSIS’2008, 2008, pp.61-67.
Wei, M. and Chan, R. H., “Dimensionality reduction of hybrid data using mutual information-based unsupervised feature transformation: With application on intrusion detection”, 2015 IEEE 13th International Conference on Industrial Informatics (INDIN), 2015, pp. 1108-1111.
Ambusaidi, M., He, X., Tan, Z., Nanda, P., Lu, L. F., and Nagar, U. T., “A novel feature selection approach for intrusion detection data classification”, 2014 IEEE 13th International Conference on Trust, Security and Privacy in Computing and Communications, 2014, pp.82-89.
Wu, X., Kumar, V., Quinlan, J. R., Ghosh, J., Yang, Q., Motoda, H., and Steinberg, D., “Top 10 algorithms in data mining”, Knowledge and Information Systems, vol.14, no.1, 2008, pp.1-37.
Elbasiony, R. M., Sallam, E. A., Eltobely, T. E. and Fahmy, M. M., “A hybrid network intrusion detection framework based on random forests and weighted k-means”, Ain Shams Engineering Journal,
vol.4, no.4, 2005, pp.753-762.
Chebrolu, S., Abraham, A. and Thomas, J. P., “Feature deduction and ensemble design of intrusion detection systems”, Computers &
Security, vol.24, no.4, 2005, pp. 295-307.
Relan, N. G. and Patil, D. R., “Implementation of network intrusion detection system using variant of decision tree algorithm”, 2015 International Conference on Nascent Technologies in the Engineering Field (ICNTE-2015), 2015, pp.1-5.
Wang, G., Hao, J., Ma, J. and Huang, L. 2010. A new approach to intrusion detection using Artificial Neural Networks and fuzzy clustering. Expert Systems with Applications, 37(9):6225-6232.
Shah, B. and Trivedi, B. H., “Reducing Features of KDD CUP 1999 Dataset for Anomaly Detection Using Back Propagation Neural Network”, 2015 Fifth International Conference on Advanced Computing & Communication Technologies, 2015, pp.247-251.
Harbola, A., Harbola, J. and Vaisla, K. S., “Improved Intrusion Detection in DDoS Applying Feature Selection Using Rank & Score of Attributes in KDD-99 Data Set”, 2014 Sixth International
Conference on Computational Intelligence and Communication Networks, 2014, pp.840-845.
“KDD CUP 1999 : UCI data repository”, The Fifth International Conference on Knowledge Discovery and Data Mining retrieved November, 10, 2015 from
How to Cite
TRANSFER OF COPYRIGHT AGREEMENT
The manuscript is herewith submitted for publication in the Journal of Telecommunication, Electronic and Computer Engineering (JTEC). It has not been published before, and it is not under consideration for publication in any other journals. It contains no material that is scandalous, obscene, libelous or otherwise contrary to law. When the manuscript is accepted for publication, I, as the author, hereby agree to transfer to JTEC, all rights including those pertaining to electronic forms and transmissions, under existing copyright laws, except for the following, which the author(s) specifically retain(s):
- All proprietary right other than copyright, such as patent rights
- The right to make further copies of all or part of the published article for my use in classroom teaching
- The right to reuse all or part of this manuscript in a compilation of my own works or in a textbook of which I am the author; and
- The right to make copies of the published work for internal distribution within the institution that employs me
I agree that copies made under these circumstances will continue to carry the copyright notice that appears in the original published work. I agree to inform my co-authors, if any, of the above terms. I certify that I have obtained written permission for the use of text, tables, and/or illustrations from any copyrighted source(s), and I agree to supply such written permission(s) to JTEC upon request.