Hierarchical Density-based Clustering of Malware Behaviour
Keywords:Anomaly Detection, Automated Dynamic Malware Analysis, Clustering, Malware Behaviour,
AbstractThe numbers and diversity of malware variants grows exponentially over the years, and there is a need to improve the efficiency of analysing large number of malware samples efficiently. To address this problem, we propose a framework for the automatic analysis of a given malware’s dynamic properties using clustering technique. The framework also provides outlier discovery, abnormal behaviour analysis and discrimination of malware variants. We also created a module for normalisation of malware labelling based on the labels we get from VirusTotal, which provides consistency of malware labels for accurate analysis of malware family and types. An evaluation model for the proposed framework is also discussed. Ultimately, the proposed framework will ensure rapid analysis of malware samples and lead to better protection for various parties against malicious malware.
“Internet Security Threat Report,” 2016. Available: https://www.symantec.com/content/dam/symantec/docs/ report/istr-21-2016-en.pdf
Av-test.org, “AV-TEST – The Independent IT Security Institute,” 2016. Available: http://www.av-test.org/en/statistics/malware
A. Moser, C. Kruegel, and E. Kirda, “Limits of static analysis for malware detection,” in Computer Security Applications Conference, 2007, ACSAC 2007, Twenty-third annual, IEEE, 2007, pp. 421-430.
K. Rieck, and P. Laskov, “Linear-time computation of similarity measures for sequential data,” Journal of Machine Learning Research 9, Jan 2008, pp. 23-48.
A. Moser, C. Kruegel, and E. Kirda, "Exploring multiple execution paths for malware analysis," in Proceedings of the 2007 IEEE Symposium on Security and Privacy, IEEE, 2007, pp. 231-245.
C. Willems, T. Holz, and F. Freiling, “CWSandbox: Towards automated dynamic binary analysis,” IEEE Security and Privacy 5, no. 2, 2007, pp. 32-39.
A. Dinaburg, P. Royal, M. Sharif and W. Lee, “Ether: malware analysis via hardware virtualization extensions,” in Proceedings of the 15th ACM conference on Computer and communications security, ACM, 2008, pp. 51-62.
C. Guarnieri, A. Tanasi, J. Bremer, and M. Schloesser, The Cuckoo Sandbox, 2012.
R. S. Pirscoveanu, M. Stevanovic and J. M. Pedersen, "Clustering analysis of malware behavior using Self Organizing Map," in 2016 International Conference On Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), London, 2016, pp. 1-6.
M. Bailey, J. Oberheide, J. Andersen and Z. M. Mao, “Automated classification and analysis of internet malware,” in International Workshop on Recent Advances in Intrusion Detection, Springer Berlin Heidelberg, 2007, pp. 178-197.
U. Bayer, P. M. Comparetti, C. Hlauschek and C Kruegel, “Scalable, Behavior-Based Malware Clustering,” in NDSS, vol. 9, Feb 2009, pp. 8-11.
K. Rieck, T. Holz, C. Willems, P. Düssel and P. Laskov, “Learning and classification of malware behaviour,” in International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, Springer Berlin Heidelberg, 2008, pp. 108-125.
H. S. Galal, Y. B. Mahdy and M. A. Atiea, “Behavior-based features model for malware detection,” in Journal of Computer Virology and Hacking Techniques 12, no. 2, 2016, pp. 59-67.
R. Perdisci, “VAMO: towards a fully automated malware clustering validity analysis,” in Proceedings of the 28th Annual Computer Security Applications Conference, ACM, 2012, pp. 329-338.
CARO - Computer Antivirus Research Organization, “A New Virus Naming Convention (1991)”. Available: http://www.caro.org/articles/ naming.html
R. J. G. B. Campello, D. Moulavi and J. Sander, “Density-based clustering based on hierarchical density estimates,” in Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer Berlin Heidelberg, 2013, pp. 160-172.
VirusTotal. Available: https://www.virustotal.com/
VirusShare.com, “VirusShare.com,” 2016. Available: http://virusshare.com
Dasmalwerk.eu, “DAS MALWERK,” 2016. Available: http://dasmalwerk.eu/
Contagiodump.blogspot.com, “contagion,” 2016. Available: http://contagiodump.blogspot.com/
M. Chandramohan, H. B. K. Tan and L. K. Shar, “Scalable malware clustering through coarse-grained behavior modeling,” in Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, ACM, 2012, p. 27.
K. Rieck, “Malheur Dataset”. Available: https://www.sec.cs.tu-bs.de/ data/malheur/
K. Rieck, P. Trinius, C. Willems and T. Holz. “Automatic analysis of malware behavior using machine learning,” in Journal of Computer Security 19, no. 4, 2011, pp. 639-668.
C. Kolbitsch, E. Kirda, and C. Kruegel. “The power of procrastination: detection and mitigation of execution-stalling malicious code,” in Proceedings of the 18th ACM conference on Computer and communications security, ACM, 2011, pp. 285-296.
Pafish, “a0rtega/pafish: Pafish is a demonstration tool that employs several techniques to detect sandboxes and analysis environments in the same way as malware families do,” 2017. Available: https://github.com/a0rtega/pafish
A. K. Jain, M. N. Murty, and P. J. Flynn, “Data clustering: a review,” in ACM computing surveys (CSUR) 31, no. 3, 1999, pp. 264-323.
M. Ester, H. P. Kriegel, J. Sander, and X. W. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise,” in Kdd, vol. 96, no. 34, 1996, pp. 226-231.
N. Kawaguchi, and K. Omote, “Malware function classification using APIs in initial behavior.” in 2015 10th Asia Joint Conference on Information Security (AsiaJCIS), IEEE, 2015, pp. 138-144.
B. Kolosnjaji, A. Zarras, G. Webster, and C. Eckert, “Deep Learning for Classification of Malware System Call Sequences,” in Australasian Joint Conference on Artificial Intelligence, Springer International Publishing, 2016, pp. 137-149.
How to Cite
TRANSFER OF COPYRIGHT AGREEMENT
The manuscript is herewith submitted for publication in the Journal of Telecommunication, Electronic and Computer Engineering (JTEC). It has not been published before, and it is not under consideration for publication in any other journals. It contains no material that is scandalous, obscene, libelous or otherwise contrary to law. When the manuscript is accepted for publication, I, as the author, hereby agree to transfer to JTEC, all rights including those pertaining to electronic forms and transmissions, under existing copyright laws, except for the following, which the author(s) specifically retain(s):
- All proprietary right other than copyright, such as patent rights
- The right to make further copies of all or part of the published article for my use in classroom teaching
- The right to reuse all or part of this manuscript in a compilation of my own works or in a textbook of which I am the author; and
- The right to make copies of the published work for internal distribution within the institution that employs me
I agree that copies made under these circumstances will continue to carry the copyright notice that appears in the original published work. I agree to inform my co-authors, if any, of the above terms. I certify that I have obtained written permission for the use of text, tables, and/or illustrations from any copyrighted source(s), and I agree to supply such written permission(s) to JTEC upon request.