Malware Classification Based On Visualizing Binary Content of Samples

Document Type : Original Article

Authors

1 Master's degree, Babol Noshirvani University of Technology, Babol, Iran

2 Assistant Professor, Babol Noshirvani University of Technology, Babol, Iran

Abstract

Malware is one of the constant challenges of the modern world, which has particular importance due to the harm it causes to users. In the last decade, there has been a great increase in malware number and complexity that caused the current security tools and methods not able to defend against. Visualizing binary content of malware and searching for malicious elements among suspicious image patterns is one of the new methods that have achieved high progress and efficiency in the last decade thanks to deep learning algorithms. In this research, by combining various ideas that exist in the field of malware image analysis, a suitable algorithm has been presented for classifying malware into their corresponding families. Visualizing the binary content of the malware executable file, applying the GIST descriptor and classifying the extracted features using the SVM classifier forms the proposed algorithm of this research, which can achieve the same results as previous researches by using traditional machine learning methods and obtain average classification accuracy of 99.72 and 99.16% on Malimg and Microsoft datasets.

Keywords

Main Subjects


Smiley face

 

[1]   R. Moir, “Defining Malware: FAQ,” 2009. [Online]. Available: https://technet.microsoft.com/en-us/library/dd632948.aspx
[2]   C. C. Elisan, Advanced malware analysis. McGraw Hill Professional, 2015.
[3]   AV-TEST - The Independent IT-Security Institute, “Malware Statistics & Trends Report,” 2022. [Online]. Available: https://www.av-test.org/en/statistics/malware/
[4]   D. Simpson, “Malware names.” [Online]. Available: https://docs.microsoft.com/en-us/microsoft-365/security/intelligence/malware-naming
[5]   D. Gibert, C. Mateu, and J. Planes, “The rise of machine learning for detection and classification of malware: Research developments, trends and challenges,” J. Netw. Comput. Appl., vol. 153, p. 102526, 2020, doi: 10.1016/j.jnca.2019.102526.
[6]   L. Nataraj, S. Karthikeyan, G. Jacob, and B. S. Manjunath, “Malware images: visualization and automatic classification,” in Proceedings of the 8th international symposium on visualization for cyber security, 2011, pp. 1–7. doi: 10.1145/2016904.2016908.
[7]   L. Chen, R. Sahita, J. Parikh, and M. Marino, “STAMINA: Scalable Deep Learning Approach for Malware Classification,” 2020. [Online]. Available: https://www.intel.com/content/dam/www/public/us/en/ai/documents/stamina-scalable-deep-learning-whitepaper.pdf
[8]   D. Vasan, M. Alazab, S. Wassan, B. Safaei, and Q. Zheng, “Image-Based malware classification using ensemble of CNN architectures (IMCEC),” Comput. \& Secur., vol. 92, p. 101748, 2020, doi: 10.1016/j.cose.2020.101748.
[9]   L. Chen, “Deep transfer learning for static malware classification,” arXiv Prepr. arXiv1812.07606, 2018.
[10] Z. Cui, F. Xue, X. Cai, Y. Cao, G. Wang, and J. Chen, “Detection of malicious code variants based on deep learning,” IEEE Trans. Ind. Informatics, vol. 14, no. 7, pp. 3187–3196, 2018, doi: 10.1109/TII.2018.2822680.
[12] R. Lyda and J. Hamrock, “Using entropy analysis to find encrypted and packed malware,” IEEE Secur. \& Priv., vol. 5, no. 2, pp. 40–45, 2007, doi: 10.1109/MSP.2007.48.
[13] X. Ugarte-Pedrero, I. Santos, B. Sanz, C. Laorden, and P. G. Bringas, “Countering entropy measure attacks on packed software detection,” in 2012 IEEE Consumer Communications and Networking Conference (CCNC), 2012, pp. 164–168. doi: 10.1109/CCNC.2012.6181079.
[14] D.-L. Vu, T.-K. Nguyen, T. V Nguyen, T. N. Nguyen, F. Massacci, and P. H. Phung, “HIT4Mal: Hybrid image transformation for malware classification,” Trans. Emerg. Telecommun. Technol., vol. 31, no. 11, p. e3789, 2020, doi: 10.1002/ett.3789.
[15] J. Fu, J. Xue, Y. Wang, Z. Liu, and C. Shan, “Malware visualization for fine-grained classification,” IEEE Access, vol. 6, pp. 14510–14523, 2018, doi: 10.1109/ACCESS.2018.2805301.
[16] Z. Ren, G. Chen, and W. Lu, “Malware visualization methods based on deep convolution neural networks,” Multimed. Tools Appl., vol. 79, no. 15, pp. 10975–10993, 2020.
[17] J. Kim, J.-Y. Paik, and E.-S. Cho, “Attention-Based Cross-Modal CNN Using Non-Disassembled Files for Malware Classification,” IEEE Access, vol. 11, pp. 22889–22903, 2023, doi: 10.1109/ACCESS.2023.3253770.
[18] K. Shaukat, S. Luo, and V. Varadharajan, “A novel deep learning-based approach for malware detection,” Eng. Appl. Artif. Intell., vol. 122, p. 106030, 2023, doi: 10.1016/j.engappai.2023.106030.
[19] A. Oliva and A. Torralba, “Modeling the shape of the scene: A holistic representation of the spatial envelope,” Int. J. Comput. Vis., vol. 42, no. 3, pp. 145–175, 2001, doi: 10.1023/A:1011139631724.
[20] SARVAM Team, “Supervised Classification with k-fold Cross Validation on a Multi Family Malware Dataset.” [Online]. Available: https://sarvamblog.blogspot.com/2014/08/supervised-classification-with-k-fold.html
[21] R. Ronen, M. Radu, C. Feuerstein, E. Yom-Tov, and M. Ahmadi, “Microsoft malware classification challenge,” arXiv Prepr. arXiv1802.10135, 2018.
[22] H. Guo, S. Huang, C. Huang, Z. Pan, M. Zhang, and F. Shi, “File entropy signal analysis combined with wavelet decomposition for malware classification,” IEEE Access, vol. 8, pp. 158961–158971, 2020, doi: 10.1109/ACCESS.2020.3020330.
[23] E. Rezende, G. Ruppert, T. Carvalho, F. Ramos, and P. De Geus, “Malicious software classification using transfer learning of resnet-50 deep neural network,” in 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), 2017, pp. 1011–1014. doi: 10.1109/ICMLA.2017.00-19.
[24] S. Yue, “Imbalanced malware images classification: a CNN based approach,” arXiv Prepr. arXiv1708.08042, 2017.
[25] D. Gibert, C. Mateu, J. Planes, and R. Vicens, “Using convolutional neural networks for classification of malware represented as images,” J. Comput. Virol. Hacking Tech., vol. 15, no. 1, pp. 15–28, 2019, doi: 10.1007/s11416-018-0323-0.
[26] H. Naeem, B. Guo, M. R. Naeem, F. Ullah, H. Aldabbas, and M. S. Javed, “Identification of malicious code variants based on image visualization,” Comput. \& Electr. Eng., vol. 76, pp. 225–237, 2019, doi: 10.1016/j.compeleceng.2019.03.015.
[27] D. Vasan, M. Alazab, S. Wassan, H. Naeem, B. Safaei, and Q. Zheng, “IMCFN: Image-based malware classification using fine-tuned convolutional neural network architecture,” Comput. Networks, vol. 171, p. 107138, 2020, doi: 10.1016/j.comnet.2020.107138.
[28] C. Wang, Z. Zhao, F. Wang, and Q. Li, “A novel malware detection and family classification scheme for IoT based on DEAM and DenseNet,” Secur. Commun. Networks, vol. 2021, 2021, doi: 10.1155/2021/6658842.
[29] B. N. Narayanan, O. Djaneye-Boundjou, and T. M. Kebede, “Performance analysis of machine learning and pattern recognition algorithms for malware classification,” in 2016 IEEE National Aerospace and Electronics Conference (NAECON) and Ohio Innovation Summit (OIS), 2016, pp. 338–342. doi: 10.1109/NAECON.2016.7856826.
[30] M. L. Santacroce, D. Koranek, and R. Jha, “Detecting malware code as video with compressed, time-distributed neural networks,” IEEE Access, vol. 8, pp. 132748–132760, 2020, doi: 10.1109/ACCESS.2020.3010706.
  • Receive Date: 18 June 2024
  • Revise Date: 04 September 2024
  • Accept Date: 14 October 2024
  • Publish Date: 22 October 2024