Detecting of Botnets’ Malicious Domains with Deep Autoencoder Neural Network

Document Type : Original Article

Authors

1 Department of Computer Engineering, Khamneh Branch, Khamneh, Iran

2 Associate Professor, Department of computer engineering, Iran university of science and technology, Tehran, Iran

3 Department of Computer Engineering, Shabestar Branch, Islamic Azad University, Shabestar, Iran

Abstract

Botnet is a group of hosts infected with the same malicious code and managed by an attacker or Botmaster through one or more command and control (C&C) servers. The new generation of Botnets generates C&C domain name     server’s list dynamically. This dynamic list created by a domain generation algorithm helps an attacker to periodically change its C&C servers and prevent their addresses from being blacklisted. Each infected host generates a large   number of domain names using a predefined algorithm and attempts to map them to their corresponding addresses by sending queries to the domain server. In this paper, the deep autoencoder neural network is used to identify domains without any knowledge of their generating algorithm, and the performance of the proposed method is compared with the performance of machine learning algorithms. Initially, a new dataset is created by combining a data set with    normal domains and two datasets containing malicious and abnormal domains and both manual and automated   methods are used to extract the features of the new dataset. Deep autoencoder neural network is applied to new and pre-processed datasets and the results are compared with machine learning algorithms. Based on the obtained results, it is possible to identify the malicious domains generated by domain generating algorithms using the deep autoencoder neural network with a higher speed and an accuracy rate larger than 98.61%.
 

Keywords


[1]     M. Asadi, S. Parsa, M. A. Jabraeil Jamali, and V. Majidnezhad, “P2P Botnet detection Using Deep Learning method,” Journal of Electronical & Cyber Defence, vol. 8, no. 2, 2020. (In Persian)##
[2]     R. Jalaei and M. R. Hasani Ahangar, “An Analytical Survey on Botnet and Detection Methods,” Journal of Electronical & Cyber Defence, vol. 4, no. 4, 2017. (In Persian)##
[3]     V. Mohammadi and A. Rezaee, “Botnets Detection by Analyzing Network Traffic Group Activities and Unsuccessful Responses,” Passive Defense Quarterly, vol. 7, no.3, 2016. (In Persian)##
[4]     D. K. McGrath and M. Gupta, “Behind Phishing: An Examination of Phisher Modi Operandi,” First USENIXWorkshop on Large-Scale Exploits and Emergent Threats, LEET ‘08, San Francisco, CA,USA, 2008.##
[5]     L. Bilge, E. Kirda, C. Kruegel, and M. Balduzzi, “Exposure: Finding Malicious Domains Using Passive DNS Analysis,” Network and Distributed System Security Symposium, NDSS 2011, San Diego, CA, USA, 2011.##
[6]     J. Ma, L. K. Saul, S. Savage, and G.M. Voelker, “Beyond blacklists: Learning to detect malicious web sites from suspicious URLs,” 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, 2009.##
[7]     M. Antonakakis, R. Perdisci, Y. Nadji, N. Vasiloglou, S. Abu-Nimeh, W. Lee, and D. Dagon, “From Throw-Away Traffic to Bots: Detecting the Rise of DGA-Based Malware,” 21st USENIX Security Symposium, Bellevue, WA, USA, 2012.##
[8]     S. Yadav, A. K. K. Reddy, A. L. N. Reddy, and S. Ranjan, “Detecting algorithmically generated malicious domain names,” 10th annual conference on Internet measurement MC ’10, 2010.##
[9]     R. R. Curtin, A. B. Gardner, S. Grzonkowski, A. Kleymenov, and A. Mosquera, “Detecting DGA domains with recurrent neural networks and side information,” 14th International Conference on Availability, Reliability and Security, 2019.##
[10]  N. Davuth and S-R. Kim, “Classification of malicious domain names using support vector machine and bi-gram method,” J. Secur. Appl., vol. 7, pp. 51–58, 2013.##
[11]  J. Woodbridge, H. S. Anderson, A. Ahuja, and D. Grant, “Predicting domain generation algorithms with long      short-term memory networks,” arXiv Prepr, arXiv:1611.00791, 2016.##
[12]  K. Demertzis and L. Iliadis, “Evolving Smart URL Filter in a Zone-Based Policy Firewall for Detecting Algorithmically Generated Malicious Domains,” Lecture Notes in Computer Science, pp. 223–233, 2015.##
[13]  S. Schiavoni, F. Maggi, L. Cavallaro, and S. Zanero, “Phoenix: DGA-Based Botnet Tracking and Intelligence,” Lecture Notes in Computer Science, pp. 192–211, 2014.##
[14]  S. Yadav, A. K. K. Reddy, A. L. N. Reddy, and S. Ranjan, “Detecting Algorithmically Generated Domain-Flux Attacks With DNS Traffic Analysis,” IEEE/ACM Transactions on Networking, vol. 20, no. 5, pp. 1663–1677, 2012.##
F. Ren, Z. Jiang, X. Wang, and J. Liu, “A DGA domain names detection modeling method based on integrating an attention mechanism and deep neural network,” Cyber security, vol. 3, no.1, pp. 1-13, 2020.##
[15]  J. Hagen and S. Luo, “Why domain generating algorithms (dgas),” Trend Micro, Retrieved March, vol. 25, 2017.##
 [16]  C. E. Shannon, “A Mathematical Theory of Communication,” 2009.##
[17]  S. Douzi, M. Amar, and B. El Ouahidi, “Advanced Phishing Filter Using Autoencoder and Denoising Autoencoder,” International Conference on Big Data and Internet of  Thing – BDIOT’17, 2017.##
[18]  D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv Prepr, arXiv:1412.6980, 2014.##
[19]  Y. Bengio, A. Courville, and P. Vincent, “Representation Learning: A Review and New Perspectives,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1798–1828, 2013.##
[20]  J. Wang, H. He, and D. V. Prokhorov, “A Folded Neural Network Autoencoder for Dimensionality Reduction,” Procedia Computer Science, vol. 13, pp. 120–127, 2012.##
[21]  M. Abadi et al., “Tensorflow: a system for large-scale machine learning,” OSDI, vol. 16, pp. 265–283, 2016.##
[22]  Chollet F., Keras, Accessed 2017-05-28. [Online]. Available: https://github.com/fchollet/keras##
[23]  Alexa Top 1 Million Sites: The Alexa Top Sites web service provides access to lists of websites ordered by Alexa Traffic Rank. (https://www.kaggle.com/ cheedcheed/top1m)##
[24]  Bambenek Consulting provided malicious algorithmically generated domains.(http://osint.bambenekconsulting.com /feeds/dga-feed.txt)##
[25]  360 Lab DGA Domains: A collection of domains generated by DGA and it is maintained by 360-a Chinese security vendor. (https://data.netlab.360.com/feeds/dga/ dga.txt)##
Volume 9, Issue 1 - Serial Number 33
Serial No. 33, Spring Quarterly
April 2021
Pages 61-74
  • Receive Date: 02 April 2020
  • Revise Date: 11 June 2020
  • Accept Date: 05 August 2020
  • Publish Date: 21 April 2021