Comparison of Supervised Machine Learning Algorithms in Detection of Botnets Domain Generation Algorithms

Document Type : Original Article

Authors

1 Ph.D. Student ,Department of Computer Engineering, Shabestar Branch, Islamic Azad University, Shabestar ------------ Faculty member of of Department of Computer Engineering, Khamneh Branch,

2 Assistant Professor, Department of Computer Engineering, Shabestar Branch, Islamic Azad University, Shabestar, Iran

3 Associate Professor, Department of Computer, Science and Technology University, Tehran, Iran

4 Assistant professor, Department of Computer Engineering, Shabestar Branch, Islamic Azad University, Shabestar, Iran

Abstract

Domain generation algorithms (DGAs) are used in Botnets as rendezvous points to their command and control (C&C) servers, and can continuously provide a large number of domains which can evade detection by traditional methods such as Blacklist. Internet security vendors often use blacklists to detect Botnets and malwares, but the DGA can    continuously update the domain to evade blacklist detection. In this paper, first, using features engineering; the three types of structural, statistical and linguistic features are extracted for the detection of DGAs, and then a new dataset is produced by using a dataset with normal DGAs and two datasets with malicious DGAs. Using supervised machine learning algorithms, the classification of DGAs has been performed and the results have been compared to determine a DGA detection model with a higher accuracy and a lower error rate. The results obtained in this paper show that the random forest algorithm offers accuracy rate, detection rate and receiver operating characteristic (ROC) equal to 89.32%, 91.67% and 0.889, respectively. Also, compared to the results of the other investigated algorithms, the random forest algorithm presents a lower false positive rate (FPR) equal to 0.373.
 

Keywords


[1]     S. Parsa, H. Mortazi, “Botnet Detection with Flow Behavior Analysis Approach,” Journal of Electronical &Cyber Defence, vol. 5, no. 4, 2017. (In Persian)##
[2]     S. Schiavoni, F. Maggi, L. Cavallaro, and S. Zanero, Phoenix, “DGA-based botnet track- ing and intelligence,” in: Proceedings of the International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA), in: Lecture Notes in Computer Science, 8550, pp. 192–211, 2014.##
[3]     J. Woodbridge, H. S. Anderson, A. Ahuja, and D. Grant, “Predicting domain generation algorithms with long short-term memory networks,” CoRR abs/1611.00791. arXiv:1611.00791, 2016.##
[4]     D. K. McGrath and  M. Gupta, “Behind Phishing: An Examination of Phisher Modi Operandi,” In Proceedings of the First USENIXWorkshop on  Large-Scale Exploits and Emergent Threats, LEET 08, San Francisco, CA,USA, 15 April 2008.##
[5]     L. Bilge, E. Kirda, C. Kruegel, and M. Balduzzi, “EXPOSURE: Finding Malicious Domains Using Passive DNS Analysis,” In Proceedings of the Network and Distributed System Security Symposium, NDSS 2011, San Diego, CA, USA, 6–9 February 2011.##
[6]     J. Ma, L. K. Saul, S. Savage, and G. M. Voelker, “Beyond blacklists: Learning to detect malicious web sites from suspicious URLs,” In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, pp. 1245–1254, 2009.##
[7]     S. Yadav, A. K. K. Reddy, A. L. N. Reddy, and S. Ranjan, “Detecting algorithmically generated domain-flux attacks with DNS traffic analysis,” IEEE/ACM Trans. Netw., vol. 20, pp. 1663–1677, 2012.##
[8]     M. Antonakakis, R. Perdisci, Y. Nadji, N. Vasiloglou, S. Abu-Nimeh,W. Lee, and D. Dagon, “From    Throw-Away Traffic to Bots: Detecting the Rise of DGA-Based Malware,” In Proceedings of the 21st USENIX Security Symposium, Bellevue, WA, USA, 8–10 August 2012.##
[9]     D. Nhauo and K. Sung-Ryul, “Classification of malicious domain names using support vector machine and bi-gram method,” J. Secur. Appl., vol. 7, pp. 51–58, 2013.##
[10]  K. Demertzis and L. Iliadis, “Evolving smart URL filter in a zone-based policy firewall for detecting algorithmically generated malicious domains,” In International Symposium on Statistical Learning and Data Sciences; Springer:Cham, Switzerland, pp.    223–233, 2015.##
[11]  J. Hagen and S. Luo, “Why domain generation algorithms (DGA)?,” Trend Micro, 18 August 2016.##
[12]  Symantec, W32.Ramnit analysis, Version 1.0,     2015-02-24.##
[13]  J. Geffner, “End-to-end analysis of a domain generating algorithm malware family,” Black Hat USA, 2013.##
[14]  C. E. Shannon, “A Mathematical Theory of Communication,” Bell System Technical Journal, vol. 27, no. 3, pp. 379–423, 1948.##
[15]  M. Mohri, A. Rostamizadeh, and A. Talwalkar, “Foundations of machine learning,” MIT press, 2018.##
[16]  I. Rish, “An empirical study of the naive Bayes classifier,” International Joint Conferences on Artificial Intelligence 2001 Workshop on Empirical Methods in Artificial Intelligence, pp. 41-46, 2001.##
[17]  L. Rokach and O. Z. Maimon, Data mining with decision trees: theory and applications, 2008.##
[18]  J. Harrell and E. Frank, “Regression modeling strategies: with applications to linear models, logistic and ordinal regression, and survival analysis,” Springer, 2015.##
[19]  D. Denisko and M. M. Hoffman, “Classification and interaction in random forests,” in Proceedings of the National Academy of Sciences, vol. 115, no. 8, pp. 1690-1692, 2018.##
[20]  C. Cortes and V. Vapnik, “Support vector networks,” Machine Learning, vol. 20, no. 3, pp. 273–297, 1995.##
[21]  G. Shakhnarovich, T. Darrel, and P. Indyk,    “Nearest-neighbor methods in lea