شناسایی دامنه های بدخواه شبکه های بات با استفاده از شبکه عصبی خود رمزگذار عمیق

اسدی, مهدی; پارسا, سعید; وثوقی, وحید

شناسایی دامنه های بدخواه شبکه های بات با استفاده از شبکه عصبی خود رمزگذار عمیق

نوع مقاله : مقاله پژوهشی

نویسندگان

¹ مربی، گروه مهندسی کامپیوتر، واحد خامنه، دانشگاه آزاد اسلامی، خامنه، ایران

² دانشیار، دانشکده مهندسی کامپیوتر، دانشگاه علم و صنعت ایران، تهران، ایران

³ کارشناسی ارشد، گروه مهندسی کامپیوتر، واحد شبستر، دانشگاه آزاد اسلامی، شبستر، ایران

چکیده

هر شبکه ‌بات گروهی از میزبان‌هایی است که با کد بدخواه یکسانی آلوده‌شده و از طریق یک یا چند سرویس‌دهنده فرمان ‌و کنترل توسط مهاجم یا مدیر بات هدایت می‌شوند. در شبکه‌های بات نسل جدید فهرست ‌نام‌های دامنه سرویس‌دهندههای فرمان ‌و کنترل به‌صورت پویا ایجاد می‌شود. این فهرست پویا که توسط یک الگوریتم تولید دامنه ایجاد می‌شود به مهاجم کمک می‌کند تا مکان سرویس‌دهندههای فرمان ‌و کنترل خود را به‌صورت دوره‌ای تغییر داده و از قرار گرفتن آدرس‌های آن‌ها در فهرست‌های سیاه جلوگیری کند. هر میزبان آلوده با استفاده از یک الگوریتم از پیش تعریف‌شده، تعداد زیادی نام دامنه تولید کرده و با ارسال پرس‌وجوهای سرویس‌دهنده دامنه تلاش می‌کند آن‌ها را به آدرس‌های متناظرشان نگاشت کند. در این مقاله، از الگوریتم‌ شبکه عصبی خود رمزگذار عمیق برای شناسایی دامنههایی که هیچ‌گونه آگاهی از الگوریتم تولید آن‌ها وجود نداشته است، استفاده‌شده و عملکرد روش پیشنهادی با عملکرد الگوریتمهای یادگیری ماشین مقایسه شده است. ابتدا مجموعه داده جدیدی از ترکیب یک مجموعه داده با دامنههای سالم و دو مجموعه داده حاوی دامنههای بدخواه و ناسالم ایجادشده و از دو سناریوی دستی و خودکار برای استخراج ویژگیهای مجموعه داده جدید استفاده‌شده است. شبکه عصبی خود رمزگذار عمیق بر روی مجموعه داده جدید و پیش‌پردازش شده اعمال‌شده و نتایج در مقایسه با الگوریتمهای یادگیری ماشین بررسی‌شده است. با توجه به نتایج به‌دست‌آمده، میتوان با استفاده از شبکه عصبی خود رمزگذار عمیق، دامنههای بدخواه تولیدشده توسط الگوریتمهای تولید دامنه را با سرعت بیشتر و نرخ صحت بیشتر از 98.61% شناسایی کرد.

کلیدواژه‌ها

DOR:20.1001.1.23224347.1400.9.1.5.2

عنوان مقاله [English]

Detecting of Botnets’ Malicious Domains with Deep Autoencoder Neural Network

نویسندگان [English]

M. Asadi ¹
S. Parsa ²
V. Vosoughi ³

¹ Department of Computer Engineering, Khamneh Branch, Khamneh, Iran

² Associate Professor, Department of computer engineering, Iran university of science and technology, Tehran, Iran

³ Department of Computer Engineering, Shabestar Branch, Islamic Azad University, Shabestar, Iran

چکیده [English]

Botnet is a group of hosts infected with the same malicious code and managed by an attacker or Botmaster through one or more command and control (C&C) servers. The new generation of Botnets generates C&C domain name server’s list dynamically. This dynamic list created by a domain generation algorithm helps an attacker to periodically change its C&C servers and prevent their addresses from being blacklisted. Each infected host generates a large number of domain names using a predefined algorithm and attempts to map them to their corresponding addresses by sending queries to the domain server. In this paper, the deep autoencoder neural network is used to identify domains without any knowledge of their generating algorithm, and the performance of the proposed method is compared with the performance of machine learning algorithms. Initially, a new dataset is created by combining a data set with normal domains and two datasets containing malicious and abnormal domains and both manual and automated methods are used to extract the features of the new dataset. Deep autoencoder neural network is applied to new and pre-processed datasets and the results are compared with machine learning algorithms. Based on the obtained results, it is possible to identify the malicious domains generated by domain generating algorithms using the deep autoencoder neural network with a higher speed and an accuracy rate larger than 98.61%.

کلیدواژه‌ها [English]

Botnet
Domain Generation Algorithms (DGAs)
Feature Extraction
Deep Neural Network
Deep Autoencoder Neural Network

مراجع

[1] M. Asadi, S. Parsa, M. A. Jabraeil Jamali, and V. Majidnezhad, “P2P Botnet detection Using Deep Learning method,” Journal of Electronical & Cyber Defence, vol. 8, no. 2, 2020. (In Persian)##

[2] R. Jalaei and M. R. Hasani Ahangar, “An Analytical Survey on Botnet and Detection Methods,” Journal of Electronical & Cyber Defence, vol. 4, no. 4, 2017. (In Persian)##

[3] V. Mohammadi and A. Rezaee, “Botnets Detection by Analyzing Network Traffic Group Activities and Unsuccessful Responses,” Passive Defense Quarterly, vol. 7, no.3, 2016. (In Persian)##

[4] D. K. McGrath and M. Gupta, “Behind Phishing: An Examination of Phisher Modi Operandi,” First USENIXWorkshop on Large-Scale Exploits and Emergent Threats, LEET ‘08, San Francisco, CA,USA, 2008.##

[5] L. Bilge, E. Kirda, C. Kruegel, and M. Balduzzi, “Exposure: Finding Malicious Domains Using Passive DNS Analysis,” Network and Distributed System Security Symposium, NDSS 2011, San Diego, CA, USA, 2011.##

[6] J. Ma, L. K. Saul, S. Savage, and G.M. Voelker, “Beyond blacklists: Learning to detect malicious web sites from suspicious URLs,” 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, 2009.##

[7] M. Antonakakis, R. Perdisci, Y. Nadji, N. Vasiloglou, S. Abu-Nimeh, W. Lee, and D. Dagon, “From Throw-Away Traffic to Bots: Detecting the Rise of DGA-Based Malware,” 21st USENIX Security Symposium, Bellevue, WA, USA, 2012.##

[8] S. Yadav, A. K. K. Reddy, A. L. N. Reddy, and S. Ranjan, “Detecting algorithmically generated malicious domain names,” 10th annual conference on Internet measurement MC ’10, 2010.##

[9] R. R. Curtin, A. B. Gardner, S. Grzonkowski, A. Kleymenov, and A. Mosquera, “Detecting DGA domains with recurrent neural networks and side information,” 14th International Conference on Availability, Reliability and Security, 2019.##

[10] N. Davuth and S-R. Kim, “Classification of malicious domain names using support vector machine and bi-gram method,” J. Secur. Appl., vol. 7, pp. 51–58, 2013.##

[11] J. Woodbridge, H. S. Anderson, A. Ahuja, and D. Grant, “Predicting domain generation algorithms with long short-term memory networks,” arXiv Prepr, arXiv:1611.00791, 2016.##

[12] K. Demertzis and L. Iliadis, “Evolving Smart URL Filter in a Zone-Based Policy Firewall for Detecting Algorithmically Generated Malicious Domains,” Lecture Notes in Computer Science, pp. 223–233, 2015.##

[13] S. Schiavoni, F. Maggi, L. Cavallaro, and S. Zanero, “Phoenix: DGA-Based Botnet Tracking and Intelligence,” Lecture Notes in Computer Science, pp. 192–211, 2014.##

[14] S. Yadav, A. K. K. Reddy, A. L. N. Reddy, and S. Ranjan, “Detecting Algorithmically Generated Domain-Flux Attacks With DNS Traffic Analysis,” IEEE/ACM Transactions on Networking, vol. 20, no. 5, pp. 1663–1677, 2012.##

F. Ren, Z. Jiang, X. Wang, and J. Liu, “A DGA domain names detection modeling method based on integrating an attention mechanism and deep neural network,” Cyber security, vol. 3, no.1, pp. 1-13, 2020.##

[15] J. Hagen and S. Luo, “Why domain generating algorithms (dgas),” Trend Micro, Retrieved March, vol. 25, 2017.##

[16] C. E. Shannon, “A Mathematical Theory of Communication,” 2009.##

[17] S. Douzi, M. Amar, and B. El Ouahidi, “Advanced Phishing Filter Using Autoencoder and Denoising Autoencoder,” International Conference on Big Data and Internet of Thing – BDIOT’17, 2017.##

[18] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv Prepr, arXiv:1412.6980, 2014.##

[19] Y. Bengio, A. Courville, and P. Vincent, “Representation Learning: A Review and New Perspectives,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1798–1828, 2013.##

[20] J. Wang, H. He, and D. V. Prokhorov, “A Folded Neural Network Autoencoder for Dimensionality Reduction,” Procedia Computer Science, vol. 13, pp. 120–127, 2012.##

[21] M. Abadi et al., “Tensorflow: a system for large-scale machine learning,” OSDI, vol. 16, pp. 265–283, 2016.##

[22] Chollet F., Keras, Accessed 2017-05-28. [Online]. Available: https://github.com/fchollet/keras##

[23] Alexa Top 1 Million Sites: The Alexa Top Sites web service provides access to lists of websites ordered by Alexa Traffic Rank. (https://www.kaggle.com/ cheedcheed/top1m)##

[24] Bambenek Consulting provided malicious algorithmically generated domains.(http://osint.bambenekconsulting.com /feeds/dga-feed.txt)##

[25] 360 Lab DGA Domains: A collection of domains generated by DGA and it is maintained by 360-a Chinese security vendor. (https://data.netlab.360.com/feeds/dga/ dga.txt)##