تشخیص ایمیل‌های هرزنامه با استفاده از الگوریتم بهینه‌سازی جهت جریان بهبودیافته

راعی, حجت; سلیمانیان قره چپق, فرهاد

تشخیص ایمیل‌های هرزنامه با استفاده از الگوریتم بهینه‌سازی جهت جریان بهبودیافته

نوع مقاله : مقاله پژوهشی

نویسندگان

¹ کارشناسی ارشد،گروه مهندسی کامپیوتر، واحد ارومیه، دانشگاه آزاد اسلامی، ارومیه ، ایران

² دانشیار،گروه مهندسی کامپیوتر، واحد ارومیه، دانشگاه آزاد اسلامی، ارومیه ، ایران

چکیده

با پیشرفت علم و فناوری، محبوبیت روزافزون اینترنت خصوصاً ایمیل به طور گسترده‌ای افزایش یافته است. ایمیل هرزنامه یکی از مخرب‌ترین حملات توسط مهاجمان سایبری است که عمدتاً برای انتشار محتوای مخرب از جمله تبلیغات تجاری، ویروس‌های رایانه‌ای و اطلاعات گمراه‌کننده استفاده می‌شود. مهاجمان معمولاً سیستم‌ها و سرورها را با انواع مختلف بدافزارها و ویروس‌ها هدف قرار می‌دهند و هدف آنها به خطر انداختن یا دسترسی غیرمجاز به حساب‌های سیستم یا ایمیل است. در این مقاله، از الگوریتم جهت جریان بهبودیافته برای انتخاب ویژگی و الگوریتم k نزدیکترین همسایه برای طبقه‌بندی ایمیل هرزنامه استفاده شده است. الگوریتم جهت جریان معمولاً معایبی مانند گیرافتادن در بهینه محلی و عدم تنوع جمعیتی دارد. برای افزایش توانایی الگوریتم جهت جریان از عملگرهای آشوب به‌منظور تنوع جمعیتی و همگرایی سریع استفاده شده است. در روش پیشنهادی از دو نوع آشوب به نام‌های آشوب دایره‌ای و نگاشت لجستیک استفاده شده است. ارزیابی مدل پیشنهادی برروی مجموعه داده Spambase با 4601 نمونه و 57 ویژگی انجام شده است. نتایج نشان می‌دهد که درصد صحت مدل پیشنهادی با نگاشت لجستیک در مقایسه با روش‌های دیگر بیشتر است.

کلیدواژه‌ها

20.1001.1.23224347.1404.13.1.2.7

موضوعات

آسیب پذیری ها و تهدیدات فضای سایبری

عنوان مقاله [English]

An Improved Flow Direction Optimization Algorithm for Spam Email Detection

نویسندگان [English]

hojjat raie ¹
Farhad Soleimanian Gharehchopogh ²

¹ Master's degree, Department of Computer Engineering, Urmia Branch, Islamic Azad University, Urmia, Iran

² Associate Professor, Department of Computer Engineering, Urmia Branch, Islamic Azad University, Urmia, Iran

چکیده [English]

With the advancement of science and technology, the popularity of the Internet, particularly email, has increased significantly. Email spam has become one of the most prevalent forms of cyberattack, primarily used to disseminate malicious content, including commercial advertisements, computer viruses, and misleading information. Cyber attackers often target systems and servers with various types of malware and viruses to compromise or gain unauthorized access to systems or email accounts. This paper presents an improved flow direction algorithm for feature selection and a k-nearest neighbor algorithm for email spam classification. The Flow Direction Algorithm (FDA) typically faces challenges such as getting stuck in local optima and lacking population diversity. To enhance the FDA's capabilities, chaos operators have been introduced to promote population diversity and expedite convergence. The proposed method employs two types of chaos: circular chaos and logistic mapping. The performance of the proposed model was evaluated using the Spambase dataset, which consists of 4601 samples and 57 features. The results demonstrate that the accuracy of the proposed model, particularly with logistic mapping, is higher than that of other methods.

کلیدواژه‌ها [English]

Spam email detection
feature selection
flow direction algorithm
chaos mapping

مراجع

[1]. E. Blanzieri, A. Bryl, "A survey of learning-based techniques of email spam filtering," Artificial Intelligence Review, vol. 29, no. 1, pp. 63-92, 2008. https://doi.org/10.1007/s10462-009-9109-6

[2]. N. N. Nicholas, V. Nirmalrani, "An enhanced mechanism for detection of spam emails by deep learning technique with bio-inspired algorithm," e-Prime - Advances in Electrical Engineering, Electronics and Energy, vol. 8, no. 1, pp. 1-42, 2024. https://doi.org/10.1016/j.prime.2024.100504

[3]. S. O. Olatunji, "Improved email spam detection model based on support vector machines," Neural Computing and Applications, vol. 31, no. 3, pp. 691-699, 2019. https://doi.org/10.1007/s00521-017-3100-y

[4]. M. Rezvani, F. Bagheri, M. Fateh and E. Tahanian, "Malicious Domain Detection using DNS Records," Electronic and Cyber Defense, vol. 9, no. 3, pp. 83-97, 2021. (In Persian).https://dor.isc.ac/dor/20.1001.1.23224347.1400.9.3.7.8

[5]. S. Qesmati, "Spam Management in Social Networks by Content Rating," Electronic and Cyber Defense, vol. 2, no. 2, pp. 53-62, 2014. (In Persian).https://dorl.net/dor/20.1001.1.23224347.1393.2.2.14.4

[6]. R. A. Zitar, A. Hamdan, "Genetic optimized artificial immune system in spam detection: a review and a model," Artificial Intelligence Review, vol. 40, no. 3, pp. 305-377, 2013. https://doi.org/10.1007/s10462-011-9285-z

[7]. N. Mahmoodi, H. Shirazi, M. Fakhredanesh and K. Dadashtabar Ahmadi, "Improving the performance of the convolutional neural network using incremental weight loss function to deal with class imbalanced data," Electronic and Cyber Defense, vol. 11, no. 4, pp. 19-34, 2024. (In Persian).https://dorl.net/dor/20.1001.1.23224347.1402.11.4.2.9

[8]. M. khorram, M. Rahmanimanesh, "DDoS Attack Detection System Using Ensemble Method Classification and Active Learning Approach," Electronic and Cyber Defense, vol. 11, no. 3, pp. 101-118, 2023. (In Persian). https://dorl.net/dor/20.1001.1.23224347.1402.11.3.10.5

[9]. E. Bastami, H. Soltanizadeh, M. Rahmanimanesh and P. Keshavarzi, "A Malware Classification Method Using visualization and Word Embedding Features," Electronic and Cyber Defense, vol. 11, no. 1, pp. 1-13, 2023. (In Persian) https://dorl.net/dor/20.1001.1.23224347.1402.11.1.1.2

[10]. J. R. Vergara, P. A. Estévez, "A review of feature selection methods based on mutual information," Neural Computing and Applications, vol. 24, no. 1, pp. 175-186, 2014. https://doi.org/10.1007/s00521-013-1368-0

[11]. Z. M. Hira, D. F. Gillies, "A Review of Feature Selection and Feature Extraction Methods Applied on Microarray Data," Adv Bioinformatics, vol. 2015, no. pp. 198363, 2015. https://doi.org/10.1155/2015/198363

[12]. P. Hanchuan, L. Fuhui and C. Ding, "Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 8, pp. 1226-1238, 2005. https://doi.org/10.1109/TPAMI.2005.159

[13]. K. Kira, L. A. Rendell (1992) A Practical Approach to Feature Selection, in Machine Learning Proceedings 1992, Morgan Kaufmann: San Francisco (CA). p. 249-256. https://doi.org/10.1016/B978-1-55860-247-2.50037-1

[14]. F. S. Gharehchopogh, A. A. Khargoush A Chaotic-Based Interactive Autodidactic School Algorithm for Data Clustering Problems and Its Application on COVID-19 Disease Detection. Symmetry, 2023. volume, 1-20 DOI: 10.3390/sym15040894.

[15]. Y. Shen, C. Zhang, F. Soleimanian Gharehchopogh and S. Mirjalili, "An improved whale optimization algorithm based on multi-population evolution for global optimization and engineering design problems," Expert Systems with Applications, vol. 215, no. 1, pp. 119269, 2023. https://doi.org/10.1016/j.eswa.2022.119269

[16]. J. Yang, S. Gao, X. Zhao, G. Li and Z. Gao, "Enhanced Sparrow Search Algorithm Based on Improved Game Predatory Mechanism and its Application," Digital Signal Processing, vol. 2023, no. 1, pp. 104310, 2023. https://doi.org/10.1016/j.dsp.2023.104310

[17]. R. Ranjan, J. K. Chhabra, "Automatic clustering and feature selection using multi-objective crow search algorithm," Applied Soft Computing, vol. 142, no. 1, pp. 110305, 2023. https://doi.org/10.1016/j.asoc.2023.110305

[18]. M. Abd Elaziz, M. A. A. Al-qaness, A. Dahou, R. A. Ibrahim and A. A. A. El-Latif, "Intrusion detection approach for cloud and IoT environments using deep learning and Capuchin Search Algorithm," Advances in Engineering Software, vol. 176, no. 1, pp. 103402, 2023. https://doi.org/10.1016/j.advengsoft.2022.103402

[19]. S. Velliangiri, P. Karthikeyan, V. M. Arul Xavier and D. Baswaraj, "Hybrid electro search with genetic algorithm for task scheduling in cloud computing," Ain Shams Engineering Journal, vol. 12, no. 1, pp. 631-639, 2021. https://doi.org/10.1016/j.asej.2020.07.003

[20]. H. Karami, M. V. Anaraki, S. Farzin and S. Mirjalili, "Flow Direction Algorithm (FDA): A Novel Optimization Approach for Solving Optimization Problems," Computers & Industrial Engineering, vol. 156, no. 2, pp. 1-38, 2021. https://doi.org/10.1016/j.cie.2021.107224

[21]. f. Soleimanian Gharehchopogh, M. Sakhidek Hovshin, "A New Model for Email Spam Detection using Hybrid of Magnetic Optimization Algorithm with Harmony Search Algorithm," Biannual Journal Monadi for Cyberspace Security (AFTA), vol. 9, no. 1, pp. 50-39, 2020. http://monadi.isc.org.ir/article-1-163-en.html

[22]. P. Wanda, "GRUSpam: robust e-mail spam detection using gated recurrent unit (GRU) algorithm," International Journal of Information Technology, vol. 15, no. 8, pp. 4315-4322, 2023. https://doi.org/10.1007/s41870-023-01516-z

[23]. A. Hosseinalipour, R. Ghanbarzadeh, "A novel approach for spam detection using horse herd optimization algorithm," Neural Computing and Applications, vol. 34, no. 15, pp. 13091-13105, 2022. https://doi.org/10.1007/s00521-022-07148-x

[24]. P. Dhal, C. Azad, "Hybrid momentum accelerated bat algorithm with GWO based optimization approach for spam classification," Multimedia Tools and Applications, vol. 83, no. 9, pp. 26929-26969, 2024. https://doi.org/10.1007/s11042-023-16448-w

[25]. A. Sharaff, H. Gupta (2019) Extra-tree classifier with metaheuristics approach for email classification, in Advances in computer communication and computational sciences, Springer. p. 189-197. https://doi.org/10.1007/978-981-13-6861-5_17

[26]. N. O. F. Elssied, O. Ibrahim, "K-means clustering scheme for enhanced spam detection," Research Journal of Applied Sciences, Engineering and Technology, vol. 7, no. 10, pp. 1940-1952, 2014. http://dx.doi.org/10.19026/rjaset.7.486

[27]. K.-C. Ying, S.-W. Lin, Z.-J. Lee and Y.-T. Lin, "An ensemble approach applied to classify spam e-mails," Expert Systems with Applications, vol. 37, no. 3, pp. 2197-2201, 2010. https://doi.org/10.1007/s10207-023-00756-1

[28]. H. He, A. Tiwari, J. Mehnen, T. Watson, C. Maple, Y. Jin and B. Gabrys. Incremental information gain analysis of input attribute impact on RBF-kernel SVM spam detection. in 2016 IEEE Congress on Evolutionary Computation (CEC). 2016. 1022-1029. https://doi.org/10.1109/CEC.2016.7743901

[29]. M. Mafarja, I. Aljarah, H. Faris, A. I. Hammouri, A. M. Al-Zoubi and S. Mirjalili, "Binary grasshopper optimisation algorithm approaches for feature selection problems," Expert Systems with Applications, vol. 117, no. pp. 267-286, 2019. https://doi.org/10.1016/j.eswa.2018.09.015

[30]. W. Ma, D. Tran and D. Sharma. A novel spam email detection system based on negative selection. in 2009 Fourth International Conference on Computer Sciences and Convergence Information Technology. 2009. 987-992. https://doi.org/10.1109/ICCIT.2009.58

[31]. S. Ozawa, J. Nakazato, T. Ban and J. Shimamura. An autonomous online malicious spam email detection system using extended RBF network. in 2015 International Joint Conference on Neural Networks (IJCNN). 2015. 1-7. https://doi.org/10.1109/IJCNN.2015.7280826

[32]. H. Mohammadzadeh, F. S. Gharehchopogh, "A novel hybrid whale optimization algorithm with flower pollination algorithm for feature selection: Case study Email spam detection," Computational Intelligence, vol. 37, no. 1, pp. 176-209, 2021. https://doi.org/10.1111/coin.12397

[33]. S. Abu-Nimeh, D. Nappa, X. Wang and S. Nair. Bayesian additive regression trees-based spam detection for enhanced email privacy. in 2008 Third International Conference on Availability, Reliability and Security. 2008. 1044-1051. https://doi.org/10.1109/ARES.2008.136

[34]. S. B. Rathod, T. M. Pattewar. Content based spam detection in email using Bayesian classifier. in 2015 International Conference on Communications and Signal Processing (ICCSP). 2015. 1257-1261. https://doi.org/10.1109/ICCSP.2015.7322709

[35]. A. A. Naem, N. I. Ghali and A. A. Saleh, "Antlion optimization and boosting classifier for spam email detection," Future Computing and Informatics Journal, vol. 3, no. 2, pp. 436-442, 2018. https://doi.org/10.1016/j.fcij.2018.11.006

[36]. R. K. Eluri, N. Devarakonda, "Feature Selection with a Binary Flamingo Search Algorithm and a Genetic Algorithm," Multimedia Tools and Applications, vol. 82, no. 17, pp. 26679-26730, 2023. https://doi.org/10.1007/s11042-023-15467-x

[37]. H. Lu, X. Wang, Z. Fei and M. Qiu, "The Effects of Using Chaotic Map on Improving the Performance of Multiobjective Evolutionary Algorithms," Mathematical Problems in Engineering, vol. 2014, no. pp. 924652, 2014. https://doi.org/10.1155/2014/924652

[38]. M. Abdel-Basset, W. Ding and D. El-Shahat, "A hybrid Harris Hawks optimization algorithm with simulated annealing for feature selection," Artificial Intelligence Review, vol. 54, no. 1, pp. 593-637, 2021. https://doi.org/10.1007/s10462-020-09860-3

[39]. A. Ghosh, A. Senthilrajan, "Comparison of machine learning techniques for spam detection," Multimedia Tools and Applications, vol. 2023, no. 1, pp. 1-18, 2023. 10.1007/s11042-023-14689-3

[40]. R. Talaei Pashiri, Y. Rostami and M. Mahrami, "Spam detection through feature selection using artificial neural network and sine–cosine algorithm," Mathematical Sciences, vol. 14, no. 3, pp. 193-199, 2020. https://doi.org/10.1007/s40096-020-00327-8

[41]. M. Shuaib, S. i. M. Abdulhamid, O. S. Adebayo, O. Osho, I. Idris, J. K. Alhassan and N. Rana, "Whale optimization algorithm-based email spam feature selection method using rotation forest algorithm for classification," SN Applied Sciences, vol. 1, no. 5, pp. 390, 2019. https://doi.org/10.1007/s42452-019-0394-7

[42]. P. Ovhal, S. Kulkarni and J. K. Valadi, "Improved Filter Ranking Incorporated Binary Black Hole Algorithm for Feature Selection," SN Computer Science, vol. 3, no. 1, pp. 1-23, 2021. https://doi.org/10.1007/s42979-021-00933-w

[43]. R. A. Ibrahim, M. A. Elaziz, D. Oliva, E. Cuevas and S. Lu, "An opposition-based social spider optimization for feature selection," Soft Computing, vol. 23, no. 24, pp. 13547-13567, 2019. https://doi.org/10.1007/s00500-019-03891-x