انتخاب ویژگی با استفاده از ترکیب الگوریتم‌های ژنتیک-وال-کلونی مورچگان، برای پیش‌بینی خطاهای نرم‌افزار مبتنی بر یادگیری ماشین

نوع مقاله : مقاله پژوهشی

نویسندگان

1 استادیار،دانشگاه جامع امام حسین (ع)، تهران، ایران

2 کارشناسی ارشد، دانشگاه جامع امام حسین (ع)، تهران، ایران

3 پژوهشگر، دانشگاه جامع امام حسین(ع)، تهران، ایران

چکیده

روش‌های پیش‌بینی خطای نرم‌افزار برای پیش‌بینی ماژول‌های مستعد خطا در مراحل اولیه‌ی توسعه‌ی نرم‌افزار استفاده می‌شود. امروزه فنون یادگیری ماشین پرکاربردترین فنون مورد استفاده در زمینه‌ی پیش‌بینی خطاهای نرم‌افزار محسوب می‌شود. ابعاد بالای داده یکی از مشکلاتی است که عملکرد الگوریتم‌های یادگیری ماشین را تحت تأثیر قرار می‌دهد. ابعاد بالای داده به معنای وجود ویژگی‌های غیرمفید یا افزونه است که احتمالاً الگوریتم یادگیری را گمراه می‌کند و درنتیجه دقت آن را کاهش می‌دهد. دقت پایین پیش‌بینی خطای نرم‌افزار باعث شناسایی دیرهنگام بعضی ماژول‌های خطادار می‌شود و در نتیجه تلاش و هزینه‌ی برطرف کردن خطاها را به صورت غیرعادی بالا می‌برد. بنابراین حل مشکل ابعاد بالای داده برای افزایش دقت پیش‌بینی خطای نرم‌افزار ضروری است. برای کاهش ابعاد داده، محققین از الگوریتم‌های انتخاب ویژگی استفاده می‌کنند. الگوریتم‌های انتخاب ویژگی به دو دسته‌ی الگوریتم‌های مبتنی بر فیلتر و الگوریتم‌های مبتنی بر پوشش تقسیم می‌شود. الگوریتم‌های مبتنی بر پوشش منجر به مدل‌های پیش‌بینی با دقت بالاتری می‌شود. در این الگوریتم‌ها می‌توان از روش‌های مختلفی برای جستوجوی راه‌حل‌ها استفاده نمود که بهترین نوع آن جست‌وجوی فراابتکاری است. هرکدام از الگوریتم‌های فراابتکاری نقاط قوت و ضعفی دارد که محققان برای برطرف کردن این نقاط ضعف از ترکیب این الگوریتم‌ها استفاده می‌کنند. در این تحقیق برای بهبود نقاط ضعف هرکدام از الگوریتم‌های فراابتکاری، از ترکیب سه الگوریتم ژنتیک، کلونی مورچگان و بهینه‌سازی وال برای انتخاب ویژگی مبتنی بر پوشش استفاده می‌شود. بدیهی است به‌کارگیری روش‌های پیش‌بینی زودهنگام خطاهای نرم‌افزار قبل از آزمون واقعی آن، یکی از فنون مؤثر پدافند غیرعامل در کاهش هزینه‌های توسعه‌ی سامانه‌های نرم‌افزاری محسوب می‌شود. برای ارزیابی روش پیشنهادی، 19 پروژه‌ی نرم‌افزاری مورد بررسی و آزمایش قرار گرفته و نتایج با دیگر روش‌ها مقایسه شده است. نتایج ارزیابی نشان می‌دهد که روش پیشنهادی از عملکرد بهتری نسبت به سایر روش‌ها برخوردار است.

کلیدواژه‌ها


عنوان مقاله [English]

Feature selection using a combination of Genetic-Whale-Ant colony algorithms for software fault prediction by machine learning

نویسندگان [English]

  • Ali Karimi 1
  • Mohammadreza Irajimoghaddam 2
  • Esmaeil Bastami 3
1 Assistant Professor, Imam Hossein University, Tehran, Iran
2 Master's degree, Imam Hossein University (AS), Tehran, Iran
3 Researcher, Imam Hossein University, Tehran, Iran
چکیده [English]

Software fault prediction methods are used to predict fault-prone modules in the early stages of software development. Machine learning techniques are the most common techniques used in software fault prediction. Data dimensionality is one of the problems that affect the performance of machine learning algorithms. Data dimensionality means the existence of irrelevant or redundant features that may mislead the learning algorithm hence decrease its accuracy. The low accuracy of software fault prediction causes the late detection of some faulty modules and as a result increases the effort and cost of fixing faults abnormally. Therefore, solving the data dimensionality problem is necessary to increase the accuracy of software fault prediction. Researchers use the feature selection algorithms for dimensionality reduction. Feature selection algorithms are divided into two types of filter-based feature selection and wrapper-based feature selection algorithms. Wrapper-based algorithms lead to higher accuracy prediction models. In these algorithms we can use different methods to search for the good solutions; the best of which, is the metaheuristic search. Each of the metaheuristic algorithms has some strengths and weaknesses, so the researchers use a combination of these algorithms to address these weaknesses. In this research, to address the weaknesses of each metaheuristic algorithm, a combination of genetic, ant colony and whale optimization algorithms, is used as the wrapper feature selection. Obviously, the application of early software fault prediction methods before the actual test is one of the effective passive defense techniques in reducing the software system development costs. 19 software projects are used to evaluate the proposed method. Comparison of the results with other methods shows that the proposed method outperforms the counterparts.

کلیدواژه‌ها [English]

  • Software Fault Prediction
  • Feature Selection
  • Metaheuristic Algorithm
  • Genetic Algorithm
  • Whale Optimization Algorithm
  • Ant Colony Optimization Algorithm

Smiley face

[1]     J. Gaur, A. Goyal, T. Choudhury, and S. Sabitha, "A Walk Through of Software Testing Techniques," in 5th International Conference on System Modeling & Advancement in Research Trends, Moradabad, 2016. 
[2]     J. Goyal and B. Kishan, "Progress on Machine Learning Techniques for Software Fault Prediction," International Journal of Advanced Trends in Computer Science and Engineering, vol. 8, no. 2, pp. 305-311, 2019.
[3]     H. Turabieh, M. Mafarja, and X. Li, "Iterated feature selection algorithms with layered recurrent neural network for software fault prediction," Expert Systems With Applications, vol. 122, pp. 27-42, 2019.
[4]     F. Karimian and S. M. Babamir, "Evaluation of Classifiers in Software Fault-Proneness Prediction," Journal of AI and Data Mining, vol. 5, no. 2, pp. 149-167, 2017.
[5]     M. Mafarja, A. Qasem, A. A. Heidari, I. Aljarah, H. Faris, and S. Mirjalili, "Efficient Hybrid Nature-Inspired Binary Optimizers for Feature Selection," Cognitive Computation, vol. 12, no. 1, pp. 150-175, 2019.
[6]     H. M. Mohammad, S. U. Umar, and T. A. Rashid, "A Systematic and Meta-Analysis Survey of Whale Optimization Algorithm," Computational Intelligence and Neuroscience, vol. 2019, pp. 1-25, 2019.
[7]     S. Umadevi and K. S. J. Marseline, "A Survey on Data Mining Classification Algorithms," in International Conference on Signal Processing and Communication, Karunya Nagar, 2017. 
[8]     A. Kaur and I. Kaur, "An empirical evaluation of classification algorithms for fault prediction in open source projects," Journal of King Saud University - Computer and Information Sciences, vol. 30, no. 1, pp. 2-17, 2018.
[9]     P. Singh, R. Malhotra, and S. Bansal, "Analyzing the Effectiveness of Machine Learning Algorithms for Determining Faulty Classes: A Comparative Analysis," in 9th International Conference on Cloud Computing, Data Science & Engineering, Noida, 2019. 
[10] S. Bernard, L. Heutte, and S. Adam, "Influence of Hyperparameters on Random Forest Accuracy," in International Workshop on Multiple Classifier Systems, Reykjavik, 2009. 
[11] E. Scornet, "Tuning parameters in random forests," ESAIM: Proceedings and surveys, vol. 60, pp. 144-162, 2018.
[12] B. Venkatesh and J. Anuradha, "A Review of Feature Selection and Its Methods," Cybernetics and Information Technologies, vol. 19, no. 1, pp. 3-26, 2019.
[13] N. Mlambo, W. K. Cheruiyot, and M. W. Kimwele, "A Survey and Comparative Study of Filter and Wrapper Feature Selection Techniques," The International Journal Of Engineering And Science, vol. 5, no. 8, pp. 57-67, 2016.
[14] A. Jović, K. Brkić, and N. Bogunović, "A review of feature selection methods with applications," in 2015 38th international convention on information and communication technology, electronics and microelectronics (MIPRO), 2015: Ieee, pp. 1200-1205. 
[15] A. O. Balogun, S. Basri, S. J. Abdulkadir, and A. S. Hashim, "Performance Analysis of Feature Selection Methods in Software Defect Prediction: A Search Method Approach," applied sciences, vol. 9, no. 13, p. 2764, 2019.
[16] T. Dokeroglu, E. Sevinc, T. Kucukyilmaz, and A. Cosar, "A survey on new generation metaheuristic algorithms," Computers & Industrial Engineering, vol. 137, pp. 1-29, 2019.
[17] S. Mirjalili and A. Lewis, "The Whale Optimization Algorithm," Advances in Engineering Software, vol. 95, pp. 51-67, 2016.
[18] M. M. Mafarja and S. Mirjalili, "Whale Optimization Approaches for Wrapper Feature Selection," Applied Soft Computing, vol. 62, pp. 441-453, 2018.
[19] M. Sharawi, H. M. Zawbaa, and E. Emary, "Feature Selection Approach Based on Whale Optimization Algorithm," in Ninth International Conference on Advanced Computational Intelligence, Doha, 2017. 
[20] M. Dorigo, M. Birattari, and T. Stutzle, "Ant colony optimization," IEEE Computational Intelligence Magazine, vol. 1, no. 4, pp. 28-39, 2006.
[21] E. Zorarpacı and S. A. Özel, "A hybrid approach of differential evolution and artificial bee colony for feature selection," Expert Systems with Applications, vol. 62, pp. 91-103, 2016.
[22] G. Haixiang, L. Yijing, J. Shang, G. Mingyun, H. Yuanyue, and G. Bing, "Learning from class-imbalanced data: Review of methods and applications," Expert Systems With Applications, vol. 73, pp. 220-239, 2017.
[23] V. López, A. Fernández, S. García, V. Palade, and F. Herrera, "An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics," Information Sciences, vol. 250, pp. 113-141, 2013.
[24] P. Branco, L. Torgo, and R. P. Ribeiro, "A Survey of Predictive Modeling on Imbalanced Domains," ACM Computing Surveys, vol. 49, no. 2, pp. 1-50, 2016.
[25] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "SMOTE: Synthetic Minority Over-sampling Technique," Journal of Artificial Intelligence Research, vol. 16, pp. 321-357, 2002.
[26] R. Mohammed, J. Rawashdeh, and M. Abdullah, "Machine Learning with Oversampling and Undersampling Techniques: Overview Study and Experimental Results," in 11th International Conference on Information and Communication Systems, Irbid, 2020. 
[27] C. Catal and B. Diri, "Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem," Information Sciences, vol. 179, no. 8, pp. 1040-1058, 2009.
[28] E. Borandag, A. Ozcift, D. Kilinc, and F. Yucalar, "Majority Vote Feature Selection Algorithm in Software Fault Prediction," Computer Science and Information Systems, vol. 16, no. 2, pp. 515-539, 2019.
[29] A. K. Jakhar and K. Rajnish, "Software fault prediction with data mining techniques by using feature selection based models," International Journal on Electrical Engineering and Informatics, vol. 10, no. 3, pp. 447-465, 2018.
[30] S. Jacob and G. Raju, "Software Defect Prediction in Large Space Systems through Hybrid Feature Selection and Classification," The International Arab Journal of Information Technology, vol. 14, no. 2, pp. 208-214, 2017.
[31] M. Anbu and G. S. A. Mala, "Feature selection using firefly algorithm in software defect prediction," Cluster Computing, vol. 22, no. 5, pp. 10925-10934, 2019.
[32] C. Manjula and L. Florence, "Deep neural network based hybrid approach for software defect prediction using software metrics," Cluster Computing, vol. 22, no. 4, pp. 9847-9863, 2018.
[33] I. Tumar, Y. Hassouneh, H. Turabieh, and T. Thaher, "Enhanced Binary Moth Flame Optimization as a Feature Selection Algorithm to Predict Software Fault Prediction," IEEE Access, vol. 8, pp. 8041-8055, 2020.
[34] T. Thaher and N. Arman, "Efficient Multi-Swarm Binary Harris Hawks Optimization as a Feature Selection Approach for Software Fault Prediction," in 11th International Conference on Information and Communication Systems, Irbid, 2020.
[35] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, "The WEKA Data Mining Software: An Update," SIGKDD Explorations, vol. 11, no. 1, pp. 10-18, 2009.
[36]  E. Özcan, B. Bilgin, and E. E. Korkmaz, "A comprehensive analysis of hyper-heuristics," Intelligent data analysis, vol. 12, no. 1, pp. 3-23, 2008.
دوره 10، شماره 1 - شماره پیاپی 37
شماره پیاپی 37، فصلنامه بهار
خرداد 1401
صفحه 33-45
  • تاریخ دریافت: 12 اردیبهشت 1400
  • تاریخ بازنگری: 21 تیر 1400
  • تاریخ پذیرش: 05 شهریور 1400
  • تاریخ انتشار: 01 خرداد 1401