Feature selection using a combination of Genetic-Whale-Ant colony algorithms for software fault prediction by machine learning

Document Type : Original Article

Authors

1 Assistant Professor, Imam Hossein University, Tehran, Iran

2 Master's degree, Imam Hossein University (AS), Tehran, Iran

3 Researcher, Imam Hossein University, Tehran, Iran

Abstract

Software fault prediction methods are used to predict fault-prone modules in the early stages of software development. Machine learning techniques are the most common techniques used in software fault prediction. Data dimensionality is one of the problems that affect the performance of machine learning algorithms. Data dimensionality means the existence of irrelevant or redundant features that may mislead the learning algorithm hence decrease its accuracy. The low accuracy of software fault prediction causes the late detection of some faulty modules and as a result increases the effort and cost of fixing faults abnormally. Therefore, solving the data dimensionality problem is necessary to increase the accuracy of software fault prediction. Researchers use the feature selection algorithms for dimensionality reduction. Feature selection algorithms are divided into two types of filter-based feature selection and wrapper-based feature selection algorithms. Wrapper-based algorithms lead to higher accuracy prediction models. In these algorithms we can use different methods to search for the good solutions; the best of which, is the metaheuristic search. Each of the metaheuristic algorithms has some strengths and weaknesses, so the researchers use a combination of these algorithms to address these weaknesses. In this research, to address the weaknesses of each metaheuristic algorithm, a combination of genetic, ant colony and whale optimization algorithms, is used as the wrapper feature selection. Obviously, the application of early software fault prediction methods before the actual test is one of the effective passive defense techniques in reducing the software system development costs. 19 software projects are used to evaluate the proposed method. Comparison of the results with other methods shows that the proposed method outperforms the counterparts.

Keywords


Smiley face

[1]     J. Gaur, A. Goyal, T. Choudhury, and S. Sabitha, "A Walk Through of Software Testing Techniques," in 5th International Conference on System Modeling & Advancement in Research Trends, Moradabad, 2016. 
[2]     J. Goyal and B. Kishan, "Progress on Machine Learning Techniques for Software Fault Prediction," International Journal of Advanced Trends in Computer Science and Engineering, vol. 8, no. 2, pp. 305-311, 2019.
[3]     H. Turabieh, M. Mafarja, and X. Li, "Iterated feature selection algorithms with layered recurrent neural network for software fault prediction," Expert Systems With Applications, vol. 122, pp. 27-42, 2019.
[4]     F. Karimian and S. M. Babamir, "Evaluation of Classifiers in Software Fault-Proneness Prediction," Journal of AI and Data Mining, vol. 5, no. 2, pp. 149-167, 2017.
[5]     M. Mafarja, A. Qasem, A. A. Heidari, I. Aljarah, H. Faris, and S. Mirjalili, "Efficient Hybrid Nature-Inspired Binary Optimizers for Feature Selection," Cognitive Computation, vol. 12, no. 1, pp. 150-175, 2019.
[6]     H. M. Mohammad, S. U. Umar, and T. A. Rashid, "A Systematic and Meta-Analysis Survey of Whale Optimization Algorithm," Computational Intelligence and Neuroscience, vol. 2019, pp. 1-25, 2019.
[7]     S. Umadevi and K. S. J. Marseline, "A Survey on Data Mining Classification Algorithms," in International Conference on Signal Processing and Communication, Karunya Nagar, 2017. 
[8]     A. Kaur and I. Kaur, "An empirical evaluation of classification algorithms for fault prediction in open source projects," Journal of King Saud University - Computer and Information Sciences, vol. 30, no. 1, pp. 2-17, 2018.
[9]     P. Singh, R. Malhotra, and S. Bansal, "Analyzing the Effectiveness of Machine Learning Algorithms for Determining Faulty Classes: A Comparative Analysis," in 9th International Conference on Cloud Computing, Data Science & Engineering, Noida, 2019. 
[10] S. Bernard, L. Heutte, and S. Adam, "Influence of Hyperparameters on Random Forest Accuracy," in International Workshop on Multiple Classifier Systems, Reykjavik, 2009. 
[11] E. Scornet, "Tuning parameters in random forests," ESAIM: Proceedings and surveys, vol. 60, pp. 144-162, 2018.
[12] B. Venkatesh and J. Anuradha, "A Review of Feature Selection and Its Methods," Cybernetics and Information Technologies, vol. 19, no. 1, pp. 3-26, 2019.
[13] N. Mlambo, W. K. Cheruiyot, and M. W. Kimwele, "A Survey and Comparative Study of Filter and Wrapper Feature Selection Techniques," The International Journal Of Engineering And Science, vol. 5, no. 8, pp. 57-67, 2016.
[14] A. Jović, K. Brkić, and N. Bogunović, "A review of feature selection methods with applications," in 2015 38th international convention on information and communication technology, electronics and microelectronics (MIPRO), 2015: Ieee, pp. 1200-1205. 
[15] A. O. Balogun, S. Basri, S. J. Abdulkadir, and A. S. Hashim, "Performance Analysis of Feature Selection Methods in Software Defect Prediction: A Search Method Approach," applied sciences, vol. 9, no. 13, p. 2764, 2019.
[16] T. Dokeroglu, E. Sevinc, T. Kucukyilmaz, and A. Cosar, "A survey on new generation metaheuristic algorithms," Computers & Industrial Engineering, vol. 137, pp. 1-29, 2019.
[17] S. Mirjalili and A. Lewis, "The Whale Optimization Algorithm," Advances in Engineering Software, vol. 95, pp. 51-67, 2016.
[18] M. M. Mafarja and S. Mirjalili, "Whale Optimization Approaches for Wrapper Feature Selection," Applied Soft Computing, vol. 62, pp. 441-453, 2018.
[19] M. Sharawi, H. M. Zawbaa, and E. Emary, "Feature Selection Approach Based on Whale Optimization Algorithm," in Ninth International Conference on Advanced Computational Intelligence, Doha, 2017. 
[20] M. Dorigo, M. Birattari, and T. Stutzle, "Ant colony optimization," IEEE Computational Intelligence Magazine, vol. 1, no. 4, pp. 28-39, 2006.
[21] E. Zorarpacı and S. A. Özel, "A hybrid approach of differential evolution and artificial bee colony for feature selection," Expert Systems with Applications, vol. 62, pp. 91-103, 2016.
[22] G. Haixiang, L. Yijing, J. Shang, G. Mingyun, H. Yuanyue, and G. Bing, "Learning from class-imbalanced data: Review of methods and applications," Expert Systems With Applications, vol. 73, pp. 220-239, 2017.
[23] V. López, A. Fernández, S. García, V. Palade, and F. Herrera, "An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics," Information Sciences, vol. 250, pp. 113-141, 2013.
[24] P. Branco, L. Torgo, and R. P. Ribeiro, "A Survey of Predictive Modeling on Imbalanced Domains," ACM Computing Surveys, vol. 49, no. 2, pp. 1-50, 2016.
[25] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "SMOTE: Synthetic Minority Over-sampling Technique," Journal of Artificial Intelligence Research, vol. 16, pp. 321-357, 2002.
[26] R. Mohammed, J. Rawashdeh, and M. Abdullah, "Machine Learning with Oversampling and Undersampling Techniques: Overview Study and Experimental Results," in 11th International Conference on Information and Communication Systems, Irbid, 2020. 
[27] C. Catal and B. Diri, "Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem," Information Sciences, vol. 179, no. 8, pp. 1040-1058, 2009.
[28] E. Borandag, A. Ozcift, D. Kilinc, and F. Yucalar, "Majority Vote Feature Selection Algorithm in Software Fault Prediction," Computer Science and Information Systems, vol. 16, no. 2, pp. 515-539, 2019.
[29] A. K. Jakhar and K. Rajnish, "Software fault prediction with data mining techniques by using feature selection based models," International Journal on Electrical Engineering and Informatics, vol. 10, no. 3, pp. 447-465, 2018.
[30] S. Jacob and G. Raju, "Software Defect Prediction in Large Space Systems through Hybrid Feature Selection and Classification," The International Arab Journal of Information Technology, vol. 14, no. 2, pp. 208-214, 2017.
[31] M. Anbu and G. S. A. Mala, "Feature selection using firefly algorithm in software defect prediction," Cluster Computing, vol. 22, no. 5, pp. 10925-10934, 2019.
[32] C. Manjula and L. Florence, "Deep neural network based hybrid approach for software defect prediction using software metrics," Cluster Computing, vol. 22, no. 4, pp. 9847-9863, 2018.
[33] I. Tumar, Y. Hassouneh, H. Turabieh, and T. Thaher, "Enhanced Binary Moth Flame Optimization as a Feature Selection Algorithm to Predict Software Fault Prediction," IEEE Access, vol. 8, pp. 8041-8055, 2020.
[34] T. Thaher and N. Arman, "Efficient Multi-Swarm Binary Harris Hawks Optimization as a Feature Selection Approach for Software Fault Prediction," in 11th International Conference on Information and Communication Systems, Irbid, 2020.
[35] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, "The WEKA Data Mining Software: An Update," SIGKDD Explorations, vol. 11, no. 1, pp. 10-18, 2009.
[36]  E. Özcan, B. Bilgin, and E. E. Korkmaz, "A comprehensive analysis of hyper-heuristics," Intelligent data analysis, vol. 12, no. 1, pp. 3-23, 2008.
Volume 10, Issue 1 - Serial Number 37
Serial No. 37, Spring Quarterly
May 2022
Pages 33-45
  • Receive Date: 02 May 2021
  • Revise Date: 12 July 2021
  • Accept Date: 27 August 2021
  • Publish Date: 22 May 2022