A model for feature selection in software fault prediction based on memetic algorithm and fuzzy logic

Document Type : Original Article

Authors

1 Master student, Faculty of Computer, Imam Hossein University, Tehran, Iran

2 Assistant Professor, Imam Hossein University, Tehran, Iran

3 Researcher, Semnan University, Semnan, Iran

Abstract

Today, due to high costs, it is not possible to perform a comprehensive and complete test on all parts of the
software. But if the fault-prone parts are identified before the test, the main focus of the test can be placed
on these parts, which leads to cost savings. Identifying fault-prone components is the main purpose of
software fault prediction. A predictive model receives software modules along with their features as input
and predicts which ones are prone to fault. Machine learning techniques are commonly used to construct
these models, the performance of which is highly dependent on the training dataset. Training datasets
usually have many software features, some of which are irrelevant or redundant, and the removal of these
features is done using feature selection methods. In this research, a new method for wrapper-based feature
selection is proposed that uses memetic algorithm, random forest technique and a new criterion based on
fuzzy inference system. The results show that the proposed fuzzy evaluation criterion has a better
performance than the existing criteria and improves the performance of feature selection. The final purpose
of this research is to achieve a robust model for predicting high performance software faults and the
comparison results show that the proposed model has higher performance than other models.

Keywords


[1]     N. Rajkumar and C. Viji, “An Efficient Software Fault Prediction Scheme to Assure Qualified Software Implementation using Improved Classification Methods,” International Journal of Innovative Technology and Exploring Engineering (IJITEE), vol. 8, no. 8S, 2019.##
[2]     J. Goyal and B. Kishan, “Progress on Machine Learning Techniques for Software Fault Prediction,” International Journal of Advanced Trends in Computer Science and Engineering, vol. 8, 2019.##
[3]     H. Turabieh, M. Mafarja, and X. Li, “Iterated feature selection algorithms with layered recurrent neural network for software fault prediction,” Expert systems with applications, vol. 122, pp. 27-42, 2019.##
[4]     M. Babamir and F. Karimian, “Evaluation of Classifiers in Software Fault-Proneness Prediction,” Journal of AI and Data Mining, vol. 5, 2017.##
[5]     Y. Jian, X. Yu, Z. Xu, and Z. Ma, “A Hybrid Feature Selection Method for Software Fault Prediction,” IEICE Transactions on Information and Systems, vol. 102, no. 10, pp. 1966-1975, 2019.##
[6]     R. Moussa and D. Azar, “A PSO-GA approach targeting fault-prone software modules,” Journal of Systems and Software, vol. 132, pp. 41-49, 2017.##
[7]     L. Kumar, S. K. Sripada, A. Sureka, and S. K. Rath, “Effective fault prediction model developed using least square support vector machine (LSSVM),” Journal of Systems and Software, vol. 137, pp. 686-712, 2018.##
[8]     S. Wang and X. Yao, “Using class imbalance learning for software defect prediction,” IEEE Transactions on Reliability, vol. 62, no. 2, pp. 434-443, 2013.##
[9]     A. Abu Zaher, R. Berretta, N. Noman, and P. Moscato, “An adaptive memetic algorithm for feature selection using proximity graphs,” Computational Intelligence, vol. 35, no. 1, pp. 156-183, 2019.##
[10]   K. Gao, T. M. Khoshgoftaar, H. Wang, and N. Seliya, “Choosing software metrics for defect prediction: an investigation on feature selection techniques,” Software: Practice and Experience, vol. 41, no. 5, pp. 579-606, 2011.##
[11]   X.-Y. Liu, Y. Liang, S. Wang, Z.-Y. Yang, and H.-S. Ye, “A hybrid genetic algorithm with wrapper-embedded approaches for feature selection,” IEEE Access, vol. 6, pp. 22863-22874, 2018.##
[12]   Z. Zhu, Y.-S. Ong, and M. Dash, “Wrapper–filter feature selection algorithm using a memetic framework,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 37, no. 1, pp. 70-76, 2007.##
[13]   E. Elakkiya and S. Selvakumar, “GAMEFEST: Genetic Algorithmic Multi Evaluation measure based FEature Selection Technique for social network spam detection,” Multimedia Tools and Applications, pp. 1-33, 2019.##
[14]   M. Ghosh, S. Begum, R. Sarkar, D. Chakraborty, and U. Maulik, “Recursive memetic algorithm for gene selection in microarray data,” Expert Systems with Applications, vol. 116, pp. 172-185, 2019.##
[15]   S. C. Yusta, “Different metaheuristic strategies to solve the feature selection problem,” Pattern Recognition Letters, vol. 30, no. 5, pp. 525-534, 2009.##
[16]   A. Tharwat, “Classification assessment methods,” Applied Computing and Informatics, 2020.##
[17]   S. Chatterjee and B. Maji, “A Mahalanobis distance based algorithm for assigning rank to the predicted fault prone software modules,” Applied Soft Computing, vol. 70, pp. 764-772, 2018.##
[18]   L. A. Zadeh, “Fuzzy sets,” Information and control, vol. 8, no. 3, pp. 338-353, 1965.##
[19]   E. Erturk and E. A. Sezer, “Iterative software fault prediction with a hybrid approach,” Applied Soft Computing, vol. 49, pp. 1020-1033, 2016.##
[20]   Y. L. S. Rani, V. Sucharita, D. Bhattacharyya, and H.-J. Kim, “Performance evaluation of feature selection methods on large dimensional databases,” International Journal of Database Theory and Application, vol. 9, no. 9, pp. 75-82, 2016.##
[21]   H. Faris et al., “An efficient binary salp swarm algorithm with crossover scheme for feature selection problems,” Knowledge-Based Systems, vol. 154, pp. 43-67, 2018.##
[22]   I. Tumar, Y. Hassouneh, H. Turabieh, and T. Thaher, “Enhanced binary moth flame optimization as a feature selection algorithm to predict software fault prediction,” IEEE Access, vol. 8, pp. 8041-8055, 2020.##
[23]   R. Shatnawi, “The application of ROC analysis in threshold identification, data imbalance and metrics selection for software fault prediction,” Innovations in Systems and Software Engineering, vol. 13, no. 2-3, pp. 201-217, 2017.##
[24]   M. M. Kabir, M. Shahjahan, and K. Murase, “A new local search based hybrid genetic algorithm for feature selection,” Neurocomputing, vol. 74, no. 17, pp. 2914-2928, 2011.##
[25]   J. Lee, I. Yu, J. Park, and D.-W. Kim, “Memetic feature selection for multilabel text categorization using label frequency difference,” Information Sciences, vol. 485, pp. 263-280, 2019.##
[26]   B. Chakraborty, “Feature subset selection by particle swarm optimization with fuzzy fitness function,” in 2008 3rd international conference on intelligent system and knowledge engineering, vol. 1: IEEE, pp. 1038-1042, 2008.##
[27]   M. Bardamova, A. Konev, I. Hodashinsky, and A. Shelupanov, “A fuzzy classifier with feature selection based on the gravitational search algorithm,” Symmetry, vol. 10, no. 11, p. 609, 2018.##
[28]   B. Chakraborty and A. Kawamura, “A new penalty-based wrapper fitness function for feature subset selection with evolutionary algorithms,” Journal of Information and Telecommunication, vol. 2, no. 2, pp. 163-180, 2018.##
[29]   A. Kalsoom, M. Maqsood, M. A. Ghazanfar, F. Aadil, and S. Rho, “A dimensionality reduction-based efficient software fault prediction using Fisher linear discriminant analysis (FLDA),” The Journal of Supercomputing, vol. 74, no. 9, pp. 4568-4602, 2018.##
[30]   I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene selection for cancer classification using support vector machines,” Machine learning, vol. 46, no. 1-3, pp. 389-422, 2002.##
[31]   B. Xue, M. Zhang, and W. N. Browne, “Particle swarm optimization for feature selection in classification: A multi-objective approach,” IEEE transactions on cybernetics, vol. 43, no. 6, pp. 1656-1671, 2012.##
[32]   L.-Y. Chuang, H.-W. Chang, C.-J. Tu, and C.-H. Yang, “Improved binary PSO for feature selection using gene expression data,” Computational Biology and Chemistry, vol. 32, no. 1, pp. 29-38, 2008.##
[33]   P. Patchaiammal  and R. Thirumalaiselvi, “Genetic Evolutionary Learning Fitness Function (GELFF) for Feature Diagnosis to Software Fault Prediction,” International Journal of Innovative Technology and Exploring Engineering (IJITEE), vol. 8, no. 11S, 2019.##
[34]   P. Bermejo, J. A. Gámez, and J. M. Puerta, “Speeding up incremental wrapper feature subset selection with Naive Bayes classifier,” Knowledge-Based Systems, vol. 55, pp. 140-147, 2014.##
[35]   C. Pascoal, M. R. Oliveira, A. Pacheco, and R. Valadas, “Theoretical evaluation of feature selection methods based on mutual information,” Neurocomputing, vol. 226, pp. 168-181, 2017.##
[36]   M. Khonji, A. Jones, and Y. Iraqi, “An empirical evaluation for feature selection methods in phishing email classification,” International Journal of Computer Systems Science & Engineering, vol. 28, no. 1, pp. 37-51, 2013.##
[37]   C. A. Kumar, M. Sooraj, and S. Ramakrishnan, “A comparative performance evaluation of supervised feature selection algorithms on microarray datasets,” Procedia computer science, vol. 115, pp. 209-217, 2017.##
[38]   I. H. Laradji, M. Alshayeb, and L. Ghouti, “Software defect prediction using ensemble learning on selected features,” Information and Software Technology, vol. 58, pp. 388-402, 2015.##
[39]   G. Kou, P. Yang, Y. Peng, F. Xiao, Y. Chen, and F. E. Alsaadi, “Evaluation of feature selection methods for text classification with small datasets using multiple criteria decision-making methods,” Applied Soft Computing, vol. 86, p. 105836, 2020.##
[40]   J. Huang and C. X. Ling, “Using AUC and accuracy in evaluating learning algorithms,” IEEE Transactions on knowledge and Data Engineering, vol. 17, no. 3, pp. 299-310, 2005.##
[41]   S. Chatterjee and B. Maji, “A new fuzzy rule based algorithm for estimating software faults in early phase of development,” Soft Computing, vol. 20, no. 10, pp. 4023-4035, 2016.##
[42]   J. N. Mandrekar, “Receiver operating characteristic curve in diagnostic test assessment,” Journal of Thoracic Oncology, vol. 5, no. 9, pp. 1315-1316, 2010.##
[43]   P. Cingolani and J. Alcalá-Fdez, “jFuzzyLogic: a java library to design fuzzy logic controllers according to the standard for fuzzy control programming,” International Journal of Computational Intelligence Systems, vol. 6, no. sup1, pp. 61-75, 2013.##
[44]   M. Shepperd, Q. Song, Z. Sun, and C. Mair, “Data quality: Some comments on the nasa software defect datasets,” IEEE Transactions on Software Engineering, vol. 39, no. 9, pp. 1208-1215, 2013.##
Volume 9, Issue 3 - Serial Number 35
Serial No. 35, Autumn Quarterly
December 2021
Pages 143-163
  • Receive Date: 02 February 2021
  • Revise Date: 14 April 2021
  • Accept Date: 29 May 2021
  • Publish Date: 22 November 2021