Improving code smell detection accuracy based on gray wolf algorithm and majority voting

Document Type : Original Article

Authors

1 Assistant Professor, Imam Hossein University (AS), Tehran, Iran

2 Master's degree, Imam Hossein University, Tehran, Iran

Abstract

A code smell is a superficial symptom that may indicate a deeper problem in the application. Code smell makes it difficult to maintain, develop and evolve the program. The presence of code smell definitely does not mean that the software is not working properly, but this issue may cause slow processing, increase the risk of failure and software errors. It is obvious that one of the effective ways to increase the quality of the software is to rebuild and rearrange the code, which has a direct relationship with the smell of the code. So far, a lot of research has been done in the field of identifying and removing code smells of software systems. However, among them, four types of code smell include; Long method, feature envy, god class and data class have attracted the most attention of researchers. Researchers use feature selection algorithms to increase the prediction accuracy of code smells and reduce data dimensions. In this article, the gray wolf algorithm has been used to determine the selected subset of optimal features. Selecting the feature makes the model simpler, improves the accuracy and also reduces the training time. Also, in order to identify and classify code smells, the recognition model based on three machine learning algorithms under the title; Nearest neighbor, decision tree and support vector machine are built. Finally, the final result of the model output is determined based on the majority voting mechanism. In this article, the improved version of the Fontana dataset has been used to evaluate the proposed method. Also, to evaluate the results of the proposed method, statistical criteria including precision, accuracy, recall and F criterion have been used. Finally, the results of the proposed method have been compared with the results of other related methods. The results obtained from the tests show that the proposed method has provided an acceptable performance compared to other methods.

Keywords

Main Subjects


Smiley face

 
[1] M. Tufano, F. Palomba, G. Bavota, R. Oliveto, M. Di Penta, A. De Lucia, D. Poshyvanyk, “When and Why Your Code Starts to Smell Bad,”  IEEE/ACM 37th IEEE International Conference on Software Engineering, vol. 1,  pp. 403–414, 2017.
[2] M. S. Haque, J. Carver, T. Atkison, “Causes, impacts, and detection approaches of code smell: A survey,” ACMSE '18: Proceedings of the ACMSE Conference, pp. 1-8, 2018.  
[3] M. Fowler, “Refactoring: Improving the Design of Existing Code,” Addison-Wesley Professional 2 Ed, pp. 40–48, 2018.
[4] F. Palomba, G. Bavota, M. Di Penta, F. Fasano, R. Oliveto, and A. De Lucia, “On the diffuseness and the impact on maintainability of code smells: A large scale empirical investigation,” in Proceedings of the 40th International Conference on Software Engineering (ICSE), pp. 1188–1221, ACM, 2018.
[5] A. Tahir, J. Dietrich, S. Counsell, S. Licorish, and A. Yamashita, “A large scale study on how developers discuss code smells and anti-pattern in stack exchange sites,” Information and Software Technology, vol. 125, pp. 30–36, 2020.
[6] X. Han, A. Tahir, P. Liang, S. Counsell, Y. Luo, “Understanding Code Smell Detection via Code Review: A Study of the OpenStack Community,” In: IEEE/ACM 29th International Conference on Program Comprehension (ICPC) [Internet]. IEEE, pp.323–34. 2021.
[7] M. Fowler, “Refactoring: Improving the Design of Existing Code,” Addison-Wesley Professional, pp. 25–33 2018.
[8] G. Langelier, H. Sahraoui, P. Poulin, “Visualization-based analysis of quality for large-scale software systems,”  in: Proceedings of the 20th IEEE/ACM International Conference on Automated Software Engineering, pp. 214–223, 2014.
[9] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I.H. Witten, “The weka data mining software: an update,” ACM SIGKDD Explor. Newsl, vol. 11 (1), pp. 10–18, 2015.
[10] R. Marinescu, “Detection strategies: metrics-based rules for detecting design flaws,” in: 20th IEEE International Conference Proceedings on Software Maintenance, pp. 350–359. 2004.
[11] G. Ganea, I. Verebi, R. Marinescu, “Continuous quality assessment with incode,” Sci. Comput. Program, vol 134, pp. 19–36, 2017.
[12] M. Mantyla, “Bad smells in software-a taxonomy and an empirical study,” Helsinki University of Technology, pp. 303–314, 2003.
[13] B. Venkatesh, J. Anuradha, “A Review of Feature Selection and Its Methods,” Cybern Inf Technol [Internet], vol 19(1), pp.3–26, 2019.
[14] S. Umadevi, KSJ. Marseline, “A survey on data mining classification algorithms,” Proc IEEE Int Conf Signal Process Commun ICSPC, pp. 264–8, 2018.
[15] S. Kanj, F. Abdallah, “Editing training data for multi-label classification with the k-nearest neighbor rule,” Pattern Analysis and Applications, Vol 19(1), pp. 145-161, 2016.
[16] S. Archana, and K. Elangovan, “Survey of classification techniques in data mining,” International Journal of Computer Science and Mobile Applications, Vol 2(2): pp. 65-71. 2014.
[17] S. Roy, S. Mondal, A. Ekbal, MS. Desarkar, “Dispersion Ratio based Decision Tree Model for Classification,” Expert Syst Appl [Internet], vol 116, pp. 1–9, 2019.
[18] S .Huang, CAI. Nianguang, P. Penzuti, S. Narandes, Y. Wang, XU. Wayne, “Applications of support vector machine (SVM) learning in cancer genomics,” Cancer Genomics and Proteomics, vol 15(1), pp. 41–5, 2019.
[19] EO. Kiyak, D. Birant, KU. Birant, “Comparison of Multi-Label Classification Algorithms for Code Smell Detection,” In: 2019 3rd International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT) [Internet], IEEE, pp.1–6, 2019. 
[20] F. Pecorelli, D. Di Nucci, C. De Roover, A. De Lucia, “On the role of data balancing for machine learning-based code smell detection,” MaLTeSQuE 2019 - Proceedings of the 3rd ACM SIGSOFT International Workshop on Machine Learning Techniques for Software Quality Evaluation, co-located with ESEC/FSE. pp. 19–24, 2019.
[21] F. Pecorelli, F. Palomba, D. Di Nucci, A. De Lucia, “Comparing Heuristic and Machine Learning Approaches for Metric-Based Code Smell Detection,” In: IEEE/ACM 27th International Conference on Program Comprehension (ICPC) [Internet], pp. 93–104, 2019.
[22] R. Ibrahim, M. Ahmed, R. Nayak, S. Jamel, “Reducing redundancy of test cases generation using code smell detection and refactoring.” J King Saud Univ - Comput Inf Sci [Internet], vol 32(3) , pp. 367–74, 2020.
[23] T. Guggulothu, SA. Moiz, “Code smell detection using multi-label classification approach,” Software Quality Journal, Vol. 28,  pp. 63–86 , 2020.
[24] S. Jain, A. Saha, “Improving performance with hybrid feature selection and ensemble machine learning techniques for code smell detection.” Sci Comput Program [Internet]. Dec, vol 212, pp. 1–34, 2021. 
[25] Muhammad Ilyas Azeem, Fabio Palomba, Lin Shi, Qing Wang, “Machine Learning Techniques for Code Smell Detection:
A Systematic Literature Review and Meta-Analysis” Information & Software Technology. January 7, 2019.