ارزیابی مدل‌های یادگیری عمیق برای تولید داده آزمون در فازرهای مبتنی بر فایل

تقوی, محمد تقی; باقری, مسعود

ارزیابی مدل‌های یادگیری عمیق برای تولید داده آزمون در فازرهای مبتنی بر فایل

نوع مقاله : مقاله پژوهشی

نویسندگان

¹ دانشجوی دکتری، دانشگاه جامع امام حسین (ع)، تهران، ایران

² استادیار، دانشگاه جامع امام حسین (ع)، تهران، ایران

چکیده

فازینگ به معنی اجرای مکرر برنامه تحت آزمون با ورودیهای تغییر یافته، با هدف یافتن آسیبپذیری است. در صورتی که ورودیهای برنامه تحت آزمون دارای ساختار پیچیدهای باشند، تولید ورودیهای تغییر یافته برای انجام فازینگ کار راحتی نیست. بهترین راه حل در این موارد، استفاده از ساختار ورودی برنامه تحت آزمون به‌منظور تولید دقیق داده آزمون است. مشکلی که وجود دارد این است که ممکن است مستندات ساختار ورودی برنامه تحت آزمون در دسترس نباشد. همچنین درک انسانی چنین ساختارهای پیچیدهای نیز بسیار مشکل، پرهزینه، زمانبر و مستعد خطای انسانی است. برای غلبه بر مشکلات فوق، استفاده از یادگیری ماشین و شبکههای عصبی عمیق به‌منظور یادگیری خودکار ساختارهای پیچیده ورودیهای برنامه و تولید داده آزمون متناسب با این ساختار پیشنهاد شده است. یکی از چالشهای اصلی در این زمینه، استفاده از مدل یادگیری متناسب با کاربرد مورد نظر است. در این مقاله، مدلهای یادگیری عمیق مناسب برای یادگیری و تولید داده آزمون در فازرهای مبتنی بر فایل مورد بررسی قرار گرفته است. همچنین با معرفی پارامترهای مناسب برای بررسی کارایی، ارزیابی مدلهای یادگیری عمیق انجام شده است. بر این اساس، شبکههای عصبی بازرخداد و مشتقات آن به‌عنوان بهترین مدلهای یادگیری عمیق برای دادههای متنی انتخاب شده است. همچنین پارامترهای مؤثر برای ارزیابی کارایی مدلهای یادگیری عمیق شامل زمان آموزش، میزان خطای مدلها در زمان آموزش و و زمان ارزیابی درنظر گرفته شده است. پارامتر میزان خطا به‌عنوان پارامتر اصلی، یک بار در مدلهای یادگیری عمیق مختلف با ساختار یکسان و یک بار در مدلهای یادگیری عمیق یکسان با ساختار متفاوت مورد ارزیابی قرار گرفته و بهترین مدل یادگیری عمیق انتخاب و معرفی شده است.

کلیدواژه‌ها

20.1001.1.23224347.1401.10.2.6.2

عنوان مقاله [English]

Evaluating Deep Learning Models for Test Data Generation In File Based Fuzzers

نویسندگان [English]

Mohammad Taghi Taghavi ¹
Masood Bagheri ²

¹ PhD student, Imam Hossein University (AS), Tehran, Iran

² Assistant Professor, Imam Hossein University (AS), Tehran, Iran

چکیده [English]

Fuzzing means repeatedly running the program being tested, by modified inputs, with the aim of finding its vulnerabilities. If the program has a complex input structure, generating modified inputs for fuzzing is not an easy task. The best solution in such cases is to use the input structure of the program under test to produce accurate test data. The problem is that the input structure documentation of program under test may not be available. Human understanding of such complex structures is also hard to achieve, costly, time consuming, and prone to errors. To overcome to above problems, this research proposes the use of machine learning and deep neural networks, which automatically learn the complex structures of program inputs and generate test data tailored to this structure. One of main challenges in this field is choosing the appropriate deep learning model which suits the intended application. In this paper, suitable deep learning models for learning and test data generation in file-based fuzzers are studied. Also, the evaluation is performed by introducing and applying the appropriate performance evaluation parameters. So the recurrent neural network and its derivations are introduced as the best deep learning models for text data. Also, effective parameters considered for performance evaluation include the training time, loss value in training and evaluation time. The loss value as the main parameter is evaluated once in various deep learning models with same structure and again in the same deep learning models with various structures and the best deep learning model is selected and proposed.

کلیدواژه‌ها [English]

Fuzzing
Deep Learning
Text Test Data Generation
Performance Evaluation

مراجع

[1] B. Miller and L. Fredriksen, "An Empirical Study of the Reliability of Unix Utilities," Communication of ACM, vol. 33, no. 12, pp. 32-44, 1990.
[2] B. Miller, D. Koski, C. Pheow, L. Maganty, R. Murthy, A. Natarjan, and J. Steidl, "Fuzz Revisited: A Re-examination of the Reliability of UNIX Utilities and Services," University of Wisconsin, Madison, 1995.
[3] P. Gogefroid, "From Blackbox Fuzzing to Whitebox Fuzzing Towards Verification," 2010. [Online]. Available: http://selab.fbk.eu/issta2010/download/slides/Godefroid-Keynote-ISSTA2010.pdf.
[4] P. Godefroid, M. Levin, and D. Molnar, "Automated Whitebox Fuzz Testing," In Proceedings of the Network and Distributed System Security Symposium, 2008.
[5] T. Iqbal and S. Qureshi, "The Survey: Text Generation Models in Deep Learning," Journal of King Saud University-Computer and Information Sciences (Production and hosting by Elsevier), vol. 34, no. 6, pp. 2515-2528,[M1] 2020.
[6] O. Bastani, A. Aiken, P. Liang, and R. Sharma, "Synthesizing Program Input Grammars," ACM SIGPLAN Notices - PLDI '17 , vol. 52, no. 6, pp. 95-110, 2017.
[7] P. Godefroid, H. Peleg, and R. Singh, "Learn&Fuzz: Machine Learning for Input Fuzzing," ASE (Automated Software Engineering), 2017.
[8] J. Wang, B. Chent, L. Wei, and Y. Liu, "Skyfire: Data-Driven Seed Generation for Fuzzing," 2017 IEEE Symposium on Security and Privacy (SP), 2017.
[9] C. Paduraru and M. C. Melemciuc, "An Automatic Test Data Generation Tool Using Machine Learning," Proceedings of the 13th International Conference on Software Technologies (ICSOFT 2018), pp. 472-481, 2018.
[10] D. She, K. Pei, D. Epstein, J. Yang, and B. Ray, "NEUZZ: Efficient Fuzzing with Neural Program Smoothing," IEEE Symposium on Security and Privacy, vol. 89, no. 49, pp. 38-53, 2019.
[11] Elvis, "Deep Learning for NLP: An Overview of Recent Trends," dair.ir, 24 08 2018. [Online]. Available: https://medium.com/dair-ai/deep-learning-for-nlp-an-overview-of-recent-trends-d0d8f40a776d. [Accessed 22 02 2021].
[12] P. Kawthekar, R. Rewari, and S. Bhooshan, "Evaluating Generative Models for Text Generation," arXiv, 2017.
[13] O. Cífka, A. Severyn, E. Alfonseca, and K. Filippova, "Eval All, Trust a Few, do Wrong to None: Comparing Sentence Generation Models," arXiv:1804.07972, 2017.
[14] Y. Zhu, S. Lu, L. Zheng, J. Guo, W. Zhang, J. Wang, and Y. Yu, "Texygen: A Benchmarking Platform for Text Generation Models," arXiv:1802.01886v1, 2018.
[15] E. Montahaei, D. Alihosseini, and M. Soleymani Baghshah, "Jointly Measuring Diversity and Quality in Text Generation Models," In Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation (NeuralGen), Minneapolis, Minnesota, USA, 2019.
[16] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, Massachusett: MIT Press, 2016.
[17] D. Shiffman, S. Fry, and Z. Marsh, The Nature of Code, Daniel Shiffman, 2012.
[18] A. Biswal, "Top 10 Deep Learning Algorithms You Should Know in 2021," Simplilearn, 16 2 2021. [Online]. Available: https://www.simplilearn.com/tutorials/deep-learning-tutorial/deep-learning-algorithm. [Accessed 3 3 2021].
[19] K. Cho, B. Van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, "Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation," arXiv:1406.1078, 2014.
[20] M. Hong, M. Wang, L. Luo, X. Tan, D. Zhang, and Y. Lao, "Combining Gated Recurrent Unit and Attention Pooling for Sentimental Classification," In Proceedings of the 2018 2Nd International Conference on Computer Science and Artificial Intelligence, ser. CSAI ’18, New York, NY, USA, 2018.
[21] S. Mangal, P. Joshi, and R. Modak, "LSTM vs. GRU vs. Bidirectional RNN for Script Generation," arXiv:1908.04332, 2019.
[22] L. Yu, W. Zhang, J. Wang, and Y. Yu, "SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient," Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31, no. 1, pp. 2852-2858, 2017.
[23] K. Lin, D. Li, X. He, M.-T. Sun, and Z. Zhang, "Adversarial Ranking for Language Generation," In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, Long Beach, CA, USA, 2017.
[24] J. Guo, S. Lu, H. Cai, W. Zhang, Y. Yu, and J. Wang, " Long Text Generation via Adversarial Training with Leaked Information," In Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, USA, 2018.
[25] T. Che, Y. Li, R. Zhang, R. Devon Hjelm, W. Li, Y. Song, and Y. Bengio, "Maximum-Likelihood Augmented Discrete Generative Adversarial Networks," arXiv:1702.07983, 2017.
[26] Y. Zhang, Z. Gan, K. Fan, Z. Chen, R. Henao, D. Shen, and L. Carin, "Adversarial Feature Matching for Text Generation," In Proceedings of the 34th International Conference on Machine Learning, Sydney, NSW, Australia, 2017.
[27] D. Ceara, M. Potet, G. Ensimag, and L. Mounier, "Detecting Software Vulnerabilities Static Taint Analysis Potet," University Politehnica Bucuresti and University Joseph Fourie, 2009.
[28] V. Manes, H. Han, C. Han, S. Cha, and M. Egele, "The Art, Science, and Engineering of Fuzzing: A Survey," ACM Computing Surveys, 2019.
[29] S. Bengio, O. Vinyals, N. Jaitly, and N. Shazeer, "Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks," arXiv:1506.03099, vol. 3, 2015.
[30] M. Caccia, L. Caccia, W. Fedus, H. Larochelle, J. Pineau, and L. Charlin, "Language GANs Falling Short," arXiv:1811.02549, 2020.
[31] O. Bastani, A. Aiken, P. Liang, and R. Sharma, "Synthesizing Program Input Grammars," ACM SIGPLAN Notices - PLDI '17 , vol. 52, no. 6, pp. 95-110, 2017.
[32] P. Godefroid, H. Peleg, and R. Singh, "Learn&Fuzz: Machine Learning for Input Fuzzing," ASE (Automated Software Engineering), 2017.
[33] J. Wang, B. Chent, L. Wei, and Y. Liu, "Skyfire: Data-Driven Seed Generation for Fuzzing," 2017 IEEE Symposium on Security and Privacy (SP), 2017.
[34] C. Paduraru and M. C. Melemciuc, "An Automatic Test Data Generation Tool using Machine Learning," Proceedings of the 13th International Conference on Software Technologies (ICSOFT 2018), pp. 472-481, 2018.
[35] R. Fan and Y. Chang, "Machine Learning for Black-Box Fuzzing of Network Protocols," Information and Communications Security (ICICS 2017), pp. 621-632, 2018.
[36] Z. Hu, J. Shi, Y. Huang, J. Xiong and X. Bu, "GANFuzz: A GAN-based Industrial Network Protocol Fuzzing Framework," In Proceedings of the 15th ACM International Conference on Computing Frontiers (CF18) pp. 138-145, Ischia, Italy, 2018.
[37] M. Zakeri, S. Parsa, and A. Kalaee, "Format-aware Learn&Fuzz: Deep Test Data Generation for Efficient Fuzzing," arXiv Prepr arXiv181209961, 2019.
[38] T. Taghavi and M. Bagheri, "Presenting an Intelligence Test Data Generation Method to Discover Software Vulnerabilities," Advanced Defence Science& Technology, vol. 4, no. 37, [M2] pp. 307-322, 2019 (In Persian).
[39] "Pickle — Python Object Serialization," [Online]. Available: https://docs.python.org/3/library/pickle.html. [Accessed 18 04 2021].
[40] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Dropout: A Simple Way to Prevent Neural Networks from Overfitting," Journal of Machine Learning Research, vol. 15, pp. 1929-1958, 2014.
[41] G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, "Densely Connected Convolutional Networks," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4700-4708, 2017.
[42] A. Karpathy, Connecting Images and Natural Language, Stanford University, 2016.
[43] D. P. Kingma and J. Ba, "Adam: A Method for Stochastic Optimization," arXiv:1412.6980, 2017.

دوره 10، شماره 2 - شماره پیاپی 38
شماره پیاپی 38، فصلنامه تابستان
مهر 1401
صفحه 57-73

فایل ها

سابقه مقاله

تاریخ دریافت: 21 اردیبهشت 1400
تاریخ بازنگری: 13 آذر 1400
تاریخ پذیرش: 18 مرداد 1401
تاریخ انتشار: 01 مهر 1401

تعداد مشاهده مقاله: 830
تعداد دریافت فایل اصل مقاله: 480

ارزیابی مدل‌های یادگیری عمیق برای تولید داده آزمون در فازرهای مبتنی بر فایل

Evaluating Deep Learning Models for Test Data Generation In File Based Fuzzers

مراجع

دوره 10، شماره 2 - شماره پیاپی 38
شماره پیاپی 38، فصلنامه تابستان
مهر 1401
صفحه 57-73

فایل ها

سابقه مقاله

هم رسانی

ارجاع به این مقاله

آمار

ارزیابی مدل‌های یادگیری عمیق برای تولید داده آزمون در فازرهای مبتنی بر فایل

Evaluating Deep Learning Models for Test Data Generation In File Based Fuzzers

مراجع

دوره 10، شماره 2 - شماره پیاپی 38شماره پیاپی 38، فصلنامه تابستانمهر 1401صفحه 57-73

فایل ها

سابقه مقاله

هم رسانی

ارجاع به این مقاله

آمار

دوره 10، شماره 2 - شماره پیاپی 38
شماره پیاپی 38، فصلنامه تابستان
مهر 1401
صفحه 57-73