قطعه‌بندی معنایی تصاویر خودروهای خودران با بهره‌گیری از تکنیک معلم-دانش‌آموز

نوع مقاله : مقاله پژوهشی

نویسندگان

1 کارشناسی ارشد،دانشگاه علم و صنعت، تهران، ایران

2 استادیار، دانشگاه علم و صنعت، تهران، ایران

چکیده

قطعه‌بندی معنایی یکی از رایج‌ترین خروجی‌های پردازش تصویری برای خودروهای خودران مجهز به بینایی است. مدل‌های مبتنی بر یادگیری عمیق جهت یاد گرفتن ویژگی‌های محیطی جدید و با دامنه متفاوت نیازمند در اختیار داشتن انبوهی از داده هستند. اما فرآیند برچسب‌گذاری دستی این حجم از داده توسط انسان بسیار زمان‌بر خواهد بود. در حالی که رویکرد بسیاری از مقالات مبتنی بر آموزش مدل‌های یادگیری عمیق با روش نظارتی است، در این مقاله از روش نیمه نظارتی جهت اعمال قطعه‌بندی معنایی بهره گرفته می‌شود. به‌طور دقیق‌تر در این پژوهش، روش معلم- دانش‌آموز جهت برقراری تعامل میان مدل‌های یادگیری عمیق به‌ کار گرفته می‌شود. در ابتدا مدل‌های DABNet و ContextNet در جایگاه معلم با استفاده از پایگاه داده BDD100K آموزش داده می‌شوند. با توجه به اهمیت قابلیت تعمیم پذیری و مقاوم بودن مدل‌های مورد استفاده در خودروهای خودران، این معیارهای شبکه‌های معلم با شبیه‌سازی در نرم‌افزار CARLA مورد ارزیابی قرار گرفته‌اند. سپس شبکه‌های معلم، پایگاه داده Cityscapes را به‌طور کامل و بدون دخالت انسان در فرآیند آموزش با بهره‌گیری از یادگیری نیمه- نظارتی به مدل FastSCNN آموزش داده‌اند. برخلاف سایر رویکردهای نیمه- نظارتی، وجود دو پایگاه داده با اختلاف دامنه قابل توجه، روش معلم- دانش‌آموز را بیشتر به چالش خواهد کشید. نتایج نشان می‌دهد عملکرد مدل دانش‌آموز در کلاس‌هایی نظیر خودرو، انسان و جاده که شناسایی آن‌ها از مهم‌ترین اولویت‌های خودرو خودران است به‌ترتیب به میزان 2/1%، 3% و 8/3% با برچسب‌گذاری دستی اختلاف دارد. همچنین میانگین دقت مدل دانش‌آموز نیز تنها 5/4% اختلاف عملکرد با مدلی دارد که آماده‌سازی پایگاه داده آن نیازمند صرف زمان بسیار زیاد است.

کلیدواژه‌ها


عنوان مقاله [English]

The Semantic Segmentation of Autonomous Vehicles Images with the Teacher-Student Technique

نویسندگان [English]

  • Amir Khosravian 1
  • Masoud Masih-Tehrani 2
  • Abdollah Amirkhani 2
1 M.Sc., University of Science and Technology, Tehran, Iran
2 Assistant Professor, University of Science and Technology, Tehran, Iran
چکیده [English]

Semantic segmentation is one of the most common image processing outputs for vision-based autonomous vehicles. Deep neural networks require large-scale data in order to learn new environment features with diverse domains. While the approach of a great deal of papers is based on supervised learning, in this paper, semantic segmentation has been implemented by taking advantage of the semi-supervised learning method. To be more specific, in this study the teacher-student technique is utilized to establish a connection for the interaction between the deep learning models. First, the DABNet and ContextNet models are trained as our teacher networks with the BDD100K database. Regarding the significance of generalization and robustness of models in autonomous vehicles, these criteria of the teacher models have been evaluated by simulations in CARLA software. Finally, teacher networks train the FastSCNN model automatically using the Cityscapes database without any human interference. In contrast with other semi-supervised approaches, the existence of two different databases with noticeable amount of domain-shift effect would challenge the student-teacher technique even more. The results indicate that student’s performance in classes such as vehicles, pedestrians, and road, which are the highest priority classes to detect, has only 1.2%, 3%, and 3.8% accuracy difference, respectively. Also, there is a 4.5% drop for the model’s mean intersection over union accuracy between the teacher’s performance and a similar model trained with an entirely supervised method. Also, the mean accuracy for the student model has only 4.5% difference in performance with a model whose data base needs a long time for preparation. 

کلیدواژه‌ها [English]

  • Autonomous vehicles
  • Convolutional neural networks
  • Semantic segmentation
  • Teacher-student technique
[1]      S. Singh, “Critical reasons for crashes investigated in the national motor vehicle crash causation survey,” Traffic Saf. Facts - Crash Stats, 2015.
[2]       F. Becker and K. W. Axhausen, “Literature review on surveys investigating the acceptance of automated vehicles,” in TRB 96th Annual Meeting Compendium of Papers, pp. 1–12 , 2017
[3]          C. Gkartzonikas and K. Gkritza, “What have we learned? A review of stated preference and choice studies on autonomous vehicles,” Transp. Res. Part C, vol. 98, pp. 323–337, 2019.
[4]          J. Cui, L. S. Liew, et al. “A review on safety failures, security attacks, and available counter measures for autonomous vehicles,” Ad. Hoc. Networks, vol. 90, p. 101823, 2019.
[5]          J. Van Brummelen, M. O’Brien, et al. “Autonomous vehicle perception: The technology of today and tomorrow,” Transp. Res. Emerg. Technol., Part C, vol. 89, pp. 384–406, 2018.
[6]          J. Janai, F. Guney, et al. “Computer vision for autonomous vehicles: Problems, datasets and state of the art,” Foundations and Trends in Computer Graphics and Vision, vol. 12, no. 1-3, pp. 1-308, 2020.
[7]          D. Feng, et al. “Deep multi-modal object detection and semantic segmentation for autonomous driving: datasets, methods, and challenges,” IEEE Transactions on Intelligent Transportation Systems, 2020, DOI: 10.1109/TITS.2020.2972974.
[8]          K. Kim, J. S. Kims S. Jeong, et al. “Cybersecurity for autonomous vehicles: Review of attacks and defense,” Computers & Security, vol. 103, p. 102150, 2021. 
[9]          Z. El Rewini, K. Sadatsharan, D. F. Selvaraj, et al.,  “Cybersecurity challenges in vehicular communications,” Vehicular Communications, vol. 23, p. 100214, 2020.
[10]        C. Kamann and C. Rother, “Benchmarking the robustness of semantic segmentation models,” in arXiv preprint arXiv:1908.05005, 2019. 
[11]        H.  Wu,  Y.  Yan,  Y.  Ye,  M.  K.  Ng,  and  Q.  Wu,  “Geometric  knowledge embedding for unsupervised domain adaptation,” Knowledge-Based Systems, vol. 191, p. 105155, 2020.
[12]        Y. Chen, W. Li, C. Sakaridis, D. Dai, and L. Van Gool, “Domain adaptive faster R-CNN for object detection in the wild,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 3339-3348.
[13]        E. Romera, L. M. Bergasa, K. Yang, J. M. Alvarez, and R. Barea, "Bridging the day and night domain gap for semantic segmentation," in IEEE Intelligent Vehicles Symposium (IV), 2019, pp. 1312-1318.
[14]        M. Cordts et al., “The cityscapes dataset for semantic urban scene understanding,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 3213-3223.
[15]        F. Yu et al., “BDD100K: a diverse driving dataset for heterogeneous multitask learning,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 2636-2645.
[16]        A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the KITTI vision benchmark suite,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2012, pp. 3354–3361.
[17]        D. Heo, J. Nam, and B. Ko, “Estimation of pedestrian pose orientation using soft target training based on teacher–student framework,” Sensors, vol. 19, no. 5, p. 1147, 2019.
[18]        L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for semantic image segmentation,” arXiv  preprint arXiv:1706.05587, 2017.
[19]        L. C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 801-818.
[20]        K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition.” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770-778. 
[21]        J. Xie, B. Shuai, J.-F. Hu, J. Lin, and W. S. Zheng, “Improving fast segmentation with teacher-student learning,” in British Machine Vision Conference (BMVC), 2018, pp. 205.
[22]        D. Heo, J. Nam, and B. Ko, "Estimation of pedestrian pose orientation using soft target training based on teacher–student framework," Sensors, vol. 19, no. 5, pp. 1147, 2019.
[23]        E. Yurtsever, J. Lambert, A. Carballo, and K. Takeda, "A survey of autonomous driving: common practices and emerging technologies," IEEE Access, vol. 8, pp. 58443-58469, 2020.
[24]        Y. Zhu et al., “Improving semantic segmentation via self-training,” arXiv  preprint arXiv: 2004.14960, 2020.
[25]        G. J. Brostow, J. Shotton, J. Fauqueur, and R. Cipolla, “Segmentation and recognition using structure from motion point clouds,” in Proceedings of European Conference on Computer Vision (ECCV), 2008, pp. 44-57. 
[26]        G. Brostow, J. Fauqueur, and R. Cipolla, “Semantic object classesin video: A high-definition ground truth database,” Pattern Recognition, vol. 30, no. 2, pp. 88–97, 2009.
[27]        L.-C. Chen et al., “Naive-student: leveraging semi-supervised learning in video sequences for urban scene segmentation,” in Proceedings of European Conference on Computer Vision (ECCV), 2020, pp. 695-714.  
[28]        Q. Xie, M.-T. Luong, E. Hovy, and Q. V Le, “Self-training with noisy student improves imageNet classification,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 10687-10698. 
[29]        G. Ros, L. Sellart, J. Materzynska, D. Vazquez, and A. M. Lopez, “The SYNTHIA dataset: a large collection of synthetic images for semantic segmentation of urban scenes,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 3234-3243.
[30]        Y. H. Tsai, K. Sohn, S. Schulter, and M. Chandraker, “Domain adaptation for structured output via discriminative patch representations,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 1456-1465.
[31]        Y. H. Tsai, W. C. Hung, S. Schulter, K. Sohn, M. H. Yang, and M. Chandraker, “Learning to adapt structured output space for semantic segmentation,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 7472-7481.
[32]        M. Chen, H. Xue, and D. Cai, “Domain adaptation for semantic segmentation with maximum squares loss,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 2090-2099. 
[33]        T. H. Vu, H. Jain, M. Bucher, M. Cord, and P. Perez, “Advent: adversarial entropy minimization for domain adaptation in semantic segmentation,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 2517-2526.
[34]        F. Pan, I. Shin, F. Rameau, S. Lee, and I. S. Kweon, “Unsupervised intra-domain adaptation for semantic segmentation through self-supervision,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 3764-3773.
[35]        U. Michieli, M. Biasetton, G. Agresti, and P. Zanuttigh, “Adversarial learning and self-teaching techniques for domain adaptation in semantic segmentation,” IEEE Transactions on Intelligent Vehicles, , vol. 5, no. 3, pp. 508-518, 2020. 
[36]        S. R. Richter, V. Vineet, S. Roth, and V. Koltun, “Playing for data: ground truth from computer games,” in Proceedings of European Conference on Computer Vision (ECCV), 2016, pp. 102-118. 
[37]        G. Neuhold, T. Ollmann, S. R. Bulo, and P. Kontschieder, “The mapillary vistas dataset for semantic understanding of street scenes,” Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2017, pp. 4990-4999. 
[38]        S. M. Khorashadizadeh, V. Azadzadeh, and, A. M. Latif, “Detection of pornographic digital images using support vector machine and neural network,” Journal of Electronical & Cyber Defence, vol. 4, no. 4, pp. 79-88, 2017. 
(in Persian)
[39]        M. Asadi, M. A. Jabraeil Jamali, et al., “Comparison of supervised machine learning algorithms in detection of botnets domain generation algorithms,” Journal of Electronical & Cyber Defence, vol. 8, no. 4, pp. 17-29, 2020. 
(in Persian)
[40]        G. Li, I. Yun, J. Kim, and J. Kim, “DABNet: depth-wise asymmetric bottleneck for real-time semantic segmentation,” in British Machine Vision Conference (BMVC), 2019.
[41]        R. P. K. Poudel, U. Bonde, S. Liwicki, and C. Zach, “ContextNet: exploring context and detail for semantic segmentation in real-time,” in British Machine Vision Conference (BMVC), 2018.
[42]        R. P. K. Poudel, S. Liwicki, and R. Cipolla, “Fast-SCNN: fast semantic segmentation network,” in British Machine Vision Conference (BMVC), 2019. 
[43]        D. Mazzini, “guided upsampling network for real-time semantic segmentation,” in British Machine Vision Conference (BMVC), 2018, p. 117. 
[44]        C. Yu, “BiSeNet: bilateral segmentation network for real-time semantic segmentation,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 325-341.
[45]        J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 3431-3440.
[46]        O. Ronneberger, P. Fischer, and T. Brox, “U-net: convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2015, pp. 234–241.
[47]        M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L. C. Chen, “Inverted residuals and linear bottlenecks: mobile networks for classification, detection and segmentation mark,” arXiv preprint arXiv:1801.04381, 2018.
[48]        H. Zhao, X. Qi, X. Shen, J. Shi, and J. Jia, “ICNet for real-time semantic segmentation on high-resolution images,” in Proceedings of European Conference on Computer Vision (ECCV), 2018, pp. 405-420. 
[49]        N. S. Keskar, D. Mudigere, J. Nocedal, M. Smelyanskiy, and P. T. P. Tang, “On large-batch training for deep learning: generalization gap and sharp minima,” Int. Conf. Learn. Represent, pp. 1–16. 2016
[50]        D. P. Kingma and J. L. Ba, “Adam: A method for stochastic optimization,” in 3rd International Conference on Learning Representations (ICLR), 2015.
[51]        A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, “CARLA: an open urban driving simulator,” in Conference on Robot Learning (CoRL), 2017.
[52]        V. Badrinarayanan, A. Kendall and R. Cipolla, “SegNet: A deep convolutional encoder-decoder architecture for image segmentation,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 12, pp. 2481-2495, 2017.
[53]        A. Paszke, A. Chaurasia, S. Kim, and E. Culurciello, “Enet: A deep neural network architecture for real-time semantic segmentation,” arXiv preprint arXiv:1606.02147, 2016.
[54]        M. Liu and H. Yin, “Feature pyramid encoding network for real-time semantic segmentation,” arXiv preprint arXiv:1909.08599, 2019.
[55]        S. Mehta, M. Rastegari, A. Caspi, L. Shapiro, and H. Hajishirzi, “ESPNet: efficient spatial pyramid of dilated convolutions for semantic segmentation,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 552-568.
[56]        S. Mehta, M. Rastegari, L. Shapiro, and H. Hajishirzi, “ESPNetv2: a light-weight, power efficient, and general purpose convolutional neural network,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 9190-9200.
دوره 9، شماره 4 - شماره پیاپی 36
شماره پیاپی 36، فصلنامه زمستان
اسفند 1400
صفحه 1-19
  • تاریخ دریافت: 14 مهر 1399
  • تاریخ بازنگری: 18 تیر 1400
  • تاریخ پذیرش: 19 تیر 1400
  • تاریخ انتشار: 01 اسفند 1400