The Semantic Segmentation of Autonomous Vehicles Images with the Teacher-Student Technique

Document Type : Original Article

Authors

1 M.Sc., University of Science and Technology, Tehran, Iran

2 Assistant Professor, University of Science and Technology, Tehran, Iran

Abstract

Semantic segmentation is one of the most common image processing outputs for vision-based autonomous vehicles. Deep neural networks require large-scale data in order to learn new environment features with diverse domains. While the approach of a great deal of papers is based on supervised learning, in this paper, semantic segmentation has been implemented by taking advantage of the semi-supervised learning method. To be more specific, in this study the teacher-student technique is utilized to establish a connection for the interaction between the deep learning models. First, the DABNet and ContextNet models are trained as our teacher networks with the BDD100K database. Regarding the significance of generalization and robustness of models in autonomous vehicles, these criteria of the teacher models have been evaluated by simulations in CARLA software. Finally, teacher networks train the FastSCNN model automatically using the Cityscapes database without any human interference. In contrast with other semi-supervised approaches, the existence of two different databases with noticeable amount of domain-shift effect would challenge the student-teacher technique even more. The results indicate that student’s performance in classes such as vehicles, pedestrians, and road, which are the highest priority classes to detect, has only 1.2%, 3%, and 3.8% accuracy difference, respectively. Also, there is a 4.5% drop for the model’s mean intersection over union accuracy between the teacher’s performance and a similar model trained with an entirely supervised method. Also, the mean accuracy for the student model has only 4.5% difference in performance with a model whose data base needs a long time for preparation. 

Keywords


[1]      S. Singh, “Critical reasons for crashes investigated in the national motor vehicle crash causation survey,” Traffic Saf. Facts - Crash Stats, 2015.
[2]       F. Becker and K. W. Axhausen, “Literature review on surveys investigating the acceptance of automated vehicles,” in TRB 96th Annual Meeting Compendium of Papers, pp. 1–12 , 2017
[3]          C. Gkartzonikas and K. Gkritza, “What have we learned? A review of stated preference and choice studies on autonomous vehicles,” Transp. Res. Part C, vol. 98, pp. 323–337, 2019.
[4]          J. Cui, L. S. Liew, et al. “A review on safety failures, security attacks, and available counter measures for autonomous vehicles,” Ad. Hoc. Networks, vol. 90, p. 101823, 2019.
[5]          J. Van Brummelen, M. O’Brien, et al. “Autonomous vehicle perception: The technology of today and tomorrow,” Transp. Res. Emerg. Technol., Part C, vol. 89, pp. 384–406, 2018.
[6]          J. Janai, F. Guney, et al. “Computer vision for autonomous vehicles: Problems, datasets and state of the art,” Foundations and Trends in Computer Graphics and Vision, vol. 12, no. 1-3, pp. 1-308, 2020.
[7]          D. Feng, et al. “Deep multi-modal object detection and semantic segmentation for autonomous driving: datasets, methods, and challenges,” IEEE Transactions on Intelligent Transportation Systems, 2020, DOI: 10.1109/TITS.2020.2972974.
[8]          K. Kim, J. S. Kims S. Jeong, et al. “Cybersecurity for autonomous vehicles: Review of attacks and defense,” Computers & Security, vol. 103, p. 102150, 2021. 
[9]          Z. El Rewini, K. Sadatsharan, D. F. Selvaraj, et al.,  “Cybersecurity challenges in vehicular communications,” Vehicular Communications, vol. 23, p. 100214, 2020.
[10]        C. Kamann and C. Rother, “Benchmarking the robustness of semantic segmentation models,” in arXiv preprint arXiv:1908.05005, 2019. 
[11]        H.  Wu,  Y.  Yan,  Y.  Ye,  M.  K.  Ng,  and  Q.  Wu,  “Geometric  knowledge embedding for unsupervised domain adaptation,” Knowledge-Based Systems, vol. 191, p. 105155, 2020.
[12]        Y. Chen, W. Li, C. Sakaridis, D. Dai, and L. Van Gool, “Domain adaptive faster R-CNN for object detection in the wild,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 3339-3348.
[13]        E. Romera, L. M. Bergasa, K. Yang, J. M. Alvarez, and R. Barea, "Bridging the day and night domain gap for semantic segmentation," in IEEE Intelligent Vehicles Symposium (IV), 2019, pp. 1312-1318.
[14]        M. Cordts et al., “The cityscapes dataset for semantic urban scene understanding,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 3213-3223.
[15]        F. Yu et al., “BDD100K: a diverse driving dataset for heterogeneous multitask learning,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 2636-2645.
[16]        A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the KITTI vision benchmark suite,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2012, pp. 3354–3361.
[17]        D. Heo, J. Nam, and B. Ko, “Estimation of pedestrian pose orientation using soft target training based on teacher–student framework,” Sensors, vol. 19, no. 5, p. 1147, 2019.
[18]        L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for semantic image segmentation,” arXiv  preprint arXiv:1706.05587, 2017.
[19]        L. C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 801-818.
[20]        K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition.” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770-778. 
[21]        J. Xie, B. Shuai, J.-F. Hu, J. Lin, and W. S. Zheng, “Improving fast segmentation with teacher-student learning,” in British Machine Vision Conference (BMVC), 2018, pp. 205.
[22]        D. Heo, J. Nam, and B. Ko, "Estimation of pedestrian pose orientation using soft target training based on teacher–student framework," Sensors, vol. 19, no. 5, pp. 1147, 2019.
[23]        E. Yurtsever, J. Lambert, A. Carballo, and K. Takeda, "A survey of autonomous driving: common practices and emerging technologies," IEEE Access, vol. 8, pp. 58443-58469, 2020.
[24]        Y. Zhu et al., “Improving semantic segmentation via self-training,” arXiv  preprint arXiv: 2004.14960, 2020.
[25]        G. J. Brostow, J. Shotton, J. Fauqueur, and R. Cipolla, “Segmentation and recognition using structure from motion point clouds,” in Proceedings of European Conference on Computer Vision (ECCV), 2008, pp. 44-57. 
[26]        G. Brostow, J. Fauqueur, and R. Cipolla, “Semantic object classesin video: A high-definition ground truth database,” Pattern Recognition, vol. 30, no. 2, pp. 88–97, 2009.
[27]        L.-C. Chen et al., “Naive-student: leveraging semi-supervised learning in video sequences for urban scene segmentation,” in Proceedings of European Conference on Computer Vision (ECCV), 2020, pp. 695-714.  
[28]        Q. Xie, M.-T. Luong, E. Hovy, and Q. V Le, “Self-training with noisy student improves imageNet classification,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 10687-10698. 
[29]        G. Ros, L. Sellart, J. Materzynska, D. Vazquez, and A. M. Lopez, “The SYNTHIA dataset: a large collection of synthetic images for semantic segmentation of urban scenes,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 3234-3243.
[30]        Y. H. Tsai, K. Sohn, S. Schulter, and M. Chandraker, “Domain adaptation for structured output via discriminative patch representations,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 1456-1465.
[31]        Y. H. Tsai, W. C. Hung, S. Schulter, K. Sohn, M. H. Yang, and M. Chandraker, “Learning to adapt structured output space for semantic segmentation,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 7472-7481.
[32]        M. Chen, H. Xue, and D. Cai, “Domain adaptation for semantic segmentation with maximum squares loss,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 2090-2099. 
[33]        T. H. Vu, H. Jain, M. Bucher, M. Cord, and P. Perez, “Advent: adversarial entropy minimization for domain adaptation in semantic segmentation,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 2517-2526.
[34]        F. Pan, I. Shin, F. Rameau, S. Lee, and I. S. Kweon, “Unsupervised intra-domain adaptation for semantic segmentation through self-supervision,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 3764-3773.
[35]        U. Michieli, M. Biasetton, G. Agresti, and P. Zanuttigh, “Adversarial learning and self-teaching techniques for domain adaptation in semantic segmentation,” IEEE Transactions on Intelligent Vehicles, , vol. 5, no. 3, pp. 508-518, 2020. 
[36]        S. R. Richter, V. Vineet, S. Roth, and V. Koltun, “Playing for data: ground truth from computer games,” in Proceedings of European Conference on Computer Vision (ECCV), 2016, pp. 102-118. 
[37]        G. Neuhold, T. Ollmann, S. R. Bulo, and P. Kontschieder, “The mapillary vistas dataset for semantic understanding of street scenes,” Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2017, pp. 4990-4999. 
[38]        S. M. Khorashadizadeh, V. Azadzadeh, and, A. M. Latif, “Detection of pornographic digital images using support vector machine and neural network,” Journal of Electronical & Cyber Defence, vol. 4, no. 4, pp. 79-88, 2017. 
(in Persian)
[39]        M. Asadi, M. A. Jabraeil Jamali, et al., “Comparison of supervised machine learning algorithms in detection of botnets domain generation algorithms,” Journal of Electronical & Cyber Defence, vol. 8, no. 4, pp. 17-29, 2020. 
(in Persian)
[40]        G. Li, I. Yun, J. Kim, and J. Kim, “DABNet: depth-wise asymmetric bottleneck for real-time semantic segmentation,” in British Machine Vision Conference (BMVC), 2019.
[41]        R. P. K. Poudel, U. Bonde, S. Liwicki, and C. Zach, “ContextNet: exploring context and detail for semantic segmentation in real-time,” in British Machine Vision Conference (BMVC), 2018.
[42]        R. P. K. Poudel, S. Liwicki, and R. Cipolla, “Fast-SCNN: fast semantic segmentation network,” in British Machine Vision Conference (BMVC), 2019. 
[43]        D. Mazzini, “guided upsampling network for real-time semantic segmentation,” in British Machine Vision Conference (BMVC), 2018, p. 117. 
[44]        C. Yu, “BiSeNet: bilateral segmentation network for real-time semantic segmentation,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 325-341.
[45]        J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 3431-3440.
[46]        O. Ronneberger, P. Fischer, and T. Brox, “U-net: convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2015, pp. 234–241.
[47]        M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L. C. Chen, “Inverted residuals and linear bottlenecks: mobile networks for classification, detection and segmentation mark,” arXiv preprint arXiv:1801.04381, 2018.
[48]        H. Zhao, X. Qi, X. Shen, J. Shi, and J. Jia, “ICNet for real-time semantic segmentation on high-resolution images,” in Proceedings of European Conference on Computer Vision (ECCV), 2018, pp. 405-420. 
[49]        N. S. Keskar, D. Mudigere, J. Nocedal, M. Smelyanskiy, and P. T. P. Tang, “On large-batch training for deep learning: generalization gap and sharp minima,” Int. Conf. Learn. Represent, pp. 1–16. 2016
[50]        D. P. Kingma and J. L. Ba, “Adam: A method for stochastic optimization,” in 3rd International Conference on Learning Representations (ICLR), 2015.
[51]        A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, “CARLA: an open urban driving simulator,” in Conference on Robot Learning (CoRL), 2017.
[52]        V. Badrinarayanan, A. Kendall and R. Cipolla, “SegNet: A deep convolutional encoder-decoder architecture for image segmentation,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 12, pp. 2481-2495, 2017.
[53]        A. Paszke, A. Chaurasia, S. Kim, and E. Culurciello, “Enet: A deep neural network architecture for real-time semantic segmentation,” arXiv preprint arXiv:1606.02147, 2016.
[54]        M. Liu and H. Yin, “Feature pyramid encoding network for real-time semantic segmentation,” arXiv preprint arXiv:1909.08599, 2019.
[55]        S. Mehta, M. Rastegari, A. Caspi, L. Shapiro, and H. Hajishirzi, “ESPNet: efficient spatial pyramid of dilated convolutions for semantic segmentation,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 552-568.
[56]        S. Mehta, M. Rastegari, L. Shapiro, and H. Hajishirzi, “ESPNetv2: a light-weight, power efficient, and general purpose convolutional neural network,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 9190-9200.
Volume 9, Issue 4 - Serial Number 36
Serial No. 36, Winter Quarterly
February 2022
Pages 1-19
  • Receive Date: 05 October 2020
  • Revise Date: 09 July 2021
  • Accept Date: 10 July 2021
  • Publish Date: 20 February 2022