Predicting Pedestrian Intentions in Self-Driving Cars: Leveraging Non-Visual Features and Semantic Mapping

Document Type : Research Paper

Authors

Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan8415683111, Iran.

Abstract

Predicting pedestrians' intentions to cross paths with cars, particularly at intersections and crosswalks, is critical for autonomous systems. While recent studies have showcased the effectiveness of deep learning models based on computer vision in this domain, current models often lack the requisite confidence for integration into autonomous systems, leaving several unresolved issues. One of the fundamental challenges in autonomous systems is accurately predicting whether pedestrians intend to cross the path of a self-driving car. Our proposed model addresses this challenge by employing convolutional neural networks to predict pedestrian crossing intentions based on non-visual input data, including body pose, car velocity, and pedestrian bounding box, across sequential video frames. By logically arranging non-visual features in a 2D matrix format and utilizing an RGB semantic map to aid in comprehending and distinguishing fused features, our model achieves improved accuracy in pedestrian crossing intention prediction compared to previous approaches. Evaluation against the criteria of the JAAD database for pedestrian crossing intention prediction demonstrates significant enhancements over prior studies.

Keywords

Main Subjects


  • Rasouli, A., Kotseruba, I., & Tsotsos, J. K. (2017). Are they going to cross? a benchmark dataset and baseline for pedestrian crosswalk behavior. In Proceedings of the IEEE International Conference on Computer Vision Workshops(pp. 206-213).
  • Kotseruba, I., Rasouli, A., & Tsotsos, J. K. (2020, October). Do they want to cross? Understanding pedestrian intention for behavior prediction. In 2020 IEEE Intelligent Vehicles Symposium (IV)(pp. 1688-1693). IEEE
  • Lorenzo, J., Parra, I., Wirth, F., Stiller, C., Llorca, D. F., & Sotelo, M. A. (2020, October). Rnn-based pedestrian crossing prediction using activity and pose-related features. In 2020 IEEE Intelligent Vehicles Symposium (IV)(pp. 1801-1806). IEEE.
  • Rasouli, A., Kotseruba, I., Kunic, T., & Tsotsos, J. K. (2019). Pie: A large-scale dataset and models for pedestrian intention estimation and trajectory prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision(pp. 6262-6271).
  • Quan, R., Zhu, L., Wu, Y., & Yang, Y. (2021). Holistic LSTM for pedestrian trajectory prediction. IEEE transactions on image processing, 30, 3229-3239.
  • Fushishita, N., Tejero-de-Pablos, A., Mukuta, Y., & Harada, T. (2020). Long-term human video generation of multiple futures using poses. In Computer Vision–ECCV 2020 Workshops: Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16(pp. 596-612). Springer International Publishing.
  • Rasouli, A., & Tsotsos, J. K. (2019). Autonomous vehicles that interact with pedestrians: A survey of theory and practice. IEEE transactions on intelligent transportation systems21(3), 900-918.
  • Rasouli, A., Kotseruba, I., & Tsotsos, J. K. (2017, June). Agreeing to cross: How drivers and pedestrians communicate. In 2017 IEEE Intelligent Vehicles Symposium (IV)(pp. 264-269). IEEE.
  • Fang, J., Wang, F., Xue, J., & Chua, T. S. (2024). Behavioral intention prediction in driving scenes: A survey. IEEE Transactions on Intelligent Transportation Systems.
  • Cao, Z., Simon, T., Wei, S. E., & Sheikh, Y. (2017). Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE conference on computer vision and pattern recognition(pp. 7291-7299)
  • Chen, S., & Demachi, K. (2020). A vision-based approach for ensuring proper use of personal protective equipment (PPE) in decommissioning of Fukushima Daiichi nuclear power station. Applied Sciences10(15), 5129.
  • O'Shea, K., & Nash, R. (2015). An introduction to convolutional neural networks. arXiv preprint arXiv:1511.08458.
  • Kotseruba, I., Rasouli, A., & Tsotsos, J. K. (2021). Benchmark for evaluating pedestrian action prediction. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision(pp. 1258-1268).
  • Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
  • Yang, D., Zhang, H., Yurtsever, E., Redmill, K. A., & Özgüner, Ü. (2022). Predicting pedestrian crossing intention with feature fusion and spatio-temporal attention. IEEE Transactions on Intelligent Vehicles7(2), 221-230
  • Rasouli, A., Kotseruba, I., & Tsotsos, J. K. (2020). Pedestrian action anticipation using contextual feature fusion in stacked rnns. arXiv preprint arXiv:2005.06582
  • Ham, J. S., Kim, D. H., Jung, N., & Moon, J. (2023). Cipf: Crossing intention prediction network based on feature fusion modules for improving pedestrian safety. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 3666-3675).

 

  • Azarmi, M., Rezaei, M., Wang, H., & Glaser, S. (2024). PIP-Net: Pedestrian Intention Prediction in the Wild. arXiv preprint arXiv:2402.12810.
  • Zhang, Z., Tian, R., & Ding, Z. (2023, June). Trep: Transformer-based evidential prediction for pedestrian intention with uncertainty. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 37, No. 3, pp. 3534-3542).
  • Zhou, Y., Tan, G., Zhong, R., Li, Y., & Gou, C. (2023). Pit: Progressive interaction transformer for pedestrian crossing intention prediction. IEEE Transactions on Intelligent Transportation Systems.
  • Damirchi, H., Greenspan, M., & Etemad, A. (2023, October). Context-aware pedestrian trajectory prediction with multimodal transformer. In 2023 IEEE International Conference on Image Processing (ICIP) (pp. 2535-2539). IEEE.
  • Li, Y., Zhang, C., Zhou, J., & Zhou, S. (2024). POI-GAN: A Pedestrian Trajectory Prediction Method for Service Scenarios. IEEE Access.
  • Lv, Z., Huang, X., & Cao, W. (2022). An improved GAN with transformers for pedestrian trajectory prediction models. International Journal of Intelligent Systems, 37(8), 4417-4436.