CVAE-ADS: A Deep Learning Framework for Traffic Accident Detection and Video Summarization

Keywords: Anomaly Detection, Video Summarizaton, Convolutional Variational Autoencoder, Latent Space Clustering

Abstract

Since it is a manual process of monitoring to identify accidents, it is becoming more and more difficult and results in human error, because of the rapid increase in road traffic and surveillance video. This underscores the urgent need for robust, automated systems capable of identifying accidents, as well as the burden of summarizing long videos. In order to address this issue, we propose CVAE-ADS, which is an unsupervised Approach that not only detects anomalies but also summarizes keyframes of a video to monitor traffic. This method operates in two phases. The stage of detecting Abnormalities intraffic video is performed using a Convolutional Variational Autoencoder, which operates on normal frames and identifies anomalies based on reconstruction errors. The second stage is the clustering of the perceived anomalous frames in the latent space, followed by the selection of representative keyframes to form a summary video. We tested the method with two benchmark datasets, namely, the IITH Accident Dataset and a subset of UCF-Crime. The findings have shown that the proposed approach had great accuracy of accident detection and AUC of 90.61 and 87.95 on IITH and UCF-Crime respectively and low rebuilding error and Equal Error Rates. To summarize, the method achieves substantial frame reduction and produces high visual quality with a wide variety of keyframes. It is able to measure up to 85 reduction rates with coverage of 92.5 on the IITH dataset and 80 reduction rates with coverage of 90 on an Accident subset of the UCF-Crime Dataset. CVAE-ADS offers a lightweight version of constant traffic monitoring, which utilizes limited organizational capital to categorize coincidences in real-time and recapitulate video footage of the accidents

Downloads

Download data is not yet available.

References

WHO, Road traffic injuries, Retrived 13 December 2023, https://www.who.int/news-room/fact-sheets/detail/road-traffic-injuries

Ministry of Road Transport and Highways. (2022). Road accidents in India 2022, https://morth.nic.in/sites/default/files/RA_2022_30_Oct.pdf

Ministry of Road Transport and Highways. (2024, July 24). Deaths due to road accidents in India. Press Information Bureau. https://www.pib.gov.in/PressReleasePage.aspx?PRID=2036268

Tiezzi, M., Melacci, S., Maggini, M., Frosini, A. (2018). Video Surveillance of Highway Traffic Events by Deep Learning Architectures. In: Kůrková, V., Manolopoulos, Y., Hammer, B., Iliadis, L., Maglogiannis, I. (eds) Artificial Neural Networks and Machine Learning – ICANN 2018. ICANN 2018. Lecture Notes in Computer Science(), vol 11141. Springer, Cham. https://doi.org/10.1007/978-3-030-01424-7_57

Chauhan, A., Vegad, S. (2022). Smart Surveillance Based on Video Summarization: A Comprehensive Review, Issues, and Challenges. In: Suma, V., Fernando, X., Du, KL., Wang, H. (eds) Evolutionary Computing and Mobile Sustainable Networks. Lecture Notes on Data Engineering and Communications Technologies, vol 116. Springer, Singapore. https://doi.org/10.1007/978-981-16-9605-3_29

P. Kadam et al., "Recent Challenges and Opportunities in Video Summarization With Machine Learning Algorithms," in IEEE Access, vol. 10, pp. 122762-122785, 2022, doi: 10.1109/ACCESS.2022.3223379.ss

S. S. Thomas, S. Gupta, and V. K. Subramanian, “Event detection on roads using perceptual video summarization,” IEEE Trans. Intell. Transp. Syst., vol. 19, no. 9, pp. 2944–2954, Dec. 2017.

Adewopo, Victor & Elsayed, Nelly & Elsayed, Zag & Ozer, M. & Abdelgawad, Ahmed & Bayoumi, Magdy. (2022). Review on Action Recognition for Accident Detection in Smart City Transportation Systems.

Saini, P., Kumar, K., Kashid, S. et al. Video summarization using deep learning techniques: a detailed analysis and investigation. Artif Intell Rev 56, 12347–12385 (2023). https://doi.org/10.1007/s10462-023-10444-0

Pawar, K., Attar, V. Deep learning approaches for video-based anomalous activity detection. World Wide Web 22, 571–601 (2019). https://doi.org/10.1007/s11280-018-0582-1

Gandhi, V., Chaudhari, Y., Kumar, A. et al. Benchmarking Machine Learning Models for Obesity Classification with SHAP-Based Interpretability. Int J Comput Intell Syst (2025). https://doi.org/10.1007/s44196-025-01078-x

D. Singh and C. K. Mohan, "Deep Spatio-Temporal Representation for Detection of Road Accidents Using Stacked Autoencoder," in IEEE Transactions on Intelligent Transportation Systems, vol. 20, no. 3, pp. 879-887, March 2019, doi: 10.1109/TITS.2018.2835308.

Srinivasan, A. Srikanth, H. Indrajit and V. Narasimhan, "A Novel Approach for Road Accident Detection using DETR Algorithm," 2020 International Conference on Intelligent Data Science Technologies and Applications (IDSTA), 2020, pp. 75-80, doi:10.1109/IDSTA50958.2020.9263703.

Wang, Chen & Yulu, Dai & Zhou, Wei & Geng, Yifei, “A Vision- Based Video Crash Detection Framework for Mixed Traffic Flow Environment Considering Low-Visibility Condition,” Journal of Advanced Transportation, 2020. https://doi.org/10.1155/2020/9194028

Robles-Serrano, Sergio, German Sanchez-Torres, and John Branch-Bedoya. 2021. "Automatic Detection of Traffic Accidents from Video Using Deep Learning Techniques" Computers 10, no. 11: 148. https://doi.org/10.3390/computers10110148

Khan, Sardar Waqar, Qasim Hafeez, Muhammad Irfan Khalid, Roobaea Alroobaea, Saddam Hussain, Jawaid Iqbal, Jasem Almotiri, and Syed Sajid Ullah. 2022. "Anomaly Detection in Traffic Surveillance Videos Using Deep Learning" Sensors 22, no. 17: 6563. https://doi.org/10.3390/s22176563

Karishma Pawar, Vahida Attar, Deep learning based detection and localization of road accidents from traffic surveillance videos, ICT Express, 2021, ISSN 2405-9595, https://doi.org/10.1016/j.icte.2021.11.004.

R. Pathak and A. C. Elster, "Applying Transfer Learning to Traffic Surveillance Videos for Accident Detection," 2022 International Conference on Applied Artificial Intelligence (ICAPAI), Halden, Norway, 2022, pp. 1-7, doi: 10.1109/ICAPAI55158.2022.9801568.

K. V. Thakare, D. P. Dogra, H. Choi, H. Kim and I. -J. Kim, "Object Interaction-Based Localization and Description of Road Accident Events Using Deep Learning," in IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 11, pp. 20601-20613, Nov. 2022, doi: 10.1109/TITS.2022.3170648

V. A. Adewopo and N. Elsayed, "Smart City Transportation: Deep Learning Ensemble Approach for Traffic Accident Detection," in IEEE Access, vol. 12, pp. 59134-59147, 2024, doi: 10.1109/ACCESS.2024.3387972.

Kosambia, Twinkle and Gheewala, Jaydeep, Video Synopsis for Accident Detection using Deep Learning Technique (May 22, 2021). Proceedings of the International Conference on Smart Data Intelligence (ICSMDI 2021), Available at SSRN: https://ssrn.com/abstract=3851250 or http://dx.doi.org/10.2139/ssrn.3851250

A. Pramanik, S. K. Pal, J. Maiti and P. Mitra, "Traffic Anomaly Detection and Video Summarization Using Spatio-Temporal Rough Fuzzy Granulation With Z-Numbers," in IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 12, pp. 24116-24125, Dec. 2022, doi: 10.1109/TITS.2022.3198595.

Tahir M, Qiao Y, Kanwal N, Lee B, Asghar MN. Privacy Preserved Video Summarization of Road Traffic Events for IoT Smart Cities. Cryptography. 2023; 7(1):7. https://doi.org/10.3390/cryptography7010007

Saxena, N., Asghar, M.N. (2023). YOLOv5 for Road Events Based Video Summarization. In: Arai, K. (eds) Intelligent Computing. SAI 2023. Lecture Notes in Networks and Systems, vol 739. Springer, Cham. https://doi.org/10.1007/978-3-031-37963-5_69

Zhang, K., Chao, WL., Sha, F., Grauman, K. (2016). Video Summarization with Long Short-Term Memory. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds) Computer Vision – ECCV 2016. ECCV 2016. Lecture Notes in Computer Science(), vol 9911. Springer, Cham. https://doi.org/10.1007/978-3-319-46478-7_47

Mayya, V., Nayak, A.: Traffic surveillance video summarization for detecting traffic rules violators using R-CNN. In: Advances in Computer Communication and Computational Sciences— Proceedings of IC4S 2017, pp. 117–126. Springer Verlag (2019). https://doi.org/10.1007/978- 981-13-0341-8_11

Fajtl, J., Sokeh, H.S., Argyriou, V., Monekosso, D., Remagnino, P. (2019). Summarizing Videos with Attention. In: Carneiro, G., You, S. (eds) Computer Vision – ACCV 2018 Workshops. ACCV 2018. Lecture Notes in Computer Science(), vol 11367. Springer, Cham. https://doi.org/10.1007/978-3-030-21074-8_4

Jingxu Lin, Sheng-hua Zhong, Ahmed Fares,Deep hierarchical LSTM networks with attention for video summarization, Comsputers & Electrical Engineering, Volume 97, 2022, 107618, ISSN 0045-7906, https://doi.org/10.1016/j.compeleceng.2021.107618.

Payal Kadam, Deepali Vora, Shruti Patil, Sashikala Mishra, Vaishali Khairnar, Behavioral profiling for adaptive video summarization: From generalization to personalization, MethodsX, Volume 13, 2024, 102780, ISSN 2215-0161, https://doi.org/10.1016/j.mex.2024.102780.

Zhang, Yujia, Xiaodan Liang, Dingwen Zhang, Min Tan, and Eric P. Xing. 2020. “Unsupervised Object-Level Video Summarization with Online Motion Auto-Encoder.” Pattern Recognition Letters 130: 376–85. https://doi.org/https://doi.org/10.1016/j.patrec.2018.07.030.

Ji Zhong, Xiong K, Pang Y, Li X. Video summarization with attention-based encoder–decoder networks. IEEE Transactions on Circuits and Systems for Video Technology. 2019; 30(6):1709-17.

Sheng-Hua Zhong, Jingxu Lin, Jianglin Lu, Ahmed Fares, and Tongwei Ren. 2022. Deep Semantic and Attentive Network for Unsupervised Video Summarization. ACM Trans. Multimedia Comput. Commun. Appl. 18, 2, Article 55 (May 2022), 21 pages. https://doi.org/10.1145/3477538

Sreeja, M.U., Kovoor, B.C. A multi-stage deep adversarial network for video summarization with knowledge distillation. J Ambient Intell Human Comput 14, 9823–9838 (2023). https://doi.org/10.1007/s12652-021-03641-8

Y. Yuan and J. Zhang, "Unsupervised Video Summarization via Deep Reinforcement Learning With Shot-Level Semantics," in IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 1, pp. 445-456, Jan. 2023, doi: 10.1109/TCSVT.2022.3197819.

Panda, Rameswar, Abir Das, Ziyan Wu, Jan Ernst and Amit K. Roy-Chowdhury. “Weakly Supervised Summarization of Web Videos.” 2017 IEEE International Conference on Computer Vision (ICCV) (2017): 3677-3686.

S. Cai, W. Zuo, L. S. Davis, and L. Zhang, “Weakly-supervised video summarization using variational encoder–decoder and web prior,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 184–200.

Ramos W, Silva M, Araujo E, Moura V, Oliveira K, Marcolino LS, Nascimento ER (2022) Text-driven video acceleration: a weakly-supervised reinforcement learning method. IEEE Trans Pattern Anal Mach Intell 45(2):2492–2504

Kaiyang Zhou, Yu Qiao, and Tao Xiang. 2018. Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence (AAAI'18/IAAI'18/EAAI'18). AAAI Press, Article 929, 7582–7589

Xu J, Sun Z, Ma C (2021) Crowd aware summarization of surveillance videos by deep reinforcement learning. Multimed Tools Appl 80:6121–6141. https://doi.org/10.1007/s11042-020-09888-1

Liu T, Meng Q, Huang J-J, Vlontzos A, Rueckert D, Kainz B (2022) Video summarization through reinforcement learning with a 3d spatio-temporal u-net. IEEE Trans Image Process 31:1573–1586

Guolong Wang, Xun Wu, Junchi Yan, Progressive reinforcement learning for video summarization, Information Sciences, Volume 655, 2024, 119888, ISSN 0020-0255, https://doi.org/10.1016/j.ins.2023.119888.

An, Jinwon and Sungzoon Cho. “Variational Autoencoder based Anomaly Detection using Reconstruction Probability.” (2015).

Nguyen, H.H., Nguyen, C.N., Dao, X.T., Duong, Q.T., Kim, D.P., & Pham, M. (2024). Variational Autoencoder for Anomaly Detection: A Comparative Study. ArXiv, abs/2408.13561.

Iqbal, T., & Qureshi, S. (2022). Reconstruction probability-based anomaly detection using variational auto-encoders. International Journal of Computers and Applications, 45(3), 231–237. https://doi.org/10.1080/1206212X.2022.2143026

N. Aslam and M. H. Kolekar, "A-VAE: Attention based Variational Autoencoder for Traffic Video Anomaly Detection," 2023 IEEE 8th International Conference for Convergence in Technology (I2CT), Lonavla, India, 2023, pp. 1-7, doi: 10.1109/I2CT57861.2023.10126296.

Wenhao Yu, Qinghong Huang, A deep encoder-decoder network for anomaly detection in driving trajectory behavior under spatio-temporal context, International Journal of Applied Earth Observation and Geoinformation, Volume 115, 2022, 103115, ISSN 1569-8432, https://doi.org/10.1016/j.jag.2022.103115.

Zhang, C., Wang, X., Zhang, J. et al. VESC: a new variational autoencoder based model for anomaly detection. Int. J. Mach. Learn. & Cyber. 14, 683–696 (2023). https://doi.org/10.1007/s13042-022-01657-ws

W. Sultani, C. Chen, and M. Shah, “Real-world anomaly detection in surveillance videos,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 6479–6488

Quang Nguyen Huy Minh, Nen Nguyen Dinh, Long Viet Ho, Cuong Phan Huu, Real-time traffic accident detection using yolov8, Transportation Research Procedia, Volume 85, 2025, Pages 68-75, ISSN 2352-1465, https://doi.org/10.1016/j.trpro.2025.03.135.

Md Shamsul Arefin, Md Ibrahim Shikder Mahin, Farzana Akter Mily, Real-time rapid accident detection for optimizing road safety in Bangladesh, Heliyon, Volume 11, Issue 4, 2025, e42432, ISSN 2405-8440, https://doi.org/10.1016/j.heliyon.2025.e42432.

Published
2026-01-01
How to Cite
[1]
A. Chauhan and S. Vegad, “CVAE-ADS: A Deep Learning Framework for Traffic Accident Detection and Video Summarization ”, j.electron.electromedical.eng.med.inform, vol. 8, no. 1, pp. 185-205, Jan. 2026.
Section
Electronics