Deep Learning Advancements And Challeng In
DOI:
https://doi.org/10.63665/j20d6q38Keywords:
Video Summarization, Deep Learning, BLIP Transformer, Vision-Language Models, YouTube Video Analysis, Semantic Video Retrieval, Automated Video Indexing, Social Media Analytics, Content-Based Video Retrieval, Artificial Intelligence, Multimedia Processing, Video Captioning, Semantic Understanding, Large-Scale Video Data, Computer VisionAbstract
With the exponential rise of video content on social media platforms, particularly YouTube, which handles over 500 hours of uploads every minute, efficient video indexing, retrieval, and summarization have become critical challenges. Traditional methods rely heavily on user-provided metadata such astitles, tags, and descriptions, which are often inaccurate or unrelated to the actual content. To overcome these limitations, recent advances in vision-language models, such as BLIP (Bootstrapping Language-Image Pretraining) transformers, have enabled more accurate and automated video understanding by jointly learning from visual and textual modalities.
This paper presents a systematic review of deep learning-based video summarization approaches, with a particular emphasis on BLIP-based models and their potential to bridge the gap between raw video content and semantic interpretation. Out of more than 300 research studies, 44 were shortlisted using strict inclusion criteria, and their methodologies, applications, and datasets are critically analyzed. The review highlights how BLIP transformers enhance summarization performance by generating context-aware captions, enabling semantic indexing, and improving retrieval efficiency. The insights provided in this study offer valuable guidance for researchers and practitioners aiming to leverage deep learning and vision-language models for managing largescale video data in social networking platforms.
Downloads
References
[1] Faryal Shamsi, Muhammad Daudpota Sher, and Sarang Shaikh. Content based automatic video genre identification. International Journal of Advanced Computer Science and Applications, 10(6), 2019.
[2] Irum Sindhu and Faryal Shamsi. Prediction of IMDB movie score & movie success by using Facebook. In 2023 International Multi-disciplinary Conference in Emerging Research Trends (IMCERT), volume 1, pages 1–5. IEEE, 2023.
[3] Irum Sindhu and Faryal Shamsi. Adverse use of social media by higher secondary school students: A case study on meta social network platforms. International Journal of Academic Research for Humanities, 3(4):205–216, 2023.
[4] Ghulam Mujtaba, Liyana Shuib, Norisma Idris, Wai Lam Hoo, Ram Gopal Raj, Kamran Khowaja, Khairunisa Shaikh, and Henry Friday Nweke. Clinical text classification research trends: systematic literature review and open issues. Expert Systems with Applications, 116:494–520, 2019.
[5] Faryal Shamsi and Irum Sindhu. Improving DBLP efficiency through social media mining. Journal of Information & Communication Technology (JICT), 15(1), 2021.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Authors

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
