A Hybrid Deep Learning For Image Captioning

Authors

  • Thotakura Naga Sai Venkatarama Raju PG scholar, Department of MCA, DNR College, Bhimavaram, Andhra Pradesh. Author
  • Ch.Jeevan Babu (Assistant Professor), Master of Computer Applications, DNR college, Bhimavaram, Andhra Pradesh. Author

Keywords:

Image captioning, Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM).

Abstract

The Image Caption Generator introduces an
innovative approach to automatically describing image
content by seamlessly integrating computer vision and
natural language processing (NLP). Leveraging recent
advancements in neural networks, NLP, and computer
vision, the model combines Convolutional Neural
Networks (CNNs), specifically the pre-trained Xception
model, for precise image feature extraction, with
Recurrent Neural Networks (RNNs), including Long
Short-Term Memory (LSTM) cells, for coherent
sentence generation. Enhanced by the incorporation of
a Beam Search algorithm and an Attention mechanism,
the model significantly improves the accuracy and
relevance of generated captions by dynamically
focusing on different parts of the image and exploring
multiple caption sequences. Trained on a dataset of
8,000 images from the Flickr8K dataset paired with
human-judged descriptions over multiple epochs, the
model achieves a significant reduction in loss.
Additionally, it incorporates a text-to-speech module
using the pyttsx3 library to audibly articulate the
generated text from the image captions, enhancing
accessibility for visually impaired individuals or users
who prefer audio output. Evaluation using
BLEUscoreand METEOR metrics confirms the model's
proficiency in producing coherent and contextually
accurate image captions, marking a significant
advancement in image captioning technology.

Downloads

Download data is not yet available.

Downloads

Published

2025-05-01

Issue

Section

Articles

How to Cite

A Hybrid Deep Learning For Image Captioning. (2025). International Journal of Multidisciplinary Engineering In Current Research, 10(5), 168-173. https://ijmec.com/index.php/multidisciplinary/article/view/636

Most read articles by the same author(s)