Visual Question Answering from Multi Models

Authors

  • Kodeti Veerabrahmani PG scholar, Department of MCA, CDNR collage, Bhimavaram, Andhra Pradesh. Author
  • A.Durga Devi (Assistant Professor), Master of Computer Applications, DNR collage, Bhimavaram, Andhra Pradesh. Author

Abstract

This work proposes an open-ended, free-form task
called Visual Question Answering (VQA). The aim for
a natural language query regarding the image and a
supplied image is to offer an appropriate response in
natural language. Answers and questions remain
open-ended to reflect real-world situations such as
helping the blind. In the various area of an image
Visual questions selectively target such as underlying
context and background details. Because of this, a
system that excels at visual quality assurance (VQA)
usually requires an in-depth knowledge of the image
and more advanced reasoning than a system that
generates generic image descriptions. Furthermore,
since many open-ended responses are limited to a few
words or a restricted set of responses that may be given
in a multiple-choice style, VQA is accessible to
computer evaluation. In this project you ask to use
ROBERTA model to extract features from questions
and answers and then apply BEIT model to extract
features from the images. Both features should be
fusion in multi-modal to answer for given question and
images. To train multi modal you ask to use VQA 2.0
dataset.

Downloads

Download data is not yet available.

Downloads

Published

2025-05-01

Issue

Section

Articles

How to Cite

Visual Question Answering from Multi Models. (2025). International Journal of Multidisciplinary Engineering In Current Research, 10(5), 283-288. https://ijmec.com/index.php/multidisciplinary/article/view/654