Visual Question Answering from Multi Models

Kodeti Veerabrahmani; A.Durga Devi

Authors

Kodeti Veerabrahmani PG scholar, Department of MCA, CDNR collage, Bhimavaram, Andhra Pradesh. Author
A.Durga Devi (Assistant Professor), Master of Computer Applications, DNR collage, Bhimavaram, Andhra Pradesh. Author

Abstract

This work proposes an open-ended, free-form task
called Visual Question Answering (VQA). The aim for
a natural language query regarding the image and a
supplied image is to offer an appropriate response in
natural language. Answers and questions remain
open-ended to reflect real-world situations such as
helping the blind. In the various area of an image
Visual questions selectively target such as underlying
context and background details. Because of this, a
system that excels at visual quality assurance (VQA)
usually requires an in-depth knowledge of the image
and more advanced reasoning than a system that
generates generic image descriptions. Furthermore,
since many open-ended responses are limited to a few
words or a restricted set of responses that may be given
in a multiple-choice style, VQA is accessible to
computer evaluation. In this project you ask to use
ROBERTA model to extract features from questions
and answers and then apply BEIT model to extract
features from the images. Both features should be
fusion in multi-modal to answer for given question and
images. To train multi modal you ask to use VQA 2.0
dataset.

Downloads

Download data is not yet available.

Visual Question Answering from Multi Models

Authors

Abstract

Downloads

Downloads

Published

Issue

Section

How to Cite

Most read articles by the same author(s)

Submission

Submission

Menu

visitors

Latest publications

Reach US

Ethics and Policies

Important Links

Downloads & Indexing