Research about Visual Question Answering Published in ArXiv

4 minute read

Visual Question Answering (VQA) is a recent topic in computer vision and natural language processing that has attracted a great deal of attention from deep learning, computer vision and natural language processing communities. (Kafle and Kanan, 2017). I have tried to collect and curate some publications form Arxiv that related to the visual question answering, and the results were listed here. Please enjoy it!

Last updated: August 14, 2020
Source : ArXiv

No.	Year	Title	URL
1	2020	Visual Question Answering Using Semantic Information from Image Descriptions	View
2	2020	Understanding Knowledge Gaps in Visual Question Answering: Implications for Gap Identification and Testing	View
3	2020	Generating Rationales in Visual Question Answering	View
4	2020	PathVQA: 30000+ Questions for Medical Visual Question Answering	View
5	2020	RUBi: Reducing Unimodal Biases in Visual Question Answering	View
6	2020	VQA-LOL: Visual Question Answering under the Lens of Logic	View
7	2020	Component Analysis for Visual Question Answering Architectures	View
8	2020	Augmenting Visual Question Answering with Semantic Frame Information in a Multitask Learning Approach	View
9	2020	Robust Explanations for Visual Question Answering	View
10	2020	Generating Question Relevant Captions to Aid Visual Question Answering	View
11	2019	Assessing the Robustness of Visual Question Answering	View
12	2019	Self-Critical Reasoning for Robust Visual Question Answering	View
13	2019	Learning Sparse Mixture of Experts for Visual Question Answering	View
14	2019	Inverse Visual Question Answering with Multi-Level Attentions	View
15	2019	Decoupled Box Proposal and Featurization with Ultrafine-Grained Semantic Labels Improve Image Captioning and Visual Question Answering	View
16	2019	VideoNavQA: Bridging the Gap between Visual and Embodied Question Answering	View
17	2019	Fusion of Detected Objects in Text for Visual Question Answering	View
18	2019	An Empirical Study on Leveraging Scene Graphs for Visual Question Answering	View
19	2019	A Comparative Evaluation of Visual and Natural Language Question Answering Over Linked Data	View
20	2019	Quantifying and Alleviating the Language Prior Problem in Visual Question Answering	View
21	2019	GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering	View
22	2019	Generating Natural Language Explanations for Visual Question Answering using Scene Graphs and Visual Attention	View
23	2018	Textually Enriched Neural Module Networks for Visual Question Answering	View
24	2018	Faithful Multimodal Explanation for Visual Question Answering	View
25	2018	Question-Guided Hybrid Convolution for Visual Question Answering	View
26	2018	Interpretable Visual Question Answering by Visual Grounding from Attention Supervision Mining	View
27	2018	Learning Visual Question Answering by Bootstrapping Hard Attention	View
28	2018	Question Relevance in Visual Question Answering	View
29	2018	Learning Visual Knowledge Memory Networks for Visual Question Answering	View
30	2018	Think Visually: Question Answering through Virtual Imagery	View
31	2018	R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering	View
32	2018	Reciprocal Attention Fusion for Visual Question Answering	View
33	2018	Explicit Reasoning over End-to-End Neural Architectures for Visual Question Answering	View
34	2018	Attention on Attention: Architectures for Visual Question Answering (VQA)	View
35	2018	Co-attending Free-form Regions and Detections with Multi-modal Multiplicative Feature Embedding for Visual Question Answering	View
36	2018	Learning to Count Objects in Natural Images for Visual Question Answering	View
37	2018	Dual Recurrent Attention Units for Visual Question Answering	View
38	2017	Interpretable Counting for Visual Question Answering	View
39	2017	Don’t Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering	View
40	2017	Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge	View
41	2017	MemexQA: Visual Memex Question Answering	View
42	2017	Visual Question Answering with Memory-Augmented Networks	View
43	2017	Learning Convolutional Text Representations for Visual Question Answering	View
44	2017	Survey of Visual Question Answering: Datasets and Techniques	View
45	2017	Speech-Based Visual Question Answering	View
46	2017	The Promise of Premise: Harnessing Question Premises in Visual Question Answering	View
47	2017	C-VQA: A Compositional Split of the Visual Question Answering (VQA) v1	View
48	2017	Being Negative but Constructively: Lessons Learnt from Creating Better Visual Question Answering Datasets	View
49	2017	An Analysis of Visual Question Answering Algorithms	View
50	2017	Recurrent and Contextual Models for Visual Question Answering	View
51	2017	VQABQ: Visual Question Answering by Basic Questions	View
52	2017	Task-driven Visual Saliency and Attention-based Visual Question Answering	View
53	2016	VIBIKNet: Visual Bidirectional Kernelized Network for Visual Question Answering	View
54	2016	Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering	View
55	2016	Zero-Shot Visual Question Answering	View
56	2016	Hierarchical Question-Image Co-Attention for Visual Question Answering	View
57	2016	Proposing Plausible Answers for Open-ended Visual Question Answering	View
58	2016	Visual Question Answering: Datasets, Algorithms, and Future Challenges	View
59	2016	The Color of the Cat is Gray: 1 Million Full-Sentences Visual Question Answering (FSVQA)	View
60	2016	Graph-Structured Representations for Visual Question Answering	View
61	2016	Measuring Machine Intelligence Through Visual Question Answering	View
62	2016	Interpreting Visual Question Answering Models	View
63	2016	Analyzing the Behavior of Visual Question Answering Models	View
64	2016	Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?	View
65	2016	Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding	View
66	2016	Hierarchical Co-Attention for Visual Question Answering	View
67	2016	Ask Your Neurons: A Deep Learning Approach to Visual Question Answering	View
68	2016	A Focused Dynamic Attention Model for Visual Question Answering	View
69	2016	Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering	View
70	2016	VQA: Visual Question Answering	View
71	2016	Dynamic Memory Networks for Visual and Textual Question Answering	View

Share on

Twitter Facebook LinkedIn

Factors Affecting Students Acceptance of Learning Simulation Tools in Computing Education Courses from Social, Technology, and Personal Trait Perspectives

2 minute read

Published: September 23, 2025

This study presents a theoretical model to explore the factors influencing students’ acceptance of simulation tools in computing education. These factors include social influences, technology-related aspects, and personal characteristics. The term simulation tools refers to systems that can replicate complex processes and situations, providing students with realistic, hands-on experiences without the risks or costs associated with physical setups. To validate the proposed model, 312 responses from university students were collected. A cross-sectional online survey was conducted, and the participants were selected through purposive sampling. The findings indicated that subjective norms have the most significant direct effect on students perceptions of usefulness, influencing their views on learning outcomes from using simulation tools in computing education courses. Additionally, social support and self-efficacy were also found to have significant effects. However, the impacts of fidelity and innovativeness were not supported. This study sets itself apart from previous research by using a comprehensive approach to explore the factors influencing student acceptance of simulation tools in computing education. Specifically, this research develops a theory based on the Technology Acceptance Model (TAM) and expands it by incorporating environmental factors and personal characteristics of students.

Extraction and attribution of public figures statements for journalism in Indonesia using deep learning

2 minute read

Published: February 27, 2024

News articles are usually written by journalists based on statements taken from interviews with public figures. Attribution from such statements provides important information and it can be extracted from news articles to build a knowledge base by developing a sequential tagging scheme such as entity recognition. This research applies two deep learning architectures: recurrent neural networks-based and transformer-based, to establish public figures statement attribution and extraction models in the Indonesian Language. The experiments are conducted using five deep-learning model architectures with two different corpus sizes to investigate the impact of corpus size on each model’s performance. The experiments show that the best model for the RNN-based architecture is PFSA-ID-BLWCA which achieves 81.34 % F1 score, and the best model for the transformer-based is PFSA-ID-TWCA which obtains 81.01 % F1 score. This research also discovers that the size of the corpus influences the model performances. Furthermore, the study lays a foundation to overcome the attribution extraction in another language, especially low-resource languages, with some necessary adjustments.

PFSA-ID: An Annotated Indonesian Corpus and Baseline Model of Public Figures Statements Attributions

2 minute read

Published: November 07, 2022

By far, the corpus for the quotation extraction and quotation attribution tasks in Indonesian is still limited in quantity and depth. This study aims to develop an Indonesian corpus of public figure statements attributions and a baseline model for attribution extraction, so it will contribute to fostering research in information extraction for the Indonesian language.

Understanding quotation extraction and attribution: towards automatic extraction of public figure’s statements for journalism in Indonesia

3 minute read

Published: December 09, 2020

Extracting information from unstructured data becomes a challenging task for computational linguistics. Public figure’s statement attributed by journalists in a story is one type of information that can be processed into structured data. Therefore, having the knowledge base about this data will be very beneficial for further use, such as for opinion mining, claim detection and fact-checking. This study aims to understand statement extraction tasks and the models that have already been applied to formulate a framework for further study.

Sigit Purnomo

Share on

You may also enjoy

Factors Affecting Students Acceptance of Learning Simulation Tools in Computing Education Courses from Social, Technology, and Personal Trait Perspectives

Extraction and attribution of public figures statements for journalism in Indonesia using deep learning

PFSA-ID: An Annotated Indonesian Corpus and Baseline Model of Public Figures Statements Attributions

Understanding quotation extraction and attribution: towards automatic extraction of public figure’s statements for journalism in Indonesia