AIML at VQA-Med 2020: Knowledge Inference via a Skeleton-based Sentence Mapping Approach for Medical Domain Visual Question Answering

Zhibin Liao, Qi Wu, Chunhua Shen, Anton van den Hengel, Johan Verjans

Research output: Contribution to journalConference articlepeer-review

7 Citations (Scopus)

Abstract

In this paper, we describe our contribution to the 2020 ImageCLEF Medical Domain Visual Question Answering (VQA-Med) challenge. Our submissions scored first place on the VQA challenge leaderboard, and also the first place on the associated Visual Question Generation (VQG) challenge leaderboard. Our VQA approach was developed using a knowledge inference methodology called Skeleton-based Sentence Mapping (SSM). Using all the questions and answers, we derived a set of classifiable tasks and inferred the corresponding labels. As a result, we were able to transform the VQA task into a multi-task image classification problem which allowed us to focus on the image modelling aspect. We further propose a class-wise and task-wise normalization facilitating optimization of multiple tasks in a single network. This enabled us to apply a multi-scale and multi-architecture ensemble strategy for robust prediction. Lastly, we positioned the VQG task as a transfer learning problem using the VGA task trained models. The VQG task was also solved using classification.

Original languageEnglish
JournalCEUR Workshop Proceedings
Volume2696
Publication statusPublished or Issued - 2020
Event11th Conference and Labs of the Evaluation Forum, CLEF 2020 - Thessaloniki, Greece
Duration: 22 Sept 202025 Sept 2020

Keywords

  • Class-wise
  • Deep Neural Networks
  • Knowledge Inference
  • Skeleton-based Sentence Mapping
  • Task-wise Normalization
  • Visual Question Answering
  • Visual Question Generation

ASJC Scopus subject areas

  • General Computer Science

Cite this