Abstract
In this paper, we describe our contribution to the 2020 ImageCLEF Medical Domain Visual Question Answering (VQA-Med) challenge. Our submissions scored first place on the VQA challenge leaderboard, and also the first place on the associated Visual Question Generation (VQG) challenge leaderboard. Our VQA approach was developed using a knowledge inference methodology called Skeleton-based Sentence Mapping (SSM). Using all the questions and answers, we derived a set of classifiable tasks and inferred the corresponding labels. As a result, we were able to transform the VQA task into a multi-task image classification problem which allowed us to focus on the image modelling aspect. We further propose a class-wise and task-wise normalization facilitating optimization of multiple tasks in a single network. This enabled us to apply a multi-scale and multi-architecture ensemble strategy for robust prediction. Lastly, we positioned the VQG task as a transfer learning problem using the VGA task trained models. The VQG task was also solved using classification.
Original language | English |
---|---|
Journal | CEUR Workshop Proceedings |
Volume | 2696 |
Publication status | Published or Issued - 2020 |
Event | 11th Conference and Labs of the Evaluation Forum, CLEF 2020 - Thessaloniki, Greece Duration: 22 Sept 2020 → 25 Sept 2020 |
Keywords
- Class-wise
- Deep Neural Networks
- Knowledge Inference
- Skeleton-based Sentence Mapping
- Task-wise Normalization
- Visual Question Answering
- Visual Question Generation
ASJC Scopus subject areas
- General Computer Science