vqa-models

Modulated Residual Network

Modern technology has brought about incredible advancements in many areas, including visual question answering. MODERN, short for Modulated Residual Network, is an architecture used in visual question answering that employs conditional batch normalization to allow for linguistic embedding. This linguistic embedding from an LSTM modulates the batch normalization parameters of a ResNet, enabling the manipulation of entire feature maps by scaling them up or down, negating them, or shutting them off

Uncertainty Class Activation Map (U-CAM) Using Gradient Certainty Method

Overview of U-CAM Deep learning models have revolutionized the field of artificial intelligence by enabling computers to process and understand complex data, such as images and speech. However, these models are often considered "black boxes" as their decisions are difficult to interpret and explain. As a result, researchers have been working towards developing methods that can provide explanations for how these models arrive at their predictions. One such method is U-CAM or Uncertainty-based V