Atrous Spatial Pyramid Pooling

What is Atrous Spatial Pyramid Pooling (ASPP)? Atrous Spatial Pyramid Pooling (ASPP) is a module used in semantic segmentation that enables the resampling of a given feature layer at multiple rates prior to convolution. In simpler terms, it allows us to analyze an image at different scales and with different filters, so that we can capture objects accurately and gather more contextual information from the image. This technique makes use of multiple parallel atrous convolutional layers, each wi

Bilateral Guided Aggregation Layer

What is Bilateral Guided Aggregation Layer? Bilateral Guided Aggregation Layer is a technique that is used in the field of computer vision to improve semantic segmentation. It is a feature fusion layer that aims to bring together different types of feature representation and enhance their mutual connections. The Bilateral Guided Aggregation Layer was first used in the BiSeNet V2 architecture that aimed to improve semantic segmentation for autonomous driving. Specifically, within the BiSeNet im

Channel-wise Cross Attention

What is Channel-wise Cross Attention? Channel-wise cross attention is a module used in the UCTransNet architecture to perform semantic segmentation. It fuses features of inconsistent semantics between the Channel Transformer and U-Net decoder, eliminating ambiguity with the decoder features. The operation is a blend of convolutional neural networks and transformer networks, which work together to improve the performance of the model across various tasks. How does Channel-wise Cross Attention

Channel-wise Cross Fusion Transformer

The Channel-wise Cross Fusion Transformer, also known as the CCT module, is an important component used in the UCTransNet architecture for semantic segmentation. What is UCTransNet? UCTransNet is a deep learning architecture used for semantic segmentation, which is a task in computer vision that involves grouping different parts of an image into specific categories. For example, a semantic segmentation model can identify and label objects in an image like cars, pedestrians, or buildings. This

Deeper Atrous Spatial Pyramid Pooling

DeepLabv3 introduces the ASPP module which improves the segmentation accuracy of image recognition models by exploiting global context information. DASPP is a more advanced version of this module, designed to further refine the features of the ASPP module to better identify objects in images. What is DASPP? DASPP stands for "Deeper ASPP" and is a refinement of the ASPP module of DeepLabv3. It adds an additional 3 × 3 convolution after the 3 × 3 dilated convolutions of ASPP to further refine t

Flow Alignment Module

Overview of Flow Alignment Module (FAM) The Flow Alignment Module, or FAM, is a specialized module used for scene parsing. FAM helps to identify the Semantic Flow between feature maps of different levels and effectively broadcasts high-level features to high-resolution features. The process is efficient and helps reduce information loss during the transmission process. This article explains the concept of Semantic Flow and how FAM works. Understanding this technology can help us improve our sc

Global Convolutional Network

A Global Convolutional Network, or GCN, is a type of computer algorithm used in image recognition and categorization. It is a building block used to perform two tasks simultaneously: classification and localization. The GCN uses a large kernel to generate semantic score maps, similar to the structure of a Fully Convolutional Network (FCN). How Does a GCN Work? A GCN employs a combination of 1xk + kx1 and kx1 + 1xk convolutions instead of directly using global convolutions or larger kernels. T

Neural Attention Fields

Overview of NEAT, Neural Attention Fields NEAT, or Neural Attention Fields, is a feature representation for end-to-end imitation learning models. It is a technique used to compress high-dimensional 2D image features into a compact representation by selectively attending to relevant regions in the input while ignoring irrelevant information. This way, the model associates the images with the Bird's Eye View (BEV) representation, which facilitates the driving task. In this article, we will explor

Point-wise Spatial Attention

Overview of Point-wise Spatial Attention (PSA) Point-wise Spatial Attention (PSA) is a module used in semantic segmentation, which is the process of dividing an image into multiple regions or objects, each with its own semantic meaning. The goal of PSA is to capture contextual information, especially in the long range, by aggregating information across the entire feature map. This helps to improve the accuracy and efficiency of semantic segmentation models. How PSA Works The PSA module takes

PointRend

PointRend is a powerful segmentation tool that has quickly gained popularity among machine learning enthusiasts. It is a module that allows for high-quality image segmentation by treating segmentation as an image rendering problem. The module uses a subdivision strategy to select critical points at which to compute labels, making it more efficient than direct, dense computation. This article aims to explain PointRend and how it can be incorporated into popular meta-architectures for both instanc

Pyramid Pooling Module

Overview of Pyramid Pooling Module In the world of computer vision, semantic segmentation involves labeling every pixel in an image with a corresponding category. As such, it is a challenging task that requires a lot of computation. Convolutional neural networks like ResNet have proven to be effective in tackling the problem, but they still have their own limitations that need to be addressed. One of these limitations is the small empirical receptive field on high-level layers, which makes it d

Short-Term Dense Concatenate

The STDC module is a tool used for semantic segmentation, which is a technique used in visual recognition tasks to identify and classify objects within an image. This module proves to be effective as it extracts deep features from images with scalable receptive fields and multi-scale information. By removing structure redundancy in the BiSeNet architecture, STDC aims to improve the efficiency of object recognition tasks. What is STDC? Short-term Dense Concatenate (STDC) is a software module d

1 / 1