SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings

Speed is a critical factor in many computer vision tasks, such as scene understanding and visual odometry, which are essential components in autonomous and robotic systems. The ability to estimate depth from a single frame is called monocular depth estimation (MDE), and it is an essential skill for many computer vision applications. However, vision transformer architectures are too deep and complex for real-time inference on low-resource platforms. This is where the Separable Pyramidal pooling E

ViP-DeepLab

Introduction to ViP-DeepLab ViP-DeepLab is a model used for depth-aware video panoptic segmentation. This model was created by adding a depth prediction head and a next-frame instance branch to the already existing Panoptic-DeepLab model. By doing so, ViP-DeepLab is able to perform video panoptic segmentation and monocular depth estimation simultaneously. What is Depth-Aware Video Panoptic Segmentation? Video panoptic segmentation is a process that includes segmenting objects and backgrounds

1 / 1