Mask SSD: An Effective Single-Stage Approach to Object Instance Segmentation.
AbstractWe propose Mask SSD, an efficient and effective approach to address the challenging instance segmentation task. Based on a single-shot detector, Mask SSD detects all instances in an image and marks the pixels that belong to each instance. It consists of a detection subnetwork that predicts object categories and bounding box locations, and an instance-level segmentation subnetwork that generates the foreground mask for each instance. In the detection subnetwork, multi-scale and feedback features from different layers are used to better represent objects of various sizes and provide high-level semantic information. Then, we adopt an assistant classification network to guide per-class score prediction, which consists of objectness prior and category likelihood. The instance-level segmentation subnetwork outputs pixel-wise segmentation for each detection while providing the multi-scale and feedback features from different layers as input. These two subnetworks are jointly optimized by a multi-task loss function, which renders Mask SSD direct prediction on detection and segmentation results. We conduct extensive experiments on PASCAL VOC, SBD, and MS COCO datasets to evaluate the performance of Mask SSD. Experimental results verify that as compared with state-of-the-art approaches, our proposed method has a comparable precision with less speed overhead.