Learning Universal Shape Dictionary for Realtime Instance Segmentation

visit5153 25 0 .pdf 2021-01-24 06:01:33

Learning Universal Shape Dictionary for Realtime Instance Segmentation

We present a novel explicit shape representation for instance segmentation. Based on how to model the object shape, current instance segmentation systems can be divided into two categories, implicit and explicit models.The implicit methods, which represent the object mask/contour by intractable network parameters, and produce it through pixel-wise classification, are predominant. However, the explicit methods, which parameterize the shape with simple and explainable models, are less explored. Since the operations to generate the final shape are light-weighted, the explicit methods have a clear speed advantage over implicit methods, which is crucial for real-world applications. The proposed USD-Seg adopts a linear model, sparse coding with dictionary, for object shapes. First, it learns a dictionary from a large collection of shape datasets, making any shape being able to be decomposed into a linear combination through the dictionary. Hence the name "Universal Shape Dictionary". Then it adds a simple shape vector regression head to ordinary object detector, giving the detector segmentation ability with minimal overhead. For quantitative evaluation, we use both average precision (AP) and the proposed Efficiency of AP (AP$_E$) metric, which intends to also measure the computational consumption of the framework to cater to the requirements of real-world applications. We report experimental results on the challenging COCO dataset, in which our single model on a single Titan Xp GPU achieves 35.8 AP and 27.8 AP$_E$ at 65 fps with YOLOv4 as base detector, 34.1 AP and 28.6 AP$_E$ at 12 fps with FCOS as base detector.

学习通用形状字典以进行实时实例分割

我们提出了一种新颖的显式形状表示形式，用于实例分割。基于如何对对象形状进行建模，当前实例分割系统可以分为两类：隐式模型和显式模型。.. 隐式方法占主导地位，隐式方法主要通过难处理的网络参数表示对象蒙版/轮廓，并通过逐像素分类将其生成。但是，很少探索使用简单且可解释的模型对形状进行参数化的显式方法。由于生成最终形状的操作是轻量级的，因此与隐式方法相比，显式方法具有明显的速度优势，这对于实际应用至关重要。提出的USD-Seg采用线性模型，即字典稀疏编码和对象形状。首先，它从大量形状数据集中学习字典，从而使任何形状都可以通过字典分解为线性组合。因此，名称为“通用形状字典”。然后，它向普通对象检测器添加了一个简单的形状矢量回归头，从而以最小的开销提供了检测器分割能力。对于定量评估，我们同时使用平均精度（AP）和建议的效率AP（AP Ë ）指标，它还旨在衡量框架的计算消耗量，以满足实际应用的需求。我们在具有挑战性的COCO数据集上报告了实验结果，其中我们在单个Titan Xp GPU上的单个模型可达到35.8 AP和27.8 AP Ë 以YOLOv4作为基本检测器，34.1 AP和28.6 AP的速度为65 fps Ë 以FPS作为基本检测器，速度为12 fps。（阅读更多）