YOLObile: Real-Time Object Detection on Mobile Devices via Compression-Compilation Co-Design
The rapid development and wide utilization of object detection techniques have aroused attention on both accuracy and speed of object detectors. However, the current state-of-the-art object detection works are either accuracy-oriented using a large model but leading to high latency or speed-oriented using a lightweight model but sacrificing accuracy.In this work, we propose YOLObile framework, a real-time object detection on mobile devices via compression-compilation co-design. A novel block-punched pruning scheme is proposed for any kernel size. To improve computational efficiency on mobile devices, a GPU-CPU collaborative scheme is adopted along with advanced compiler-assisted optimizations. Experimental results indicate that our pruning scheme achieves 14$\times$ compression rate of YOLOv4 with 49.0 mAP. Under our YOLObile framework, we achieve 17 FPS inference speed using GPU on Samsung Galaxy S20. By incorporating our proposed GPU-CPU collaborative scheme, the inference speed is increased to 19.1 FPS, and outperforms the original YOLOv4 by 5$\times$ speedup. Source code is at: \url{https://github.com/nightsnack/YOLObile}.
YOLObile:通过压缩编译协同设计在移动设备上进行实时对象检测
物体检测技术的迅速发展和广泛使用引起了对物体检测器的准确性和速度的关注。但是,当前的最新对象检测工作要么使用大型模型以准确性为导向,但导致高延迟,或者使用轻量级模型以速度为导向,但牺牲准确性。.. 在这项工作中,我们提出YOLObile框架下,通过压缩编译协同设计在移动设备上的实时目标检测。针对任何内核大小,提出了一种新颖的块打孔修剪方案。为了提高移动设备上的计算效率,采用了GPU-CPU协作方案以及高级的编译器辅助优化。实验结果表明,我们的修剪方案达到了14 × 49.0 mAP的YOLOv4压缩率。在我们的YOLObile框架下,我们在Samsung Galaxy S20上使用GPU实现了17 FPS推理速度。通过合并我们提出的GPU-CPU协作方案,推理速度提高到19.1 FPS,并且比原始YOLOv4高出5倍 × 加速。源代码位于:\ url {https://github.com/nightsnack/YOLObile}。 (阅读更多)
暂无评论