Handgun detection using combined human pose and weapon appearance

CCTV surveillance systems are essential nowadays to prevent and mitigate security threats or dangerous situations such as mass shootings or terrorist attacks, in which early detection is crucial. These solutions are manually supervised by a security operator, which has significant limitations.Novel deep learning-based methods have allowed to develop automatic and real time weapon detectors with promising results. However, these approaches are based on visual weapon appearance only and no additional contextual information is exploited. For handguns, body pose may be a useful cue, especially in cases where the gun is barely visible and also as a way to reduce false positives. In this work, a novel method is proposed to combine, in a single architecture, both weapon appearance and 2D human pose information. First, pose keypoints are estimated to extract hand regions and generate binary pose images, which are the model inputs. Then, each input is processed with a different subnetwork to extract two feature maps. Finally, this information is combined to produce the hand region prediction (handgun vs no-handgun). A new dataset composed of samples collected from different sources has been used to evaluate model performance under different situations. Moreover, the robustness of the model to different brightness and weapon size conditions (simulating conditions in which appearance is degraded by low light and distance to the camera) have also been tested. Results obtained show that the combined model improves overall performance substantially with respect to appearance alone as used by other popular methods such as YOLOv3.

结合人体姿势和武器外观的手枪检测

如今,CCTV监视系统对于预防和缓解安全威胁或危险情况(例如大规模枪击或恐怖袭击)至关重要,在这种情况下及早发现至关重要。这些解决方案由安全操作员手动监督,这有很大的局限性。.. 新型的基于深度学习的方法已允许开发自动且实时的武器探测器,并取得了可喜的成果。但是,这些方法仅基于视觉武器的外观,没有利用其他上下文信息。对于手枪,身体姿势可能是一个有用的提示,尤其是在几乎看不见枪的情况下,并且它也是减少误报的一种方式。在这项工作中,提出了一种在单一体系结构中将武器外观和2D人体姿势信息相结合的新颖方法。首先,估计姿势关键点以提取手部区域并生成二进制姿势图像,这是模型输入。然后,使用不同的子网处理每个输入,以提取两个特征图。最后,将这些信息组合起来以生成手部区域预测(手枪与非手枪)。由从不同来源收集的样本组成的新数据集已用于评估不同情况下的模型性能。此外,还测试了该模型对不同亮度和武器尺寸条件(模拟外观因光线不足和距相机距离变差的条件)的鲁棒性。所获得的结果表明,与其他流行方法(例如YOLOv3)所使用的组合模型相比,仅在外观上,组合模型就可以显着提高整体性能。 (阅读更多)