尽管移动设备的体系结构最近取得了进步,但对于大多数嵌入式设备而言,深度学习计算要求仍然令人望而却步。为了解决该问题,我们设想利用网络第一层执行的压缩来减少通信成本,从而在本地设备和云之间共享推理的计算成本。..

Compressing Representations for Embedded Deep Learning

Despite recent advances in architectures for mobile devices, deep learning computational requirements remains prohibitive for most embedded devices. To address that issue, we envision sharing the computational costs of inference between local devices and the cloud, taking advantage of the compression performed by the first layers of the networks to reduce communication costs.Inference in such distributed setting would allow new applications, but requires balancing a triple trade-off between computation cost, communication bandwidth, and model accuracy. We explore that trade-off by studying the compressibility of representations at different stages of MobileNetV2, showing those results agree with theoretical intuitions about deep learning, and that an optimal splitting layer for network can be found with a simple PCA-based compression scheme.