● Goal: Convert FP32 CNNs into INT8 without significant accuracy loss. ● Why: INT8 math has higher throughput, and lower memory requirements. ● Challenge: INT8 has significantly lower precision and dynamic range than FP32. ● Solution: Minimize loss of information when quantizing trained model weights to INT8 and during INT8 computation of activations. ● Result: Method was implemented in TensorRT. It does not require any additional fine tuning or retraining.