A Quantization Scheme with High Expressiveness and Accuracy

A Quantization Scheme with High Expressiveness and Accuracy

To deploy deep neural networks on resource-limited devices, quantization has been widely explored. In this work, we study the extremely low-bit networks which have tremendous speed up, memory saving, but large accuracy degradation compared with full precision values. We first bring up three omitted issues in extremely low-bit networks: the unexploited nice properties of ternary quantized networks; the squashing range of quantized values and gradient vanishing during backward. By reparameterizing quantized activation vector with full precision scale $\gamma$ and offset $\beta$ for ternary direction vector $\mathbf{A}^t\in{-1, 0, +1}^n$, we decouple the range $\gamma$, $\beta$ from direction $\mathbf{A}^t$ to extenuate above problems. Learnable scale and offset can automatically adjust the range of quantized values and sparsity without gradient vanishing. A novel encoding and computation pattern is designed to support efficient computing for our reparameterized ternary network~(RTN). Extensive experiments on ResNet-18 for ImageNet demonstrate that the proposed RTN finds a much better efficiency between bitwidth and accuracy and achieves up to 24.22% relative accuracy improvement compared with state-of-the-art extremely low-bit networks. Moreover, we validate the proposed computation pattern on Field Programmable Gate Arrays~(FPGA), and it brings $46.46\times$ and $89.17\times$ savings on power and area compared with the full precision convolution.

Results on ImageNet

Avatar
Xin Dong
Ph.D. student at Harvard on Machine Learning

Xin Dong is a Ph.D. student at Harvard University. He’s research focuses efficient deep learning, at the intersection between machine learning and computer architecture. He completed his undergraduate study of Yingcai Honors College at University of Electronic Science and Technology of China (UESTC). He was a Research Assistant in Nanyang Technological University (NTU), Singapore and UC San Diego (UCSD) working on technique related to machine learning.