Field-Configurable Multi-resolution Inference: Rethinking Quantization

Field-Configurable Multi-resolution Inference: Rethinking Quantization

Abstract

Departing from traditional quantization for a fixed quantization resolution, we describe a novel architecture approach to support inference at multiple resolution deployment points. A single meta multi-resolution model with a small footprint can select from multiple resolutions at runtime to satisfy given resource constraints. The proposed scheme relies on term quantization to enable flexible bit annihilation at any position for a value in a context of a group of values. This is in contrast to conventional uniform quantization which always truncates the lowest-order bits. We present multi-resolution training of the meta model and field-configurable multi-resolution MultiplierAccumulator (mMAC) design. We compare our design against a traditional MAC design and evaluate the inference performance on a variety of datasets including ImageNet, COCO, and WikiText-2.

Publication
The 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2021)