Publications

Filter by type:
(2020). Field-Configurable Multi-resolution Inference: Rethinking Quantization. The 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2021).

PDF

(2020). exBERT: Extending Pre-trained Models with Domain-specific Vocabulary Under Constrained Training Resources. Findings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020).

PDF Code

(2019). Additive Powers-of-Two Quantization: A Non-uniform Discretization for Neural Networks. The 8th International Conference on Learning Representations (ICLR 2020).

PDF Code

(2019). RTN: Reparameterized Ternary Network. The 34th AAAI Conference on Artificial Intelligence (AAAI 2020).

PDF

(2019). Maestro: A Memory-on-Logic Architecture for Coordinated Parallel Use of Many Systolic Arrays. The 30th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP 2019).

PDF

(2019). Full-stack Optimization for Accelerating CNNs with FPGA Validation. The 33rd ACM International Conference on Supercomputing (ICS 2019).

PDF

(2017). Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon. The 31-st Annual Conference on Neural Information Processing Systems (NeurIPS 2017).

PDF Code Slides