Non-uniform quantization has been shown to achieve promising performance for compressing neural networks, due to its better adaptation to the distribution of weights. However, traditional non-uniform quantization methods rely solely on weight distribution density, resulting in diminished model performance post-quantization. To tackle this challenge, we propose a novel non-uniform quantization method that can not only automatically learn the clipping threshold but also adaptively adjust the quantization levels, which can effectively reduce the quantization error. Specifically, we first develop a local uniform quantization strategy by providing finer quantization levels in dense regions. In addition, the gradient of weights is also taken into account in assigning quantization levels. Furthermore, to further diminish quantization error, we propose a linear interpolation-based clipping method with a learnable threshold, which can automatically learn the clipping threshold, minimizing the impact of abnormal data on quantization. The efficacy of our method is validated on CIFAR10, CIFAR100, TINY-IMAGENET, and IMAGENET100 datasets, yielding promising results in terms of improved model performance and reduced quantization errors.
Keywords: Learnable clipping threshold; Neural network; Non-uniform quantization.
Copyright © 2025. Published by Elsevier Ltd.