7. Quantized Awareness Training (QAT)ΒΆ
The quantization indicates a technique for performing computations and storing tensors with the bit widths lower than the floating-point accuracy. Quantized models perform some or all operations on the tensor using integers rather than floating-point values. Compared to typical FP32 models, horizon_plugin_pytorch supports INT8 quantization, resulting in a 4x reduction in model size and a 4x reduction in memory bandwidth requirements. The hardware support for INT8 computation is typically 2 to 4 times faster than FP32 computation. The quantization is primarily a technique to accelerate the inference, and the quantization operations are only supported for forward computation.
horizon_plugin_pytorch provides the BPU-adapted quantization operations and supports quantization-aware training (QAT). The QAT uses fake-quantization modules to model the quantization errors in forward computation and backpropagation. Note that the computation process of the QAT is performed by using floating-point operations. At the end of the QAT, horizon_plugin_pytorch provides the conversion functions to convert the trained model to a fixed-point model, using a more compact model for representation and high-performance vectorization on the BPU.
This section gives you a detailed introduction to horizon_plugin_pytorch quantitative training tool developed on the basis of PyTorch.
- 7.1. Environmental Dependence
- 7.2. Quick Start
- 7.2.1. Building A Floating Point Model
- 7.2.2. Pre-train A Floating Point Model
- 7.2.3. Set BPU Architectures
- 7.2.4. Operator Fusion
- 7.2.5. Convert A Floating Point Model To A Quantitative Model
- 7.2.6. Quantization Aware Training (QAT)
- 7.2.7. Converting a Quantized Model to a Fixed-point Model
- 7.2.8. Check and Compile a Fixed-point Prediction Model
- 7.3. TUTORIAL
- 7.3.1. Floating Point Model Preparation
- 7.3.2. Operator Fusion
- 7.3.3. Set Up Different BPU Architectures
- 7.3.4. Heterogeneous Model QAT
- 7.3.5. FX Based Quantization
- 7.3.6. Building Quantization-friendly Floating-point Models
- 7.3.7. QAT Experience Summary
- 7.3.8. Model Precision Debug Tool
- 7.3.9. Ideas of Quantization Precision Debug
- 7.4. API REFERRENCE
- 7.5. NOTE
- 7.6. Calibration