7.4.2. qconfig 详解¶

7.4.2.1. 什么是 qconfig¶

模型的量化方式由 qconfig 决定，在准备 qat / calibration 模型之前，需要先给模型设置 qconfig。我们不推荐您自定义 qconfig，尽量只使用预定义好的qconfig变量，因为自定义 qconfig 需要对具体的处理器限制认知清晰，详细了解训练工具的工作原理，定义出错可能导致模型无法正常收敛、模型无法编译等问题，浪费大量时间和人力。

注意

目前，Plugin 中维护了两个版本的qconfig，早期版本的 qconfig 将在不久的将来被废弃，我们只推荐您使用此文档中介绍的 qconfig 用法。

7.4.2.2. 如何获取 qconfig¶

使用封装好的 qconfig 变量。这些 qconfig 存放在 horizon_plugin_pytorch/quantization/qconfig.py 中，可以适用于绝大多数情况。包括：

from horizon_plugin_pytorch.quantization.qconfig import (
    default_calib_8bit_fake_quant_qconfig,
    default_qat_8bit_fake_quant_qconfig,
    default_qat_8bit_fixed_act_fake_quant_qconfig,
    default_calib_8bit_weight_16bit_act_fake_quant_qconfig,
    default_qat_8bit_weight_16bit_act_fake_quant_qconfig,
    default_qat_8bit_weight_16bit_fixed_act_fake_quant_qconfig,
    default_qat_8bit_weight_32bit_out_fake_quant_qconfig, # 参考算子列表，支持高精度输出的算子可以设置此 qconfig 获得更高的精度
    default_calib_8bit_weight_32bit_out_fake_quant_qconfig, # 参考算子列表，支持高精度输出的算子可以设置此 qconfig 获得更高的精度
)

使用 get_default_qconfig 接口。此接口较固定 qconfig 变量更灵活，我们推荐您对量化和硬件限制有清晰认知之后再使用。常用参数和解释如下：

from horizon_plugin_pytorch.quantization.qconfig import get_default_qconfig

qconfig = get_default_qconfig(
    activation_fake_quant="fake_quant",  # 支持 fake_quant, lsq, pact，常用 fake quant
    weight_fake_quant="fake_quant", # 支持 fake_quant, lsq, pact，常用 fake quant
    activation_observer="min_max", # 支持 min_max, fixed_scale, clip, percentile, clip_std, mse, kl
    weight_observer="min_max", # 支持 min_max, fixed_scale, clip, percentile, clip_std, mse, kl
    activation_qkwargs={
        "dtype": qint16, # 由具体算子决定是否支持 int16
        "is_sync_quantize": False, # 是否同步统计数据，默认关闭提升forward速度
        "averaging_constant": 0.01 # 滑动平均系数，设置为0时，scale不更新
    },
    weight_qkwargs={ # 只支持 dtype = qint8, qscheme = torch.per_channel_symmetric, ch_axis = 0, 不建议做额外配置
        "dtype": qint8,
        "qscheme": torch.per_channel_symmetric,
        "ch_axis": 0,
    },
)

7.4.2.3. 如何设置 qconfig¶

共有三种设置方法，我们推荐您使用前两种，最后一种设置方式将废弃。

直接设置 qconfig 属性。此方法优先级最高，其余方法不会覆盖直接设置的 qconfig。

model.qconfig = default_qat_8bit_fake_quant_qconfig

qconfig 模板。在 prepare 接口上指定 qconfig setter 和 example_inputs，自动为模型设置 qconfig。

model = prepare_qat_fx(
    model,
    example_inputs=data,
    qconfig_setter=default_qat_qconfig_setter,
)

qconfig_dict。在 prepare_qat_fx 接口上指定 qconfig_dict。此用法将逐步废弃，如无兼容性需求，不推荐再使用，这里不展开介绍。

model = prepare_qat_fx(
    model,
    qconfig_dict={"": default_qat_qconfig_setter},
)

7.4.2.4. qconfig 模板¶

长期以来，配置 qconfig 出错的问题经常发生，因此我们开发了 qconfig 模板。qconfig 模板基于 subclass trace 方案感知模型的图结构，并按设定的规则自动设置 qconfig，是我们最推荐的设置 qconfig 方法。用法如下：

qat_model = prepare_qat_fx(
    model,
    example_inputs=example_input,  # 用来感知图结构
    qconfig_setter=( # qconfig 模板，支持传入多个模板，优先级从高到低。
        sensitive_op_qat_8bit_weight_16bit_act_qconfig_setter(table, ratio=0.2),
        default_calibration_qconfig_setter,
    )
)

注意

模板的优先级低于直接给模型设置 qconfig 属性，如果模型在 prepare 之前已经使用 model.qconfig = xxx 进行了配置，那么模板将不会生效。如果没有特殊需求，我们不推荐将两者混合使用，这很容易引发低级错误。绝大多数情况下，我们推荐您使用模板和 model.qconfig = xxx 两种设置方式中的一种即可满足需求。

模板可分为三类：

固定模板。固定模板中 calibration / qat / qat_fixed_act_scale 区别在于使用的 observer 类型和 scale 更新逻辑，分别用于校准，qat 训练，固定 activation scale qat 训练。default 模板( default_calibration_qconfig_setter / default_qat_qconfig_setter / default_qat_fixed_act_qconfig_setter )会做三件事：首先，将可以设置的高精度输出都设置上，对于不支持高精度的输出将给出提示；然后，从 grid sample 算子的 grid 输入向前搜索，直到出现第一个 gemm 类算子或者QuantStub，将中间的所有算子都设置为 int16。根据经验这里的 grid 一般表达范围较宽，int8 有较大可能不满足精度需求；最后，将其余算子设置为 int8。int16 模板( qat_8bit_weight_16bit_act_qconfig_setter / qat_8bit_weight_16bit_fixed_act_qconfig_setter / calibration_8bit_weight_16bit_act_qconfig_setter )会做两件事：首先，将可以设置的高精度输出都设置上，对于不支持高精度的输出将给出提示；其次，将其余算子设置为 int16。

from horizon_plugin_pytorch.quantization.qconfig_template import (
    default_calibration_qconfig_setter,
    default_qat_qconfig_setter,
    default_qat_fixed_act_qconfig_setter,
    qat_8bit_weight_16bit_act_qconfig_setter,
    qat_8bit_weight_16bit_fixed_act_qconfig_setter,
    calibration_8bit_weight_16bit_act_qconfig_setter,
)

敏感度模板。敏感度模板有 sensitive_op_calibration_8bit_weight_16bit_act_qconfig_setter， sensitive_op_qat_8bit_weight_16bit_act_qconfig_setter， sensitive_op_qat_8bit_weight_16bit_fixed_act_qconfig_setter，三者的区别和固定模板中三者的区别一致，也是分别用于校准，qat 训练，固定 activation scale qat 训练。敏感度模板的第一个输入是精度 debug 工具产生的敏感度结果，第二个参数可以指定 ratio 或 topk ，敏感度模板会将量化敏感度最高的 topk 个算子设置为 int16。搭配固定模板，可以轻松实现混合精度调优。

from horizon_plugin_pytorch.quantization.qconfig_template import (
    default_calibration_qconfig_setter,
    default_qat_qconfig_setter,
    default_qat_fixed_act_qconfig_setter,
    qat_8bit_weight_16bit_act_qconfig_setter,
    qat_8bit_weight_16bit_fixed_act_qconfig_setter,
    calibration_8bit_weight_16bit_act_qconfig_setter,
    sensitive_op_qat_8bit_weight_16bit_act_qconfig_setter,
    sensitive_op_qat_8bit_weight_16bit_fixed_act_qconfig_setter,
    sensitive_op_calibration_8bit_weight_16bit_act_qconfig_setter,
)

table = torch.load("output_0-0_dataindex_1_sensitive_ops.pt")

qat_model = prepare_qat_fx(
    model,
    example_inputs=example_input,
    qconfig_setter=( 
        sensitive_op_qat_8bit_weight_16bit_fixed_act_qconfig_setter(table, ratio=0.2),
        default_calibration_qconfig_setter,
    )
)

自定义模板。自定义模板只有 ModuleNameQconfigSetter，需要传入模块名和对应 qconfig 的字典，一般用于设置 fixed scale 等特殊需求，可以和固定模板，敏感度模板搭配使用。

from horizon_plugin_pytorch.quantization.qconfig_template import (
    default_calibration_qconfig_setter,
    default_qat_qconfig_setter,
    default_qat_fixed_act_qconfig_setter,
    qat_8bit_weight_16bit_act_qconfig_setter,
    qat_8bit_weight_16bit_fixed_act_qconfig_setter,
    calibration_8bit_weight_16bit_act_qconfig_setter,
    sensitive_op_qat_8bit_weight_16bit_act_qconfig_setter,
    sensitive_op_qat_8bit_weight_16bit_fixed_act_qconfig_setter,
    sensitive_op_calibration_8bit_weight_16bit_act_qconfig_setter,
    ModuleNameQconfigSetter,
)

table = torch.load("output_0-0_dataindex_1_sensitive_ops.pt")

module_name_to_qconfig = {
    "op_1": default_qat_8bit_fake_quant_qconfig,
    "op_2": get_default_qconfig(
        activation_observer="fixed_scale",
        activation_qkwargs={
            "dtype": qint16,
            "scale": OP2_MAX / QINT16_MAX,
        },
    )
}

qat_model = prepare_qat_fx(
    model,
    example_inputs=example_input,
    qconfig_setter=(
        ModuleNameQconfigSetter(module_name_to_qconfig),
        sensitive_op_qat_8bit_weight_16bit_fixed_act_qconfig_setter(table, ratio=0.2),
        default_calibration_qconfig_setter,
    )
)