4.2.4.1. 量化 API

This file is in the process of migration to torch/ao/quantization, and is kept here for compatibility while the migration process is ongoing. If you are adding a new entry/functionality, please, add it to the torch/ao/quantization/fuse_modules.py, while adding an import statement here.

torch.quantization.fuse_modules.fuse_modules(model, modules_to_fuse, inplace=False, fuser_func=<function fuse_known_modules>, fuse_custom_config_dict=None)

Fuses a list of modules into a single module

Fuses only the following sequence of modules: conv, bn conv, bn, relu conv, relu linear, relu bn, relu All other sequences are left unchanged. For these sequences, replaces the first item in the list with the fused module, replacing the rest of the modules with identity.

参数
  • model – Model containing the modules to be fused

  • modules_to_fuse – list of list of module names to fuse. Can also be a list of strings if there is only a single list of modules to fuse.

  • inplace – bool specifying if fusion happens in place on the model, by default a new model is returned

  • fuser_func – Function that takes in a list of modules and outputs a list of fused modules of the same length. For example, fuser_func([convModule, BNModule]) returns the list [ConvBNModule, nn.Identity()] Defaults to torch.quantization.fuse_known_modules

  • fuse_custom_config_dict – custom configuration for fusion

# Example of fuse_custom_config_dict
fuse_custom_config_dict = {
    # Additional fuser_method mapping
    "additional_fuser_method_mapping": {
        (torch.nn.Conv2d, torch.nn.BatchNorm2d): fuse_conv_bn
    },
}
返回

model with fused modules. A new copy is created if inplace=True.

Examples:

>>> m = myModel()
>>> # m is a module containing  the sub-modules below
>>> modules_to_fuse = [ ['conv1', 'bn1', 'relu1'], ['submodule.conv', 'submodule.relu']]
>>> fused_m = torch.ao.quantization.fuse_modules(m, modules_to_fuse)
>>> output = fused_m(input)

>>> m = myModel()
>>> # Alternately provide a single list of modules to fuse
>>> modules_to_fuse = ['conv1', 'bn1', 'relu1']
>>> fused_m = torch.ao.quantization.fuse_modules(m, modules_to_fuse)
>>> output = fused_m(input)

prepare and convert

horizon_plugin_pytorch.quantization.quantize.convert(module, mapping=None, inplace=False, remove_qconfig=True)

Converts submodules in input module to a different module according to mapping by calling from_float method on the target module class. And remove qconfig at the end if remove_qconfig is set to True.

参数
  • module – input module

  • mapping – a dictionary that maps from source module type to target module type, can be overwritten to allow swapping user defined Modules

  • inplace – carry out model transformations in-place, the original module is mutated

horizon_plugin_pytorch.quantization.quantize.prepare_calibration(model, inplace=False)

Prepare the model for calibration.

参数
  • model – Float model with fused ops

  • inplace – carry out model transformations in-place or not. Defaults to False.

horizon_plugin_pytorch.quantization.quantize.prepare_qat(model: torch.nn.modules.module.Module, mapping: Optional[Dict[torch.nn.modules.module.Module, torch.nn.modules.module.Module]] = None, inplace: bool = False, optimize_graph: bool = False, hybrid: bool = False)

Prepares a copy of the model for quantization-aware training and converts it to quantized version.

Quantization configuration should be assigned preemptively to individual submodules in .qconfig attribute.

参数
  • model – input model to be modified in-place

  • mapping – dictionary that maps float modules to quantized modules to be replaced.

  • inplace – carry out model transformations in-place, the original module is mutated

  • optimize_graph – whether to do some process on origin model for special purpose. Currently only support using torch.fx to fix cat input scale(only used on Bernoulli)

  • hybrid – whether to generate a hybrid model that some intermediate operation is computed in float. There are some constraints for this functionality now: 1. The hybrid model cannot pass check_model and cannot be compiled. 2. Some quantized operation cannot directly accept input from float operation, user need to manually insert QuantStub.

horizon_plugin_pytorch.quantization.quantize_fx.convert_fx(graph_module: torch.fx.graph_module.GraphModule, convert_custom_config_dict: Optional[Dict[str, Any]] = None, _remove_qconfig: bool = True) horizon_plugin_pytorch.quantization.fx.graph_module.QuantizedGraphModule

Convert a calibrated or trained model to a quantized model.

参数
  • graph_module – A prepared and calibrated/trained model (GraphModule)

  • convert_custom_config_dict

    dictionary for custom configurations for convert function:

    convert_custom_config_dict = {
        # We automativally preserve all attributes, this option is
        # just in case and not likely to be used.
        "preserved_attributes": ["preserved_attr"],
    }
    

  • _remove_qconfig – Option to remove the qconfig attributes in the model after convert. for internal use only.

返回

A quantized model (GraphModule)

Example:

# prepared_model: the model after prepare_fx/prepare_qat_fx and
# calibration/training
quantized_model = convert_fx(prepared_model)
horizon_plugin_pytorch.quantization.quantize_fx.fuse_fx(model: torch.nn.modules.module.Module, fuse_custom_config_dict: Optional[Dict[str, Any]] = None) horizon_plugin_pytorch.quantization.fx.graph_module.GraphModuleWithAttr

Fuse modules like conv+add+bn+relu etc. Fusion rules are defined in horizon_plugin_pytorch.quantization.fx.fusion_pattern.py

参数
  • model – a torch.nn.Module model

  • fuse_custom_config_dict

    Dictionary for custom configurations for fuse_fx, e.g.

    fuse_custom_config_dict = {
        # We automativally preserve all attributes, this option is
        # just in case and not likely to be used.
        "preserved_attributes": ["preserved_attr"],
    }
    

Example:

from torch.quantization import fuse_fx
m = fuse_fx(m)
horizon_plugin_pytorch.quantization.quantize_fx.prepare_calibration_fx(model, qconfig_dict: Optional[Dict[str, Any]] = None, prepare_custom_config_dict: Optional[Dict[str, Any]] = None, optimize_graph: bool = False, hybrid: bool = False, hybrid_dict: Optional[Dict[str, List]] = None) horizon_plugin_pytorch.quantization.fx.graph_module.ObservedGraphModule

Prepare the model for calibration.

Args: Same as prepare_qat_fx

horizon_plugin_pytorch.quantization.quantize_fx.prepare_qat_fx(model: torch.nn.modules.module.Module, qconfig_dict: Optional[Dict[str, Any]] = None, prepare_custom_config_dict: Optional[Dict[str, Any]] = None, optimize_graph: bool = False, hybrid: bool = False, hybrid_dict: Optional[Dict[str, List]] = None) horizon_plugin_pytorch.quantization.fx.graph_module.ObservedGraphModule

Prepare a model for quantization aware training.

参数
  • model – torch.nn.Module model or GraphModule model (maybe from fuse_fx)

  • qconfig_dict

    qconfig_dict is a dictionary with the following configurations:

    qconfig_dict = {
        # optional, global config
        "": qconfig,
    
        # optional, used for module types
        "module_type": [
            (torch.nn.Conv2d, qconfig),
            ...,
        ],
    
        # optional, used for module names
        "module_name": [
            ("foo.bar", qconfig)
            ...,
        ],
        # priority (in increasing order):
        #   global, module_type, module_name, module.qconfig
        # qconfig == None means quantization should be
        # skipped for anything matching the rule.
        # The qconfig of function or method is the same as the
        # qconfig of its parent module, if it needs to be set
        # separately, please wrap this function as a module.
    }
    

  • prepare_custom_config_dict

    customization configuration dictionary for quantization tool:

    prepare_custom_config_dict = {
        # We automativally preserve all attributes, this option is
        # just in case and not likely to be used.
        "preserved_attributes": ["preserved_attr"],
    }
    

  • optimize_graph – whether to do some process on origin model for special purpose. Currently only support using torch.fx to fix cat input scale(only used on Bernoulli)

  • hybrid – Whether prepare model in hybrid mode. Default value is False and model runs on BPU completely. It should be True if the model is quantized by model convert or contains some CPU ops. In hybrid mode, ops which aren’t supported by BPU and ops which are specified by the user will run on CPU. How to set qconfig: Qconfig in hybrid mode is the same as qconfig in non-hybrid mode. For BPU op, we should ensure the input of this op is quantized, the activation qconfig of its previous non-quantstub op should not be None even if its previous non-quantstub op is a CPU op. How to specify CPU op: Define CPU module_name or module_type in hybrid_dict.

  • hybrid_dict

    hybrid_dict is a dictionary to define user-specified CPU op:

    hybrid_dict = {
        # optional, used for module types
        "module_type": [torch.nn.Conv2d, ...],
    
        # optional, used for module names
        "module_name": ["foo.bar", ...],
    }
    # priority (in increasing order): module_type, module_name
    # To set a function or method as CPU op, wrap it as a module.
    

返回

A GraphModule with fake quant modules (configured by qconfig_dict), ready for quantization aware training

Example:

import torch
from horizon_plugin_pytorch.quantization import get_default_qat_qconfig
from horizon_plugin_pytorch.quantization import prepare_qat_fx

qconfig = get_default_qat_qconfig()
def train_loop(model, train_data):
    model.train()
    for image, target in data_loader:
        ...

qconfig_dict = {"": qconfig}
prepared_model = prepare_qat_fx(float_model, qconfig_dict)
# Run QAT training
train_loop(prepared_model, train_loop)

Extended tracer and wrap of torch.fx This file defines a inherit tracer of torch.fx.Tracer and a extended wrap to allow wrapping of user-defined Module or method, which help users do some optimization of their own module by torch.fx

horizon_plugin_pytorch.utils.fx_helper.wrap(obj)

This function can be: 1) called or used as a decorator on a string to:

register a builtin function as a “leaf function”

  1. called or used as a decorator on a function to:

    register this function as a “leaf function”

  2. called or used as a decorator on subclass of torch.nn.Module to:

    register this module as a “leaf module”, and register all user defined method in this class as “leaf method”

  3. called or used as a decorator on a class method to

    register it as “leaf method”

same as torch.quantization.FakeQuantize.

class horizon_plugin_pytorch.quantization.fake_quantize.FakeQuantize(observer=<class 'horizon_plugin_pytorch.quantization.observer.MovingAverageMinMaxObserver'>, quant_min=None, quant_max=None, saturate=None, in_place=False, channel_len=1, **observer_kwargs)

Simulate the quantize and dequantize operations in training time. The output of this module is given by

x_out = (clamp(round(x/scale + zero_point), quant_min, quant_max)-zero_point)*scale # noqa

  • scale defines the scale factor used for quantization.

  • zero_point specifies the quantized value to which 0 in floating point maps to

  • quant_min specifies the minimum allowable quantized value.

  • quant_max specifies the maximum allowable quantized value.

  • fake_quant_enabled controls the application of fake quantization on tensors, note that statistics can still be updated.

  • observer_enabled controls statistics collection on tensors

  • dtype specifies the quantized dtype that is being emulated with fake-quantization, the allowable values is qint8 and qint16. The values of quant_min and quant_max should be chosen to be consistent with the dtype

参数
  • observer (module) – Module for observing statistics on input tensors and calculating scale and zero-point.

  • quant_min (int) – The minimum allowable quantized value.

  • quant_max (int) – The maximum allowable quantized value.

  • channel_len (int) – Size of data at channel dim.

  • observer_kwargs (optional) – Arguments for the observer module

observer

User provided module that collects statistics on the input tensor and provides a method to calculate scale and zero-point.

Type

Module

extra_repr()

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

forward(X)

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

set_qparams(scale: Union[torch.Tensor, Sequence, float], zero_point: Optional[Union[torch.Tensor, Sequence, int]] = None)

default symmetric

classmethod with_args(**kwargs)

Wrapper that allows creation of class factories.

This can be useful when there is a need to create classes with the same constructor arguments, but different instances. Can be used in conjunction with _callable_args

Example:

>>> Foo.with_args = classmethod(_with_args)
>>> foo_builder = Foo.with_args(a=3, b=4).with_args(answer=42)
>>> foo_instance1 = foo_builder()
>>> foo_instance2 = foo_builder()
>>> id(foo_instance1) == id(foo_instance2)
False
class horizon_plugin_pytorch.quantization.observer.MovingAverageMinMaxObserver(averaging_constant=0.01, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, quant_min=None, quant_max=None, is_sync_quantize=False, factory_kwargs=None)

Observer module for computing the quantization parameters based on the moving average of the min and max values.

This observer computes the quantization parameters based on the moving averages of minimums and maximums of the incoming tensors. The module records the average minimum and maximum of incoming tensors, and uses this statistic to compute the quantization parameters.

参数
  • averaging_constant – Averaging constant for min/max.

  • dtype – Quantized data type

  • qscheme – Quantization scheme to be used, only support per_tensor_symmetric scheme

  • reduce_range – Reduces the range of the quantized data type by 1 bit

  • quant_min – Minimum quantization value.

  • quant_max – Maximum quantization value.

  • is_sync_quantize – Whether use sync quantize

  • factory_kwargs – Arguments for register data buffer

forward(x_orig)

Records the running minimum and maximum of x.

class horizon_plugin_pytorch.quantization.observer.MovingAveragePerChannelMinMaxObserver(averaging_constant=0.01, ch_axis=0, dtype=torch.qint8, qscheme=torch.per_channel_symmetric, quant_min=None, quant_max=None, is_sync_quantize=False, factory_kwargs=None)

Observer module for computing the quantization parameters based on the running per channel min and max values.

This observer uses the tensor min/max statistics to compute the per channel quantization parameters. The module records the running minimum and maximum of incoming tensors, and uses this statistic to compute the quantization parameters.

参数
  • averaging_constant – Averaging constant for min/max.

  • ch_axis – Channel axis

  • dtype – Quantized data type

  • qscheme – Quantization scheme to be used, Only support per_channel_symmetric

  • quant_min – Minimum quantization value.

  • quant_max – Maximum quantization value.

  • is_sync_quantize – whether use sync quantize

  • factory_kwargs – Arguments for register data buffer

forward(x_orig)

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

fuse modules

horizon_plugin_pytorch.quantization.fuse_modules.fuse_known_modules(mod_list, additional_fuser_method_mapping=None)

Returns a list of modules that fuses the operations specified in the input module list.

Fuses only the following sequence of modules: conv, bn; conv, bn, relu; conv, relu; conv, bn, add; conv, bn, add, relu; conv, add; conv, add, relu; linear, bn; linear, bn, relu; linear, relu; linear, bn, add; linear, bn, add, relu; linear, add; linear, add, relu. For these sequences, the first element in the output module list performs the fused operation. The rest of the elements are set to nn.Identity()

class horizon_plugin_pytorch.march.March

BPU platform

BAYES: Bayes platform
BERNOULLI2: Bernoulli2 platform
META: Meta platform
horizon_plugin_pytorch.quantization.qconfig.get_default_calib_qconfig(dtype='qint8', calib_qkwargs=None, backend='')

Get default calibration qconfig.

参数
  • dtype (str) – quantization type, the allowable value is qint8 and qint16

  • calib_qkwargs (dict) – A dict that contains args of CalibFakeQuantize and args of calibration observer.

  • backend (str) – backend implementation

horizon_plugin_pytorch.quantization.qconfig.get_default_qat_out_qconfig(dtype='qint8', weight_fake_quant='fake_quant', weight_qkwargs=None, backend='')

Get default qat out qconfig.

参数
  • dtype (str) – quantization type, the allowable value is qint8 and qint16

  • weight_fake_quant (str) – FakeQuantize type of weight, default is fake_quant.Avaliable items is fake_quant, lsq and pact

  • weight_qkwargs (dict) – A dict contain weight Observer type, args of weight FakeQuantize and args of weight Observer.

  • backend (str) – backend implementation

horizon_plugin_pytorch.quantization.qconfig.get_default_qat_qconfig(dtype='qint8', weight_dtype='qint8', activation_fake_quant='fake_quant', weight_fake_quant='fake_quant', activation_qkwargs=None, weight_qkwargs=None, backend='')

Get default qat qconfig.

参数
  • dtype (str) – Activation quantization type, the allowable values is qint8 and qint16

  • weight_dtype (str) – Weight quantization type, the allowable values is qint8 and qint16

  • activation_fake_quant (str) – FakeQuantize type of activation, default is fake_quant. Avaliable items is fake_quant, lsq, pact

  • weight_fake_quant (str) – FakeQuantize type of weight, default is fake_quant.Avaliable items is fake_quant, lsq and pact

  • activation_qkwargs (dict) – A dict contain activation Observer type, args of activation FakeQuantize and args of activation Observer.

  • weight_qkwargs (dict) – A dict contain weight Observer type, args of weight FakeQuantize and args of weight Observer.

  • backend (str) – backend implementation

horizon_plugin_pytorch.utils.onnx_helper.export_to_onnx(model, args, f, export_params=True, verbose=False, training=<TrainingMode.EVAL: 0>, input_names=None, output_names=None, operator_export_type=<OperatorExportTypes.ONNX_FALLTHROUGH: 3>, opset_version=11, do_constant_folding=True, example_outputs=None, strip_doc_string=True, dynamic_axes=None, keep_initializers_as_inputs=None, custom_opsets=None, enable_onnx_checker=False)

Export a (float or qat)model into ONNX format.

参数
  • model (torch.nn.Module/torch.jit.ScriptModule/ScriptFunction) – the model to be exported.

  • args (tuple or torch.Tensor) –

    args can be structured either as:

    1. ONLY A TUPLE OF ARGUMENTS:

      args = (x, y, z)
      

    The tuple should contain model inputs such that model(*args) is a valid invocation of the model. Any non-Tensor arguments will be hard-coded into the exported model; any Tensor arguments will become inputs of the exported model, in the order they occur in the tuple.

    1. A TENSOR:

      args = torch.Tensor([1])
      

    This is equivalent to a 1-ary tuple of that Tensor.

    3. A TUPLE OF ARGUMENTS ENDING WITH A DICTIONARY OF NAMED ARGUMENTS:

    args = (x,
            {'y': input_y,
             'z': input_z})
    

    All but the last element of the tuple will be passed as non-keyword arguments, and named arguments will be set from the last element. If a named argument is not present in the dictionary , it is assigned the default value, or None if a default value is not provided.

  • f – a file-like object or a string containing a file name. A binary protocol buffer will be written to this file.

  • export_params (bool, default True) – if True, all parameters will be exported.

  • verbose (bool, default False) – if True, prints a description of the model being exported to stdout, doc_string will be added to graph. doc_string may contaion mapping of module scope to node name in future torch onnx.

  • training (enum, default TrainingMode.EVAL) –

    if model.training is False and in training mode if model.training is True.

    • TrainingMode.EVAL: export the model in inference mode.

    • TrainingMode.PRESERVE: export the model in inference mode

    • TrainingMode.TRAINING: export the model in training mode. Disables optimizations which might interfere with training.

  • input_names (list of str, default empty list) – names to assign to the input nodes of the graph, in order.

  • output_names (list of str, default empty list) – names to assign to the output nodes of the graph, in order.

  • operator_export_type (enum, default ONNX_FALLTHROUGH) –

    • OperatorExportTypes.ONNX: Export all ops as regular ONNX ops (in the default opset domain).

    • OperatorExportTypes.ONNX_FALLTHROUGH: Try to convert all ops to standard ONNX ops in the default opset domain.

    • OperatorExportTypes.ONNX_ATEN: All ATen ops (in the TorchScript namespace “aten”) are exported as ATen ops.

    • OperatorExportTypes.ONNX_ATEN_FALLBACK: Try to export each ATen op (in the TorchScript namespace “aten”) as a regular ONNX op. If we are unable to do so,fall back to exporting an ATen op.

  • opset_version (int, default 11) – by default we export the model to the opset version of the onnx submodule.

  • do_constant_folding (bool, default False) – Apply the constant-folding optimization. Constant-folding will replace some of the ops that have all constant inputs with pre-computed constant nodes.

  • example_outputs (Tensor/Tuple of Tensor, default None) – Must be provided when exporting a ScriptModule or ScriptFunction, ignored otherwise. Used to determine the type and shape of the outputs without tracing the execution of the model. A single object is treated as equivalent to a tuple of one element.

  • strip_doc_string (bool, default True) – if True, strips the field “doc_string” from the exported model, which information about the stack trace.

  • dynamic_axes (dict<str, list(int)/dict<int, str>>, default empty dict) –

    By default the exported model will have the shapes of all input and output tensors set to exactly match those given in args (and example_outputs when that arg is required). To specify axes of tensors as dynamic (i.e. known only at run-time), set dynamic_axes to a dict with schema:

    • KEY (str): an input or output name. Each name must also be provided in input_names or output_names.

    • VALUE (dict or list): If a dict, keys are axis indices and values are axis names. If a list, each element is an axis index.

  • keep_initializers_as_inputs (bool, default None) – If True, all the initializers (typically corresponding to parameters) in the exported graph will also be added as inputs to the graph. If False, then initializers are not added as inputs to the graph, and only the non-parameter inputs are added as inputs. This may allow for better optimizations (e.g. constant folding) by backends/runtimes.

  • custom_opsets (dict<str, int>, default empty dict) –

    A dict with schema:

    • KEY (str): opset domain name

    • VALUE (int): opset version

    If a custom opset is referenced by model but not mentioned in this dictionary, the opset version is set to 1.

  • enable_onnx_checker (bool, default True) – If True the onnx model checker will be run as part of the export, to ensure the exported model is a valid ONNX model.