4.2.4.1. 量化 API¶
This file is in the process of migration to torch/ao/quantization, and is kept here for compatibility while the migration process is ongoing. If you are adding a new entry/functionality, please, add it to the torch/ao/quantization/fuse_modules.py, while adding an import statement here.
- torch.quantization.fuse_modules.fuse_modules(model, modules_to_fuse, inplace=False, fuser_func=<function fuse_known_modules>, fuse_custom_config_dict=None)¶
Fuses a list of modules into a single module
Fuses only the following sequence of modules: conv, bn conv, bn, relu conv, relu linear, relu bn, relu All other sequences are left unchanged. For these sequences, replaces the first item in the list with the fused module, replacing the rest of the modules with identity.
- 参数
model – Model containing the modules to be fused
modules_to_fuse – list of list of module names to fuse. Can also be a list of strings if there is only a single list of modules to fuse.
inplace – bool specifying if fusion happens in place on the model, by default a new model is returned
fuser_func – Function that takes in a list of modules and outputs a list of fused modules of the same length. For example, fuser_func([convModule, BNModule]) returns the list [ConvBNModule, nn.Identity()] Defaults to torch.quantization.fuse_known_modules
fuse_custom_config_dict – custom configuration for fusion
# Example of fuse_custom_config_dict fuse_custom_config_dict = { # Additional fuser_method mapping "additional_fuser_method_mapping": { (torch.nn.Conv2d, torch.nn.BatchNorm2d): fuse_conv_bn }, }
- 返回
model with fused modules. A new copy is created if inplace=True.
Examples:
>>> m = myModel() >>> # m is a module containing the sub-modules below >>> modules_to_fuse = [ ['conv1', 'bn1', 'relu1'], ['submodule.conv', 'submodule.relu']] >>> fused_m = torch.ao.quantization.fuse_modules(m, modules_to_fuse) >>> output = fused_m(input) >>> m = myModel() >>> # Alternately provide a single list of modules to fuse >>> modules_to_fuse = ['conv1', 'bn1', 'relu1'] >>> fused_m = torch.ao.quantization.fuse_modules(m, modules_to_fuse) >>> output = fused_m(input)
prepare and convert
- horizon_plugin_pytorch.quantization.quantize.convert(module, mapping=None, inplace=False, remove_qconfig=True)¶
Converts submodules in input module to a different module according to mapping by calling from_float method on the target module class. And remove qconfig at the end if remove_qconfig is set to True.
- 参数
module – input module
mapping – a dictionary that maps from source module type to target module type, can be overwritten to allow swapping user defined Modules
inplace – carry out model transformations in-place, the original module is mutated
- horizon_plugin_pytorch.quantization.quantize.prepare_calibration(model, inplace=False)¶
Prepare the model for calibration.
- 参数
model – Float model with fused ops
inplace – carry out model transformations in-place or not. Defaults to False.
- horizon_plugin_pytorch.quantization.quantize.prepare_qat(model: torch.nn.modules.module.Module, mapping: Optional[Dict[torch.nn.modules.module.Module, torch.nn.modules.module.Module]] = None, inplace: bool = False, optimize_graph: bool = False, hybrid: bool = False)¶
Prepares a copy of the model for quantization-aware training and converts it to quantized version.
Quantization configuration should be assigned preemptively to individual submodules in .qconfig attribute.
- 参数
model – input model to be modified in-place
mapping – dictionary that maps float modules to quantized modules to be replaced.
inplace – carry out model transformations in-place, the original module is mutated
optimize_graph – whether to do some process on origin model for special purpose. Currently only support using torch.fx to fix cat input scale(only used on Bernoulli)
hybrid – whether to generate a hybrid model that some intermediate operation is computed in float. There are some constraints for this functionality now: 1. The hybrid model cannot pass check_model and cannot be compiled. 2. Some quantized operation cannot directly accept input from float operation, user need to manually insert QuantStub.
- horizon_plugin_pytorch.quantization.quantize_fx.convert_fx(graph_module: torch.fx.graph_module.GraphModule, convert_custom_config_dict: Optional[Dict[str, Any]] = None, _remove_qconfig: bool = True) horizon_plugin_pytorch.quantization.fx.graph_module.QuantizedGraphModule ¶
Convert a calibrated or trained model to a quantized model.
- 参数
graph_module – A prepared and calibrated/trained model (GraphModule)
convert_custom_config_dict –
dictionary for custom configurations for convert function:
convert_custom_config_dict = { # We automativally preserve all attributes, this option is # just in case and not likely to be used. "preserved_attributes": ["preserved_attr"], }
_remove_qconfig – Option to remove the qconfig attributes in the model after convert. for internal use only.
- 返回
A quantized model (GraphModule)
Example:
# prepared_model: the model after prepare_fx/prepare_qat_fx and # calibration/training quantized_model = convert_fx(prepared_model)
- horizon_plugin_pytorch.quantization.quantize_fx.fuse_fx(model: torch.nn.modules.module.Module, fuse_custom_config_dict: Optional[Dict[str, Any]] = None) horizon_plugin_pytorch.quantization.fx.graph_module.GraphModuleWithAttr ¶
Fuse modules like conv+add+bn+relu etc. Fusion rules are defined in horizon_plugin_pytorch.quantization.fx.fusion_pattern.py
- 参数
model – a torch.nn.Module model
fuse_custom_config_dict –
Dictionary for custom configurations for fuse_fx, e.g.
fuse_custom_config_dict = { # We automativally preserve all attributes, this option is # just in case and not likely to be used. "preserved_attributes": ["preserved_attr"], }
Example:
from torch.quantization import fuse_fx m = fuse_fx(m)
- horizon_plugin_pytorch.quantization.quantize_fx.prepare_calibration_fx(model, qconfig_dict: Optional[Dict[str, Any]] = None, prepare_custom_config_dict: Optional[Dict[str, Any]] = None, optimize_graph: bool = False, hybrid: bool = False, hybrid_dict: Optional[Dict[str, List]] = None) horizon_plugin_pytorch.quantization.fx.graph_module.ObservedGraphModule ¶
Prepare the model for calibration.
Args: Same as prepare_qat_fx
- horizon_plugin_pytorch.quantization.quantize_fx.prepare_qat_fx(model: torch.nn.modules.module.Module, qconfig_dict: Optional[Dict[str, Any]] = None, prepare_custom_config_dict: Optional[Dict[str, Any]] = None, optimize_graph: bool = False, hybrid: bool = False, hybrid_dict: Optional[Dict[str, List]] = None) horizon_plugin_pytorch.quantization.fx.graph_module.ObservedGraphModule ¶
Prepare a model for quantization aware training.
- 参数
model – torch.nn.Module model or GraphModule model (maybe from fuse_fx)
qconfig_dict –
qconfig_dict is a dictionary with the following configurations:
qconfig_dict = { # optional, global config "": qconfig, # optional, used for module types "module_type": [ (torch.nn.Conv2d, qconfig), ..., ], # optional, used for module names "module_name": [ ("foo.bar", qconfig) ..., ], # priority (in increasing order): # global, module_type, module_name, module.qconfig # qconfig == None means quantization should be # skipped for anything matching the rule. # The qconfig of function or method is the same as the # qconfig of its parent module, if it needs to be set # separately, please wrap this function as a module. }
prepare_custom_config_dict –
customization configuration dictionary for quantization tool:
prepare_custom_config_dict = { # We automativally preserve all attributes, this option is # just in case and not likely to be used. "preserved_attributes": ["preserved_attr"], }
optimize_graph – whether to do some process on origin model for special purpose. Currently only support using torch.fx to fix cat input scale(only used on Bernoulli)
hybrid – Whether prepare model in hybrid mode. Default value is False and model runs on BPU completely. It should be True if the model is quantized by model convert or contains some CPU ops. In hybrid mode, ops which aren’t supported by BPU and ops which are specified by the user will run on CPU. How to set qconfig: Qconfig in hybrid mode is the same as qconfig in non-hybrid mode. For BPU op, we should ensure the input of this op is quantized, the activation qconfig of its previous non-quantstub op should not be None even if its previous non-quantstub op is a CPU op. How to specify CPU op: Define CPU module_name or module_type in hybrid_dict.
hybrid_dict –
hybrid_dict is a dictionary to define user-specified CPU op:
hybrid_dict = { # optional, used for module types "module_type": [torch.nn.Conv2d, ...], # optional, used for module names "module_name": ["foo.bar", ...], } # priority (in increasing order): module_type, module_name # To set a function or method as CPU op, wrap it as a module.
- 返回
A GraphModule with fake quant modules (configured by qconfig_dict), ready for quantization aware training
Example:
import torch from horizon_plugin_pytorch.quantization import get_default_qat_qconfig from horizon_plugin_pytorch.quantization import prepare_qat_fx qconfig = get_default_qat_qconfig() def train_loop(model, train_data): model.train() for image, target in data_loader: ... qconfig_dict = {"": qconfig} prepared_model = prepare_qat_fx(float_model, qconfig_dict) # Run QAT training train_loop(prepared_model, train_loop)
Extended tracer and wrap of torch.fx This file defines a inherit tracer of torch.fx.Tracer and a extended wrap to allow wrapping of user-defined Module or method, which help users do some optimization of their own module by torch.fx
- horizon_plugin_pytorch.utils.fx_helper.wrap(obj)¶
This function can be: 1) called or used as a decorator on a string to:
register a builtin function as a “leaf function”
- called or used as a decorator on a function to:
register this function as a “leaf function”
- called or used as a decorator on subclass of torch.nn.Module to:
register this module as a “leaf module”, and register all user defined method in this class as “leaf method”
- called or used as a decorator on a class method to
register it as “leaf method”
same as torch.quantization.FakeQuantize.
- class horizon_plugin_pytorch.quantization.fake_quantize.FakeQuantize(observer=<class 'horizon_plugin_pytorch.quantization.observer.MovingAverageMinMaxObserver'>, quant_min=None, quant_max=None, saturate=None, in_place=False, channel_len=1, **observer_kwargs)¶
Simulate the quantize and dequantize operations in training time. The output of this module is given by
x_out = (clamp(round(x/scale + zero_point), quant_min, quant_max)-zero_point)*scale # noqa
scale
defines the scale factor used for quantization.zero_point
specifies the quantized value to which 0 in floating point maps toquant_min
specifies the minimum allowable quantized value.quant_max
specifies the maximum allowable quantized value.fake_quant_enabled
controls the application of fake quantization on tensors, note that statistics can still be updated.observer_enabled
controls statistics collection on tensorsdtype
specifies the quantized dtype that is being emulated with fake-quantization, the allowable values is qint8 and qint16. The values of quant_min and quant_max should be chosen to be consistent with the dtype
- 参数
observer (module) – Module for observing statistics on input tensors and calculating scale and zero-point.
quant_min (int) – The minimum allowable quantized value.
quant_max (int) – The maximum allowable quantized value.
channel_len (int) – Size of data at channel dim.
observer_kwargs (optional) – Arguments for the observer module
- observer¶
User provided module that collects statistics on the input tensor and provides a method to calculate scale and zero-point.
- Type
Module
- extra_repr()¶
Set the extra representation of the module
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(X)¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
注解
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- set_qparams(scale: Union[torch.Tensor, Sequence, float], zero_point: Optional[Union[torch.Tensor, Sequence, int]] = None)¶
default symmetric
- classmethod with_args(**kwargs)¶
Wrapper that allows creation of class factories.
This can be useful when there is a need to create classes with the same constructor arguments, but different instances. Can be used in conjunction with _callable_args
Example:
>>> Foo.with_args = classmethod(_with_args) >>> foo_builder = Foo.with_args(a=3, b=4).with_args(answer=42) >>> foo_instance1 = foo_builder() >>> foo_instance2 = foo_builder() >>> id(foo_instance1) == id(foo_instance2) False
- class horizon_plugin_pytorch.quantization.observer.MovingAverageMinMaxObserver(averaging_constant=0.01, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, quant_min=None, quant_max=None, is_sync_quantize=False, factory_kwargs=None)¶
Observer module for computing the quantization parameters based on the moving average of the min and max values.
This observer computes the quantization parameters based on the moving averages of minimums and maximums of the incoming tensors. The module records the average minimum and maximum of incoming tensors, and uses this statistic to compute the quantization parameters.
- 参数
averaging_constant – Averaging constant for min/max.
dtype – Quantized data type
qscheme – Quantization scheme to be used, only support per_tensor_symmetric scheme
reduce_range – Reduces the range of the quantized data type by 1 bit
quant_min – Minimum quantization value.
quant_max – Maximum quantization value.
is_sync_quantize – Whether use sync quantize
factory_kwargs – Arguments for register data buffer
- forward(x_orig)¶
Records the running minimum and maximum of
x
.
- class horizon_plugin_pytorch.quantization.observer.MovingAveragePerChannelMinMaxObserver(averaging_constant=0.01, ch_axis=0, dtype=torch.qint8, qscheme=torch.per_channel_symmetric, quant_min=None, quant_max=None, is_sync_quantize=False, factory_kwargs=None)¶
Observer module for computing the quantization parameters based on the running per channel min and max values.
This observer uses the tensor min/max statistics to compute the per channel quantization parameters. The module records the running minimum and maximum of incoming tensors, and uses this statistic to compute the quantization parameters.
- 参数
averaging_constant – Averaging constant for min/max.
ch_axis – Channel axis
dtype – Quantized data type
qscheme – Quantization scheme to be used, Only support per_channel_symmetric
quant_min – Minimum quantization value.
quant_max – Maximum quantization value.
is_sync_quantize – whether use sync quantize
factory_kwargs – Arguments for register data buffer
- forward(x_orig)¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
注解
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
fuse modules
- horizon_plugin_pytorch.quantization.fuse_modules.fuse_known_modules(mod_list, additional_fuser_method_mapping=None)¶
Returns a list of modules that fuses the operations specified in the input module list.
Fuses only the following sequence of modules: conv, bn; conv, bn, relu; conv, relu; conv, bn, add; conv, bn, add, relu; conv, add; conv, add, relu; linear, bn; linear, bn, relu; linear, relu; linear, bn, add; linear, bn, add, relu; linear, add; linear, add, relu. For these sequences, the first element in the output module list performs the fused operation. The rest of the elements are set to nn.Identity()
- class horizon_plugin_pytorch.march.March¶
BPU platform
BAYES: Bayes platformBERNOULLI2: Bernoulli2 platformMETA: Meta platform
- horizon_plugin_pytorch.quantization.qconfig.get_default_calib_qconfig(dtype='qint8', calib_qkwargs=None, backend='')¶
Get default calibration qconfig.
- 参数
dtype (str) – quantization type, the allowable value is qint8 and qint16
calib_qkwargs (dict) – A dict that contains args of CalibFakeQuantize and args of calibration observer.
backend (str) – backend implementation
- horizon_plugin_pytorch.quantization.qconfig.get_default_qat_out_qconfig(dtype='qint8', weight_fake_quant='fake_quant', weight_qkwargs=None, backend='')¶
Get default qat out qconfig.
- 参数
dtype (str) – quantization type, the allowable value is qint8 and qint16
weight_fake_quant (str) – FakeQuantize type of weight, default is fake_quant.Avaliable items is fake_quant, lsq and pact
weight_qkwargs (dict) – A dict contain weight Observer type, args of weight FakeQuantize and args of weight Observer.
backend (str) – backend implementation
- horizon_plugin_pytorch.quantization.qconfig.get_default_qat_qconfig(dtype='qint8', weight_dtype='qint8', activation_fake_quant='fake_quant', weight_fake_quant='fake_quant', activation_qkwargs=None, weight_qkwargs=None, backend='')¶
Get default qat qconfig.
- 参数
dtype (str) – Activation quantization type, the allowable values is qint8 and qint16
weight_dtype (str) – Weight quantization type, the allowable values is qint8 and qint16
activation_fake_quant (str) – FakeQuantize type of activation, default is fake_quant. Avaliable items is fake_quant, lsq, pact
weight_fake_quant (str) – FakeQuantize type of weight, default is fake_quant.Avaliable items is fake_quant, lsq and pact
activation_qkwargs (dict) – A dict contain activation Observer type, args of activation FakeQuantize and args of activation Observer.
weight_qkwargs (dict) – A dict contain weight Observer type, args of weight FakeQuantize and args of weight Observer.
backend (str) – backend implementation
- horizon_plugin_pytorch.utils.onnx_helper.export_to_onnx(model, args, f, export_params=True, verbose=False, training=<TrainingMode.EVAL: 0>, input_names=None, output_names=None, operator_export_type=<OperatorExportTypes.ONNX_FALLTHROUGH: 3>, opset_version=11, do_constant_folding=True, example_outputs=None, strip_doc_string=True, dynamic_axes=None, keep_initializers_as_inputs=None, custom_opsets=None, enable_onnx_checker=False)¶
Export a (float or qat)model into ONNX format.
- 参数
model (torch.nn.Module/torch.jit.ScriptModule/ScriptFunction) – the model to be exported.
args (tuple or torch.Tensor) –
args can be structured either as:
ONLY A TUPLE OF ARGUMENTS:
args = (x, y, z)
The tuple should contain model inputs such that
model(*args)
is a valid invocation of the model. Any non-Tensor arguments will be hard-coded into the exported model; any Tensor arguments will become inputs of the exported model, in the order they occur in the tuple.A TENSOR:
args = torch.Tensor([1])
This is equivalent to a 1-ary tuple of that Tensor.
3. A TUPLE OF ARGUMENTS ENDING WITH A DICTIONARY OF NAMED ARGUMENTS:
args = (x, {'y': input_y, 'z': input_z})
All but the last element of the tuple will be passed as non-keyword arguments, and named arguments will be set from the last element. If a named argument is not present in the dictionary , it is assigned the default value, or None if a default value is not provided.
f – a file-like object or a string containing a file name. A binary protocol buffer will be written to this file.
export_params (bool, default True) – if True, all parameters will be exported.
verbose (bool, default False) – if True, prints a description of the model being exported to stdout, doc_string will be added to graph. doc_string may contaion mapping of module scope to node name in future torch onnx.
training (enum, default TrainingMode.EVAL) –
if model.training is False and in training mode if model.training is True.
TrainingMode.EVAL
: export the model in inference mode.TrainingMode.PRESERVE
: export the model in inference modeTrainingMode.TRAINING
: export the model in training mode. Disables optimizations which might interfere with training.
input_names (list of str, default empty list) – names to assign to the input nodes of the graph, in order.
output_names (list of str, default empty list) – names to assign to the output nodes of the graph, in order.
operator_export_type (enum, default ONNX_FALLTHROUGH) –
OperatorExportTypes.ONNX
: Export all ops as regular ONNX ops (in the default opset domain).OperatorExportTypes.ONNX_FALLTHROUGH
: Try to convert all ops to standard ONNX ops in the default opset domain.OperatorExportTypes.ONNX_ATEN
: All ATen ops (in the TorchScript namespace “aten”) are exported as ATen ops.OperatorExportTypes.ONNX_ATEN_FALLBACK
: Try to export each ATen op (in the TorchScript namespace “aten”) as a regular ONNX op. If we are unable to do so,fall back to exporting an ATen op.
opset_version (int, default 11) – by default we export the model to the opset version of the onnx submodule.
do_constant_folding (bool, default False) – Apply the constant-folding optimization. Constant-folding will replace some of the ops that have all constant inputs with pre-computed constant nodes.
example_outputs (Tensor/Tuple of Tensor, default None) – Must be provided when exporting a ScriptModule or ScriptFunction, ignored otherwise. Used to determine the type and shape of the outputs without tracing the execution of the model. A single object is treated as equivalent to a tuple of one element.
strip_doc_string (bool, default True) – if True, strips the field “doc_string” from the exported model, which information about the stack trace.
dynamic_axes (dict<str, list(int)/dict<int, str>>, default empty dict) –
By default the exported model will have the shapes of all input and output tensors set to exactly match those given in
args
(andexample_outputs
when that arg is required). To specify axes of tensors as dynamic (i.e. known only at run-time), setdynamic_axes
to a dict with schema:KEY (str): an input or output name. Each name must also be provided in
input_names
oroutput_names
.VALUE (dict or list): If a dict, keys are axis indices and values are axis names. If a list, each element is an axis index.
keep_initializers_as_inputs (bool, default None) – If True, all the initializers (typically corresponding to parameters) in the exported graph will also be added as inputs to the graph. If False, then initializers are not added as inputs to the graph, and only the non-parameter inputs are added as inputs. This may allow for better optimizations (e.g. constant folding) by backends/runtimes.
custom_opsets (dict<str, int>, default empty dict) –
A dict with schema:
KEY (str): opset domain name
VALUE (int): opset version
If a custom opset is referenced by
model
but not mentioned in this dictionary, the opset version is set to 1.enable_onnx_checker (bool, default True) – If True the onnx model checker will be run as part of the export, to ensure the exported model is a valid ONNX model.