10.6.6. profiler¶
Profilers widely used for perf in HAT.
10.6.6.1. profiler¶
This class should be used when you don't want the (small) overhead of profiling. |
|
This profiler simply records the duration of actions (in seconds) and reports the mean duration of each action and the total time spent over the entire training run. |
|
Base class for defining the process of model analysis. |
|
Compute the similarity of two models. |
|
Profile featuremap value with log or tensorboard. |
|
Checking if model has shared ops. |
|
Checking if model has unfused ops. |
|
Compare weights of float/qat/quantized models. |
|
Check deploy device(BPU or CPU) of hybrid model. |
|
Run model and save each op info. |
|
Run hbir model and save each op info. |
10.6.6.2. API Reference¶
This file is modified from pytorch-lightning.
checking if there are any bottlenecks in your code.
- class hat.profiler.profilers.PassThroughProfiler(dirpath: Optional[Union[str, pathlib.Path]] = None, filename: Optional[str] = None, auto_describe: bool = False, schedule: Optional[Callable[[int], hat.profiler.profilers.ProfilerAction]] = None, summary_interval: int = - 1)¶
This class should be used when you don’t want the (small) overhead of profiling. The Trainer uses this class by default.
- start(action_name: str) None ¶
Define how to start recording an action.
- stop(action_name: str) None ¶
Define how to record the duration once an action is complete.
- summary() str ¶
Create profiler summary in text format.
- class hat.profiler.profilers.SimpleProfiler(dirpath: Optional[Union[str, pathlib.Path]] = None, filename: Optional[str] = None, warmup_step: int = 1, use_real_duration: bool = False, auto_describe: bool = False, schedule: Optional[Callable[[int], hat.profiler.profilers.ProfilerAction]] = None, summary_interval: int = - 1)¶
This profiler simply records the duration of actions (in seconds) and reports the mean duration of each action and the total time spent over the entire training run.
- start(action_name: str) None ¶
Define how to start recording an action.
- stop(action_name: str) None ¶
Define how to record the duration once an action is complete.
- summary() str ¶
Create profiler summary in text format.
Memory Profiling.
Help profiling the GPU or CPU memory bottleneck in the process of model training.
- class hat.profiler.memory_profiler.CPUMemoryProfiler(dirpath: Optional[Union[str, pathlib.Path]] = None, filename: Optional[str] = None, auto_describe: bool = False, schedule: Optional[Callable[[int], hat.profiler.profilers.ProfilerAction]] = None, summary_interval: int = - 1)¶
- start(action_name: str) None ¶
Define how to start recording an action.
- stop(action_name: str) None ¶
Define how to record the duration once an action is complete.
- summary()¶
Create profiler summary in text format.
- class hat.profiler.memory_profiler.GPUMemoryProfiler(dirpath: Optional[Union[str, pathlib.Path]] = None, filename: Optional[str] = None, record_snapshot: bool = False, snapshot_interval: int = 1, record_functions: Optional[Set[str]] = None, auto_describe: bool = False, schedule: Optional[Callable[[int], hat.profiler.profilers.ProfilerAction]] = None, summary_interval: int = - 1)¶
- save_snapshots()¶
Dump all snapshots.
- snapshots format (dict(dict)):
{
step_id : {
action_name: […]
},
}
- start(action_name: str) None ¶
Define how to start recording an action.
- stop(action_name: str) None ¶
Define how to record the duration once an action is complete.
- summary()¶
Create profiler summary in text format.
- class hat.profiler.memory_profiler.StageCPUMemoryProfiler(profile_action_name: str, leaks: bool = True, dirpath: Union[str, pathlib.Path] = None, filename: str = None, auto_describe: bool = False)¶
- describe() None ¶
Log a profile report after the conclusion of run.
- profile(action_name: str)¶
Yield a context manager to encapsulate the scope of a profiled action.
Example:
with self.profile('load training data'): # load training data code
The profiler will start once you’ve entered the context and will automatically stop once you exit the code block.
- class hat.profiler.model_profiler.BaseModelProfiler¶
Base class for defining the process of model analysis.
- class hat.profiler.model_profiler.CheckDeployDevice(print_tabulate: bool = True, out_dir: Optional[str] = None)¶
Check deploy device(BPU or CPU) of hybrid model.
- 参数
print_tabulate (bool, optional) – Whether print the result as tabulate. Defaults to True.
out_dir – path to save the result txt ‘deploy_device.txt’. If None, will save in the current directory. Default: None
- 返回
- A dict of model deploy infos with schema
KEY (str): module name
VALUE (Tuple): (deploy device(BPU or CPU), module type)
- class hat.profiler.model_profiler.CheckFused(print_tabulate: bool = True)¶
Checking if model has unfused ops.
Check unfused modules in a model. NOTE: This function is only capable to check unfused modules. For the correctness of fusion, please use featuremap_similarity to compare the feature between fused and unfused model.
- 参数
print_tabulate (bool) – Whether print the result as tabulate. Default: True.
- 返回
The qualified name of modules that can be fused.
- 返回类型
List[List[str]]
Checking if model has shared ops.
Count called times for all leaf modules in a model.
- 参数
check_leaf_module (callable, optional) – A function to check if a module is leaf. Pass None to use pre-defined is_leaf_module. Default: None.
print_tabulate (bool, optional) – Whether print the result as tabulate. Default: True.
- 返回
The qualified name and called times of each leaf module.
- 返回类型
Dict[str, int]
- class hat.profiler.model_profiler.CompareWeights(similarity_func='Cosine', with_tensorboard: bool = False, tensorboard_dir: Optional[str] = None, out_dir: Optional[str] = None)¶
Compare weights of float/qat/quantized models.
This function compares weights of each layer based on torch.quantization._numeric_suite.compare_weights. The weight similarity and atol will be print on the screen and save in “weight_comparison.txt”. If you want to see histogram of weights, set with_tensorboard=True.
- 参数
similarity_func – similarity computation function. Support “Cosine”, “MSE”, “L1”, “KL”, “SQNR” or any user-defined Callable object. If it is a user-defined object, it should return a scalar or tensor with only one number. Otherwise the result shown may be unexpected. Default: “Cosine”
with_tensorboard – whether to use tensorboard. Default: False
tensorboard_dir – tensorboard log file path. Default: None
out_dir – path to save the result txt and picture. If None, will save in the current directory. Default: None
- 返回
- KEY (str): module name (Eg. layer1.0.conv.weight)
- VALUE (dict): a dict of the corresponding weights in two models:
”float”: weight value in float model “quantized”: weight value in qat/quantized model
A list of list. Each list is each layer weight similarity in format [module name, similarity, atol(N scale)]
- 返回类型
A weight comparison dict with schema
- class hat.profiler.model_profiler.FeaturemapSimilarity(similarity_func: Union[str, callable] = 'Cosine', threshold: Optional[numbers.Real] = None, devices: Optional[Union[torch.device, tuple]] = None, out_dir: Optional[str] = None)¶
Compute the similarity of two models.
Compute the similarity of feature maps. The input models can be float/ fused/calibration/qat/quantized model.
- 参数
similarity_func – similarity computation function. Support “Cosine”, “MSE”, “L1”, “KL”, “SQNR”, or any user-defined Callable object. If it is a user-defined object, it should return a scalar or tensor with only one number. Otherwise the result shown may be unexpected. Default: “Cosine”
threshold – if similarity value exceeds or less than this threshold, the featuremap name will be shown in red color.If threshold is none, it will be set to different values according to different similarity functions. Default: None
devices – run model on which devices (cpu, gpu). If can be - None. Run model with given inputs - torch.device. Both models and given inputs will be moved on this specified device - tuple. A tuple of 2 torch.devices. The two models will be moved on specified devices seperatedly. It may be used to compare the CPU and GPU results difference.
out_dir – path to save the result txt and picture. If None, will save in the current directory. Default: None
- 返回
A List of list. Each list is each layer similarity info in format [index, module name, module type, similarity, scale, atol, atol(N scale), single op error(N scale)]
- class hat.profiler.model_profiler.HbirModelProfiler(show_table: bool = True, show_tensorboard: bool = False, prefixes: Optional[Tuple[str, ...]] = None, types: Optional[Tuple[Type, ...]] = None, with_stack: bool = False, force_per_channel: bool = False, out_dir: Optional[str] = None)¶
Run hbir model and save each op info.
This function runs hbir model and save each op info on disk, which can be show in a table or in tensorboard.
- 参数
show_table – whether show each op info in a table, which will also be saved in statistic.txt
show_tensorboard – whether show each op histogram in tensorboard.
prefixes – only show ops with the prefix of qualified name
types – only show ops with given types
with_stack – whether show op location in code
force_per_channel – whether show data in per channel in tensorboard
out_dir – dir to save op infos and result files
- class hat.profiler.model_profiler.ModelProfilerv2(show_table: bool = True, show_tensorboard: bool = False, prefixes: Optional[Tuple[str, ...]] = None, types: Optional[Tuple[Type, ...]] = None, with_stack: bool = False, force_per_channel: bool = False, out_dir: Optional[str] = None)¶
Run model and save each op info.
This function runs model and save each op info on disk, which can be show in a table or in tensorboard.
- 参数
show_table – whether show each op info in a table, which will also be saved in statistic.txt
show_tensorboard – whether show each op histogram in tensorboard.
prefixes – only show ops with the prefix of qualified name
types – only show ops with given types
with_stack – whether show op location in code
force_per_channel – whether show data in per channel in tensorboard
out_dir – dir to save op infos and result files
- class hat.profiler.model_profiler.ProfileFeaturemap(prefixes: Tuple = (), types: Tuple = (), device: Optional[torch.device] = None, preserve_int: bool = False, use_class_name: bool = False, skip_identity: bool = False, with_tensorboard: bool = False, tensorboard_dir: Optional[str] = None, print_per_channel_scale: bool = False, show_per_channel: bool = False, out_dir: Optional[str] = None, file_name: Optional[str] = None, profile_func: Optional[callable] = None)¶
Profile featuremap value with log or tensorboard.
Print min/max/mean/var/scale of each feature profiled by get_raw_features by default. If with_tensorboard set True, the histogram of each feature will be shown in tensorboard, which is useful to see the data distribution.
If you want to get more info about features, you can define your custom profile functions to process the results of get_raw_features.
- 参数
prefixes – get features info by the prefix of qualified name Default: tuple().
types – get features info by module type. Default: tuple().
device – model run on which device. Default: None
preserve_int – if True, record each op result in int type. Default: False
use_class_name – if True, record class name not class type. Default: False
skip_identity – if True, will not record the result of Identity module. Default: False
with_tensorboard – whether to use tensorboard. Default: False
tensorboard_dir – tensorboard log file path. Default: None
print_per_channel_scale – whether to print per channel scales. Default: False
show_per_channel – show each featuremap in per channel ways in tensorboard. Default: False
out_dir – path to save the result txt and picture. If None, will save in the current directory. Default: None
file_name – result file name. If None, will save result and fig with name ‘statistic’.(statistic.txt and statistic.html). Default: None
profile_func (callable, None) – you custom featuremap profiler function. Default: None
- 返回
A List of list. Each list is each layer statistic in format [index, module name, module type, attr, min, max, mean, var, scale]