10.5. Modelzoo¶
10.5.1. Classification¶
network |
float |
qat |
quantization |
dataset |
input shape |
bpu latency (ms) |
FPS |
---|---|---|---|---|---|---|---|
MobileNetV1 |
74.12 |
73.92 |
73.61 |
ImageNet |
1x3x224x224 |
0.983 |
1318.82 |
MobileNetV2 |
72.65 |
72.51 |
72.11 |
ImageNet |
1x3x224x224 |
0.888 |
1508.56 |
ResNet 18 |
72.04 |
72.03 |
72.03 |
ImageNet |
1x3x224x224 |
2.507 |
443.53 |
ResNet 50 |
77.37 |
76.99 |
76.94 |
ImageNet |
1x3x224x224 |
4.917 |
214.31 |
VargNetV2 |
73.94 |
73.56 |
73.64 |
ImageNet |
1x3x224x224 |
0.871 |
1557.39 |
EfficientNet-B0 |
74.31 |
74.23 |
74.18 |
ImageNet |
1x3x224x224 |
1.183 |
1047.60 |
SwinTransformer |
80.24 |
80.15 |
80.05 |
ImageNet |
1x3x224x224 |
22.76 |
44.59 |
MixVarGENet |
71.33 |
71.23 |
71.04 |
ImageNet |
1x3x224x224 |
1.023 |
1282.44 |
VargConvert |
78.98 |
78.92 |
78.89 |
ImageNet |
1x3x224x224 |
3.409 |
316.88 |
EfficieNasNetm |
80.24 |
79.99 |
79.94 |
ImageNet |
1x3x300x300 |
3.531 |
305.12 |
EfficieNasNets |
76.63 |
76.23 |
76.03 |
ImageNet |
1x3x280x280 |
1.536 |
766.52 |
Torchvision(浮点模型来自社区):
network |
float |
qat |
quantization |
dataset |
input shape |
bpu latency (ms) |
FPS |
---|---|---|---|---|---|---|---|
ResNet 18 |
69.76 |
69.71 |
69.73 |
ImageNet |
1x3x224x224 |
2.479 |
403.38 |
ResNet 50 |
76.13 |
76.07 |
76.06 |
ImageNet |
1x3x224x224 |
4.989 |
200.44 |
MobileNetV2 |
71.88 |
71.27 |
71.27 |
ImageNet |
1x3x224x224 |
1.32 |
1249.79 |
10.5.2. Detection¶
RetinaNet
network |
backbone |
float |
qat |
quantization |
dataset |
input shape |
bpu latency (ms) |
FPS |
---|---|---|---|---|---|---|---|---|
Retinanet-vargnetv2 |
vargnetv2 |
31.51 |
31.21 |
31.20 |
MS COCO |
1x3x1024x1024 |
99.92 |
10.05 |
YOLOv3
network |
backbone |
float |
qat |
quantization |
dataset |
input shape |
bpu latency (ms) |
FPS |
---|---|---|---|---|---|---|---|---|
YOLOv3-MobileNetv1 |
mobilenetv1 |
76.57 |
75.62 |
75.61 |
VOC |
1x3x416x416 |
10.83 |
96.46 |
YOLOv3-VarGDarknet |
VarGDarknet |
33.90 |
33.60 |
33.36 |
COCO |
1x3x416x416 |
19.03 |
52.54 |
FCOS
network |
backbone |
float |
qat |
quantization |
dataset |
input shape |
bpu latency (ms) |
FPS |
---|---|---|---|---|---|---|---|---|
FCOS-efficientnet |
efficientnetb0 |
36.26 |
35.79 |
35.59 |
MS COCO |
1x3x512x512 |
3.164 |
345.95 |
FCOS-efficientnet |
efficientnetb1 |
41.37 |
41.21 |
40.71 |
MS COCO |
1x3x640x640 |
7.909 |
131.95 |
FCOS-efficientnet |
efficientnetb2 |
45.35 |
45.10 |
45.00 |
MS COCO |
1x3x768x768 |
13.91 |
73.76 |
FCOS-efficientnet |
efficientnetb3 |
48.03 |
47.65 |
47.58 |
MS COCO |
1x3x896x896 |
25.06 |
38.92 |
DETR
network |
backbone |
float |
qat |
quantization |
dataset |
input shape |
bpu latency (ms) |
FPS |
---|---|---|---|---|---|---|---|---|
DETR-resnet50 |
resnet50 |
35.70 |
31.42 |
31.31 |
MS COCO |
1x3x800x1333 |
128.4 |
7.81 |
DETR-efficientnetb3 |
efficientnetb3 |
37.21 |
35.95 |
35.99 |
MS COCO |
1x3x800x1333 |
91.82 |
10.93 |
FCOS3D
network |
backbone |
float |
qat |
quantization |
dataset |
input shape |
bpu latency (ms) |
FPS |
---|---|---|---|---|---|---|---|---|
FCOS3D-efficientnetb0 |
efficientnetb0 |
30.62 |
30.27 |
30.38 |
nuscenes |
1x3x512x896 |
11.26 |
92.79 |
10.5.3. Segmentation¶
UNet
network |
backbone |
float |
qat |
quantization |
dataset |
input shape |
bpu latency (ms) |
FPS |
---|---|---|---|---|---|---|---|---|
UNet |
MobileNetV1 |
68.02 |
67.56 |
67.53 |
Cityscapes |
1x3x1024x2048 |
5.775 |
181.81 |
Deeplab
network |
backbone |
float |
qat |
quantization |
dataset |
input shape |
bpu latency (ms) |
FPS (dual core) |
---|---|---|---|---|---|---|---|---|
Deeplab |
EfficientNet-M0 |
76.30 |
76.22 |
76.12 |
Cityscapes |
1x3x1024x2048 |
21.25 |
47.73 |
Deeplab |
EfficientNet-M1 |
77.94 |
77.64 |
77.65 |
Cityscapes |
1x3x1024x2048 |
44.36 |
22.69 |
Deeplab |
EfficientNet-M2 |
78.82 |
78.65 |
78.63 |
Cityscapes |
1x3x1024x2048 |
70.67 |
14.20 |
FastScnn
network |
backbone |
float |
qat |
quantization |
dataset |
input shape |
bpu latency (ms) |
FPS |
---|---|---|---|---|---|---|---|---|
FastScnn |
EfficientNet-B0lite |
69.97 |
69.90 |
69.88 |
Cityscapes |
1x3x1024x2048 |
8.001 |
129.36 |
10.5.4. OpticalFlow¶
PwcNet
network |
backbone |
float |
qat |
quantization |
dataset |
input shape |
bpu latency (ms) |
FPS |
---|---|---|---|---|---|---|---|---|
PwcNet-lg |
PwcNet |
1.4117 |
1.4112 |
1.4075 |
FlyingChairs |
1x6x384x512 |
41.04 |
24.61 |
10.5.5. Lidar¶
PointPillars
network |
backbone |
float |
qat |
quantization |
dataset |
input shape |
bpu latency (ms) |
FPS |
---|---|---|---|---|---|---|---|---|
PointPillars |
SequentialBottleNeck |
77.31 |
76.86 |
76.76 |
KITTI3D |
150000x4 |
55.88 |
25.66 |
CenterPoint
network |
backbone |
float |
qat |
quantization |
dataset |
input shape |
bpu latency (ms) |
FPS |
---|---|---|---|---|---|---|---|---|
CenterPoint |
SequentialBottleNeck |
58.32 |
58.11 |
58.14 |
nuscenes |
1x5x20x40000, 40000x4 |
63.26 |
18.99 |
LidarMultiTask
network |
backbone |
float |
qat |
quantization |
dataset |
input shape |
bpu latency (ms) |
FPS |
---|---|---|---|---|---|---|---|---|
LidarMultiTask |
MixVarGENet |
58.09 |
57.72 |
57.62 |
nuscenes |
1x5x20x40000, 40000x4 |
44.85 |
27.77 |
注解
PointPillars 的指标是 Box3d Moderate
这项。
10.5.6. Lane Detection¶
GaNet
network |
backbone |
float |
qat |
quantization |
dataset |
input shape |
bpu latency (ms) |
FPS |
---|---|---|---|---|---|---|---|---|
GaNet |
MixVarGENet |
79.49 |
78.72 |
78.72 |
CuLane |
1x3x320x800 |
3.741 |
291.64 |
10.5.7. Multiple Object Track¶
Motr
network |
backbone |
float |
qat |
quantization |
dataset |
input shape |
bpu latency (ms) |
FPS |
---|---|---|---|---|---|---|---|---|
Motr |
efficientnetb3 |
58.02 |
57.62 |
57.76 |
Mot17 |
1x3x800x1422, 1x256x2x128, 1x1x1x256, 1x4x2x128 |
75.75 |
12.26 |
10.5.8. Binocular depth estimation¶
StereoNet
network |
backbone |
float |
qat |
quantization |
dataset |
input shape |
bpu latency (ms) |
FPS |
---|---|---|---|---|---|---|---|---|
StereoNet |
StereoNeck |
1.1270 |
1.1677 |
1.1685 |
SceneFlow |
1x6x540x960 |
90.44 |
11.13 |
StereoNetPlus |
MixVarGENet |
1.1270 |
1.1329 |
1.1351 |
SceneFlow |
2x3x544x960 |
18.6 |
55.23 |
10.5.9. Bev¶
BevIPM
network |
backbone |
float |
qat |
quantization |
dataset |
input shape |
bpu latency (ms) |
FPS |
---|---|---|---|---|---|---|---|---|
BevIPM |
efficientnetb0 |
30.59 |
30.80 |
30.41 |
nuscenes det |
6x3x512x960, 6x128x128x2 |
40.34 |
25.45 |
BevIPM |
efficientnetb0 |
51.47 |
51.41 |
50.98 |
nuscenes seg |
6x3x512x960, 6x128x128x2 |
40.34 |
25.45 |
BevLSS |
efficientnetb0 |
30.09 |
30.05 |
30.01 |
nuscenes det |
6x3x256x704, 10x128x128x2, 10x128x128x2 |
32.35 |
31.91 |
BevLSS |
efficientnetb0 |
51.78 |
51.47 |
51.46 |
nuscenes seg |
6x3x256x704, 10x128x128x2, 10x128x128x2 |
32.35 |
31.91 |
BevGKT |
MixVarGENet |
28.11 |
28.12 |
27.90 |
nuscenes det |
6x3x512x960, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2 |
93.56 |
10.86 |
BevGKT |
MixVarGENet |
48.53 |
48.02 |
48.37 |
nuscenes seg |
6x3x512x960, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2 |
93.56 |
10.86 |
BevIPM4D |
efficientnetb0 |
37.24 |
37.19 |
37.31 |
nuscenes det |
6x3x512x960, 6x128x128x2, 1x64x128x128, 1x128x128x2 |
43.51 |
23.49 |
BevIPM4D |
efficientnetb0 |
52.90 |
53.80 |
53.86 |
nuscenes seg |
6x3x512x960, 6x128x128x2, 1x64x128x128, 1x128x128x2 |
43.51 |
23.49 |
Detr3d |
efficientnetb3 |
34.04 |
33.87 |
33.39 |
nuscenes det |
6x3x512x1408, 6x2x4x256, 6x2x4x256, 6x2x4x256, 6x2x4x256, 1x24x4x256 |
196.9 |
5.078 |
BevCFT |
efficientnetb3 |
32.93 |
32.68 |
32.63 |
nuscenes det |
6x3x512x1408 |
179.9 |
5.558 |
10.5.10. Keypoint Detection¶
HeatmapKeypointModel
network |
backbone |
float |
qat |
quantization |
dataset |
input shape |
bpu latency (ms) |
FPS |
---|---|---|---|---|---|---|---|---|
HeatmapKeypointModel |
efficientnetb0 |
94.33 |
94.30 |
94.31 |
carfusion |
1x3x128x128 |
0.958 |
1377.6 |
10.5.11. Trajectory Prediction¶
DenseTNT
network |
backbone |
float |
qat |
quantization |
dataset |
input shape |
bpu latency (ms) |
FPS |
---|---|---|---|---|---|---|---|---|
DenseTNT |
vectornet |
1.2974 |
1.2989 |
1.3038 |
argoverse 1 |
30x9x19x32, 30x11x9x64, 30x1x1x96, 30x2x1x2048, 30x1x1x2048 |
52.77 |
20.42 |