10.5. Modelzoo

10.5.1. Classification

network

float

qat

quantization

dataset

input shape

bpu latency (ms)

FPS

MobileNetV1

74.12

73.92

73.61

ImageNet

1x3x224x224

0.983

1318.82

MobileNetV2

72.65

72.51

72.11

ImageNet

1x3x224x224

0.888

1508.56

ResNet 18

72.04

72.03

72.03

ImageNet

1x3x224x224

2.507

443.53

ResNet 50

77.37

76.99

76.94

ImageNet

1x3x224x224

4.917

214.31

VargNetV2

73.94

73.56

73.64

ImageNet

1x3x224x224

0.871

1557.39

EfficientNet-B0

74.31

74.23

74.18

ImageNet

1x3x224x224

1.183

1047.60

SwinTransformer

80.24

80.15

80.05

ImageNet

1x3x224x224

22.76

44.59

MixVarGENet

71.33

71.23

71.04

ImageNet

1x3x224x224

1.023

1282.44

VargConvert

78.98

78.92

78.89

ImageNet

1x3x224x224

3.409

316.88

EfficieNasNetm

80.24

79.99

79.94

ImageNet

1x3x300x300

3.531

305.12

EfficieNasNets

76.63

76.23

76.03

ImageNet

1x3x280x280

1.536

766.52

Torchvision(浮点模型来自社区):

network

float

qat

quantization

dataset

input shape

bpu latency (ms)

FPS

ResNet 18

69.76

69.71

69.73

ImageNet

1x3x224x224

2.479

403.38

ResNet 50

76.13

76.07

76.06

ImageNet

1x3x224x224

4.989

200.44

MobileNetV2

71.88

71.27

71.27

ImageNet

1x3x224x224

1.32

1249.79

10.5.2. Detection

RetinaNet

network

backbone

float

qat

quantization

dataset

input shape

bpu latency (ms)

FPS

Retinanet-vargnetv2

vargnetv2

31.51

31.21

31.20

MS COCO

1x3x1024x1024

99.92

10.05

YOLOv3

network

backbone

float

qat

quantization

dataset

input shape

bpu latency (ms)

FPS

YOLOv3-MobileNetv1

mobilenetv1

76.57

75.62

75.61

VOC

1x3x416x416

10.83

96.46

YOLOv3-VarGDarknet

VarGDarknet

33.90

33.60

33.36

COCO

1x3x416x416

19.03

52.54

FCOS

network

backbone

float

qat

quantization

dataset

input shape

bpu latency (ms)

FPS

FCOS-efficientnet

efficientnetb0

36.26

35.79

35.59

MS COCO

1x3x512x512

3.164

345.95

FCOS-efficientnet

efficientnetb1

41.37

41.21

40.71

MS COCO

1x3x640x640

7.909

131.95

FCOS-efficientnet

efficientnetb2

45.35

45.10

45.00

MS COCO

1x3x768x768

13.91

73.76

FCOS-efficientnet

efficientnetb3

48.03

47.65

47.58

MS COCO

1x3x896x896

25.06

38.92

DETR

network

backbone

float

qat

quantization

dataset

input shape

bpu latency (ms)

FPS

DETR-resnet50

resnet50

35.70

31.42

31.31

MS COCO

1x3x800x1333

128.4

7.81

DETR-efficientnetb3

efficientnetb3

37.21

35.95

35.99

MS COCO

1x3x800x1333

91.82

10.93

FCOS3D

network

backbone

float

qat

quantization

dataset

input shape

bpu latency (ms)

FPS

FCOS3D-efficientnetb0

efficientnetb0

30.62

30.27

30.38

nuscenes

1x3x512x896

11.26

92.79

10.5.3. Segmentation

UNet

network

backbone

float

qat

quantization

dataset

input shape

bpu latency (ms)

FPS

UNet

MobileNetV1

68.02

67.56

67.53

Cityscapes

1x3x1024x2048

5.775

181.81

Deeplab

network

backbone

float

qat

quantization

dataset

input shape

bpu latency (ms)

FPS (dual core)

Deeplab

EfficientNet-M0

76.30

76.22

76.12

Cityscapes

1x3x1024x2048

21.25

47.73

Deeplab

EfficientNet-M1

77.94

77.64

77.65

Cityscapes

1x3x1024x2048

44.36

22.69

Deeplab

EfficientNet-M2

78.82

78.65

78.63

Cityscapes

1x3x1024x2048

70.67

14.20

FastScnn

network

backbone

float

qat

quantization

dataset

input shape

bpu latency (ms)

FPS

FastScnn

EfficientNet-B0lite

69.97

69.90

69.88

Cityscapes

1x3x1024x2048

8.001

129.36

10.5.4. OpticalFlow

PwcNet

network

backbone

float

qat

quantization

dataset

input shape

bpu latency (ms)

FPS

PwcNet-lg

PwcNet

1.4117

1.4112

1.4075

FlyingChairs

1x6x384x512

41.04

24.61

10.5.5. Lidar

PointPillars

network

backbone

float

qat

quantization

dataset

input shape

bpu latency (ms)

FPS

PointPillars

SequentialBottleNeck

77.31

76.86

76.76

KITTI3D

150000x4

55.88

25.66

CenterPoint

network

backbone

float

qat

quantization

dataset

input shape

bpu latency (ms)

FPS

CenterPoint

SequentialBottleNeck

58.32

58.11

58.14

nuscenes

1x5x20x40000, 40000x4

63.26

18.99

LidarMultiTask

network

backbone

float

qat

quantization

dataset

input shape

bpu latency (ms)

FPS

LidarMultiTask

MixVarGENet

58.09

57.72

57.62

nuscenes

1x5x20x40000, 40000x4

44.85

27.77

注解

PointPillars 的指标是 Box3d Moderate 这项。

10.5.6. Lane Detection

GaNet

network

backbone

float

qat

quantization

dataset

input shape

bpu latency (ms)

FPS

GaNet

MixVarGENet

79.49

78.72

78.72

CuLane

1x3x320x800

3.741

291.64

10.5.7. Multiple Object Track

Motr

network

backbone

float

qat

quantization

dataset

input shape

bpu latency (ms)

FPS

Motr

efficientnetb3

58.02

57.62

57.76

Mot17

1x3x800x1422, 1x256x2x128, 1x1x1x256, 1x4x2x128

75.75

12.26

10.5.8. Binocular depth estimation

StereoNet

network

backbone

float

qat

quantization

dataset

input shape

bpu latency (ms)

FPS

StereoNet

StereoNeck

1.1270

1.1677

1.1685

SceneFlow

1x6x540x960

90.44

11.13

StereoNetPlus

MixVarGENet

1.1270

1.1329

1.1351

SceneFlow

2x3x544x960

18.6

55.23

10.5.9. Bev

BevIPM

network

backbone

float

qat

quantization

dataset

input shape

bpu latency (ms)

FPS

BevIPM

efficientnetb0

30.59

30.80

30.41

nuscenes det

6x3x512x960, 6x128x128x2

40.34

25.45

BevIPM

efficientnetb0

51.47

51.41

50.98

nuscenes seg

6x3x512x960, 6x128x128x2

40.34

25.45

BevLSS

efficientnetb0

30.09

30.05

30.01

nuscenes det

6x3x256x704, 10x128x128x2, 10x128x128x2

32.35

31.91

BevLSS

efficientnetb0

51.78

51.47

51.46

nuscenes seg

6x3x256x704, 10x128x128x2, 10x128x128x2

32.35

31.91

BevGKT

MixVarGENet

28.11

28.12

27.90

nuscenes det

6x3x512x960, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2

93.56

10.86

BevGKT

MixVarGENet

48.53

48.02

48.37

nuscenes seg

6x3x512x960, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2

93.56

10.86

BevIPM4D

efficientnetb0

37.24

37.19

37.31

nuscenes det

6x3x512x960, 6x128x128x2, 1x64x128x128, 1x128x128x2

43.51

23.49

BevIPM4D

efficientnetb0

52.90

53.80

53.86

nuscenes seg

6x3x512x960, 6x128x128x2, 1x64x128x128, 1x128x128x2

43.51

23.49

Detr3d

efficientnetb3

34.04

33.87

33.39

nuscenes det

6x3x512x1408, 6x2x4x256, 6x2x4x256, 6x2x4x256, 6x2x4x256, 1x24x4x256

196.9

5.078

BevCFT

efficientnetb3

32.93

32.68

32.63

nuscenes det

6x3x512x1408

179.9

5.558

10.5.10. Keypoint Detection

HeatmapKeypointModel

network

backbone

float

qat

quantization

dataset

input shape

bpu latency (ms)

FPS

HeatmapKeypointModel

efficientnetb0

94.33

94.30

94.31

carfusion

1x3x128x128

0.958

1377.6

10.5.11. Trajectory Prediction

DenseTNT

network

backbone

float

qat

quantization

dataset

input shape

bpu latency (ms)

FPS

DenseTNT

vectornet

1.2974

1.2989

1.3038

argoverse 1

30x9x19x32, 30x11x9x64, 30x1x1x96, 30x2x1x2048, 30x1x1x2048

52.77

20.42