3 Intermediate Cases

Interactive AI application cases

📄️ Multimodal Interactive Assistant

Combining ASR (Automatic Speech Recognition) with VLM (Vision-Language Model) enables "voice + vision" multimodal interaction—the system understands spoken input and combines it with the current scene for semantic understanding and interaction decisions. This is widely used in robotics, smart cockpits, smart terminals, and exhibition demos.

友情链接

古月居

联系我们

GitHub
Bilibili