4 Advanced Cases

Embodied AI application cases

📄️ Vision-Language-Action Model (VLA)

VLA (Vision-Language-Action Model) is an end-to-end model that combines visual understanding, language interaction, and robot control. It generates robot actions directly from visual input and language instructions, and is widely used in embodied intelligence and robot manipulation. Representative work includes Google's RT-2 and Physical Intelligence's Pi0.

友情链接

古月居

联系我们

GitHub
Bilibili