📄️ Vision-Language-Action Model (VLA)
VLA (Vision-Language-Action Model) is an end-to-end model that combines visual understanding, language interaction, and robot control. It generates robot actions directly from visual input and language instructions, and is widely used in embodied intelligence and robot manipulation. Representative work includes Google's RT-2 and Physical Intelligence's Pi0.