A Robust and Reliable Vision-Language-Action Interface

PI: Per-Erik Forssén. Linköping University co-PI: Volker Krüger, Lund University

We will improve the safety, reliability, and robustness of Vision–Language–Action (VLA) models for robot control by making them situation-, self-, and ambiguity-aware. Building on recent VLA/LRM advances and on our ELLIIT C08 results in out-of-distribution detection and uncertainty quantification, we will develop methods that fuse sensory input with reasoning, recognize when the robot operates outside its training regime, and detect/resolve language ambiguity. Our approach is novel in extending uncertainty handling from perception to multimodal planning and decision-making, and feasible because it leverages established open architectures and our prior, validated methods. The outcome is a trustworthy VLA interface that improves human–robot collaboration and accelerates safe adoption in industrial and societal applications. Project number: F4