Abstract

CodeGraphVLP studies non-Markovian long-horizon robot manipulation by separating scene-state representation from action planning. The method maintains a persistent semantic graph state over task-relevant entities and relations, then uses executable code-as-planner reasoning over that graph.

The project targets settings where task-relevant evidence can be occluded or appear only earlier in the trajectory, and where clutter makes fine-grained visual grounding brittle.