Publication
SlotVLA: Towards Modeling of Object-Relation Representations in Robotic Manipulation
A slot-attention-based VLA framework for compact object-relation representations in robotic manipulation.
Abstract
SlotVLA introduces object-relation representations for robotic manipulation. The work uses a slot-based visual tokenizer and relation-centric decoding to produce compact object-centric representations for multitask manipulation policies.
The project pairs the model with LIBERO+, a benchmark designed to evaluate fine-grained object-relation reasoning in manipulation tasks with object-centric annotations and temporal tracking.