SlotVLA: Towards Modeling of Object-Relation Representations in Robotic Manipulation
A slot-attention-based VLA framework for compact object-relation representations in robotic manipulation.
Publications
Papers and preprints spanning multimodal AI, embodied AI, robot learning, and vision-language-action models.
A slot-attention-based VLA framework for compact object-relation representations in robotic manipulation.
A non-Markovian VLA framework that combines persistent semantic-graph state with executable code-as-planner reasoning.
A memory-aware robotic manipulation benchmark and slot-centric VLA model for non-Markovian manipulation tasks.
A VLA framework that uses object-centric and geometry-grounded views for clutter-resistant robot manipulation.
A multi-scale Transformer approach for solar PV profiling and obstruction localization for degradation mitigation.
A multi-resolution Transformer for semantic segmentation of aerial imagery.
Real-time open-vocabulary 3D mapping and queryable scene representation using RGB-D observations.
A multi-scale Transformer model for solar PV profiling from aerial imagery.