Projects
Selected work in multimodal AI, embodied AI, robot learning, and vision-language-action models.
A slot-attention-based VLA framework for compact object-relation representations in robotic manipulation.
A non-Markovian manipulation benchmark and slot-centric VLA framework for memory-aware robot policies.
A VLA framework that separates perceptual grounding from action reasoning for clutter-resistant robot manipulation.
Real-time open-vocabulary 3D mapping and queryable scene representation from RGB-D observations.
A multi-resolution Transformer architecture for semantic segmentation in aerial imagery.
A multi-scale Transformer model for solar photovoltaic profiling from aerial imagery.