Publication
Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation
Real-time open-vocabulary 3D mapping and queryable scene representation using RGB-D observations.
Abstract
Precise 3D environmental mapping is pivotal in robotics. Existing methods often rely on predefined concepts during training or are time-intensive when generating semantic maps. Open-Fusion presents a real-time open-vocabulary 3D mapping and queryable scene representation system using RGB-D data.
The method combines a pretrained vision-language foundation model for open-set semantic comprehension with TSDF-based 3D scene reconstruction. It integrates region-based embeddings, confidence maps, and 3D geometry through an enhanced Hungarian-based feature-matching mechanism.