Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation

Abstract

Precise 3D environmental mapping is pivotal in robotics. Existing methods often rely on predefined concepts during training or are time-intensive when generating semantic maps. Open-Fusion presents a real-time open-vocabulary 3D mapping and queryable scene representation system using RGB-D data.

The method combines a pretrained vision-language foundation model for open-set semantic comprehension with TSDF-based 3D scene reconstruction. It integrates region-based embeddings, confidence maps, and 3D geometry through an enhanced Hungarian-based feature-matching mechanism.

Abstract

Links