LagMemo

Abstract

Navigating to a designated goal using visual information is a fundamental capability for intelligent robots. To address the practical demands of multi-modal, open-vocabulary goal queries and multi-goal visual navigation, we propose LagMemo, a navigation system that leverages a language 3D Gaussian Splatting memory. During a one-time exploration, LagMemo constructs a unified 3D language memory with robust spatial-semantic correlations. With incoming task goals, the system efficiently queries the memory, predicts candidate goal locations, and integrates a local perception-based verification mechanism to dynamically match and validate goals. For fair and rigorous evaluation, we curate GOAT-Core, a high-quality core split distilled from GOAT-Bench. Experimental results show that LagMemo's memory module enables effective multi-modal open-vocabulary localization, and significantly outperforms state-of-the-art methods in multi-goal visual navigation.

Video

Overview

Framework

Language 3DGS Memory Reconstruction and Memory-Guided Visual Navigation Pipeline

Experiments

Goal Localization

Instance Query Illustration: Different query texts are input, and corresponding responses are retrieved from the 3DGS. The rendered results from corresponding viewpoints show that the queries match the expected outcomes.

Visual Navigation

Step-by-step Visualization of Memory-Guided Navigation to an Image Goal: Columns show key steps (28, 67, 141, 165), rows show the front view, the top-down map, and the 3D localization results (red). In this case, the agent reaches waypoint-1/2/3 (yellow star; current waypoint in red). After checking the first two, it arrives at the third where the goal verification module identifies the goal. Then the agent proceeds to the final goal (green star) and the subtask successfully terminates at step 165.

Real-world Deployment

Real-world Deployment of LagMemo. In a physical indoor environment, we reconstruct a 3DGS memory, and successfully locate and navigate to sequential multi-modal open-vocabulary goals, such as a “Mickey Mouse” doll.

LagMemo: Language 3D Gaussian Splatting Memory for Multi-modal Open-vocabulary Multi-goal Visual Navigation