AI Research

MemOVCD: Training-Free Open-Vocabulary Change Detec... | AI Research

Key Takeaways

MemOVCD is a training-free framework designed to identify semantic changes in remote sensing images using natural language queries.
Open-vocabulary change detection aims to identify semantic changes in bi-temporal remote sensing images without predefined categories.
Recent methods combine foundation models such as SAM, DINO and CLIP, but typically process each timestamp independently or interact only at the final comparison stage.
Such paradigms suffer from insufficient temporal coupling during semantic reasoning, which limits their ability to distinguish genuine semantic changes from non-semantic appearance discrepancies.
In addition, patch-dominant inference on high-resolution images often weakens global semantic continuity and produces fragmented change regions.

Paper AbstractExpand

Open-vocabulary change detection aims to identify semantic changes in bi-temporal remote sensing images without predefined categories. Recent methods combine foundation models such as SAM, DINO and CLIP, but typically process each timestamp independently or interact only at the final comparison stage. Such paradigms suffer from insufficient temporal coupling during semantic reasoning, which limits their ability to distinguish genuine semantic changes from non-semantic appearance discrepancies. In addition, patch-dominant inference on high-resolution images often weakens global semantic continuity and produces fragmented change regions. To address these issues, we propose MemOVCD, a training-free open-vocabulary change detection framework based on cross-temporal memory reasoning and global-local adaptive rectification. Specifically, we reformulate bi-temporal change detection as a two-frame tracking problem and introduce weighted bidirectional propagation to aggregate semantic evidence from both temporal directions. To stabilize memory propagation across large temporal gaps, we construct histogram-aligned transition frames to smooth abrupt appearance changes. Moreover, a global-local adaptive rectification strategy adaptively fuses local and global-view predictions, improving spatial consistency while preserving fine-grained details. Experiments on five benchmarks demonstrate that MemOVCD achieves favorable performance on two change detection tasks, validating its effectiveness and generalization under diverse open-vocabulary settings.

MemOVCD is a training-free framework designed to identify semantic changes in remote sensing images using natural language queries. Unlike traditional methods that process images from different times independently and compare them only at the very end, MemOVCD treats change detection as a continuous tracking problem. By leveraging cross-temporal memory and adaptive spatial refinement, the system can better distinguish between actual land-surface changes and simple appearance variations caused by lighting or seasonal shifts.

Bridging Temporal Gaps

A major challenge in comparing images taken at different times is that abrupt changes in appearance can confuse a model. MemOVCD addresses this by reformulating change detection as a two-frame tracking task. It uses a "histogram-aligned transition-frame" strategy, which creates a smooth sequence of images between the two original timestamps. This allows the model to propagate semantic information more reliably across large temporal gaps, reducing the "ghosting" artifacts that often occur when trying to compare two very different images directly.

Visual Exemplar Prompting

To improve how the model understands what to look for, MemOVCD extracts "visual exemplars"—a summary of stable, consistent features found in the images. By identifying regions that remain unchanged and possess high confidence, the model creates a visual prior that is combined with the user's text query. This injected visual context helps the model focus on the specific categories requested by the user, leading to more accurate and robust segmentation across different environments.

Global-Local Adaptive Rectification

High-resolution remote sensing images are typically processed in small patches, which can lead to fragmented results and a loss of global context. MemOVCD introduces a global-local adaptive rectification strategy to solve this. It performs a "connected-component-aware" fusion, where the model compares local patch-based predictions with a global view of the entire image. If a region appears fragmented or poorly defined in the local patches, the system automatically shifts its reliance toward the global prediction to ensure spatial consistency while still preserving fine-grained details.

Performance and Generalization

MemOVCD is a training-free framework, meaning it does not require task-specific optimization on new datasets. Experiments across five benchmarks—covering both building-specific and multi-class land-cover change detection—demonstrate that this approach consistently outperforms previous training-free methods. By integrating cross-temporal reasoning with multi-scale refinement, the framework provides a more reliable and spatially consistent way to analyze how landscapes evolve over time.

Comments (0)

No comments yet

Be the first to share your thoughts!