MemOVCD is a training-free framework designed to identify semantic changes in remote sensing images using natural language queries. Unlike traditional methods that process images from different times independently and compare them only at the very end, MemOVCD treats change detection as a continuous tracking problem. By leveraging cross-temporal memory and adaptive spatial refinement, the system can better distinguish between actual land-surface changes and simple appearance variations caused by lighting or seasonal shifts.
Bridging Temporal Gaps
A major challenge in comparing images taken at different times is that abrupt changes in appearance can confuse a model. MemOVCD addresses this by reformulating change detection as a two-frame tracking task. It uses a "histogram-aligned transition-frame" strategy, which creates a smooth sequence of images between the two original timestamps. This allows the model to propagate semantic information more reliably across large temporal gaps, reducing the "ghosting" artifacts that often occur when trying to compare two very different images directly.
Visual Exemplar Prompting
To improve how the model understands what to look for, MemOVCD extracts "visual exemplars"—a summary of stable, consistent features found in the images. By identifying regions that remain unchanged and possess high confidence, the model creates a visual prior that is combined with the user's text query. This injected visual context helps the model focus on the specific categories requested by the user, leading to more accurate and robust segmentation across different environments.
Global-Local Adaptive Rectification
High-resolution remote sensing images are typically processed in small patches, which can lead to fragmented results and a loss of global context. MemOVCD introduces a global-local adaptive rectification strategy to solve this. It performs a "connected-component-aware" fusion, where the model compares local patch-based predictions with a global view of the entire image. If a region appears fragmented or poorly defined in the local patches, the system automatically shifts its reliance toward the global prediction to ensure spatial consistency while still preserving fine-grained details.
Performance and Generalization
MemOVCD is a training-free framework, meaning it does not require task-specific optimization on new datasets. Experiments across five benchmarks—covering both building-specific and multi-class land-cover change detection—demonstrate that this approach consistently outperforms previous training-free methods. By integrating cross-temporal reasoning with multi-scale refinement, the framework provides a more reliable and spatially consistent way to analyze how landscapes evolve over time.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!