Back to AI Research

AI Research

GeoX: Mastering Geospatial Reasoning Through Self-P... | AI Research

Key Takeaways

  • GeoX: Mastering Geospatial Reasoning Through Self-Play and Verifiable Rewards Geospatial reasoning—the ability to understand the complex spatial relationship...
  • Geospatial reasoning requires solving image-grounded problems over the complex spatial structure of a scene.
  • However, developing this capability is hindered by the cost of annotating a vast and combinatorial question space.
  • A verifier executes each program to covert a reward signal that jointly optimizes the two roles via reinforcement learning.
  • GeoX consistently improves its base VLMs by up to 5.5 points on average, matching or exceeding conventional baselines trained on millions of curated data.
Paper AbstractExpand

Geospatial reasoning requires solving image-grounded problems over the complex spatial structure of a scene. However, developing this capability is hindered by the cost of annotating a vast and combinatorial question space. We propose GeoX, a self-play framework that acquires spatial logic through executable programs that yield verifiable rewards, without relying on large-scale human-curated data Given a satellite or aerial image, our framework employs a single multimodal policy that proposes spatial problems as executable programs and solves them under three reasoning modes-abduction, deduction, and induction-over spatial primitives and an image understanding tool. A verifier executes each program to covert a reward signal that jointly optimizes the two roles via reinforcement learning. GeoX consistently improves its base VLMs by up to 5.5 points on average, matching or exceeding conventional baselines trained on millions of curated data. Along-side the proposed method, we release a benchmark for geospatial understanding accumulated through self-play.

GeoX: Mastering Geospatial Reasoning Through Self-Play and Verifiable Rewards
Geospatial reasoning—the ability to understand the complex spatial relationships between objects in satellite and aerial imagery—is essential for tasks like urban planning and disaster response. However, training AI models for this purpose has historically been difficult because it requires massive amounts of human-labeled data, which is expensive and time-consuming to create. GeoX addresses this by introducing a self-play framework that allows a model to learn spatial logic autonomously, without needing any human-curated training data.

Learning Through Self-Play

Instead of relying on human-provided question-answer pairs, GeoX uses a "proposer" and a "solver" that work together in a cycle. The proposer creates spatial problems in the form of executable programs, and the solver attempts to find the correct answers. Because these problems are based on actual geometric and topological operations (like calculating distance, area, or adjacency), the computer can verify the answers automatically. This creates a feedback loop where the model learns by posing its own challenges and checking its own work.

Three Modes of Reasoning

To ensure the model develops a deep understanding of a scene, GeoX recasts every problem into three distinct reasoning modes:

  • Abduction: Inferring the cause from observed evidence.

  • Deduction: Predicting the outcome from a known cause.

  • Induction: Synthesizing the underlying procedure that connects a cause to an effect.
    By looking at the same scene through these three different lenses, the model gains a more robust and structural understanding of the physical world captured in the imagery.

Performance and Results

GeoX significantly improves the performance of base vision-language models, boosting their accuracy by up to 5.5 points on average. In many cases, this self-taught approach matches or exceeds the performance of models trained on millions of human-curated examples. The gains are particularly strong in tasks that require counting objects and understanding spatial relationships, which are areas where traditional human-labeled datasets often fall short.

A New Benchmark for Geospatial AI

Alongside the framework, the researchers have released a new benchmark for geospatial understanding that was built entirely through this self-play process. By grounding the learning process in executable programs, GeoX moves away from simple pattern recognition and toward a more structural representation of the Earth's surface. This approach provides a scalable path for future AI development, as it allows models to continue improving their spatial reasoning capabilities without the bottleneck of manual data annotation.

Comments (0)

No comments yet

Be the first to share your thoughts!