Rethinking Infrastructure Inspection as Image Difference Classification: A Traffic Sign Case Study
Maintaining road infrastructure, such as traffic signs, is a critical safety task that currently relies on manual, time-consuming visual inspections. While Digital Twins (DTs) offer a way to modernize this process by tracking the condition of assets over time, progress is often stalled by a lack of high-quality, annotated data. This research proposes a new approach called Image Difference Classification (IDC). Instead of training a model to identify defects from a single image, the researchers reformulate the task to compare a new inspection image against a historical reference image of the same asset. By leveraging existing data from road management systems, this method aims to reduce the reliance on large, manually labeled datasets.
How the Approach Works
The researchers curated a new, high-quality dataset containing 970 pairs of traffic sign images—one showing the asset in its original, undamaged state and another showing its current condition. The team tested two types of model architectures: encoder-based pipelines, which use specialized vision models to compare image features, and instruction-based pipelines, which use Vision-Language Models (VLMs) to "reason" about the differences between the two images based on specific prompts. To simulate real-world constraints, the models were fine-tuned using only a small number of examples per defect category.
Key Findings
The study found that instruction-based models significantly outperformed encoder-based models in both detecting whether a defect was present and classifying the specific type of damage. While encoder-based models struggled to consistently benefit from the reference images, the instruction-based approach showed clear performance gains when provided with the historical context. Notably, the models required only a small amount of "calibration"—fine-tuning with as little as one example per class—to effectively learn how to use the reference images to improve their accuracy.
Performance and Limitations
The research demonstrates that IDC is a viable strategy for updating Digital Twin assets, particularly in low-resource environments where data is scarce. However, the study also highlights some important considerations. First, the models require a "calibration" phase; without fine-tuning, the models were unable to effectively utilize the reference images and, in some cases, performed worse than they would have with a single image. Second, the researchers noted that few-shot learning—training on very limited data—can lead to inconsistent results, where performance fluctuates depending on the specific data split used. Future work will aim to address these fluctuations by testing the method on larger datasets and exploring the use of multiple reference images to provide even more context for the models.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!