Tiny but Trusted: Efficient Vision-Language Reasoning for Time-Series Anomaly Detection
Recent advancements in Vision-Language Models (VLMs) have transformed many fields, yet these models have historically struggled to identify abnormal patterns in sequential data. This paper addresses a critical gap in the field: while existing anomaly detection benchmarks provide data intervals, they lack the natural-language rationales necessary to train models to provide grounded, interpretable explanations for their decisions. The authors introduce a new framework to bridge this gap, enabling more reliable and efficient reasoning in time-series analysis.
Creating a Foundation for Reasoning
To improve how models interpret sequential data, the authors developed VisAnomBench. This is a curated benchmark built from public time-series datasets. To make the data more useful for training, the researchers augmented it with high-quality anomaly explanations. These explanations were selected from multiple large VLMs using fine-grained, task-specific rewards, ensuring that the model learns from high-quality, logical reasoning rather than just raw data points.
Introducing VisAnomReasoner
Using this new benchmark, the authors developed VisAnomReasoner, a parameter-efficient VLM specifically designed for time-series anomaly detection. By fine-tuning the model on the augmented data in VisAnomBench, the researchers created a system that is not only "tiny" in terms of parameter efficiency but also "trusted" because it provides interpretable, grounded decisions regarding anomalies in sequential data.
Performance and Generalization
The experimental results demonstrate that VisAnomReasoner significantly outperforms existing baselines. On the VisAnomBench dataset, the model achieved substantial improvements in anomaly localization, with precision increasing by at least 21.23 percentage points and F1 scores by 23.87 percentage points. Furthermore, the model showed strong cross-benchmark generalization when tested on the TSB-AD-U benchmark, where it improved precision by 9.57 percentage points and F1 scores by 13.39 percentage points, proving that the model's reasoning capabilities are robust across different datasets.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!