AI Research

Tiny but Trusted: Efficient Vision-Language Reasoni... | AI Research

Key Takeaways

Tiny but Trusted: Efficient Vision-Language Reasoning for Time-Series Anomaly Detection Recent advancements in Vision-Language Models (VLMs) have transformed...
Public anomaly detection benchmarks typically provide interval annotations but not natural-language rationales, making it difficult to fine-tune VLMs to produce grounded, interpretable decisions.
Through fine-tuning on this benchmark, we develop VisAnomReasoner, a parameter-efficient VLM for time-series anomaly detection.
Additional experiments on the TSB-AD-U benchmark demonstrate strong cross-benchmark generalization, with VisAnomReasoner improving precision and F1 by 9.57 and 13.39 percentage points, respectively.
# Tiny but Trusted: Efficient Vision-Language Reasoning for Time-Series Anomaly Detection

Paper AbstractExpand

Recent advances in Vision-Language Models (VLMs) have achieved impressive performance across many tasks, yet prior studies report unsatisfactory performance when applying large language or multimodal models to finding abnormal patterns in sequential data. Public anomaly detection benchmarks typically provide interval annotations but not natural-language rationales, making it difficult to fine-tune VLMs to produce grounded, interpretable decisions. To address this gap, we construct VisAnomBench, a curated benchmark built from public time-series datasets and augmented with high-quality anomaly explanations selected from multiple large VLMs using fine-grained, task-specific rewards. Through fine-tuning on this benchmark, we develop VisAnomReasoner, a parameter-efficient VLM for time-series anomaly detection. Experimental results on VisAnomBench show that VisAnomReasoner achieves more accurate anomaly localization and consistently outperforms all baselines, with improvements of at least 21.23 and 23.87 percentage points in precision and F1, respectively. Additional experiments on the TSB-AD-U benchmark demonstrate strong cross-benchmark generalization, with VisAnomReasoner improving precision and F1 by 9.57 and 13.39 percentage points, respectively.

Tiny but Trusted: Efficient Vision-Language Reasoning for Time-Series Anomaly Detection

Recent advancements in Vision-Language Models (VLMs) have transformed many fields, yet these models have historically struggled to identify abnormal patterns in sequential data. This paper addresses a critical gap in the field: while existing anomaly detection benchmarks provide data intervals, they lack the natural-language rationales necessary to train models to provide grounded, interpretable explanations for their decisions. The authors introduce a new framework to bridge this gap, enabling more reliable and efficient reasoning in time-series analysis.

Creating a Foundation for Reasoning

To improve how models interpret sequential data, the authors developed VisAnomBench. This is a curated benchmark built from public time-series datasets. To make the data more useful for training, the researchers augmented it with high-quality anomaly explanations. These explanations were selected from multiple large VLMs using fine-grained, task-specific rewards, ensuring that the model learns from high-quality, logical reasoning rather than just raw data points.

Introducing VisAnomReasoner

Using this new benchmark, the authors developed VisAnomReasoner, a parameter-efficient VLM specifically designed for time-series anomaly detection. By fine-tuning the model on the augmented data in VisAnomBench, the researchers created a system that is not only "tiny" in terms of parameter efficiency but also "trusted" because it provides interpretable, grounded decisions regarding anomalies in sequential data.

Performance and Generalization

The experimental results demonstrate that VisAnomReasoner significantly outperforms existing baselines. On the VisAnomBench dataset, the model achieved substantial improvements in anomaly localization, with precision increasing by at least 21.23 percentage points and F1 scores by 23.87 percentage points. Furthermore, the model showed strong cross-benchmark generalization when tested on the TSB-AD-U benchmark, where it improved precision by 9.57 percentage points and F1 scores by 13.39 percentage points, proving that the model's reasoning capabilities are robust across different datasets.

Comments (0)

No comments yet

Be the first to share your thoughts!