Back to AI Research

AI Research

Unsupervised Skill Discovery for Agentic Data Analysis | AI Research

Key Takeaways

  • Unsupervised Skill Discovery for Agentic Data Analysis Data-analytic agents—AI systems designed to process data, write code, and generate reports—often strug...
  • Inference-time skill augmentation provides a lightweight way to improve data-analytic agents by injecting reusable procedural knowledge without updating model parameters.
  • However, discovering effective skills for data analysis remains challenging, as reliable supervision is expensive and success criteria vary across analytical formats.
  • This raises the key question of how to discover reusable data-analysis skills from unlabeled exploration alone.
  • We propose DataCOPE, an unsupervised verifier-guided skill discovery framework for data-analytic agents.
Paper AbstractExpand

Inference-time skill augmentation provides a lightweight way to improve data-analytic agents by injecting reusable procedural knowledge without updating model parameters. However, discovering effective skills for data analysis remains challenging, as reliable supervision is expensive and success criteria vary across analytical formats. This raises the key question of how to discover reusable data-analysis skills from unlabeled exploration alone. We propose DataCOPE, an unsupervised verifier-guided skill discovery framework for data-analytic agents. DataCOPE derives verifier signals from the exploration trajectories and uses them to characterize relative quality or aggreement among trajectories. It iteratively coordinates a Data-Analytic Agent for trajectory generation, an Unsupervised Verifier for signal extraction, and a Skill Manager for contrastive skill distillation. For report-style analysis, we instantiate the verifier as an Adaptive Checklist Verifier that derives task-specific criteria, scores reports by verifiable coverage, and iteratively refines the checklist. For reasoning-style analysis, we instantiate it as an Answer Agreement Verifier that groups trajectories by answer agreement and uses self-consistency as an auxiliary signal. We evaluate DataCOPE on report-style analysis from Deep Data Research and reasoning-style analysis from DABStep. Across both settings, DataCOPE consistently improves held-out performance over baselines. Averaged across four model settings, DataCOPE improves the mean score by 9.71% and 32.30% on report-style and reasoning-style tasks respectively.

Unsupervised Skill Discovery for Agentic Data Analysis
Data-analytic agents—AI systems designed to process data, write code, and generate reports—often struggle to adapt to the wide variety of data formats and analytical goals they encounter. While "skill augmentation" can help by injecting reusable procedural knowledge into these agents, creating these skills usually requires expensive human annotation or ground-truth labels. This paper introduces DataCOPE, a framework that allows agents to discover and refine their own analytical skills through unsupervised exploration, eliminating the need for human-provided supervision.

How DataCOPE Works

DataCOPE operates as a closed-loop system that iteratively improves an agent’s performance. It consists of three main components: a Data-Analytic Agent that generates exploratory trajectories, an Unsupervised Verifier that evaluates these trajectories without needing gold-standard answers, and a Skill Manager that distills the most effective strategies into reusable "skills." By contrasting successful trajectories against less effective ones, the system learns to prioritize robust analytical procedures and avoid common errors.

Verifying Without Answers

Because data analysis tasks vary significantly, DataCOPE uses two different verification strategies depending on the task type:

  • For report-style analysis: The framework uses an "Adaptive Checklist Verifier." It generates task-specific criteria to score reports based on their coverage and quality. If the agent begins to overfit to these checklists, the system triggers a refinement stage where the checklist itself is updated to better distinguish between high-quality and low-quality reports.

  • For reasoning-style analysis: The framework uses an "Answer Agreement Verifier." It groups different attempts at a problem based on whether they reach the same conclusion, using self-consistency as a signal to identify the most reliable reasoning paths.

Performance Gains

The researchers evaluated DataCOPE on two major benchmarks: report-style analysis (Deep Data Research) and reasoning-style analysis (DABStep). The results show that DataCOPE consistently outperforms baseline models that lack these discovered skills. Across four different model architectures, the framework improved mean performance by 9.71% on report-style tasks and 32.30% on reasoning-style tasks. These gains demonstrate that the agent can successfully transfer discovered skills to new, unseen tasks without requiring any ground-truth labels during the discovery process.

Key Takeaways

The primary contribution of DataCOPE is its ability to perform "skill distillation" entirely through self-guided exploration. By moving away from the reliance on costly human-labeled data, the framework provides a scalable way to improve AI agents across diverse domains. The iterative nature of the system—where both the agent's performance and the verifier's criteria are refined—ensures that the discovered skills remain relevant and effective even as the agent encounters new, complex data-analysis challenges.

Comments (0)

No comments yet

Be the first to share your thoughts!