Unsupervised Skill Discovery for Agentic Data Analysis
Data-analytic agents—AI systems designed to process data, write code, and generate reports—often struggle to adapt to the wide variety of data formats and analytical goals they encounter. While "skill augmentation" can help by injecting reusable procedural knowledge into these agents, creating these skills usually requires expensive human annotation or ground-truth labels. This paper introduces DataCOPE, a framework that allows agents to discover and refine their own analytical skills through unsupervised exploration, eliminating the need for human-provided supervision.
How DataCOPE Works
DataCOPE operates as a closed-loop system that iteratively improves an agent’s performance. It consists of three main components: a Data-Analytic Agent that generates exploratory trajectories, an Unsupervised Verifier that evaluates these trajectories without needing gold-standard answers, and a Skill Manager that distills the most effective strategies into reusable "skills." By contrasting successful trajectories against less effective ones, the system learns to prioritize robust analytical procedures and avoid common errors.
Verifying Without Answers
Because data analysis tasks vary significantly, DataCOPE uses two different verification strategies depending on the task type:
For report-style analysis: The framework uses an "Adaptive Checklist Verifier." It generates task-specific criteria to score reports based on their coverage and quality. If the agent begins to overfit to these checklists, the system triggers a refinement stage where the checklist itself is updated to better distinguish between high-quality and low-quality reports.
For reasoning-style analysis: The framework uses an "Answer Agreement Verifier." It groups different attempts at a problem based on whether they reach the same conclusion, using self-consistency as a signal to identify the most reliable reasoning paths.
Performance Gains
The researchers evaluated DataCOPE on two major benchmarks: report-style analysis (Deep Data Research) and reasoning-style analysis (DABStep). The results show that DataCOPE consistently outperforms baseline models that lack these discovered skills. Across four different model architectures, the framework improved mean performance by 9.71% on report-style tasks and 32.30% on reasoning-style tasks. These gains demonstrate that the agent can successfully transfer discovered skills to new, unseen tasks without requiring any ground-truth labels during the discovery process.
Key Takeaways
The primary contribution of DataCOPE is its ability to perform "skill distillation" entirely through self-guided exploration. By moving away from the reliance on costly human-labeled data, the framework provides a scalable way to improve AI agents across diverse domains. The iterative nature of the system—where both the agent's performance and the verifier's criteria are refined—ensures that the discovered skills remain relevant and effective even as the agent encounters new, complex data-analysis challenges.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!