Back to AI Research

AI Research

SkillOS: Learning Skill Curation for Self-Evolving... | AI Research

Key Takeaways

  • SkillOS is a research framework designed to help AI agents evolve by learning how to curate their own "skills." While many current AI agents solve tasks in i...
  • LLM-based agents are increasingly deployed to handle streaming tasks, yet they often remain one-off problem solvers that fail to learn from past interactions.
  • Reusable skills distilled from experience provide a natural substrate for self-evolution, where high-quality skill curation serves as the key bottleneck.
  • Existing approaches either rely on manual skill curation, prescribe heuristic skill operations, or train for short-horizon skill operations.
  • However, they still struggle to learn complex long-term curation policies from indirect and delayed feedback.
Paper AbstractExpand

LLM-based agents are increasingly deployed to handle streaming tasks, yet they often remain one-off problem solvers that fail to learn from past interactions. Reusable skills distilled from experience provide a natural substrate for self-evolution, where high-quality skill curation serves as the key bottleneck. Existing approaches either rely on manual skill curation, prescribe heuristic skill operations, or train for short-horizon skill operations. However, they still struggle to learn complex long-term curation policies from indirect and delayed feedback. To tackle this challenge, we propose SkillOS, an experience-driven RL training recipe for learning skill curation in self-evolving agents. SkillOS pairs a frozen agent executor that retrieves and applies skills with a trainable skill curator that updates an external SkillRepo from accumulated experience. To provide learning signals for curation, we design composite rewards and train on grouped task streams based on skill-relevant task dependencies, where earlier trajectories update the SkillRepo, and later related tasks evaluate these updates. Across multi-turn agentic tasks and single-turn reasoning tasks, SkillOS consistently outperforms memory-free and strong memory-based baselines in both effectiveness and efficiency, with the learned skill curator generalizing across different executor backbones and task domains. Further analyses show that the learned curator produces more targeted skill use, while the skills in SkillRepo evolve into more richly structured Markdown files that encode higher-level meta-skills over time.

SkillOS is a research framework designed to help AI agents evolve by learning how to curate their own "skills." While many current AI agents solve tasks in isolation, they often struggle to learn from their past experiences. SkillOS addresses this by creating a system where an agent can store, update, and delete reusable skills—represented as Markdown files—allowing it to become more efficient and capable over time as it encounters new, related tasks.

A Modular Approach to Self-Evolution

The SkillOS framework splits the agent’s responsibilities into two distinct parts: a "frozen" executor and a trainable curator. The executor is responsible for solving tasks using a collection of skills stored in an external repository (SkillRepo). Once a task is finished, the curator reviews the agent's performance and decides how to manage the repository. It can insert new skills, update existing ones, or delete those that are no longer useful. By keeping the executor fixed and training only the curator, the system ensures that the agent’s ability to manage its own knowledge base improves independently of its core problem-solving logic.

Learning Through Experience

To teach the curator how to manage skills effectively, the researchers use a reinforcement learning approach that mimics real-world streaming tasks. Instead of training on random, disconnected problems, the system organizes tasks into groups based on their underlying dependencies. For example, if an agent learns a specific mathematical technique in an early task, the system evaluates the curator based on whether that skill helps the agent solve a more complex, related problem later on. This "grouped" training provides the curator with clear feedback, helping it understand the long-term value of the skills it chooses to keep or refine.

Performance and Generalization

Experiments show that SkillOS significantly improves both the effectiveness and efficiency of AI agents compared to traditional memory-based methods. In testing, the system achieved up to a 9.8% performance boost and required 6% fewer interaction steps to complete tasks. A key finding is that the learned curator is highly adaptable; it can work with different types of agent backbones and across various domains. Furthermore, as the agent continues to operate, the SkillRepo evolves, moving from simple instructions to more sophisticated, high-level "meta-skills" that allow the agent to handle increasingly complex challenges.

Key Takeaways

SkillOS demonstrates that agents can move beyond "one-off" problem solving by treating their knowledge as a living library. By using Markdown as a standard format for skills, the system makes the agent's internal knowledge more readable and structured. The research highlights that the bottleneck for self-evolving agents is not just the ability to store information, but the ability to curate it—deciding what is worth remembering and how to refine those lessons for future success.

Comments (0)

No comments yet

Be the first to share your thoughts!