ENPIRE: Agentic Robot Policy Self-Improvement in the Real World introduces a framework designed to automate the process of teaching robots complex physical skills. Traditionally, training robots for dexterous tasks requires significant human intervention, such as manually resetting the environment, evaluating performance, and adjusting algorithms. ENPIRE replaces this human-led cycle with a repeatable, autonomous feedback loop, allowing coding agents to independently refine robot policies in the real world.
The Four Modules of Autoresearch
To enable this autonomous learning, the framework organizes the research process into four core modules. The Environment module (EN) handles the physical setup, including safety constraints, automatic resets, and real-time verification of task success. The Policy Improvement module (PI) allows the agent to modify training code and algorithms based on the feedback it receives. The Rollout module (R) manages the execution of these policies on physical robots, while the Evolution module (E) enables the agent to analyze logs, consult literature, and improve its own infrastructure to address failure modes.
Two-Stage Learning Process
The framework operates in two distinct phases. First, the agent uses human feedback to construct an "environment interface." This is a one-time setup where the agent learns how to safely reset the scene and verify if a task was completed successfully. Once these tools are established, the second phase begins: fully autonomous policy improvement. In this stage, the agent works without human help, using the established tools to experiment, test hypotheses, and optimize its performance until it achieves high success rates on tasks like pin insertion or cutting zip ties.
Scaling with Robot Fleets
ENPIRE can accelerate the learning process by distributing the workload across a fleet of robots. By deploying multiple agents to test different training recipes simultaneously, the system can identify successful strategies much faster than a single robot could. To measure the efficiency of this multi-agent approach, the researchers introduced two metrics: Mean Robot Utilization (MRU), which tracks how much time the robots spend actively experimenting, and Mean Token Utilization (MTU), which measures the computational cost of the agents' decision-making process.
Current Limitations
While the framework successfully automates complex tasks, it faces challenges regarding resource efficiency. As the number of robots in a fleet increases, the agents spend more time coordinating and summarizing results from their peers, which can lead to lower robot utilization. Additionally, the token cost—the computational resources required for the coding agents to "think" and write code—tends to grow faster than the speed gains achieved by adding more robots. This creates a trade-off where larger fleets reach success faster but at a significantly higher computational cost.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!