AI Research

ENPIRE: Agentic Robot Policy Self-Improvement in th... | AI Research

Key Takeaways

ENPIRE: Agentic Robot Policy Self-Improvement in the Real World introduces a framework designed to automate the process of teaching robots complex physical s...
Achieving dexterous robotic manipulation in the real world heavily relies on human supervision and algorithm engineering, which becomes a central bottleneck in the pursuit of general physical intelligence.
Although emerging coding agents can generate code to automate algorithm search, their successes remain largely confined in digital environments.
This closed-loop system transforms real-world manipulation learning into a controllable optimization procedure, minimizing human effort while allowing fair ablations across training recipe and agent variants.
Our results suggest a practical and scalable path toward deploying coding agents to autonomously advancing robotics in the physical world.

Paper AbstractExpand

Achieving dexterous robotic manipulation in the real world heavily relies on human supervision and algorithm engineering, which becomes a central bottleneck in the pursuit of general physical intelligence. Although emerging coding agents can generate code to automate algorithm search, their successes remain largely confined in digital environments. We conjecture that the missing abstraction to automate robotics research is a repeatable feedback loop for real-world policy improvement: reset the scene, execute a policy, verify the outcome, and refine the next iteration. To bridge this gap, we introduce ENPIRE, a harness framework for coding agents that instantiates this physical feedback routine with four core modules: an Environment module (EN) for automatic reset and verification, a Policy Improvement module (PI) that launches policy refinement, a Rollout module (R) to evaluate policies with one or multiple physical robots operating in parallel, and an Evolution module (E) in which coding agents analyze logs, consult literature, improve training infrastructure and algorithm code to address failure modes. This closed-loop system transforms real-world manipulation learning into a controllable optimization procedure, minimizing human effort while allowing fair ablations across training recipe and agent variants. Powered by ENPIRE, frontier coding agents can autonomously train a policy to achieve a 99% success rate on challenging, dexterous manipulation tasks, such as organizing a pin box, fastening a zip tie, and tool use, a process that further accelerates when we dispatch an agent team on a robot fleet. Our results suggest a practical and scalable path toward deploying coding agents to autonomously advancing robotics in the physical world.

ENPIRE: Agentic Robot Policy Self-Improvement in the Real World introduces a framework designed to automate the process of teaching robots complex physical skills. Traditionally, training robots for dexterous tasks requires significant human intervention, such as manually resetting the environment, evaluating performance, and adjusting algorithms. ENPIRE replaces this human-led cycle with a repeatable, autonomous feedback loop, allowing coding agents to independently refine robot policies in the real world.

The Four Modules of Autoresearch

To enable this autonomous learning, the framework organizes the research process into four core modules. The Environment module (EN) handles the physical setup, including safety constraints, automatic resets, and real-time verification of task success. The Policy Improvement module (PI) allows the agent to modify training code and algorithms based on the feedback it receives. The Rollout module (R) manages the execution of these policies on physical robots, while the Evolution module (E) enables the agent to analyze logs, consult literature, and improve its own infrastructure to address failure modes.

Two-Stage Learning Process

The framework operates in two distinct phases. First, the agent uses human feedback to construct an "environment interface." This is a one-time setup where the agent learns how to safely reset the scene and verify if a task was completed successfully. Once these tools are established, the second phase begins: fully autonomous policy improvement. In this stage, the agent works without human help, using the established tools to experiment, test hypotheses, and optimize its performance until it achieves high success rates on tasks like pin insertion or cutting zip ties.

Scaling with Robot Fleets

ENPIRE can accelerate the learning process by distributing the workload across a fleet of robots. By deploying multiple agents to test different training recipes simultaneously, the system can identify successful strategies much faster than a single robot could. To measure the efficiency of this multi-agent approach, the researchers introduced two metrics: Mean Robot Utilization (MRU), which tracks how much time the robots spend actively experimenting, and Mean Token Utilization (MTU), which measures the computational cost of the agents' decision-making process.

Current Limitations

While the framework successfully automates complex tasks, it faces challenges regarding resource efficiency. As the number of robots in a fleet increases, the agents spend more time coordinating and summarizing results from their peers, which can lead to lower robot utilization. Additionally, the token cost—the computational resources required for the coding agents to "think" and write code—tends to grow faster than the speed gains achieved by adding more robots. This creates a trade-off where larger fleets reach success faster but at a significantly higher computational cost.

Comments (0)

No comments yet

Be the first to share your thoughts!