AutoRPA: Efficient GUI Automation through LLM-Drive...

AutoRPA: Efficient GUI Automation through LLM-Driven Code Synthesis from Interactions
AutoRPA is a framework designed to bridge the gap between two common methods of automating computer tasks: the flexible but expensive "ReAct" LLM agent approach and the efficient but rigid traditional Robotic Process Automation (RPA). While LLM agents are great at solving new tasks, they are costly to run repeatedly. Traditional RPA is fast and efficient but requires significant manual effort to build and maintain. AutoRPA solves this by automatically "distilling" the decision-making logic of LLM agents into robust, reusable, and low-cost RPA code.

The Translator-Builder Pipeline

The core of AutoRPA is a two-stage process that converts human-like reasoning into machine-executable code. First, a "translator agent" takes the step-by-step actions performed by a ReAct-style LLM agent and converts them into "soft-coded" procedures. Unlike hard-coded actions that rely on fixed screen coordinates, these soft-coded actions use semantic attributes—such as element types or text content—to locate buttons and fields, making the automation much more resilient to changes in the interface layout.
Next, a "builder agent" synthesizes these procedures into a final RPA function. To ensure the code is accurate, the builder uses a retrieval-augmented generation (RAG) mechanism. It accesses a tree-structured database of past interaction trajectories, allowing it to look up specific details about how the interface behaved in previous attempts. This prevents the agent from making incorrect assumptions about the screen state and results in cleaner, more efficient code.

Hybrid Repair Strategy

AutoRPA includes a verification phase to ensure the generated code is reliable. If an RPA function fails during execution, the system does not simply give up or rely on a basic retry. Instead, it uses a "hybrid repair strategy." An analyzer agent examines the point of failure and determines why the code stopped working. A ReAct agent then takes over to complete the task from that exact point. This successful "fix" is recorded and fed back to the builder agent, which uses the new information to refine and improve the RPA code for future use.

Efficiency and Performance

By transforming complex, multi-step LLM reasoning into streamlined RPA functions, AutoRPA significantly reduces the computational overhead of GUI automation. Experiments across multiple GUI environments show that the RPA functions generated by this framework successfully handle tasks similar to those used during training. Most notably, this approach reduces token usage by 82% to 96% compared to standard LLM-based agents, while maintaining or exceeding their success rates. This makes AutoRPA a highly efficient solution for repetitive tasks that would otherwise be too costly or difficult to automate.

AutoRPA: Efficient GUI Automation through LLM-Drive... | AI Research

Key Takeaways

The Translator-Builder Pipeline

Hybrid Repair Strategy

Efficiency and Performance

Comments (0)

No comments yet