Back to AI Research

AI Research

AutoRPA: Efficient GUI Automation through LLM-Drive... | AI Research

Key Takeaways

  • AutoRPA: Efficient GUI Automation through LLM-Driven Code Synthesis from Interactions AutoRPA is a framework designed to bridge the gap between two common me...
  • Large Language Model (LLM) based agents have demonstrated proficiency in multi-step interactions with graphical user interfaces (GUIs).
  • While most research focuses on improving single-task performance, practical scenarios often involve repetitive GUI tasks for which invoking LLM reasoning repeatedly, i.e., the ReAct paradigm, is inefficient.
  • Prior to LLMs, traditional Robotic Process Automation (RPA) offers runtime efficiency but demands significant manual effort to develop and maintain.
  • To bridge this gap, we propose AutoRPA, a framework that automatically distills the decision logic of ReAct-style agents into robust RPA functions.
Paper AbstractExpand

Large Language Model (LLM) based agents have demonstrated proficiency in multi-step interactions with graphical user interfaces (GUIs). While most research focuses on improving single-task performance, practical scenarios often involve repetitive GUI tasks for which invoking LLM reasoning repeatedly, i.e., the ReAct paradigm, is inefficient. Prior to LLMs, traditional Robotic Process Automation (RPA) offers runtime efficiency but demands significant manual effort to develop and maintain. To bridge this gap, we propose AutoRPA, a framework that automatically distills the decision logic of ReAct-style agents into robust RPA functions. AutoRPA introduces two core innovations: (1) A translator-builder pipeline, where a translator agent converts hard-coded ReAct actions into soft-coded procedures, and a builder agent synthesizes robust RPA functions via retrieval-augmented generation over multiple trajectories; (2) A hybrid repair strategy during code verification, combining RPA execution with ReAct-based fallback for iterative refinement. Experiments across multiple GUI environments demonstrate that RPA functions generated by AutoRPA successfully solve similar tasks while reducing token usage by 82% to 96%, significantly improving runtime efficiency and reusability.

AutoRPA: Efficient GUI Automation through LLM-Driven Code Synthesis from Interactions
AutoRPA is a framework designed to bridge the gap between two common methods of automating computer tasks: the flexible but expensive "ReAct" LLM agent approach and the efficient but rigid traditional Robotic Process Automation (RPA). While LLM agents are great at solving new tasks, they are costly to run repeatedly. Traditional RPA is fast and efficient but requires significant manual effort to build and maintain. AutoRPA solves this by automatically "distilling" the decision-making logic of LLM agents into robust, reusable, and low-cost RPA code.

The Translator-Builder Pipeline

The core of AutoRPA is a two-stage process that converts human-like reasoning into machine-executable code. First, a "translator agent" takes the step-by-step actions performed by a ReAct-style LLM agent and converts them into "soft-coded" procedures. Unlike hard-coded actions that rely on fixed screen coordinates, these soft-coded actions use semantic attributes—such as element types or text content—to locate buttons and fields, making the automation much more resilient to changes in the interface layout.
Next, a "builder agent" synthesizes these procedures into a final RPA function. To ensure the code is accurate, the builder uses a retrieval-augmented generation (RAG) mechanism. It accesses a tree-structured database of past interaction trajectories, allowing it to look up specific details about how the interface behaved in previous attempts. This prevents the agent from making incorrect assumptions about the screen state and results in cleaner, more efficient code.

Hybrid Repair Strategy

AutoRPA includes a verification phase to ensure the generated code is reliable. If an RPA function fails during execution, the system does not simply give up or rely on a basic retry. Instead, it uses a "hybrid repair strategy." An analyzer agent examines the point of failure and determines why the code stopped working. A ReAct agent then takes over to complete the task from that exact point. This successful "fix" is recorded and fed back to the builder agent, which uses the new information to refine and improve the RPA code for future use.

Efficiency and Performance

By transforming complex, multi-step LLM reasoning into streamlined RPA functions, AutoRPA significantly reduces the computational overhead of GUI automation. Experiments across multiple GUI environments show that the RPA functions generated by this framework successfully handle tasks similar to those used during training. Most notably, this approach reduces token usage by 82% to 96% compared to standard LLM-based agents, while maintaining or exceeding their success rates. This makes AutoRPA a highly efficient solution for repetitive tasks that would otherwise be too costly or difficult to automate.

Comments (0)

No comments yet

Be the first to share your thoughts!