Back to AI Research

AI Research

GUI agent: Guided Exploration of User-Sensitive Scr... | AI Research

Key Takeaways

  • GUI agent: Guided Exploration of User-Sensitive Screens As LLM-driven agents become more autonomous in controlling graphical user interfaces (GUIs), they oft...
  • LLM agents are increasingly being used to automate tasks for users within an open GUI environment.
  • They inevitably encounter screens containing user-sensitive information, for which takeover of task execution by the user is highly desirable or even necessary.
  • State-of-the-art LLM-driven agents are usually fine-tuned to complete tasks regardless of the safety implications of their actions.
  • This makes their real-world deployment difficult and adversely affects the reliability.
Paper AbstractExpand

LLM agents are increasingly being used to automate tasks for users within an open GUI environment. They inevitably encounter screens containing user-sensitive information, for which takeover of task execution by the user is highly desirable or even necessary. State-of-the-art LLM-driven agents are usually fine-tuned to complete tasks regardless of the safety implications of their actions. This makes their real-world deployment difficult and adversely affects the reliability. Therefore, it is crucial to identify and categorize user-sensitive states and define user-sensitive queries. This dataset would be to engineers to recognize and request handover to the user in critical scenarios. This short paper develops an explorer agent that systematically explores the query space starting from one demonstrated task to identify queries that, if executed, would lead to user-sensitive states in a GUI environment.

GUI agent: Guided Exploration of User-Sensitive Screens
As LLM-driven agents become more autonomous in controlling graphical user interfaces (GUIs), they often perform tasks that involve sensitive user data or irreversible actions, such as making payments or deleting files. Current agents are typically fine-tuned to complete tasks without regard for these safety risks. This paper introduces an "explorer agent" designed to systematically map out these high-risk scenarios. By starting from a single demonstrated task, the agent explores the application's query space to identify and categorize screens that require human intervention, helping engineers build safer, more reliable systems.

Identifying Sensitive Interactions

The core objective of the explorer agent is to discover "user-sensitive" states within a GUI. It treats the exploration process as a search through a space of potential tasks and screen states. By using a combination of instruction-tuning and a saturation algorithm, the agent generates a variety of tasks that might lead to critical screens—such as those involving authentication, personal data, or financial transactions. This allows the system to build a dataset of sensitive scenarios that would otherwise be difficult to identify manually.

How the Explorer Works

The framework utilizes two models: a native language model that executes actions and an "explorer" model that directs the search. The explorer model uses a Monte Carlo Tree Search (MCTS) approach, where it iteratively selects and expands queries based on novelty and sensitivity. To ensure the agent doesn't just repeat the same actions, it uses a reward system that scores queries and steps based on how unique they are compared to previous attempts. This process is refined through Group Relative Policy Optimization (GRPO), which helps the agent learn to prioritize paths that lead to new, critical, or sensitive screen categories.

Results and Observations

The researchers conducted experiments using the Qwen-2.5-32B-Instruct model as the explorer. They observed that as the training progressed over multiple rounds, the total reward decreased, indicating that the agent successfully "saturated" the available query space—meaning it had effectively mapped out the reachable sensitive states. Additionally, the agent showed improved accuracy in predicting steps as it gained experience. The data suggests that this systematic approach is effective at narrowing down the scope of potential user-sensitive interactions within an application.

Future Directions

While the current method successfully automates the identification of sensitive screens, the authors note that the search could be made more aggressive. Future work may focus on rejecting low-novelty queries more strictly and expanding the exploration to include more nuanced interactions, such as toggling specific settings (e.g., turning notifications on versus off). These improvements aim to provide a more comprehensive coverage of the GUI environment, ensuring that agents can reliably recognize when to hand over control to a human user.

Comments (0)

No comments yet

Be the first to share your thoughts!