Back to AI Research

AI Research

Learning to Adapt: Self-Improving Web Agent via Cog... | AI Research

Key Takeaways

  • Learning to Adapt: Self-Improving Web Agent via Cognitive-Aware Exploration introduces a new framework designed to make web-browsing AI agents more autonomou...
  • Recent advances in Multimodal Large Language Models (MLLMs) have led to promising progress in web agents.
  • However, existing web agents often rely on handcrafted execution pipelines or expensive expert trajectories, limiting their adaptability to complex, dynamic environments.
  • Moreover, we propose SCALE-Hop, a graph exploration strategy that facilitates global planning and helps agents avoid local exploration traps.
  • To further support learning, we construct SCALE-20k, a large-scale dataset collected from 19 real-world websites, containing diverse task types and structured demonstrations generated from SCALE's exploration traces.
Paper AbstractExpand

Recent advances in Multimodal Large Language Models (MLLMs) have led to promising progress in web agents. However, existing web agents often rely on handcrafted execution pipelines or expensive expert trajectories, limiting their adaptability to complex, dynamic environments. To address these challenges, we propose SCALE (Self-Cognitive-Aware Learning and Exploration), which leverages three adversarial roles, Selector, Predictor, and Judger to autonomously discover the agent's limitations and expand its cognitive boundaries through environmental exploration. Moreover, we propose SCALE-Hop, a graph exploration strategy that facilitates global planning and helps agents avoid local exploration traps. To further support learning, we construct SCALE-20k, a large-scale dataset collected from 19 real-world websites, containing diverse task types and structured demonstrations generated from SCALE's exploration traces. Experimental results show that our approach significantly improves the performance and generalization of multiple MLLMs in various web environments. Our framework offers a scalable and generalizable solution for building truly autonomous and adaptive web agents.

Learning to Adapt: Self-Improving Web Agent via Cognitive-Aware Exploration introduces a new framework designed to make web-browsing AI agents more autonomous and adaptable. Current web agents often rely on manually written instructions or expensive, human-created examples to learn how to navigate websites. This paper proposes a way for agents to learn on their own by actively exploring the web, identifying their own knowledge gaps, and systematically improving their reasoning without needing constant human supervision.

The SCALE Framework

The core of the approach is the SCALE (Self-Cognitive-Aware Learning and Exploration) framework, which assigns three distinct roles to a single AI model: the Selector, the Predictor, and the Judger. The Selector identifies potentially confusing or unfamiliar elements on a webpage and proposes an action. The Predictor then guesses what will happen if that action is taken. Finally, the Judger compares the prediction to the actual outcome. If the prediction is wrong, the agent recognizes that it has hit a "cognitive boundary"—a limit in its current understanding. By focusing on these failures, the agent can update its knowledge and learn from its mistakes in a continuous, self-improving loop.

Global Planning with SCALE-Hop

To prevent agents from getting stuck in repetitive loops or focusing only on small, local areas of a website, the researchers introduced SCALE-Hop. This strategy treats the agent’s exploration history as a map or graph. Each node in this graph represents a specific state of a webpage. SCALE-Hop monitors how much of a site has been explored and uses a "verification-guided backtracking" mechanism. When the agent feels it has exhausted its options in one area, this system helps it navigate back to previously visited or unexplored sections, ensuring the agent develops a comprehensive, global understanding of the environment rather than just learning shallow, local behaviors.

SCALE-20k Dataset

To support the training of these agents, the authors created SCALE-20k, a large-scale dataset derived from the agent's own exploration traces across 19 real-world websites. This dataset contains over 25,000 items, including single-step actions, multi-step task trajectories, and page comprehension questions. By using this data, the researchers demonstrated that the framework significantly boosts the performance of existing Multimodal Large Language Models (MLLMs), such as InternVL2.5-8B and Qwen2.5-VL-7B, helping them become more effective at navigating complex and dynamic web environments.

Key Findings

The experimental results show that the SCALE framework allows agents to move beyond rigid, pre-defined task flows. By proactively seeking out challenging scenarios and learning from the resulting errors, the agents achieved significant improvements in task success rates. The study highlights that this self-driven approach is highly generalizable, meaning it can be applied to different AI models to help them adapt to novel websites more effectively than traditional methods that rely on static, human-provided data.

Comments (0)

No comments yet

Be the first to share your thoughts!