Back to AI Research

AI Research

Bian Que: An Agentic Framework with Flexible Skill... | AI Research

Key Takeaways

  • Bian Que: An Agentic Framework with Flexible Skill Arrangement for Online System Operations Maintaining large-scale online engine systems—such as search, rec...
  • Operating and maintaining (O&M) large-scale online engine systems (search, recommendation, advertising) demands substantial human effort for release monitoring, alert response, and root cause analysis.
  • Feeding all signals indiscriminately causes dilution and hallucination, while manually curating the event-to-(data, knowledge) mapping is intractable under dozens of daily releases.
  • Our framework achieves 99.0% pass rate on offline evaluations.
  • Our code is available at this https URL .
Paper AbstractExpand

Operating and maintaining (O&M) large-scale online engine systems (search, recommendation, advertising) demands substantial human effort for release monitoring, alert response, and root cause analysis. While LLM-based agents are a natural fit for these tasks, the deployment bottleneck is not reasoning capability but orchestration: selecting, for each operational event, the relevant data (metrics, logs, change events) and the applicable operational knowledge (handbook rules and practitioner experience). Feeding all signals indiscriminately causes dilution and hallucination, while manually curating the event-to-(data, knowledge) mapping is intractable under dozens of daily releases. We present Bian Que, an agentic framework with three contributions: (i) a \emph{unified operational paradigm} abstracting day-to-day O&M into three canonical patterns: release interception, proactive inspection, and alert root cause analysis; (ii) \emph{Flexible Skill Arrangement}, where each Skill specifies which data and knowledge to retrieve for a given business-module context and can be automatically generated and updated by LLMs or iteratively refined through natural-language instructions from on-call engineers; (iii) a \emph{unified self-evolving mechanism} in which one correction signal drives two parallel pathways, case-memory-to-knowledge distillation and targeted Skill refinement. Deployed on the e-commerce search engine of KuaiShou, the major short-video platform in China, Bian Que reduces alert volume by 75%, achieves 80% root-cause analysis accuracy, and cuts mean time to resolution by over 50%. Our framework achieves 99.0% pass rate on offline evaluations. Our code is available at this https URL .

Bian Que: An Agentic Framework with Flexible Skill Arrangement for Online System Operations
Maintaining large-scale online engine systems—such as search, recommendation, and advertising platforms—is a complex, labor-intensive task. Engineers must constantly monitor system releases, perform routine health checks, and diagnose the root causes of alerts. While LLM-based agents are well-suited for these tasks, the primary challenge is not reasoning, but orchestration: knowing exactly which data (logs, metrics, or change events) and which operational knowledge (handbooks or past experience) to use for a specific event. Bian Que is an agentic framework designed to solve this by automating the selection and refinement of these operational inputs, ensuring that agents receive only the most relevant information to perform their duties.

A Unified Operational Paradigm

To manage the complexity of modern engine systems, Bian Que organizes all operational work into three "lines of defense." Instead of treating every task as a unique, isolated problem, the framework categorizes them into three canonical patterns: release interception (monitoring system behavior during updates), proactive inspection (regularly checking system health), and alert root cause analysis (diagnosing issues after they occur). By assigning a specialized agent to each of these patterns, the system creates a consistent structure for handling the entire operational lifecycle.

Flexible Skill Arrangement

The core innovation of Bian Que is its "Flexible Skill" mechanism. Because systems evolve rapidly, manually mapping every operational event to the correct data and knowledge is impossible. Instead, Bian Que uses Skills—modular units that define exactly what data to retrieve and how to reason over it for a specific business module. These Skills are generated automatically by LLMs and can be refined through natural-language instructions from engineers. This allows the system to adapt to new business requirements or changing system architectures without requiring manual code updates.

Self-Evolving Knowledge and Skills

Bian Que features a unified self-evolving mechanism that allows the system to improve over time based on practitioner feedback. When an engineer provides feedback on an agent’s performance, that single signal triggers two parallel learning processes. First, it updates the "knowledge pathway," where the system distills new insights and failure patterns into a persistent memory store. Second, it updates the "Skill pathway," where the agent refines its data-retrieval logic or reasoning prompts to avoid repeating past mistakes. This creates a continuous loop where the knowledge base and the operational skills co-evolve.

Real-World Impact

Deployed on the e-commerce search engine at Kuaishou, a major Chinese short-video platform, Bian Que has demonstrated significant improvements in operational efficiency. The framework reduced the volume of alerts by 75%, achieved an 80% accuracy rate in root-cause analysis, and cut the mean time to resolution for system issues by more than 50%. Additionally, the system achieved a 99% pass rate in offline evaluations, confirming its reliability in a high-stakes, large-scale production environment.

Comments (0)

No comments yet

Be the first to share your thoughts!