Bian Que: An Agentic Framework with Flexible Skill Arrangement for Online System Operations
Maintaining large-scale online engine systems—such as search, recommendation, and advertising platforms—is a complex, labor-intensive task. Engineers must constantly monitor system releases, perform routine health checks, and diagnose the root causes of alerts. While LLM-based agents are well-suited for these tasks, the primary challenge is not reasoning, but orchestration: knowing exactly which data (logs, metrics, or change events) and which operational knowledge (handbooks or past experience) to use for a specific event. Bian Que is an agentic framework designed to solve this by automating the selection and refinement of these operational inputs, ensuring that agents receive only the most relevant information to perform their duties.
A Unified Operational Paradigm
To manage the complexity of modern engine systems, Bian Que organizes all operational work into three "lines of defense." Instead of treating every task as a unique, isolated problem, the framework categorizes them into three canonical patterns: release interception (monitoring system behavior during updates), proactive inspection (regularly checking system health), and alert root cause analysis (diagnosing issues after they occur). By assigning a specialized agent to each of these patterns, the system creates a consistent structure for handling the entire operational lifecycle.
Flexible Skill Arrangement
The core innovation of Bian Que is its "Flexible Skill" mechanism. Because systems evolve rapidly, manually mapping every operational event to the correct data and knowledge is impossible. Instead, Bian Que uses Skills—modular units that define exactly what data to retrieve and how to reason over it for a specific business module. These Skills are generated automatically by LLMs and can be refined through natural-language instructions from engineers. This allows the system to adapt to new business requirements or changing system architectures without requiring manual code updates.
Self-Evolving Knowledge and Skills
Bian Que features a unified self-evolving mechanism that allows the system to improve over time based on practitioner feedback. When an engineer provides feedback on an agent’s performance, that single signal triggers two parallel learning processes. First, it updates the "knowledge pathway," where the system distills new insights and failure patterns into a persistent memory store. Second, it updates the "Skill pathway," where the agent refines its data-retrieval logic or reasoning prompts to avoid repeating past mistakes. This creates a continuous loop where the knowledge base and the operational skills co-evolve.
Real-World Impact
Deployed on the e-commerce search engine at Kuaishou, a major Chinese short-video platform, Bian Que has demonstrated significant improvements in operational efficiency. The framework reduced the volume of alerts by 75%, achieved an 80% accuracy rate in root-cause analysis, and cut the mean time to resolution for system issues by more than 50%. Additionally, the system achieved a 99% pass rate in offline evaluations, confirming its reliability in a high-stakes, large-scale production environment.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!