A Multi-Agent system for Multi-Objective constrained optimization
In many computing and networking systems, decision-making involves a difficult balancing act: minimizing operational costs while simultaneously meeting strict performance requirements, such as response-time guarantees. Reinforcement Learning (RL) is often used to solve these problems by combining costs and constraint violations into a single "reward" signal. However, this approach usually relies on manually chosen weights to balance these conflicting goals. If these weights are poorly selected, the system may become too aggressive (leading to performance failures) or too conservative (leading to excessive costs). This paper introduces MAMO, a framework that automates this balancing process by using a multi-agent system to learn the optimal trade-offs dynamically.
Decoupling Task Execution from Objective Design
MAMO addresses the limitations of manual weight tuning by splitting the problem into two distinct roles handled by two separate agents. The Task-Execution (TE) agent focuses on the immediate job of controlling the system—such as scaling function replicas in an edge computing environment—using a standard weighted reward. Meanwhile, a higher-level Weight-Adaptation (WA) agent observes the system's long-term performance. Instead of controlling the system directly, the WA agent learns to adjust the weights used by the TE agent. By treating the selection of these weights as a learning problem, MAMO allows the system to discover the best balance between cost and performance through experience rather than human trial-and-error.
A Hierarchical Learning Loop
The MAMO framework operates in an iterative, two-phase cycle. First, the WA agent sets the weighting coefficients, which remain fixed for a specific training period. During this time, the TE agent interacts with the environment and updates its policy to optimize for those specific weights. Once this phase concludes, the WA agent evaluates the results based on aggregated performance indicators, such as average execution costs and the frequency of constraint violations. If the system fails to meet its quality-of-service requirements, the WA agent receives a penalty, prompting it to adjust the weights for the next cycle. This hierarchical structure allows the system to adapt its definition of "optimality" as environmental conditions, such as workload fluctuations, change over time.
Performance in Dynamic Environments
The author tested MAMO in a simulated edge computing scenario involving Function-as-a-Service (FaaS) scaling. In this environment, the system must decide how many function replicas to run to handle incoming requests without exceeding resource limits or incurring unnecessary costs. Experimental results showed that as the WA agent trained, it successfully steered the TE agent toward weight configurations that kept request rejection rates within the required tolerance levels. Even when faced with noisy, non-stationary workload patterns, MAMO was able to maintain performance constraints while keeping costs reasonable, demonstrating that the framework can effectively automate the trade-off between efficiency and reliability.
Future Directions
While MAMO shows promise in balancing conflicting objectives, the author notes that this is a preliminary step toward more autonomous RL solutions. Future research aims to test the framework across different application domains beyond edge computing. Additionally, the author plans to compare MAMO against other established weight-selection strategies, such as Bayesian optimization and dual-decomposition schemes, to further validate its effectiveness and robustness in complex, real-world scenarios.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!