Back to AI Research

AI Research

A Multi-Agent system for Multi-Objective constraine... | AI Research

Key Takeaways

  • A Multi-Agent system for Multi-Objective constrained optimization In many computing and networking systems, decision-making involves a difficult balancing ac...
  • Many decision-making problems in computing and networking systems can be naturally formulated as cost-minimization problems under performance constraints.
  • However, in this context the behavior of the learned policy critically depends on the choice of these weights, which are typically selected manually.
  • This paper presents MAMO (Multi-Agent system for Multi-Objective constrained optimization), an approach to tackle this balancing problem through multi-agent RL.
  • A Multi-Agent system for Multi-Objective constrained optimization
Paper AbstractExpand

Many decision-making problems in computing and networking systems can be naturally formulated as cost-minimization problems under performance constraints. In dynamic environments, reinforcement learning (RL) is often used to solve such problems at runtime by embedding both costs and constraint violations into a single scalar reward through weighted penalty terms, following a Lagrangian-inspired formulation. However, in this context the behavior of the learned policy critically depends on the choice of these weights, which are typically selected manually. This makes it difficult to identify an appropriate trade-off between optimizing the primary objective and effectively avoiding constraint violations, particularly in non-stationary environments where their relative importance may change. This paper presents MAMO (Multi-Agent system for Multi-Objective constrained optimization), an approach to tackle this balancing problem through multi-agent RL. MAMO decouples task execution from objective design by formulating the selection of reward weights as a learning problem, providing a !rst step towards more autonomous and robust RL-based solutions for constrained optimization problems in dynamic environments.

A Multi-Agent system for Multi-Objective constrained optimization
In many computing and networking systems, decision-making involves a difficult balancing act: minimizing operational costs while simultaneously meeting strict performance requirements, such as response-time guarantees. Reinforcement Learning (RL) is often used to solve these problems by combining costs and constraint violations into a single "reward" signal. However, this approach usually relies on manually chosen weights to balance these conflicting goals. If these weights are poorly selected, the system may become too aggressive (leading to performance failures) or too conservative (leading to excessive costs). This paper introduces MAMO, a framework that automates this balancing process by using a multi-agent system to learn the optimal trade-offs dynamically.

Decoupling Task Execution from Objective Design

MAMO addresses the limitations of manual weight tuning by splitting the problem into two distinct roles handled by two separate agents. The Task-Execution (TE) agent focuses on the immediate job of controlling the system—such as scaling function replicas in an edge computing environment—using a standard weighted reward. Meanwhile, a higher-level Weight-Adaptation (WA) agent observes the system's long-term performance. Instead of controlling the system directly, the WA agent learns to adjust the weights used by the TE agent. By treating the selection of these weights as a learning problem, MAMO allows the system to discover the best balance between cost and performance through experience rather than human trial-and-error.

A Hierarchical Learning Loop

The MAMO framework operates in an iterative, two-phase cycle. First, the WA agent sets the weighting coefficients, which remain fixed for a specific training period. During this time, the TE agent interacts with the environment and updates its policy to optimize for those specific weights. Once this phase concludes, the WA agent evaluates the results based on aggregated performance indicators, such as average execution costs and the frequency of constraint violations. If the system fails to meet its quality-of-service requirements, the WA agent receives a penalty, prompting it to adjust the weights for the next cycle. This hierarchical structure allows the system to adapt its definition of "optimality" as environmental conditions, such as workload fluctuations, change over time.

Performance in Dynamic Environments

The author tested MAMO in a simulated edge computing scenario involving Function-as-a-Service (FaaS) scaling. In this environment, the system must decide how many function replicas to run to handle incoming requests without exceeding resource limits or incurring unnecessary costs. Experimental results showed that as the WA agent trained, it successfully steered the TE agent toward weight configurations that kept request rejection rates within the required tolerance levels. Even when faced with noisy, non-stationary workload patterns, MAMO was able to maintain performance constraints while keeping costs reasonable, demonstrating that the framework can effectively automate the trade-off between efficiency and reliability.

Future Directions

While MAMO shows promise in balancing conflicting objectives, the author notes that this is a preliminary step toward more autonomous RL solutions. Future research aims to test the framework across different application domains beyond edge computing. Additionally, the author plans to compare MAMO against other established weight-selection strategies, such as Bayesian optimization and dual-decomposition schemes, to further validate its effectiveness and robustness in complex, real-world scenarios.

Comments (0)

No comments yet

Be the first to share your thoughts!