Back to AI Research

AI Research

Rule-based High-Level Coaching for Goal-Conditioned... | AI Research

Key Takeaways

  • Rule-based High-Level Coaching for Goal-Conditioned Reinforcement Learning in Search-and-Rescue UAV Missions Under Limited-Simulation Training This paper int...
  • This paper presents a hierarchical decision-making framework for unmanned aerial vehicle (UAV) missions motivated by search-and-rescue (SAR) scenarios under limited simulation training.
  • The framework combines a fixed rule-based high-level advisor with an online goal-conditioned low-level reinforcement learning (RL) controller.
  • To stress-test early adaptation, we also consider a strict no-pretraining deployment regime.
  • The high-level advisor is defined offline from a structured task specification and compiled into deterministic rules.
Paper AbstractExpand

This paper presents a hierarchical decision-making framework for unmanned aerial vehicle (UAV) missions motivated by search-and-rescue (SAR) scenarios under limited simulation training. The framework combines a fixed rule-based high-level advisor with an online goal-conditioned low-level reinforcement learning (RL) controller. To stress-test early adaptation, we also consider a strict no-pretraining deployment regime. The high-level advisor is defined offline from a structured task specification and compiled into deterministic rules. It provides interpretable mission- and safety-aware guidance through recommended actions, avoided actions, and regime-dependent arbitration weights. The low-level controller learns online from task-defined dense rewards and reuses experience through a mode-aware prioritized replay mechanism augmented with rule-derived metadata. We evaluate the framework on two tasks: battery-aware multi-goal delivery and moving-target delivery in obstacle-rich environments. Across both tasks, the proposed method improves early safety and sample efficiency primarily by reducing collision terminations, while preserving the ability to adapt online to scenario-specific dynamics.

Rule-based High-Level Coaching for Goal-Conditioned Reinforcement Learning in Search-and-Rescue UAV Missions Under Limited-Simulation Training

This paper introduces a hierarchical decision-making framework designed to improve the performance of unmanned aerial vehicles (UAVs) in search-and-rescue (SAR) missions. The researchers address the challenge of training robots in environments where simulation time is limited, testing their system under a "no-pretraining" regime where the UAV must learn to operate effectively from the moment it is deployed. By combining a fixed, rule-based advisor with a flexible reinforcement learning (RL) controller, the framework enables robots to learn complex tasks safely and efficiently.

The Hierarchical Approach

The framework splits decision-making into two distinct layers. The high-level advisor is defined offline using a structured task specification, which is then compiled into deterministic rules. This advisor acts as a coach, providing the low-level controller with interpretable guidance, such as recommended actions, prohibited moves, and safety-aware arbitration weights. Meanwhile, the low-level controller uses goal-conditioned reinforcement learning to adapt to the environment in real-time, learning from dense rewards while utilizing a specialized memory system that incorporates the advisor's metadata to improve its learning efficiency.

Improving Safety and Efficiency

A primary goal of this research is to enable robots to adapt to new scenarios without needing extensive prior training. By using the high-level advisor to provide safety-aware guidance, the system significantly reduces the number of collision-related failures during the early stages of deployment. This "coaching" allows the low-level RL agent to focus on learning the task objectives more effectively, leading to better sample efficiency—meaning the robot learns the necessary skills using fewer interactions with the environment.

Performance in Complex Tasks

The authors evaluated the framework across two challenging scenarios: battery-aware multi-goal delivery and moving-target delivery within obstacle-rich environments. In both cases, the proposed method demonstrated the ability to preserve the flexibility of online learning while maintaining high safety standards. The results indicate that the integration of rule-based coaching successfully mitigates the risks typically associated with early-stage reinforcement learning, allowing the UAVs to navigate complex, dynamic environments effectively even when starting from scratch.

Comments (0)

No comments yet

Be the first to share your thoughts!