Rule-based High-Level Coaching for Goal-Conditioned Reinforcement Learning in Search-and-Rescue UAV Missions Under Limited-Simulation Training
This paper introduces a hierarchical decision-making framework designed to improve the performance of unmanned aerial vehicles (UAVs) in search-and-rescue (SAR) missions. The researchers address the challenge of training robots in environments where simulation time is limited, testing their system under a "no-pretraining" regime where the UAV must learn to operate effectively from the moment it is deployed. By combining a fixed, rule-based advisor with a flexible reinforcement learning (RL) controller, the framework enables robots to learn complex tasks safely and efficiently.
The Hierarchical Approach
The framework splits decision-making into two distinct layers. The high-level advisor is defined offline using a structured task specification, which is then compiled into deterministic rules. This advisor acts as a coach, providing the low-level controller with interpretable guidance, such as recommended actions, prohibited moves, and safety-aware arbitration weights. Meanwhile, the low-level controller uses goal-conditioned reinforcement learning to adapt to the environment in real-time, learning from dense rewards while utilizing a specialized memory system that incorporates the advisor's metadata to improve its learning efficiency.
Improving Safety and Efficiency
A primary goal of this research is to enable robots to adapt to new scenarios without needing extensive prior training. By using the high-level advisor to provide safety-aware guidance, the system significantly reduces the number of collision-related failures during the early stages of deployment. This "coaching" allows the low-level RL agent to focus on learning the task objectives more effectively, leading to better sample efficiency—meaning the robot learns the necessary skills using fewer interactions with the environment.
Performance in Complex Tasks
The authors evaluated the framework across two challenging scenarios: battery-aware multi-goal delivery and moving-target delivery within obstacle-rich environments. In both cases, the proposed method demonstrated the ability to preserve the flexibility of online learning while maintaining high safety standards. The results indicate that the integration of rule-based coaching successfully mitigates the risks typically associated with early-stage reinforcement learning, allowing the UAVs to navigate complex, dynamic environments effectively even when starting from scratch.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!