Back to AI Research

AI Research

Hierarchical Behaviour Spaces | AI Research

Key Takeaways

  • Hierarchical Behaviour Spaces (HBS) is a new approach to reinforcement learning designed to help agents navigate environments that require reasoning over ver...
  • Recent work in hierarchical reinforcement learning has shown success in scaling to billions of timesteps when learning over a set of predefined option reward functions.
  • We call this method Hierarchical Behaviour Spaces (HBS).
  • We evaluate HBS on the NetHack Learning Environment, demonstrating strong performance.
  • We conduct a series of experiments and determine that, perhaps going against conventional wisdom, the benefits of hierarchy in our method come from increased exploration rather than long term reasoning.
Paper AbstractExpand

Recent work in hierarchical reinforcement learning has shown success in scaling to billions of timesteps when learning over a set of predefined option reward functions. We show that, instead of using a single reward function per option, the reward functions can be effectively used to induce a space of behaviours, by letting the controller specify linear combinations over reward functions, allowing a more expressive set of policies to be represented. We call this method Hierarchical Behaviour Spaces (HBS). We evaluate HBS on the NetHack Learning Environment, demonstrating strong performance. We conduct a series of experiments and determine that, perhaps going against conventional wisdom, the benefits of hierarchy in our method come from increased exploration rather than long term reasoning.

Hierarchical Behaviour Spaces (HBS) is a new approach to reinforcement learning designed to help agents navigate environments that require reasoning over very long time horizons. While traditional hierarchical methods often struggle in online settings, HBS improves performance by allowing a high-level controller to dynamically combine multiple predefined reward functions. This creates a flexible "space" of possible behaviors, enabling the agent to adapt its strategy more effectively than if it were restricted to choosing from a fixed set of pre-set options.

How HBS Works

In standard hierarchical reinforcement learning, an agent might choose between a few rigid options, each optimized for a specific, singular goal. HBS changes this by allowing the controller to specify a "linear combination" of several reward functions. For example, if an agent has reward functions for finding food, gaining experience, and exploring new areas, the controller can blend these together in varying amounts. This creates a continuous spectrum of behaviors, allowing the agent to perform complex tasks that might not be captured by any single reward function on its own. The controller and the low-level policy are trained simultaneously, with the controller operating on a compressed timescale to manage long-term goals.

Performance in NetHack

The researchers tested HBS on the NetHack Learning Environment, a notoriously difficult benchmark that requires agents to make decisions over thousands of steps to succeed. HBS outperformed existing methods, showing a superior ability to reach key milestones and navigate different branches of the game’s map. Notably, as the researchers added more reward functions to the system, HBS became more effective, demonstrating that the method successfully scales by utilizing additional "axes of behavior" to improve its decision-making.

Rethinking Hierarchy and Exploration

A key finding of this research challenges the conventional wisdom regarding why hierarchical methods work. It is often assumed that hierarchy helps agents solve problems by improving "long-term reasoning"—the ability to plan far into the future. However, the experiments with HBS suggest that its success is primarily driven by enhanced exploration. By providing the agent with a diverse, expressive space of behaviors to choose from, the agent is better equipped to discover and navigate different parts of the environment. This suggests that the primary benefit of this hierarchical structure is not necessarily better credit assignment over long periods, but rather the ability to automatically tune and apply intrinsic rewards to encourage more effective exploration.

Comments (0)

No comments yet

Be the first to share your thoughts!