TSN-Affinity: Similarity-Driven Parameter Reuse for Continual Offline Reinforcement Learning
Continual offline reinforcement learning (CORL) is a challenging field where AI models must learn a sequence of tasks from static datasets without interacting with the environment, all while avoiding "catastrophic forgetting"—the tendency for a model to lose its ability to perform old tasks as it learns new ones. This paper introduces TSN-Affinity, a method that addresses this by using a combination of sparse, task-specific subnetworks and a smart routing system that determines how to reuse existing knowledge for new tasks.
The Architectural Approach
Unlike traditional methods that rely on storing past data (replay-based) to remember old tasks, TSN-Affinity uses an architectural approach. It builds upon the Decision Transformer, a model that treats reinforcement learning as a sequence-modeling problem. Within this framework, the method creates "TinySubNetworks"—sparse, task-specific masks that select only a small portion of the model's total parameters for each task. By freezing these parameters once a task is learned, the model ensures that new learning does not overwrite or interfere with previously acquired skills.
Affinity-Based Routing
The core innovation of the method is "Affinity Routing." When a new task arrives, the system evaluates whether it should create an entirely new model copy or reuse an existing one. It makes this decision based on two types of similarity:
Action Affinity: It checks if an existing subnetwork can already predict the correct actions for the new task.
Latent Affinity: It compares the internal representations (latent statistics) of the new task against those of previously learned tasks.
By combining these signals, the model can intelligently decide when to transfer knowledge from a similar past task, which helps improve performance without needing to store large amounts of past data.
Performance and Findings
The researchers tested TSN-Affinity on two distinct types of environments: Atari games (discrete control) and robotic manipulation tasks using the Franka Emika Panda arm (continuous control). The results indicate that using sparse, task-specific subnetworks is highly effective at preventing forgetting. Furthermore, the Affinity Routing mechanism was shown to improve multi-task performance, particularly in the discrete Atari benchmarks.
Considerations for Future Application
While the method shows strong potential, the study highlights that knowledge transfer is inherently more difficult in heterogeneous continuous-control settings, such as the robotic manipulation tasks. The findings suggest that similarity-guided architectural reuse is a viable and powerful alternative to traditional replay-based strategies, offering a way to manage memory and performance in environments where live interaction is impossible or too risky.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!