TSN-Affinity: Similarity-Driven Parameter Reuse for...

TSN-Affinity: Similarity-Driven Parameter Reuse for... | AI Research

Key Takeaways

TSN-Affinity: Similarity-Driven Parameter Reuse for Continual Offline Reinforcement Learning Continual offline reinforcement learning (CORL) is a challenging...
Continual offline reinforcement learning (CORL) aims to learn a sequence of tasks from datasets collected over time while preserving performance on previously learned tasks.
This setting corresponds to domains where new tasks arise over time, but adapting the model in live environment interactions is expensive, risky, or impossible.
However, CORL inherits the dual difficulty of offline reinforcement learning and adapting while preventing catastrophic forgetting.
Replay-based continual learning approaches remain a strong baseline but incur memory overhead and suffer from a distribution mismatch between replayed samples and newly learned policies.

Paper AbstractExpand

Continual offline reinforcement learning (CORL) aims to learn a sequence of tasks from datasets collected over time while preserving performance on previously learned tasks. This setting corresponds to domains where new tasks arise over time, but adapting the model in live environment interactions is expensive, risky, or impossible. However, CORL inherits the dual difficulty of offline reinforcement learning and adapting while preventing catastrophic forgetting. Replay-based continual learning approaches remain a strong baseline but incur memory overhead and suffer from a distribution mismatch between replayed samples and newly learned policies. At the same time, architectural continual learning methods have shown strong potential in supervised learning but remain underexplored in CORL. In this work, we propose TSN-Affinity, a novel CORL method based on TinySubNetworks and Decision Transformer. The method enables task-specific parameterization and controlled knowledge sharing through a RL-aware reuse strategy that routes tasks according to action compatibility and latent similarity. We evaluate the approach on benchmarks based on Atari games and simulations of manipulation tasks with the Franka Emika Panda robotic arm, covering both discrete and continuous control. Results show strong retention from sparse SubNetworks, with routing further improving multi-task performance. Our findings suggest that similarity-guided architectural reuse is a strong and viable alternative to replay-based strategies in a CORL setting. Our code is available at: this https URL .

TSN-Affinity: Similarity-Driven Parameter Reuse for Continual Offline Reinforcement Learning
Continual offline reinforcement learning (CORL) is a challenging field where AI models must learn a sequence of tasks from static datasets without interacting with the environment, all while avoiding "catastrophic forgetting"—the tendency for a model to lose its ability to perform old tasks as it learns new ones. This paper introduces TSN-Affinity, a method that addresses this by using a combination of sparse, task-specific subnetworks and a smart routing system that determines how to reuse existing knowledge for new tasks.

The Architectural Approach

Unlike traditional methods that rely on storing past data (replay-based) to remember old tasks, TSN-Affinity uses an architectural approach. It builds upon the Decision Transformer, a model that treats reinforcement learning as a sequence-modeling problem. Within this framework, the method creates "TinySubNetworks"—sparse, task-specific masks that select only a small portion of the model's total parameters for each task. By freezing these parameters once a task is learned, the model ensures that new learning does not overwrite or interfere with previously acquired skills.

Affinity-Based Routing

The core innovation of the method is "Affinity Routing." When a new task arrives, the system evaluates whether it should create an entirely new model copy or reuse an existing one. It makes this decision based on two types of similarity:

Action Affinity: It checks if an existing subnetwork can already predict the correct actions for the new task.
Latent Affinity: It compares the internal representations (latent statistics) of the new task against those of previously learned tasks.
By combining these signals, the model can intelligently decide when to transfer knowledge from a similar past task, which helps improve performance without needing to store large amounts of past data.

Performance and Findings

The researchers tested TSN-Affinity on two distinct types of environments: Atari games (discrete control) and robotic manipulation tasks using the Franka Emika Panda arm (continuous control). The results indicate that using sparse, task-specific subnetworks is highly effective at preventing forgetting. Furthermore, the Affinity Routing mechanism was shown to improve multi-task performance, particularly in the discrete Atari benchmarks.

Considerations for Future Application

While the method shows strong potential, the study highlights that knowledge transfer is inherently more difficult in heterogeneous continuous-control settings, such as the robotic manipulation tasks. The findings suggest that similarity-guided architectural reuse is a viable and powerful alternative to traditional replay-based strategies, offering a way to manage memory and performance in environments where live interaction is impossible or too risky.