Back to AI Research

AI Research

Learning to Recover Task Experts from a Multi-Task... | AI Research

Key Takeaways

  • Learning to Recover Task Experts from a Multi-Task Merged Model Multi-task model merging is a technique used to combine several specialized AI models into on...
  • Multi-task model merging aims to consolidate several task-specific experts into a unified model, yet static merging consistently suffers from parameter interference.
  • While dynamic merging models aim to bridge this gap, many works rely on the costly storage and loading of redundant expert components at inference.
  • In this work, from the perspective of task expert, we view parameter interference as parameter perturbation introduced to each expert during merging process.
  • We show that such parameter perturbations can be modeled as affine transformation, which can be approximated as additive offsets.
Paper AbstractExpand

Multi-task model merging aims to consolidate several task-specific experts into a unified model, yet static merging consistently suffers from parameter interference. While dynamic merging models aim to bridge this gap, many works rely on the costly storage and loading of redundant expert components at inference. In this work, from the perspective of task expert, we view parameter interference as parameter perturbation introduced to each expert during merging process. We show that such parameter perturbations can be modeled as affine transformation, which can be approximated as additive offsets. Motivated by these, we propose Recover Task eXpert (ReTeX), a framework that predicts those offsets, in order to undo parameter interference and recover task-expert performance from a single merged checkpoint. To recover the appropriate expert when task identity is unknown, we introduce a router-free task identifier based on SVD subspace signatures computed offline before inference. At inference, the identifier selects the task whose subspace yields the smallest projection residual for a given input. As a result, ReTeX recovers over 95% of individual-expert performance in both vision and NLP domains, while significantly improving generalization to unseen tasks. Crucially, we also show that the parameter offset prediction leads to emergent adaptive interpolation of expert knowledge for out-of-distribution (OOD) tasks. ReTeX adaptively interpolates seen expert knowledge to handle unseen tasks. Our code is available at this https URL

Learning to Recover Task Experts from a Multi-Task Merged Model

Multi-task model merging is a technique used to combine several specialized AI models into one unified system. While this saves space, it often leads to "parameter interference," where the different tasks conflict with one another, degrading the performance of the individual experts. This paper introduces a framework called Recover Task eXpert (ReTeX), which aims to undo this interference and restore the original performance of each expert from a single, merged model without needing to store redundant components.

Understanding Parameter Interference

The authors propose a new way to look at the problem of model merging. They suggest that when experts are combined, the resulting interference acts like a "perturbation" or a distortion applied to each expert's parameters. By modeling these distortions as affine transformations, the researchers found they could approximate them as simple additive offsets. By predicting these specific offsets, the ReTeX framework can effectively "clean" the merged model, allowing it to behave like the original, high-performing task-specific expert.

Identifying Tasks Without a Router

A common challenge in multi-task models is knowing which "expert" to use when the task identity is unknown. Traditional methods often use a separate, complex router to decide. ReTeX avoids this by using a router-free identifier based on SVD (Singular Value Decomposition) subspace signatures. These signatures are calculated offline before the model is used. During inference, the system checks which task's subspace produces the smallest "projection residual" for a given input, allowing it to identify the correct task automatically and efficiently.

Performance and Adaptive Knowledge

The results show that ReTeX is highly effective, recovering over 95% of the performance of individual experts across both vision and natural language processing (NLP) tasks. Beyond just restoring performance, the framework demonstrates an interesting emergent capability: it can adaptively interpolate knowledge from seen experts to handle out-of-distribution (OOD) tasks. This means the model does not just recall what it was trained on; it can intelligently combine its internal knowledge to generalize to tasks it has not explicitly seen before.

Comments (0)

No comments yet

Be the first to share your thoughts!