Learning to Recover Task Experts from a Multi-Task Merged Model
Multi-task model merging is a technique used to combine several specialized AI models into one unified system. While this saves space, it often leads to "parameter interference," where the different tasks conflict with one another, degrading the performance of the individual experts. This paper introduces a framework called Recover Task eXpert (ReTeX), which aims to undo this interference and restore the original performance of each expert from a single, merged model without needing to store redundant components.
Understanding Parameter Interference
The authors propose a new way to look at the problem of model merging. They suggest that when experts are combined, the resulting interference acts like a "perturbation" or a distortion applied to each expert's parameters. By modeling these distortions as affine transformations, the researchers found they could approximate them as simple additive offsets. By predicting these specific offsets, the ReTeX framework can effectively "clean" the merged model, allowing it to behave like the original, high-performing task-specific expert.
Identifying Tasks Without a Router
A common challenge in multi-task models is knowing which "expert" to use when the task identity is unknown. Traditional methods often use a separate, complex router to decide. ReTeX avoids this by using a router-free identifier based on SVD (Singular Value Decomposition) subspace signatures. These signatures are calculated offline before the model is used. During inference, the system checks which task's subspace produces the smallest "projection residual" for a given input, allowing it to identify the correct task automatically and efficiently.
Performance and Adaptive Knowledge
The results show that ReTeX is highly effective, recovering over 95% of the performance of individual experts across both vision and natural language processing (NLP) tasks. Beyond just restoring performance, the framework demonstrates an interesting emergent capability: it can adaptively interpolate knowledge from seen experts to handle out-of-distribution (OOD) tasks. This means the model does not just recall what it was trained on; it can intelligently combine its internal knowledge to generalize to tasks it has not explicitly seen before.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!