Adapting large pretrained models to new tasks typically involves two separate steps: compressing the model to reduce its size and fine-tuning it to improve performance. This "compress-then-adapt" approach often creates a mismatch, where the compressed model discards information that is actually vital for the new task. The paper introduces JACTUS (Joint Adaptation and Compression with a Task-aware Union of Subspaces), a framework that merges these two processes into a single, unified workflow. By doing so, JACTUS ensures that the model retains the specific information needed for the downstream task while simultaneously meeting strict size requirements.
A Unified Approach to Compression and Tuning
Instead of compressing a model and then trying to fix its performance through fine-tuning, JACTUS identifies the most important directions for a task before any compression occurs. It uses a small set of data to estimate how the model’s inputs and gradients behave during the task. By combining these task-specific insights with the model’s existing structural information, JACTUS creates a "union of subspaces." This creates a search space that is both efficient and highly relevant to the specific goal, allowing the model to be compressed and adapted simultaneously rather than in sequence.
Intelligent Resource Allocation
A major challenge in model compression is deciding which layers of a neural network should be prioritized. JACTUS solves this with a cost-aware global rank allocator. Rather than assigning the same amount of compression to every layer, the framework evaluates the "marginal gain" of each layer—essentially asking how much performance improvement a layer provides for every additional parameter it is allowed to keep. By greedily allocating the parameter budget to the layers that offer the highest return on investment, JACTUS ensures that the model’s limited capacity is used as effectively as possible.
Efficient Deployment
One of the key advantages of JACTUS is that it produces a compact, low-rank model that is ready for deployment. Unlike traditional Parameter-Efficient Fine-Tuning (PEFT) methods, which often require keeping the full, original model weights in memory during inference, JACTUS results in a model that is inherently small. Because the optimization happens within the pre-defined union subspace, the fine-tuning process is computationally efficient, allowing for robust performance without the need to store or load the massive, uncompressed base model.
Strong Performance Across Modalities
JACTUS demonstrates significant improvements over existing methods in both vision and language tasks. For example, when testing on ViT-Base models across eight different image datasets, JACTUS achieved an average accuracy of 89.2% while retaining only 80% of the original parameters, outperforming standard PEFT baselines. Similarly, on the Llama2-7B language model, it achieved 80.9% accuracy on commonsense question-answering tasks under the same 80% budget. These results suggest that by coupling compression with task-aware adaptation, models can be made smaller and faster without sacrificing their ability to learn new tasks.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!