Compress Then Adapt? No, Do It Together via Task-aw...

Compress Then Adapt? No, Do It Together via Task-aw... | AI Research

Key Takeaways

Adapting large pretrained models to new tasks typically involves two separate steps: compressing the model to reduce its size and fine-tuning it to improve p...
Adapting large pretrained models to diverse tasks is now routine, yet the two dominant strategies of parameter-efficient fine-tuning (PEFT) and low-rank compression are typically composed in sequence.
This decoupled practice first compresses and then fine-tunes adapters, potentially misaligning the compressed subspace with downstream objectives and squandering a global parameter budget.
To overcome this limitation, we introduce JACTUS (Joint Adaptation and Compression with a Task-aware Union of Subspaces), a single framework that unifies compression and adaptation.
On vision, JACTUS attains an average 89.2% accuracy on ViT-Base across eight datasets at 80% retained parameters, surpassing strong 100% PEFT baselines (e.g., DoRA 87.9%).

Paper AbstractExpand

Adapting large pretrained models to diverse tasks is now routine, yet the two dominant strategies of parameter-efficient fine-tuning (PEFT) and low-rank compression are typically composed in sequence. This decoupled practice first compresses and then fine-tunes adapters, potentially misaligning the compressed subspace with downstream objectives and squandering a global parameter budget. To overcome this limitation, we introduce JACTUS (Joint Adaptation and Compression with a Task-aware Union of Subspaces), a single framework that unifies compression and adaptation. From a small calibration set, JACTUS estimates input and pre-activation gradient covariances, forms their orthogonal union with the pretrained weight subspace, performs a projected low-rank approximation inside this union, allocates rank globally by marginal gain per parameter, and trains only a compact core matrix. This explicitly mitigates the potential misalignment between the compressed subspace and downstream objectives by coupling the directions preserved for compression with those required for adaptation, yielding a deployable low-rank model that avoids retaining full frozen weights while enabling fast and robust tuning. On vision, JACTUS attains an average 89.2% accuracy on ViT-Base across eight datasets at 80% retained parameters, surpassing strong 100% PEFT baselines (e.g., DoRA 87.9%). On language, JACTUS achieves an 80.9% average on Llama2-7B commonsense QA at the same 80% retained-parameter budget, outperforming 100% PEFT (e.g., DoRA 79.7%) and exceeding prior compress-then-finetune pipelines under the same ratained-parameter budget. We will release code.

Adapting large pretrained models to new tasks typically involves two separate steps: compressing the model to reduce its size and fine-tuning it to improve performance. This "compress-then-adapt" approach often creates a mismatch, where the compressed model discards information that is actually vital for the new task. The paper introduces JACTUS (Joint Adaptation and Compression with a Task-aware Union of Subspaces), a framework that merges these two processes into a single, unified workflow. By doing so, JACTUS ensures that the model retains the specific information needed for the downstream task while simultaneously meeting strict size requirements.

A Unified Approach to Compression and Tuning

Instead of compressing a model and then trying to fix its performance through fine-tuning, JACTUS identifies the most important directions for a task before any compression occurs. It uses a small set of data to estimate how the model’s inputs and gradients behave during the task. By combining these task-specific insights with the model’s existing structural information, JACTUS creates a "union of subspaces." This creates a search space that is both efficient and highly relevant to the specific goal, allowing the model to be compressed and adapted simultaneously rather than in sequence.

Intelligent Resource Allocation

A major challenge in model compression is deciding which layers of a neural network should be prioritized. JACTUS solves this with a cost-aware global rank allocator. Rather than assigning the same amount of compression to every layer, the framework evaluates the "marginal gain" of each layer—essentially asking how much performance improvement a layer provides for every additional parameter it is allowed to keep. By greedily allocating the parameter budget to the layers that offer the highest return on investment, JACTUS ensures that the model’s limited capacity is used as effectively as possible.

Efficient Deployment

One of the key advantages of JACTUS is that it produces a compact, low-rank model that is ready for deployment. Unlike traditional Parameter-Efficient Fine-Tuning (PEFT) methods, which often require keeping the full, original model weights in memory during inference, JACTUS results in a model that is inherently small. Because the optimization happens within the pre-defined union subspace, the fine-tuning process is computationally efficient, allowing for robust performance without the need to store or load the massive, uncompressed base model.

Strong Performance Across Modalities

JACTUS demonstrates significant improvements over existing methods in both vision and language tasks. For example, when testing on ViT-Base models across eight different image datasets, JACTUS achieved an average accuracy of 89.2% while retaining only 80% of the original parameters, outperforming standard PEFT baselines. Similarly, on the Llama2-7B language model, it achieved 80.9% accuracy on commonsense question-answering tasks under the same 80% budget. These results suggest that by coupling compression with task-aware adaptation, models can be made smaller and faster without sacrificing their ability to learn new tasks.