Back to AI Research

AI Research

InduceKV: Fixed-Footprint Continual Adaptation of M... | AI Research

Key Takeaways

  • InduceKV: Fixed-Footprint Continual Adaptation of Multimodal LLMs via Inducing KV Memories Multimodal large language models (MLLMs) are increasingly used for...
  • We study fixed-footprint continual adaptation: the deployed adaptation state is kept under a fixed memory budget, while the backbone model is left unchanged and task-specific updates are externalized.
  • We further report backbone-matched, stage-1 CoIN, compute-matched, and scalability diagnostics, showing that the gains are not due to a stronger backbone, replay alone, or an unbounded candidate pool.
  • InduceKV: Fixed-Footprint Continual Adaptation of Multimodal LLMs via Inducing KV Memories
  • Multimodal large language models (MLLMs) are increasingly used for diverse tasks, but teaching them new skills over time is difficult.
Paper AbstractExpand

Multimodal large language models must adapt to evolving tasks and domains, yet continual improvement under bounded deployment footprint remains difficult because repeated parameter updates or growing replay stores can accumulate adaptation state over time. We study fixed-footprint continual adaptation: the deployed adaptation state is kept under a fixed memory budget, while the backbone model is left unchanged and task-specific updates are externalized. We propose InduceKV, a retrieval-based method that stores each selected training prefix as an attention-ready memory entry, consisting of a frozen retrieval key and compact layerwise key--value (KV) payloads that can be appended to the model's self-attention cache. Under a strict memory budget, InduceKV constructs a compact inducing set through bilevel selection: a lightweight calibration is fit for retrieval, while the selected memory balances current-task likelihood, anchor-based retention, and coverage in the frozen retrieval space. Across task-incremental instruction tuning, continual VQA, domain-incremental adaptation, and lifelong multimodal instruction tuning, InduceKV consistently improves over PEFT, MoE, replay, and prompt-retrieval baselines under matched memory budgets. We further report backbone-matched, stage-1 CoIN, compute-matched, and scalability diagnostics, showing that the gains are not due to a stronger backbone, replay alone, or an unbounded candidate pool.

InduceKV: Fixed-Footprint Continual Adaptation of Multimodal LLMs via Inducing KV Memories
Multimodal large language models (MLLMs) are increasingly used for diverse tasks, but teaching them new skills over time is difficult. Typically, this requires updating the model's internal parameters, which can lead to "catastrophic forgetting" of previous knowledge or require an ever-growing amount of storage. This paper introduces InduceKV, a method that allows MLLMs to adapt to new tasks without changing their core parameters. Instead, it stores task-specific information in a compact, external memory that the model can access during its normal operation.

How InduceKV Works

Rather than retraining the model, InduceKV treats continual learning as a memory management problem. When a new task arrives, the model extracts "key-value" (KV) payloads—essentially compressed representations of the task's information—and stores them as memory entries.
To keep the system efficient, the method uses a "bilevel selection" process. It balances three competing goals: ensuring the model performs well on the current task, retaining accuracy on historical tasks (using a small set of "anchor" data), and ensuring the stored memories are diverse and not redundant. By using a mathematical regularizer, the system avoids storing similar or repetitive information, ensuring the fixed memory budget is used as effectively as possible.

Integration with the Model

InduceKV does not require the model to learn new behaviors through gradient updates. Instead, it acts as a retrieval-based system. When the model processes a new input, it uses a lightweight calibration interface to retrieve the most relevant memory entries. These entries are then injected directly into the model's self-attention mechanism—the same pathway the model uses to process its own internal cache. This allows the model to "read" the relevant task-specific knowledge during generation, effectively adapting its output without ever modifying its underlying weights.

Performance and Results

The researchers tested InduceKV across several challenging scenarios, including task-incremental instruction tuning, continual visual question answering (VQA), and domain-incremental adaptation. In these tests, InduceKV consistently outperformed existing methods like Parameter-Efficient Fine-Tuning (PEFT), Mixture-of-Experts (MoE), and standard replay-based approaches. The authors also conducted diagnostic tests to confirm that these performance gains were due to the effectiveness of the memory-induction strategy rather than simply having a larger model or using more compute power.

Key Considerations

The primary advantage of InduceKV is its "fixed-footprint" nature, meaning the memory usage remains constant regardless of how many tasks the model learns. While it does introduce a small amount of overhead—specifically, an extra pass to compute retrieval keys and slightly larger attention matrices during the prefill stage—it avoids the complexities of parameter-space updates. This makes it a scalable solution for deploying MLLMs in environments where compute resources and storage are strictly limited.

Comments (0)

No comments yet

Be the first to share your thoughts!