Mask the Target: A Plug-and-Play Regularizer Against LoRA Forgetting
Mask the Target: 一种即插即用的正则化器,用于对抗LoRA遗忘
Runze Xu, Arpit Garg, Hemanth Saratchandran, Simon Lucey
AI总结 针对LoRA微调中目标分布与原始训练分布差异大时导致的灾难性遗忘问题,提出一种无需重放数据的输出空间正则化方法,通过遮蔽目标token并仅对非目标词汇进行KL正则化,在不增加推理开销的前提下改善新学习与遗忘之间的平衡。
Comments In Submission
详情
低秩适应(LoRA)已成为将大型语言模型适应新领域、任务和用户的最广泛使用的微调机制之一。然而,仅凭适应性能可能掩盖一个重要失败模式:LoRA更新可能在提升目标分布性能的同时,削弱预训练和对齐阶段学习到的先前能力。我们表明,当适应分布与模型的原始训练或对齐分布存在显著差异时,这种遗忘变得尤为严重。在实际场景中,原始训练和对齐数据通常不可用,这加剧了挑战。受此约束,我们研究了基于LoRA的适应如何在无重放设置中平衡新学习与遗忘,并引入了一个简单的输出空间正则化器,可直接添加到现有训练流程中。我们的方法从基模型和适应模型分布中移除真实标记,重新归一化剩余概率,并仅对非目标词汇应用KL正则化。这保留了基模型在替代标记之间的相对偏好,同时不直接对抗适应所需的交叉熵信号。由于正则化器仅在损失层面起作用,它不需要重放数据、架构更改、适配器重新设计或推理时开销,并且可以直接应用于现有LoRA变体。在所有测试的LoRA变体和各种骨干网络上,当适应分布与基模型的原始训练或对齐分布存在显著差异时,我们的方法改善了新学习与遗忘之间的边界,表明这是一条通往更可靠LLM更新的广泛适用途径。
Low-Rank Adaptation (LoRA) has become one of the most widely used fine-tuning mechanisms for adapting large language models to new domains, tasks, and users. Yet adaptation performance alone can obscure an important failure mode: LoRA updates may improve performance on the target distribution while degrading prior capabilities learned during pretraining and alignment. We show that this forgetting becomes especially severe when the adaptation distribution differs substantially from the models original training or alignment distributions. The challenge is amplified in practical settings, where the original training and alignment data are typically unavailable. Motivated by this constraint, we study how LoRA based adaptation balances new learning against forgetting in a replay-free setting, and introduce a simple output space regularizer that can be added directly to existing training pipelines. Our method removes the ground-truth token from both the base and adapted model distributions, renormalizes the remaining probabilities, and applies KL regularization only over the non-target vocabulary. This preserves the base models relative preferences among alternative tokens without directly opposing the cross-entropy signal required for adaptation. As the regularizer acts only at the loss level, it requires no replay data, architectural changes, adapter redesign, or inference-time overhead, and can be applied directly to existing LoRA variants. Across all LoRA variants tested and across various backbones, our method improves the frontier between new learning and forgetting when the adaptation distribution differs substantially from the base models original training or alignment distributions, suggesting a broadly applicable route toward more reliable LLM updating.