- Journal ref
- IEEE International Conference on Language Modeling (COLM), 2025
- Comments
- arXiv admin note: substantial text overlap with arXiv:2312.01523
AI中文摘要
最近指令微调的进展在嵌入中注入噪声,其中NEFTune(Jain等人,2024)使用均匀噪声设立了基准。尽管NEFTune的实验发现均匀噪声优于高斯噪声,其原因仍不清楚。本文旨在通过提供彻底的理论和实证分析来澄清这一点,表明这些噪声类型之间的性能相当。此外,我们引入了一种新的语言模型微调方法,在嵌入中使用对称噪声。该方法旨在通过更严格地调节模型的局部曲率来增强模型功能,表现出优于当前方法NEFTune的性能。当使用Alpaca微调LLaMA-2-7B模型时,标准技术在AlpacaEval上获得29.79%的分数。然而,我们的方法SymNoise使用对称噪声嵌入将这一分数显著提高到69.04%,比最先进方法NEFTune(64.69%)提高了6.7%。此外,当在各种模型和更强的基线指令数据集(如Evol-Instruct、ShareGPT、OpenPlatypus)上测试时,SymNoise始终优于NEFTune。当前文献,包括NEFTune,强调了在语言模型微调中应用基于噪声的策略需要更深入的研究。我们的方法SymNoise是朝着这一方向迈出的又一重要步骤,显示出对现有最先进方法的显著改进。
英文摘要
Recent advancements in instructional fine-tuning have injected noise into embeddings, with NEFTune (Jain et al., 2024) setting benchmarks using uniform noise. Despite NEFTune's empirical findings that uniform noise outperforms Gaussian noise, the reasons for this remain unclear. This paper aims to clarify this by offering a thorough analysis, both theoretical and empirical, indicating comparable performance among these noise types. Additionally, we introduce a new fine-tuning method for language models, utilizing symmetric noise in embeddings. This method aims to enhance the model's function by more stringently regulating its local curvature, demonstrating superior performance over the current method, NEFTune. When fine-tuning the LLaMA-2-7B model using Alpaca, standard techniques yield a 29.79% score on AlpacaEval. However, our approach, SymNoise, increases this score significantly to 69.04%, using symmetric noisy embeddings. This is a 6.7% improvement over the state-of-the-art method, NEFTune (64.69%). Furthermore, when tested on various models and stronger baseline instruction datasets, such as Evol-Instruct, ShareGPT, OpenPlatypus, SymNoise consistently outperforms NEFTune. The current literature, including NEFTune, has underscored the importance of more in-depth research into the application of noise-based strategies in the fine-tuning of language models. Our approach, SymNoise, is another significant step towards this direction, showing notable improvement over the existing state-of-the-art method.