STRIDE: Training Data Attribution via Sparse Recovery from Subset Perturbations
STRIDE: 通过子集扰动的稀疏恢复进行训练数据归因
Rishit Dagli, Abir Harrasse, Luke Zhang, Florent Draye, Amirali Abdullah, Bernhard Schölkopf, Zhijing Jin
AI总结 提出STRIDE框架,将训练数据归因建模为压缩感知中的稀疏恢复问题,通过激活空间中的轻量级“引导算子”模拟数据子集的影响,实现高效且准确的LLM预训练归因。
详情
- Comments
- project page: https://stride-tda.github.io/
训练数据归因(TDA)旨在将模型的预测追溯到其训练数据。TDA的黄金标准依赖于因果干预,观察模型在数据添加或移除时的变化,但对于大型语言模型(LLMs)而言,重复训练在计算上具有挑战性。因此,大多数方法在参数空间中使用梯度来近似这种效应。然而,跟踪数十亿参数的梯度不仅成本高昂,而且依赖于局部近似。在这项工作中,我们提出了一种转变:我们不估计参数变化,而是在激活空间中建模训练数据的功能效应。我们引入了STRIDE(基于引导的训练数据影响分解),这是一个将TDA表述为压缩感知精神下的稀疏恢复问题的框架。STRIDE学习轻量级的“引导算子”,这些算子模拟在数据子集上训练引起的行为变化。通过测量这些算子如何扰动测试预测,我们通过稀疏线性分解恢复单个训练示例的影响。STRIDE在LLM预训练归因中达到了最先进的性能,同时比先前的方法快一个数量级(13倍)。我们通过下游应用(包括数据选择、数据污染和定性分析)进一步验证了其实用性。
Training Data Attribution (TDA) seeks to trace a model's predictions back to its training data. The gold standard for TDA relies on causal interventions, observing how a model changes when data is added or removed, but repeated retraining is computationally challenging for Large Language Models (LLMs). Consequently, most approaches approximate this effect in the parameter space using gradients. However, tracking gradients across billions of parameters is not only prohibitively expensive but relies on local approximations. In this work, we propose a shift: rather than estimating parameter changes, we model the functional effect of training data in the activation space. We introduce STRIDE (Steering-based Training Data Influence Decomposition), a framework that formulates TDA as a sparse recovery problem in the spirit of compressive sensing. STRIDE learns lightweight "steering operators" that mimic the behavioral shift caused by training on data subsets. By measuring how these operators perturb test predictions, we recover individual training example influences via sparse linear decomposition. STRIDE achieves state-of-the-art for LLM pre-training attribution while being an order of magnitude ($13\times$) faster than previous art. We further validate its practical utility through downstream applications including data selection, data contamination, and qualitative analysis.