AI-Assisted Variance Reduction in Randomized Experiments
AI辅助的随机实验方差缩减
David Arbour, Eli Ben-Michael, Avi Feller, Apoorva Lal, Lo-Hua Yuan
AI总结 提出将AI预测作为协变量纳入标准回归调整,以降低随机实验方差,具有“无害”特性,并通过模拟和三个实证应用验证了效率提升。
Comments camera ready for KDD 2026
详情
生成式AI和大语言模型可以从丰富、非结构化的输入中生成人类行为的逼真预测,几乎不需要特定任务的训练数据。最近的工作使用这些“数字孪生”预测来补充调查和实验中的人类响应。我们研究了使用AI生成的预测来减少随机实验方差的特殊情况。我们认为这样做不需要新的估计量,研究人员可以简单地将AI预测作为协变量纳入标准回归调整,类似于调整预后评分。这种方法的一个好处是“无害”特性,即当预测无信息时,调整后的估计量会退回到未调整的均值差。其他方法,如预测驱动推断的变体,没有这种保证。我们提供了实施指南,包括如何从离散的LLM输出中获得连续分数,以及如何使用LLM将非结构化输入特征化为辅助协变量。我们在模拟和三个实证应用中展示了这些想法:一个调查元研究、一个电子邮件营销A/B测试和一个大规模技术平台实验。总体而言,效率提升虽然适度但真实,在包含大量文本和其他非结构化数据的研究中收益更大。我们还从经验上确认了无害特性。鉴于这些收益和有限的成本,我们建议将调整AI生成的预测作为常规实证实践。
Generative AI and large language models can produce realistic predictions of human behavior from rich, unstructured inputs with little to no task-specific training data. Recent work uses these ``digital twin'' predictions to supplement human responses in surveys and experiments. We study the special case of using AI-generated predictions to reduce variance in randomized experiments. We argue that doing so requires no new estimators and that researchers can simply include AI predictions as covariates in standard regression adjustment, analogous to adjusting for a prognostic score. A benefit of this approach is a ``do no harm'' property whereby the adjusted estimator reverts to the unadjusted difference in means when predictions are uninformative. Other methods, such as variants of prediction-powered inference, do not have this guarantee. We provide implementation guidance, including how to obtain continuous scores from discrete LLM outputs and how to use LLMs to featurize unstructured inputs as auxiliary covariates. We demonstrate these ideas in simulations and three empirical applications: a survey mega-study, an email marketing A/B test, and a large-scale technology platform experiment. Overall, efficiency gains are real if modest, with greater benefits in studies that contain substantial text and other unstructured data. We also confirm the do no harm property empirically. Given these gains and limited costs, we recommend adjusting for AI-generated predictions as a regular empirical practice.