arXivDaily arXiv每日学术速递 周一至周五更新
2606.20435 2026-06-19 econ.EM 新提交

Choosing A Headline Estimand from Matching, DID, and Hybrid Designs: A Minimax-Regret Approach

从匹配、DID和混合设计中选择标题估计量:一种极小化最大遗憾方法

Yechan Park, Yuya Sasaki

AI总结 本文提出在面板数据因果效应估计中,混合设计(DIDM)的估计量介于匹配(M)和双重差分(DID)之间,并在宽泛损失函数下是极小化最大遗憾选择,建议将DIDM作为标题估计量,匹配和DID作为边界。

详情
AI中文摘要

使用面板数据估计因果效应的研究人员通常从三种利用过去结果的方法中选择:双重差分(DID)、对滞后结果进行条件化(匹配,M)以及同时进行两者的混合方法(DIDM)。相应的识别假设是非嵌套的,因此对于报告哪种方法几乎没有指导。我们给出了相应估计量有序的条件,其中DIDM介于匹配和DID之间。这使得DIDM在宽泛的损失函数类中成为三者中的极小化最大遗憾选择。我们建议将DIDM报告为标题估计量,匹配和DID作为边界。我们在应用中进行了说明。

英文摘要

Researchers using panel data to estimate causal effects routinely choose among three approaches to using past outcomes: difference-in-differences (DID), conditioning on lagged outcomes (matching, M), and a hybrid that does both (DIDM). The corresponding identifying assumptions are non-nested, leaving little guidance on which to report. We give conditions under which the corresponding estimands are ordered, with DIDM bracketed between matching and DID. This makes DIDM the minimax-regret choice among the three under a broad class of loss functions. We recommend reporting DIDM as the headline estimate, with matching and DID as bounds. We illustrate in applications.

2606.20286 2026-06-19 econ.EM 新提交

Institutions, Inputs, and Agricultural Growth in China:Revisiting Several Controversies, 1949--1986

制度、投入与中国农业增长:重访若干争议(1949–1986)

Jiyuan Lyu

AI总结 本文利用统一数据集和计量方法,重新审视关于中国农业增长的价格剪刀差、重工业投资、1978年改革及去集体化对灌溉影响的四大争议。

详情
AI中文摘要

关于1949年至1986年间中国农业增长的学术争论在价格剪刀差的程度、重工业投资的影响、1978年改革的作用以及去集体化对灌溉的影响等方面持续存在分歧。本文利用单一数据集和互补的计量经济学方法,逐一回应了这些争议。结果表明,1952–1957年是唯一一个通过所有三个渠道实现净提取的时期,此后国家通过财政和信贷工具向农业净流入约1686亿元。重工业投资对农业产生了显著的正向滞后效应,而同期负相关源于投资份额指标的零和性质。投入产出弹性在1970年突然变化,集体农业贷款在1971年断裂,两者均指向华北农业会议的整顿效果。防灾能力从集体时期的0.70下降到家庭承包后的0.53,主要原因是集体维护体系崩溃而非国家投资减少。1979年后农业供给的价格弹性趋近于零,表明1979年的收购价格提高更像是一次性重新校准而非持续的边际激励。

英文摘要

Scholarly debates on China's agricultural growth between 1949 and 1986 continue to differ over the extent of the price scissors, the effect of heavy industrial investment, the role of the 1978 reforms, and the impact of decollectivization on irrigation. Using a single dataset and complementary econometric methods, this paper addresses each of these controversies. The results show that 1952--1957 was the only net extraction period across all three channels, after which the state channelled a net inflow of about 168.6 billion yuan into agriculture via fiscal and credit instruments. Heavy industrial investment exerted a significant positive lagged effect on agriculture, while the contemporaneous negative correlation stemmed from the zero-sum nature of the investment share indicator. The input-output elasticity shifted abruptly in 1970, and collective agricultural loans broke in 1971, both pointing to the rectification effects of the North China Agricultural Conference. Disaster prevention capacity fell from 0.70 under the collective era to 0.53 after household contracting, mainly because the collective maintenance system collapsed rather than because state investment declined. After 1979 the price elasticity of agricultural supply approached zero, suggesting that the 1979 procurement price increase acted more like a one-off recalibration than a sustained marginal incentive.

2606.19972 2026-06-19 econ.EM 新提交

Biodiversity Media Narratives and Stock Market Performance: Evidence from Europe

生物多样性媒体叙事与股市表现:来自欧洲的证据

Andres Azqueta-Gavaldon, Ben Jabeur Sami, Leila Hedhili

AI总结 利用GDELT全球知识图谱构建2015-2025年法德意西四国的生物多样性媒体风险指标,通过面板格兰杰因果检验和增广逆概率加权事件研究发现,生物多样性风险显著降低股价,且低风险期的正面效应大于高风险期的负面效应。

详情
AI中文摘要

本研究为法国、德国、意大利和西班牙构建了2015-2025年间新颖的生物多样性相关媒体风险指标,利用GDELT全球知识图谱捕捉媒体对生物多样性威胁的关注。通过面板格兰杰因果检验和增广逆概率加权(AIPW)事件研究设计,我们发现了高度显著的证据表明生物多样性风险会降低股票价格,其影响在冲击后3至10个月达到峰值。此外,我们揭示了一个明显的非对称性,即低生物多样性风险期的正面效应大于高风险期的负面效应。结果在收益分布的分位数上稳健,并在控制欧洲股票市场波动性和经济政策不确定性时依然成立。我们的发现首次提供了生物多样性媒体叙事驱动欧洲股市估值的证据。

英文摘要

This study constructs novel biodiversity related media risk indicators for France, Germany, Italy, and Spain over 2015-2025, capturing media attention to biodiversity threats using the GDELT Global Knowledge Graph. Using panel Granger causality tests and an augmented inverse probability weighting (AIPW) event-study design, we find highly significant evidence that biodiversity risk reduces stock prices, with effects peaking between 3 and 10 months after a shock. Moreover, we uncover a marked asymmetry whereby the positive effects of low biodiversity risk episodes outweigh the negative effects of high-risk episodes. Results are robust across quantiles of the return distribution and hold when controlling for European equity market volatility and economic policy uncertainty. Our findings provide the first evidence that biodiversity media narratives drive stock market valuations in Europe.

2606.20240 2026-06-19 econ.EM stat.AP 新提交

Two-Sample IV: Efficient Two-Step Estimation and Tests for Overidentification and Weak-Instruments

两样本IV:高效两步估计及过度识别与弱工具变量检验

Fatima Kasenally, Ruoxi Guan, Frank Windmeijer

AI总结 针对两样本IV估计,提出异方差和样本异质性下稳健的两步高效估计方法及过度识别检验,仅需线性回归的汇总统计量,并扩展弱工具变量检验。

详情
AI中文摘要

两样本IV是一种流行的估计方法,当结果变量和处理变量在不同样本中可用,而工具变量在两个样本中都可用时。标准估计量是两样本两阶段最小二乘估计量,在同方差和样本同质性下是有效的。我们开发了一个稳健的两步程序,用于在一般异方差和样本异质性下进行有效估计,并提出了相关的两样本Hansen过度识别检验。我们方法的一个关键特征是只需要两个样本中简化形式和第一阶段的线性回归的汇总统计量。这些是估计系数向量的六个对象,以及同方差和异方差稳健的估计方差矩阵。我们进一步表明,在同方差和同质性下,处理样本中的第一阶段F统计量可以按标准方式用作弱工具变量检验,这里的相对偏差是比例偏差。我们提出了Montiel-Olea和Pflueger (2013)的有效F统计量的扩展,用于异方差情况,遵循Windmeijer (2025)的推广。我们在Marshall (2019)研究教育对投票行为影响的应用中说明了估计量和检验,并进行了聚类稳健推断。

英文摘要

Two-sample IV is a popular estimation method when the outcome and treatment variables are available in different samples, whereas instruments are available in both samples. The standard estimator is two-sample two-stage least squares estimator, which is efficient under homoskedasticity and homogeneity of the samples. We develop a robust two-step procedure for efficient estimation under general heteroskedasticity and heterogeneity of the samples, and propose a related two-sample Hansen overidentification test. A key feature of our approach is that only summary statistics from the linear regressions of the reduced form and first-stage in the two samples are needed. These are the six objects of the estimated coefficient vectors, and the homoskedastic and heteroskedasticity robust estimated variance matrices. We further show that the first-stage F-statistic in the treatment sample can be used as a test for weak instruments in the standard way under homoskedasticity and homogeneity, with the relative bias here a proportional bias. We propose an extension of the effective F-statistic of Montiel-Olea and Pflueger (2013) for the heteroskedastic case, following the generalization in Windmeijer (2025). We illustrate the estimators and tests in an application studying the effect of education on voting behavior from Marshall (2019), with cluster robust inference.

2606.19599 2026-06-19 eess.SY cs.SY econ.EM 交叉投稿

Ramping Procurement and Bid-Cost Recovery in Real-Time Market

实时市场中的爬坡采购与投标成本回收

Cong Chen, Valentina Norambuena, Lang Tong

AI总结 研究净需求不确定下与经济调度协同优化的爬坡采购,分析单间隔与多间隔协同优化设计,提出评估发电机利润、消费者支付、投标成本回收和运营效率的分析框架,并比较三种定价机制。

Comments 4 figures

详情
AI中文摘要

我们研究了净需求不确定下与经济调度协同优化的爬坡采购。我们考察了电网运营商实施的两种灵活爬坡产品设计:单间隔和多间隔协同优化。两者都依赖于滚动窗口随机优化,包含绑定和咨询间隔决策。我们开发了分析框架来评估发电机利润、消费者支付、投标成本回收(BCR)和运营效率。特别是,净需求不确定性可能导致发电机补偿不足,需要歧视性BCR。虽然运营效率对能量和爬坡价格不变,但生产者利润和消费者支付关键取决于定价。我们研究了节点边际定价(LMP)和两种统一定价:最大调度成本定价(MDCP)和最大时间节点边际定价(MTLMP)。在市场外BCR下,LMP产生歧视性能量价格,而MDCP消除BCR,MTLMP在大多数情况下也是如此。这一性质使我们能够在MDCP下为价格接受型发电机建立真实投标激励。我们的分析突出了单间隔和多间隔协同优化与定价设计之间的权衡:在高预测不确定性和中等爬坡需求下,单间隔能量-爬坡协同优化具有优势,而当净需求预测相对准确且爬坡需求具有挑战性时,多间隔协同优化更优。基于CAISO和ERCOT数据的实证结果表明,与LMP相比,MDCP和MTLMP增加了生产者利润且BCR可忽略,但以消费者支付增加为代价。

英文摘要

We study ramping procurement co-optimized with economic dispatch under net-demand uncertainty. We examine two flexible ramp product designs implemented by grid operators: single-interval and multi-interval co-optimization. Both rely on rolling-window stochastic optimization with binding and advisory interval decisions. We develop analytical frameworks to evaluate generator profits, consumer payments, bid cost recovery (BCR), and operational efficiency. In particular, net-demand uncertainty may lead to generator under-compensation, requiring discriminatory BCR. While operational efficiency is invariant to energy and ramp prices, producer profits and consumer payments depend critically on pricing. We examine locational marginal pricing (LMP) and two uniform pricing: maximum dispatch cost pricing (MDCP) and maximum temporal locational marginal pricing (MTLMP). With out-of-market BCR, LMP yields discriminatory energy prices, whereas MDCP eliminates BCR and MTLMP does so in most cases. This property enables us to establish truthful bidding incentives for price-taking generators under MDCP. Our analysis highlights trade-offs between single- and multi-interval co-optimization and pricing designs: single-interval energy-ramp co-optimization is advantageous under high forecast uncertainty and moderate ramping requirements, whereas multi-interval co-optimization is superior when net-demand forecasts are relatively accurate and ramp needs are challenging. Empirical results on CAISO and ERCOT data show that MDCP and MTLMP increase producer profits with negligible BCR, albeit at the expense of higher consumer payments relative to LMP.

2606.17165 2026-06-19 stat.ME cs.AI econ.EM math.ST stat.TH 交叉投稿

Statistical Foundations of LLM-based A/B Testing: A Surrogacy Framework for Human Causal Inference

基于LLM的A/B测试的统计基础:用于人类因果推断的替代指标框架

Joel Persson, Mårten Schultzberg, Sebastian Ankargren

发表机构 * Spotify USA, Inc.(Spotify美国公司)

AI总结 提出替代指标理论框架,证明在弱于分布等价条件下,校准LLM输出可识别平均处理效应,并分析随机性带来的偏差与方差。

详情
AI中文摘要

组织和研究者越来越有兴趣在A/B测试中使用大型语言模型(LLM)代替人类参与者,以期更快、更低成本地进行实验。我们研究当在LLM结果上估计的处理效应何时能够恢复在感兴趣的人类群体上测量的效应。LLM与人类结果之间的分布等价性会使任何标准估计量有效,但这不现实。因此,我们开发了一个统计框架,将替代终点理论适配到LLM。该框架表明,将LLM结果校准到人类结果,在替代性和可比性条件(联合弱于分布等价性)下,可以识别平均处理效应。当这些条件不成立时,感兴趣的效应仅部分可识别,我们提供了诊断方法,可以在历史实验上证伪替代性,并给出有限重叠下最坏情况偏差的界限。我们进一步证明,LLM固有的随机性会引入偏差和方差,但使用多次抽取的平均值作为替代指标可以同时缓解两者。我们在模拟和Upworthy标题的A/B测试应用中展示了方法和理论。我们工作的一个核心结论是,LLM结果作为替代指标的有效性只能对过去的处理被证伪,而无法对新处理被验证,因此对于新颖干预,人类实验仍然不可或缺。我们讨论了LLM选择、提示和温度作为设计变量的作用,以及如何确定人类实验的规模以进行验证。

英文摘要

Organizations and researchers show increasing interest in using large language models (LLMs) in place of human participants in A/B tests, in the hope of experimenting faster and at lower cost. We study when a treatment effect estimated on LLM outcomes can recover the effect that would have been measured on the human population of interest. Distributional equivalence between LLM and human outcomes would make any standard estimator valid but is unrealistic. We therefore develop a statistical framework that adapts surrogate endpoint theory to LLMs, showing that calibrating LLM outcomes to human outcomes identifies the average treatment effect under surrogacy and comparability conditions that are jointly weaker than distributional equivalence. We present a falsification test for surrogacy and a bound on the worst-case bias from limited overlap between the LLM and human samples. We further show that the stochasticity inherent to LLMs can weaken surrogacy for identification while also introducing bias and variance during estimation, but that using an average over multiple LLM draws per unit as the surrogate mitigates these issues. Simulations validate the results, and an empirical application to A/B tests on Upworthy headlines shows that raw LLM predictions recover only 39\% of the human treatment effect while nonparametric calibration closes the gap. A central takeaway is that A/B testing on LLMs yields correct results only by assumption, whereas A/B testing on humans is correct by design, and that the required assumptions are hardest to justify precisely where A/B testing on LLMs promises the greatest benefit. We discuss the role of LLM choice, prompting, and temperature as design variables, the compounded challenge posed by long-term outcomes, and how to size human pilot studies for validation.

2412.17470 2026-06-19 math.ST econ.EM stat.ME stat.TH 版本更新

A Necessary and Sufficient Condition for Size Controllability of Heteroskedasticity Robust Test Statistics

异方差稳健检验统计量尺寸可控性的一个充要条件

Benedikt M. Pötscher, David Preinerstorfer

AI总结 针对回归模型中单个约束检验,给出了异方差稳健检验统计量尺寸可控性的充要条件,改进了现有仅充分条件的结果。

Comments Clarification in Footnote 15 added

详情
AI中文摘要

我们重新审视了Pötscher和Preinerstorfer (2025)中关于回归模型中异方差稳健检验统计量的尺寸可控性结果。对于检验单个约束(例如,单个系数的零约束)这一特殊但重要的情形,我们给出了尺寸可控性的一个充要条件,而Pötscher和Preinerstorfer (2025)中的条件通常仅是充分的(即使在检验单个约束的情形下)。

英文摘要

We revisit size controllability results in Pötscher and Preinerstorfer (2025) concerning heteroskedasticity robust test statistics in regression models. For the special, but important, case of testing a single restriction (e.g., a zero restriction on a single coefficient), we povide a necessary and sufficient condition for size controllability, whereas the condition in Pötscher and Preinerstorfer (2025) is, in general, only sufficient (even in the case of testing a single restriction).

2603.06820 2026-06-19 econ.EM stat.OT 版本更新

Hippocratic Utility and Status Quo Bias

希波克拉底效用与现状偏见

Tomasz Strzalecki

AI总结 本文通过简单例子揭示一种重视失去生命多于拯救生命的效用函数,其适用范围比最初看起来有限得多。

详情
AI中文摘要

一种效用函数被提出,它更重视失去的生命而非被拯救的生命。我不质疑这种不对称背后的伦理动机。然而,我通过一个简单例子表明,这种决策标准的适用范围比最初看起来要有限得多。

英文摘要

A utility function has been proposed that values more lives that are lost than those that are saved. I do not dispute the ethical motivation behind this kind of asymmetry. However, I show with a simple example that the scope of applicability of such a decision criterion is considerably more limited than it may first appear.

2512.02203 2026-06-19 econ.EM stat.AP 版本更新

Statistical Inference in Large Multi-way Networks

大规模多路网络中的统计推断

Lucas Resende, Guillaume Lecué, Lionel Wilner, Philippe Choné

AI总结 提出一种基于分类任务的多路网络结构参数估计方法,无需固定效应数量与结构假设,避免 incidental parameter 问题,在稀疏网络中比 PPML 更快且置信区间更可靠,应用于法国医疗政策因果效应分析。

Comments Working paper

详情
AI中文摘要

我们提出了一种新方法,用于在多路网络中估计结构参数,同时控制丰富的固定效应结构。该方法基于一系列分类任务,对固定效应的数量和结构均不敏感。与完全最大似然方法相比,我们的估计量不会受到 incidental parameter 问题的影响。对于稀疏连接的网络,它在计算上也比 PPML 更快。我们提供的经验证据表明,我们的估计量比 PPML 及其偏差修正策略产生更可靠的置信区间。即使在模型误设下,这些改进仍然成立,并且在稀疏设置中更为显著。虽然 PPML 在密集、低维数据中仍具有竞争力,但我们的方法为多路模型提供了一种稳健的替代方案,能够随稀疏性高效扩展。该方法被应用于研究政策改革对法国医疗空间可达性的因果效应。

英文摘要

We propose a new method to estimate structural parameters in multi-way networks while controlling for rich structures of fixed effects. The method is based on a series of classification tasks and is agnostic to both the number and structure of fixed effects. In contrast to full maximum likelihood approaches, our estimator does not suffer from the incidental parameter problem. For sparsely connected networks, it is also computationally faster than PPML. We provide empirical evidence that our estimator yields more reliable confidence intervals than PPML and its bias-correction strategies. These improvements hold even under model misspecification and are more pronounced in sparse settings. While PPML remains competitive in dense, low-dimensional data, our approach offers a robust alternative for multi-way models that scales efficiently with sparsity. The method is applied to study the causal effect of a policy reform on spatial accessibility to health care in France.

2502.06866 2026-06-19 cs.LG cs.AI econ.EM stat.AP stat.ML 版本更新

Global Ease of Living Index: a machine learning framework for longitudinal analysis of major economies

全球生活便利指数:面向主要经济体纵向分析的机器学习框架

Arun Kumar Selvaraj, Tanay Panat, Rohitash Chandra

发表机构 * Transitional Artificial Intelligence Research Group, School of Mathematics and Statistics(过渡人工智能研究组,数学与统计学学院) Centre for Artificial Intelligence and Innovation(人工智能与创新中心) Pingla Institute(Pingla研究所)

AI总结 提出全球生活便利指数,结合社会经济和基础设施因素,利用机器学习处理缺失数据,并通过主成分分析和因子分析降维,为政策制定者提供改善生活质量的可操作工具。

详情
AI中文摘要

全球经济、地缘政治条件以及COVID-19疫情等破坏性事件对生活成本和生活质量产生了巨大影响。理解主要经济体中生活成本和生活质量的长期影响至关重要。一个透明且全面的生活指数必须包含生活条件的多个维度。在本研究中,我们提出了一种通过全球生活便利指数量化生活质量的方法,该指数将各种社会经济和基础设施因素整合为一个单一综合得分。我们的指数利用定义生活水平的经济指标,这有助于针对特定领域进行干预改进。我们提出了一个机器学习框架来处理特定国家某些经济指标的数据缺失问题。然后,我们整理并更新数据,并使用降维方法(主成分分析和因子分析)创建自1970年以来主要经济体的生活便利指数。我们的工作通过为政策制定者提供识别需要改进领域(如医疗系统、就业机会和公共安全)的实用工具,显著丰富了相关文献。我们的方法使用开放数据和代码,易于复现并适用于各种情境,为生活质量评估的持续研究和政策制定提供了透明度和可访问性。

英文摘要

The drastic changes in the global economy, geopolitical conditions, and disruptions such as the COVID-19 pandemic have impacted the cost of living and quality of life. It is essential to comprehend the long-term implications of the cost of living and quality of life in major economies. A transparent and comprehensive living index must include multiple dimensions of living conditions. In this study, we present an approach to quantifying the quality of life through the Global Ease of Living Index that combines various socio-economic and infrastructural factors into a single composite score. Our index utilises economic indicators that define living standards, which could help in targeted interventions to improve specific areas. We present a machine learning framework to address missing data for certain economic indicators in specific countries. We then curate and update the data and use a dimensionality reduction approach (Principal Component Analysis and Factor Analysis) to create the Ease of Living Index for major economies since 1970. Our work significantly adds to the literature by offering a practical tool for policymakers to identify areas needing improvement, such as healthcare systems, employment opportunities, and public safety. Our approach with open data and code can be easily reproduced and applied to various contexts, providing transparency and accessibility for ongoing research and policy development in quality-of-life assessment.

2202.03332 2026-06-19 stat.ME econ.EM stat.AP 版本更新

Practical Forecasting of Environmental Maps: A Functional Data Approach

环境地图的实用预测:一种函数型数据方法

Alexander Gleim, Nazarii Salish

AI总结 提出一种基于函数型数据分析的统计方法,用于预测随时间变化的地理区域环境数据,通过整合时空依赖关系生成预测表面,并以德国地面臭氧浓度预测为例验证其有效性。

详情
AI中文摘要

环境问题在社会经济和健康研究中日益受到关注,推动了相关现实过程记录和数据收集的进展。然而,传统数据处理工具往往过于局限,无法考虑此类数据集的丰富特性。本文提出了一种简单的统计视角,用于预测随时间在预定义地理区域上顺序收集的环境数据。我们将此类数据集视为具有可能复杂地理区域的表面(或函数型)时间序列。利用函数型数据分析技术,我们开发了一种预测方法,能够同时考虑地理和时间依赖性。该方法允许整合传统多元技术以提供预测表面。我们通过德国地面臭氧浓度的预测示例展示了我们方法的实用价值,证明了其有效性和广泛应用的潜力。

英文摘要

Environmental problems are receiving increasing attention in socio-economic and health studies, fostering advances in recording and data collection of related real-life processes. However, traditional tools for data processing are often found too restrictive as they do not account for the rich nature of such data sets. In this paper, we propose a simple statistical perspective on forecasting environmental data collected sequentially over time across some predefined geographic region. We treat such data set as a surface (or functional) time series with a possibly complicated geographical domain. Using techniques from functional data analysis, we develop a forecasting methodology that allows to account for both geographic and temporal dependencies. This methodology allows integration of traditional multivariate techniques to provide forecasts surfaces. We demonstrate the practical value of our approach with a forecasting example of ground-level ozone concentration across Germany, showcasing its effectiveness and potential for broad application.