arXivDaily arXiv每日学术速递 周一至周五更新
重置
2605.15966 2026-05-18 econ.EM stat.ME

Quasi-Bayesian Local Projection Instrumental-Variables Method: Application to Renewable Energy and Electricity Prices

准贝叶斯局部投影工具变量方法:应用于可再生能源和电力价格

Masahiro Tanaka

AI总结 本文提出一种准贝叶斯方法用于局部投影工具变量估计,通过广义矩方法构建准后验,并采用粗糙度惩罚先验平滑不同时间跨度的冲击响应。方法保留传统LP-IV方法的一阶特性,增强有限样本稳定性,并允许联合推断。仿真显示该正则化方法在中长期预测中降低均方误差。

详情
Comments
This paper supersedes a working paper circulated under the title "Quasi-Bayesian Local Projections: Simultaneous Inference and Extension to the Instrumental Variable Method" (arXiv:2503.20249)
AI中文摘要

本文介绍了一种准贝叶斯方法用于局部投影工具变量(LP-IV)估计。它利用广义矩方法(GMM)目标构建基于矩的准后验,并应用粗糙度惩罚先验以平滑不同时间跨度的冲击响应。该方法保留传统LP-IV方法的关键一阶特性,同时在有限样本中增强稳定性,并允许通过同时带进行联合推断。仿真表明,这种正则化方法相比标准GMM在中长期预测中降低了均方误差。对丹麦电力市场应用展示了该方法的实用性。

英文摘要

This paper introduces a quasi-Bayesian approach for local projection instrumental-variables (LP-IV) estimation. It builds a moment-based quasi-posterior using the generalized method of moments (GMM) objective and applies a roughness-penalty prior to smooth impulse responses over different horizons. The approach maintains the key first-order features of traditional LP-IV methods, while enhancing stability in finite samples and allowing for joint inference through simultaneous bands. Simulations indicate that this regularization decreases root mean squared error compared to standard GMM, especially at medium and longer horizons. An application to Danish electricity markets highlights the method's practical usefulness.

2605.04336 2026-05-18 econ.TH cs.CR cs.GT

The Adversarial Discount -- AI, Signal Correlation, and the Cybersecurity Arms Race

对抗折扣——人工智能、信号相关性与网络安全军备竞赛

James W. Bono

AI总结 本文研究了对抗投资的竞赛模型,探讨了攻击者和防御者在多个攻击面中分配资源给AI增强能力的过程,分析了信号相关性对军备竞赛比例的影响,并指出信息聚合可能超越私人能力投资。

详情
AI中文摘要

我们研究了一个对抗性投资的竞赛理论模型,其中攻击者和防御者在多个攻击面上分配资源以增强AI增强能力。攻击者的投资通过两个渠道运作:它无条件地增强进攻能力,并有条件地削弱防御效果,产生一种对抗折扣,这种折扣随着防御者自身投资的增加而内生加深。我们推导出一个闭式军备竞赛比例,将进攻和防御投资的相对边际效果分解为六个结构性原始要素,并在连续最佳响应动态下建立均衡唯一性和全局收敛性。核心结果关注信号交叉相关性,即威胁情报在一个表面上的信息如何影响另一个表面的检测。在完全交叉相关的情况下,军备竞赛比例与攻击面数量无关:攻击者的结构优势随着攻击面的增加完全被抵消。在基准全稀释情况下,没有交叉相关性时,每攻击面的防御效果随着攻击面的增长而消失。将分析扩展到面对攻击者的目标预期值的异质防御者,我们指出模型指出了双重低效率:过度投资于私人防御(零和转移外部性)和低估共享信号相关性(公共物品)。这些正式结果,加上模型外的公共物品推理,阐明了何时集体信息聚合可以超越私人能力投资成为对抗竞赛中的决定性因素。

英文摘要

We study a contest-theoretic model of adversarial investment in which an attacker and a defender allocate resources to AI-augmented capabilities across multiple attack surfaces. The attacker's investment operates through two channels: it amplifies offensive potency unconditionally and erodes defensive effectiveness conditionally, generating an adversarial discount that deepens endogenously with the defender's own investment. We derive a closed-form arms race ratio decomposing the relative marginal effectiveness of offensive and defensive investment into six structural primitives and establish equilibrium uniqueness and global convergence under a continuous best-response dynamic. The central result concerns signal cross-correlation, the degree to which threat intelligence on one surface informs detection on another. With full cross-correlation, the arms race ratio is independent of the number of attack surfaces: the attacker's structural advantage from surface proliferation is completely neutralized. Under the benchmark full-dilution case, without cross-correlation, per-surface defense effectiveness vanishes as the attack surface grows. Extending the analysis to heterogeneous defenders facing an attacker who targets by expected value, we argue that the model points to a dual inefficiency: overinvestment in private defense (a zero-sum redirective externality) and underinvestment in shared signal correlation (a public good). These formal results, together with public-good reasoning outside the base model, characterize when collective information aggregation can dominate private capability investment as the decisive margin in adversarial contests.

2509.11673 2026-05-18 econ.TH

Grabbing the Forbidden Fruit: Restriction-Sensitive Choice

摘取禁果:限制敏感选择

Niels Boissonnet, Alexis Ghersengorin

AI总结 本文提出限制敏感选择模型,解释禁果效应,并分析信仰回火效应和少数族裔整合政策的反噬现象。

详情
AI中文摘要

限制个体接触某些机会可能使其欲望转向替代品,即禁果效应。我们 axiomatize 一种名为限制敏感选择(RSC)的 choice 模型,该模型能解释禁果效应并与反应理论和商品理论相容。该模型可从选择数据中识别,特别是通过观察选项移除导致的选择反转。我们从代理自由和福利角度进行规范分析。我们应用该模型揭示两个现象:信念的回火效应和针对少数族裔的整合政策的反噬。

英文摘要

Restricting individuals' access to some opportunities may steer their desire toward their substitutes, a phenomenon known as the forbidden fruit effect. We axiomatize a choice model named restriction-sensitive choice (RSC), which rationalizes the forbidden fruit effect and is compatible with the prominent psychological explanations: reactance theory and commodity theory. The model is identifiable from choice data, specifically from the observation of choice reversals caused by the removal of options. We conduct a normative analysis both in terms of the agent's freedom and welfare. We apply our model to shed light on two phenomena: the backfire effect of beliefs and the backlash of integration policies targeted towards minorities.

2508.14522 2026-05-18 econ.TH

Equal Treatment of Equals and Efficiency in Probabilistic Assignments

同等对待与概率分配中的效率

Yasunori Okumura

AI总结 本文研究了涉及不可分割物品的多单位概率分配问题,探讨如何实现同等对待(ETE)公平性并满足多种效率标准。提出ETE重新分配程序,分析其是否保持原有效率属性,并提出在一般约束下构造同时满足ETE和顺序效率的高效方法。

详情
AI中文摘要

本文研究了涉及不可分割物品的一般多单位概率分配问题,特别关注实现同等对待(ETE)公平性并满足各种效率标准。我们扩展了ETE的定义,使其能够适应广泛约束和应用。我们引入了ETE重新分配程序,将任何分配转换为满足ETE的分配,并检查原始分配的效率属性——即事后效率、顺序效率和排名最小化效率——是否在ETE重新分配下得以保持。我们证明,尽管ETE重新分配后的事后高效分配仍保持事后效率,但在一般情况下可能无法保持顺序效率。然而,由于ETE重新分配后的排名最小化分配保持排名最小化效率,因此必须存在同时满足ETE和顺序效率的分配。此外,我们提出了一种计算高效的构造方法,在一般上界约束下,通过结合顺序独裁规则、适当指定的优先级列表和ETE重新分配程序,构造同时满足ETE和顺序效率的分配。

英文摘要

This paper studies general multi-unit probabilistic assignment problems involving indivisible objects, with a particular focus on achieving the fairness notion of equal treatment of equals (ETE) and satisfying various efficiency criteria. We extend the definition of ETE so that it accommodates a wide range of constraints and applications. We introduce the ETE reassignment procedure, which transforms any assignment into one that satisfies ETE, and examine whether the efficiency properties satisfied by the original assignment -- namely, ex-post efficiency, ordinal efficiency, and rank-minimizing efficiency -- are preserved under the ETE reassignment. We show that, while the ETE reassignment of an ex-post efficient assignment remains ex-post efficient, it may fail to preserve ordinal efficiency in general settings. However, since the ETE reassignment of a rank-minimizing assignment preserves rank-minimizing efficiency, there must exist an assignment satisfying both ETE and ordinal efficiency. Furthermore, we propose a computationally efficient method for constructing assignments that satisfy both ETE and ordinal efficiency under general upper bound constraints by combining the serial dictatorship rule with appropriately specified priority lists and the ETE reassignment procedure.

2506.22440 2026-05-18 cs.CY cs.LG cs.MA econ.GN q-fin.EC

From Model Design to Organizational Design: Complexity Redistribution and Trade-Offs in Generative AI

从模型设计到组织设计:生成AI中的复杂性再分配与权衡

Sharique Hasan, Alexander Oettl, Sampsa Samila

AI总结 本文提出GAS框架,分析大语言模型如何重塑组织与竞争策略,揭示生成AI中通用性、准确性与简洁性之间的权衡及复杂性再分配对管理挑战的影响。

详情
AI中文摘要

本文引入通用性-准确性-简洁性(GAS)框架,分析大语言模型如何重塑组织和竞争策略。我们认为,将AI视为简单输入成本降低忽略了两个关键动态:(a)通用性、准确性和简洁性之间的固有权衡;(b)复杂性在利益相关者间的再分配。尽管LLMs通过简单接口提供高通用性和准确性,这种用户端的简洁性掩盖了复杂性向基础设施、合规性和专业人员的转移。因此,GAS权衡并未消失,而是从用户转移到组织,带来新的管理挑战,尤其是在高风险应用中的准确性问题。我们主张,竞争优势不再来自单纯的AI采用,而是来自通过抽象层设计、流程对齐和互补专业知识掌握再分配的复杂性。本研究通过阐明可扩展认知如何再分配复杂性并重新定义技术整合的条件,推动了AI战略的发展。

英文摘要

This paper introduces the Generality-Accuracy-Simplicity (GAS) framework to analyze how large language models (LLMs) are reshaping organizations and competitive strategy. We argue that viewing AI as a simple reduction in input costs overlooks two critical dynamics: (a) the inherent trade-offs among generality, accuracy, and simplicity, and (b) the redistribution of complexity across stakeholders. While LLMs appear to defy the traditional trade-off by offering high generality and accuracy through simple interfaces, this user-facing simplicity masks a significant shift of complexity to infrastructure, compliance, and specialized personnel. The GAS trade-off, therefore, does not disappear but is relocated from the user to the organization, creating new managerial challenges, particularly around accuracy in high-stakes applications. We contend that competitive advantage no longer stems from mere AI adoption, but from mastering this redistributed complexity through the design of abstraction layers, workflow alignment, and complementary expertise. This study advances AI strategy by clarifying how scalable cognition relocates complexity and redefines the conditions for technology integration.

2506.05996 2026-05-18 econ.EM

Statistical significance in choice modelling: computation, usage and reporting

选择模型中统计显著性的计算、使用与报告

Stephane Hess, Andrew Daly, Michiel Bliemer, Angelo Guevara, Ricardo Daziano, Thijs Dekker

AI总结 本文探讨了选择模型中统计显著性的计算、使用与报告,指出过度依赖95%置信水平和对显著性的误解问题,强调需结合行为或政策意义,并关注衍生指标如支付意愿及随机异质性等特殊问题。

详情
AI中文摘要

本文对选择模型中统计显著性的使用进行了评论。我们回顾了参数估计不确定性的原因,讨论了不确定性衡量和置信区间的计算,以及统计检验的使用。我们认为,如同其他科学领域,过度依赖95%置信水平和对显著性的误解普遍存在。我们还观察到许多研究在报告不确定性衡量时缺乏精确性,尤其是在使用p值和星号衡量时更为明显。本文还强调,除了统计显著性外,还需考虑行为或政策意义。最后,我们强调了选择模型中一些特定问题的重要性,如支付意愿等衍生指标、随机异质性的处理以及重复选择数据的使用。

英文摘要

This paper offers a commentary on the use of notions of statistical significance in choice modelling. We review the reasons for uncertainty in parameter estimates, provide a precise discussion on the computation of measures of uncertainty and confidence intervals, and discuss the use of statistical tests. We argue that, as in many other areas of science, there is an over-reliance on 95\% confidence levels, and misunderstandings of the meaning of significance. We also observe a lack of precision in the reporting of measures of uncertainty in many studies, especially when using $p$-values and even more so with \emph{star} measures. The paper also stresses the importance of considering behavioural or policy significance in addition to statistical significance. Finally, we stress a number of points that are specific to choice modelling and which require special attention, notably in relation to derived measures such as willingness-to-pay, the treatment of random heterogeneity, and the use of repeated choice data.

2404.02871 2026-05-18 math.OC econ.TH

Existence and uniqueness results for a mean-field game of optimal investment

均场博弈最优投资的存在性和唯一性结果

Alessandro Calvia, Salvatore Federico, Giorgio Ferrari, Fausto Gozzi

AI总结 本文研究了随机均场博弈最优投资的均衡存在性和唯一性,分析了有限和无限时间 horizon 的情况,并探讨了确定性均场博弈的对应问题。

详情
AI中文摘要

我们建立了随机均场博弈最优投资均衡的存在性和唯一性。分析涵盖了有限和无限时间 horizon,且代表性公司的均场交互通过时间依赖的价格建模,该价格由代表性公司在每个时间点的预期(最优控制)生产能力的非线性函数决定。存在性和唯一性证明依赖于先验估计和非线性积分方程的研究,但采用了不同的技术处理有限和无限时间 horizon 的情况。此外,我们还研究了所研究均场博弈的确定性对应问题。

英文摘要

We establish the existence and uniqueness of the equilibrium for a stochastic mean-field game of optimal investment. The analysis covers both finite and infinite time horizons, and the mean-field interaction of the representative company with a mass of identical and indistinguishable firms is modeled through the time-dependent price at which the produced good is sold. At equilibrium, this price is given in terms of a nonlinear function of the expected (optimally controlled) production capacity of the representative company at each time. The proof of the existence and uniqueness of the mean-field equilibrium relies on a priori estimates and the study of nonlinear integral equations, but employs different techniques for the finite and infinite horizon cases. Additionally, we investigate the deterministic counterpart of the mean-field game under study.

2605.15644 2026-05-18 econ.TH

Dynamic Macroeconomics with Multiple Regimes

具有多个制度的动态宏观经济学

Jorge R. Chávez F

AI总结 本文提出DMR框架,通过多个制度特定传播算子描述经济演化,揭示不变律与制度依赖系统在拓扑上不等价,且制度依赖动态不可约,无法通过状态空间的单射变换消除。

详情
Comments
24 pages
AI中文摘要

宏观经济动态通常假设经济遵循单一不变运动定律。本文表明这一假设存在结构性限制。我们发展了动态宏观经济学与多个制度(DMR)框架,其中经济演化由多个制度特定传播算子支配。因此,轨迹由异质算子的有序组合产生,而非单一映射的迭代。我们得出三个结构性结果:首先,不变律与制度依赖系统在拓扑上不等价;其次,制度依赖动态不可约:无法通过任何状态空间的单射变换消除;第三,当制度算子不交换时,不存在映射F:R^n→R^n,其迭代能再现所有制度允许的轨迹。这些结果确立了不变律宏观经济学与制度依赖动态之间的结构性分离,意味着稳定性、政策评估和结构性特征必须在相互作用的传播算子层面进行,而非在单一不变映射内。

英文摘要

Macroeconomic dynamics is typically modeled under the assumption that the economy evolves according to a single invariant law of motion. This paper shows that this assumption imposes a structural restriction. We develop Dynamic Macroeconomics with Multiple Regimes (DMR), a framework in which economic evolution is governed by multiple regime-specific propagation operators. As a result, trajectories arise from ordered compositions of heterogeneous operators rather than from the iteration of a single mapping. We establish three structural results. First, invariant-law and regime-dependent systems are not topologically equivalent. Second, regime dependence is dynamically irreducible: it cannot be eliminated through any injective transformation of the state space. Third, whenever regime operators fail to commute, there exists no map $F:\mathbb{R}^n\to\mathbb{R}^n$ whose iterates reproduce all regime-admissible trajectories. These results establish a structural separation between invariant-law macroeconomics and regime-dependent dynamics, implying that stability, policy evaluation, and structural characterization must be conducted at the level of interacting propagation operators rather than within a single invariant mapping.

2605.15614 2026-05-18 econ.GN q-fin.EC

When Redistribution Becomes a State Variable: Monetary-Fiscal Stabilization with Type-Specific Sticky Wages

当再分配成为状态变量:具有类型特异粘性工资的货币-财政稳定化

Kenji Miyazaki

AI总结 本文研究了在工资合同类型特异的情况下,再分配作为状态变量对宏观经济的影响,揭示了工资差距作为分布状态变量的作用及财政政策的效果。

详情
Comments
35 pages, 7 figures
AI中文摘要

许多可操作的TANK模型将再分配视为同时性的楔子。我证明,一旦工资合同是类型特异的,这种观点就不完整。在一个可操作的双代理新凯恩斯模型中,每个家庭类型会根据自身之前的工资调整名义工资。这种自身滞后合同使跨类型工资差距成为相关分布状态变量。工资差距遵循二次预期运动定律,并通过消费分散反馈到总需求。因此,通货膨胀稳定或同时利润楔子中性化通常无法恢复相应的代表性代理分配。在维持承诺基准下,从时期t=1开始的RANK等价稳定需要依赖历史的转移,不仅响应继承的工资分散,还响应当前利润。在基准校准中,工资刚性将转移冲击的峰值产出响应提高3.27倍,从2.49×10⁻⁴到8.14×10⁻⁴。

英文摘要

Many tractable TANK models treat redistribution as a contemporaneous wedge. I show that this view is incomplete once wage contracts are type-specific. In a tractable Two-Agent New Keynesian model, each household type adjusts its nominal wage relative to its own previous wage. This own-lag contract makes the cross-type wage gap a payoff-relevant distributional state variable. The wage gap follows a second-order expectational law of motion and feeds back into aggregate demand through consumption dispersion. Inflation stabilization or contemporaneous profit-wedge neutralization therefore generally fails to restore the corresponding representative-agent allocation. Under the maintained commitment benchmark, RANK-equivalent stabilization from period $t=1$ onward requires history-dependent transfers that respond to inherited wage dispersion, not only current profits. In the benchmark calibration, wage rigidity raises the peak output response to a transfer shock by a factor of 3.27, from $2.49\times 10^{-4}$ to $8.14\times 10^{-4}$.

2605.15405 2026-05-18 econ.GN q-fin.EC stat.ME

Estimating Social Norm Complementarities

估计社会规范的互补性

Eliana La Ferrara, Cheaheon Lim, Davide Viviano

AI总结 本文通过实证研究探讨社会规范在技术和社会维度的互补性,发现女性割礼和童婚在塞拉利昂存在互补性,而多妻制与童婚在尼日利亚存在替代性,为政策制定提供依据。

详情
AI中文摘要

我们开发了一个关于社会规范选择的模型,允许在两个维度上存在互补性:技术维度,类似于消费商品之间的互补性,以及社会维度,捕捉从从众中获得的回报。这些共同决定了两种规范是互补品、替代品还是独立品,这由一种规范的均衡普及率如何响应另一种规范效用的边际变化来定义。我们使用塞拉利昂和尼日利亚的重复横断面数据估计该模型,重点研究女性割礼、多妻制和童婚。社会回报在所有规范中均显著。对于女性割礼和童婚,我们发现互补性证据,尤其是在塞拉利昂尤为明显。对于多妻制和童婚,我们发现社会替代性证据,特别是在尼日利亚。我们利用人类学见解解释这些差异。最后,我们通过迭代模型研究政策反事实,评估法律改革和社会干预的潜在影响。

英文摘要

We develop a model of choice over social norms that allows for complementarities along two dimensions: \textit{technological}, analogous to complementarities between consumption goods, and social, capturing returns from conformity. Together, these determine whether two norms are complements, substitutes, or independent, as defined by how the equilibrium prevalence of one norm responds to a marginal shift in the utility of another. We estimate the model using repeated cross-sections from Sierra Leone and Nigeria, focusing on female genital cutting, polygyny, and child marriage. Social returns are significant across all specifications. For female genital cutting and child marriage, we find evidence of complementarities, especially strong in Sierra Leone. For polygyny and child marriage, we find evidence of social substitutability, particularly in Nigeria. We interpret these differences using insights from anthropology. Finally, we iterate the model forward to study policy counterfactuals, assessing the potential effects of legal reforms and social interventions.

2605.15358 2026-05-18 econ.EM

Double Descent and Benign Overfitting in Macroeconomic Forecasting

双重下降与宏观经济预测中的良性过拟合

Andrea Carriero, Florian Huber, Davide Pettenuzzo

AI总结 本文研究宏观经济预测中的双重下降和良性过拟合现象,通过引入合成数据提升预测性能,证明其等价于因子结构核的岭回归,结果表明良性过拟合通过隐含构造良好核实现成功。

详情
Comments
56 pages, 8 figures
AI中文摘要

本文研究宏观经济预测中的双重下降和良性过拟合现象。我们发现标准的宏观经济数据集中的双重下降风险曲线由少量潜在因素驱动,并且我们确定了良性过拟合机制成立的条件。Bartlett等人的条件在精确因子模型下成立,并且在更现实的近似因子模型下也成立,前提是异质性方差在系列之间不分散太多。由于宏观经济面板只有中等维度,理论所需的过参数化比率N/T并不自然可用。我们的解决方案是通过估计因子模型生成合成数据,并证明这种策略收敛于具有因子结构核的岭回归。使用每月(FRED-MD)和季度(FRED-QD)美国数据,所得估计器在所有系列和时间范围内均优于Stock-Watson因子模型的点预测,收益广泛、统计显著且随预测时间 horizon 增加而增加。我们的结果表明,良性过拟合在有效时成功是因为过参数化隐含构造了良好的核,而不是因为过参数化本质上是可取的。

英文摘要

We study double descent and benign overfitting in macroeconomic forecasting. We document that double-descent risk curves arise in standard macroeconomic datasets that are driven by a small number of latent factors, and we characterize when the underlying benign-overfitting mechanism holds. The conditions of Bartlett et al. (2020) are satisfied under the exact factor model and can also hold under the more realistic approximate factor model, provided idiosyncratic variances are not too dispersed across series. Because macroeconomic panels have only moderate dimensions, the overparameterization ratio N/T required by the theory is not naturally available. Our solution is to augment the data with synthetic copies from an estimated factor model and we prove that this strategy converges to a kernel ridge regression with a factor-structured kernel. Using monthly (FRED-MD) and quarterly (FRED-QD) US data, the resulting estimator consistently outperforms the Stock-Watson factor model for point forecasting across all series and horizons, with gains that are pervasive, statistically significant, and increasing with the forecast horizon. Our results suggest that benign overfitting, when it works, succeeds because overparameterization implicitly constructs a well-behaved kernel, not because overparameterization is intrinsically desirable.

2605.15230 2026-05-18 econ.EM

EnergyAgentBench: Benchmarking LLM Agents on Live Energy Infrastructure Data

Eliseo Curcio

AI总结 本文提出EnergyAgentBench,首个基于实时电力市场数据的智能体基准,用于评估大型语言模型在电力基础设施场景下的决策能力。该基准包含五类共70个任务,涉及数据中心选址、长期投资组合优化和电网因果诊断等复杂问题,要求模型调用多个实时数据接口并进行多步推理。实验表明,不同模型在任务表现和成本效益上存在显著差异,且因果诊断任务对模型能力的区分度最高。

详情
英文摘要

Selecting the right electricity market region for a hyperscale AI datacenter requires reasoning across live electricity prices, grid carbon intensity, technology cost trajectories, and causal grid dynamics -- a multi-step, multi-source analytical task that static knowledge benchmarks cannot evaluate. We introduce EnergyAgentBench, the first agentic benchmark grounded in live electricity market data for this problem class. The benchmark comprises 70 task variants across five families: datacenter siting under cost-carbon trade-offs (F1), long-horizon portfolio siting (F1-LH), lifetime LCOE ranking over multi-decade cost trajectories (F2), 30-year portfolio optimization (F2-LH), and causal grid diagnosis (F3). Tasks require 3 to 48 sequential tool calls against live endpoints from the QuarluxAI infrastructure platform, the U.S. Energy Information Administration (EIA), and the National Renewable Energy Laboratory (NREL) with ground truth derived from trained XGBoost cost-surface models (R^2 0.967--0.995) and the NREL Annual Technology Baseline 2024. We evaluate nine models across Anthropic, OpenAI, and HuggingFace over 1,414 runs at three random seeds. Claude Sonnet 4.6 achieves the highest overall score (0.900) at one-quarter the cost of Claude Opus 4.7 (0.889). Claude Haiku 4.5 leads on long-horizon procedural siting (0.986), outperforming all frontier models including those costing 16x more per run. F3 Causal is the most discriminating family, with a 30.7-point spread between Sonnet (0.793) and Llama 3.3 70B (0.486), versus a 6.6-point spread on F1 Siting. A failure taxonomy of 135 coded failures identifies null-value integration in NREL ATB trajectories as the dominant failure mode (70%), followed by premature commitment on causal tasks (20%) and adversarial injection blindness (6%). Benchmark code, run trajectories, and the failure taxonomy dataset are publicly released.

2605.15217 2026-05-18 cs.AI cs.CY cs.LG econ.GN q-fin.EC

Fair outputs, Biased Internals: Causal Potency and Asymmetry of Latent Bias in LLMs for High-Stakes Decisions

Jagdish Tripathy, Marcus Buckmann

AI总结 本研究探讨了指令微调语言模型在高风险决策(如房贷审批)中表现出的行为公平性与其内部潜在偏见之间的不对称关系。研究发现,尽管模型在输出层面看似无偏,但其内部表示仍保留并放大了与种族相关的偏见,且这些隐藏的偏见具有因果影响力,能够通过特定干预引发决策反转。研究还揭示了这种偏见在不同群体间的不对称性,并指出仅关注输出的行为审计不足以识别和治理模型中的潜在偏差,需结合表示分析的双重评估框架。

详情
Comments
39 pages, 16 figures, 2 tables
英文摘要

Instruction-tuned language models exhibit behavioural fairness in high-stakes decisions while retaining biased associations in their internal representations. However, whether these suppressed representations can affect model outputs - and whether such causal potency is symmetric across demographic groups - remains unknown. We investigate the use of open-weight models for mortgage underwriting using matched applications that differ only in racially-associated names and reveal a critical disconnect: models show no output-level bias, yet retain and amplify demographic representations across model layers. Through activation steering and novel cross-layer interventions, we demonstrate that this suppressed information is decision-relevant: when reinjected at critical layers, it produces near-complete decision reversals. Critically, this latent bias is asymmetric - steering interventions affect decisions in one demographic direction, while producing minimal effects in reverse - and susceptible to adversarial prompt engineering and parameter-efficient fine-tuning. These findings demonstrate that behavioural audits focused on outputs are insufficient: fair outputs can mask exploitable internal biases. They also motivate dual-layer testing frameworks combining output evaluation with representational analysis for AI governance in high-stakes decisions.

2605.15210 2026-05-18 q-fin.TR econ.TH

TradeMech: A Method to Multilaterally Net Trades Without Altering Counterparty Exposure

Daniel Aronoff, Robert M. Townsend, Madars Virza

AI总结 本文提出了一种名为 TradeMech 的机制,用于在不改变交易对手风险敞口的情况下实现多方交易的净额结算。该方法适用于交易单一或两种同质可替代物品的市场,通过将初始双边合同转化为链和环结构,实现对指定物品的最大化多边净额结算,并保留各方的合约利润和风险位置。当一方无法提前承诺所需物品时,受影响的交易将恢复为原始双边合同,其余交易则在剩余链上重新净额结算,从而避免产生新的交易对手风险。

详情
Comments
26 pages
英文摘要

Financial markets such as bond, derivatives, and repo markets form networks of interdependent obligations. Existing multilateral netting methods typically trade off the extent of netting against preservation of counterparty exposure: central clearing reallocates exposure to a central counterparty, while trade compression may alter bilateral counterparty relationships. TradeMech is a mechanism for markets in which one or two homogeneous fungible objects are traded. The mechanism transforms a network of initial bilateral contracts into chains and cycles, nets the designated object multilaterally on those chains and cycles, and replaces initial contracts with multiparty contracts whose assigned trades remain fractions of the original bilateral trades. The construction achieves maximal multilateral netting of the designated object while preserving each agent's contractual profit and preserving the location of counterparty risk. When a party fails to pre-commit a required object, the affected assigned trade is recovered as a bilateral contract between the same original counterparties and the remaining assigned trades are re-netted on residual chains, so no new counterparty exposure is created.

2605.08812 2026-05-18 econ.GN q-fin.EC

Little Impact of ChatGPT Availability on High School Student Test Score Performance

Nick Huntington-Klein

AI总结 本研究探讨了AI工具ChatGPT的使用对高中生考试成绩的影响,利用2023和2024年非上学期间ChatGPT使用量下降的数据,识别出教育AI使用较多的地区并评估其实际影响。研究发现,AI使用对高中生考试成绩平均值没有显著的正面或负面影响,表明即使学生使用AI逃避学习,其对成绩的影响或微乎其微,或被AI的积极使用所抵消。

详情
Comments
41 pages, 4 figures
英文摘要

In educational settings, AI can be used as a learning aid, but can also be used to avoid schoolwork, thereby passing classes while learning little. Many existing studies on the impact of AI on education focus on AI use in controlled settings or with specialized tools. In this paper, the dropoff in ChatGPT activity during non-school summer months in 2023 and 2024 is used to identify areas with heavy educational AI use and thus estimate the educational impact of AI as it is actually used. I find no meaningful impact of AI usage on high school test score averages in either direction. These results imply that, to the extent that high school students use AI to avoid learning, it either does not matter much for their test performance or is cancelled out by positive uses of AI in the aggregate.

2504.17061 2026-05-18 econ.TH

On the stability of utilitarian aggregation

Leandro Nascimento

AI总结 本文研究了在聚合冯·诺依曼-摩根斯特恩效用函数时,帕累托条件的有限违反如何表征近似效用主义的聚合规则。研究证明,当帕累托原则几乎满足时,所使用的单一效用函数与个体效用加权和之间的距离不会超过一个与帕累托条件弱化版本相关的正参数的一半。这一结果表明,哈萨尼(1955)聚合定理具有稳定性,即对帕累托原则的小偏离不会导致聚合规则远离效用主义聚合。

详情
Comments
typos and minor mistakes in the appendix corrected
英文摘要

In the context of aggregating von Neumann-Morgenstern utilities, we show that bounded violations of the Pareto conditions characterize aggregation rules that are approximately utilitarian. When a single utility function is intended to represent the preference judgments of a group of individuals and the Pareto principles are nearly satisfied, we prove that its distance from a weighted sum of individual cardinal utilities does not exceed half of the positive parameter that differentiates our weaker versions of the Pareto conditions from their conventional forms. This result suggests the stability of Harsanyi's (1955) aggregation theorem, in that small deviations from the Pareto principles lead to aggregation rules that remain close to utilitarian aggregation.

2410.11263 2026-05-18 econ.EM

Closed-form estimation and inference for panels with attrition and refreshment samples

Grigory Franguridi, Lidia Kosenkova

AI总结 本文研究了存在样本流失(attrition)和补充抽样(refreshment samples)的面板数据的估计与推断问题。作者提出了一种非参数的识别假设,并基于此构建了一个无需调参且具有闭式解的估计方法,该方法通过变换经验累积分布函数实现,计算简便且具有一致性和渐近正态性。文章通过模拟实验和美国理解研究中的收入数据验证了方法的有效性。

详情
英文摘要

It has long been established that, if a panel dataset suffers from attrition, auxiliary (refreshment) sampling restores full identification under additional assumptions that still allow for nontrivial attrition mechanisms. Such identification results rely on implausible assumptions about the attrition process or lead to theoretically and computationally challenging estimation procedures. We propose an alternative identifying assumption that, despite its nonparametric nature, suggests a simple estimation algorithm based on a transformation of the empirical cumulative distribution function of the data. This estimation procedure requires neither tuning parameters nor optimization in the first step, i.e., it has a closed form. We prove that our estimator is consistent and asymptotically normal and demonstrate its good performance in simulations. We provide an empirical illustration with income data from the Understanding America Study.

2312.04827 2026-05-18 econ.TH

A Separability Foundation for Random Coefficients Logit

Fedor Sandomirskiy, Po Hyun Sung, Omer Tamuz, Ben Wincelberg

AI总结 本文研究了在不同决策问题中随机选择行为的一致性问题,每个决策问题由一组行动标签和可观测的结果向量组成。作者提出了一种分离性条件,要求在由两个可分离部分组成的决策问题中,选择概率应与分别考虑每个部分时得到的概率一致。结合单调性和连续性,这一分离性条件完整刻画了随机系数Logit规则这一类行为模型。

详情
Comments
46 pages
英文摘要

We study stochastic choice across decision problems, each represented as a menu of action labels paired with observable outcome vectors. We propose a consistency condition for behavior in decision problems composed of two separable components: choice probabilities must agree with those obtained when each component is considered in isolation. Together with monotonicity and continuity, this separability requirement characterizes the family of random coefficients logit rules.