arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1971
专题追踪
2606.18512 2026-06-18 econ.EM stat.ME 新提交

Causal Forecasting in Panel Data: A Two-Way Synthetic Forecasting Approach

面板数据中的因果预测:一种双向合成预测方法

Dennis Shen

AI总结 针对面板数据中未经历干预的目标单元的未来结果预测问题,提出双向合成预测(TWSF)方法,结合合成控制与时间序列外推,给出有限样本误差界和渐近正态性,并通过NFL体育场开放案例验证。

详情
AI中文摘要

估计面板数据中的因果效应是政策评估的核心问题。现有方法主要解决回顾性问题:在观测面板期间,目标单元在不同干预下会发生什么?然而,在许多应用中,决策者面临前瞻性问题:在观测面板之外,目标单元在尚未经历的干预下会发生什么?本文通过将基于合成控制的回顾性反事实逻辑与多元时间序列预测的外推结构相结合,开发了一个回答此类因果预测问题的框架。基于证明合成控制中单元侧回归合理性的潜在因子模型,我们对潜在时间因子施加低秩时间结构,以识别前瞻性因果预测估计量。我们通过双向合成预测估计量(TWSF)实施这一策略,该估计量从预处理结果中学习跨单元关系,并将其与从感兴趣干预下的处理单元轨迹中学习的时间序列模型相结合。在适当条件下,我们建立了有限样本预测误差界,该误差界意味着逐点一致性,并引入正交化校正,得到渐近正态性,从而实现逐点推断。我们将该框架扩展到固定多步预测视界,通过直接和递归两种程序,每种程序都继承了类似的逐点保证。我们通过模拟研究验证了理论,并通过研究2020赛季开放NFL体育场对公共卫生的影响,说明了TWSF的实际效用。

英文摘要

Estimating causal effects in panel data is a central problem in policy evaluation. Existing methods largely address retrospective questions of the form: what would have happened to a target unit under a different intervention during the observed panel? In many applications, however, decision-makers face prospective questions: what will happen to a target unit under an intervention it has not yet experienced, beyond the observed panel? This article develops a framework for answering such causal forecasting questions by integrating the retrospective counterfactual logic of synthetic-controls-based approaches with the extrapolative structure of multivariate time-series forecasting. Building on the latent factor models that justify unit-side regressions in synthetic controls, we impose low-rank temporal structure on the latent time factors to identify prospective causal forecast estimands. We operationalize this strategy through the Two-Way Synthetic Forecasting estimator, or TWSF, which learns cross-unit relationships from pre-treatment outcomes and combines them with a time-series model learned from treated donor trajectories under the intervention of interest. Under suitable conditions, we establish finite-sample forecasting error bounds that imply pointwise consistency and introduce an orthogonalized correction that yields asymptotic normality and thus enables pointwise inference. We extend the framework to fixed multi-step forecasting horizons through both direct and recursive procedures, each of which inherits analogous pointwise guarantees. We corroborate the theory with simulation studies and illustrate the practical utility of TWSF by studying the public-health impact of opening NFL stadiums during the 2020 season.

2606.19294 2026-06-18 stat.AP 新提交

Accelerating Network-Agent Dispersion: Territorial Behavior and Directionally Biased Lazy Random Walks

加速网络智能体分散:领地行为与方向性偏倚懒惰随机游走

Li Zeng, Steve Alpern

AI总结 研究通过领地行为和方向性偏倚懒惰随机游走加速网络智能体分散,理论分析和模拟表明领地行为大幅降低分散时间,方向性偏倚可进一步加速。

详情
AI中文摘要

领地行为可以极大地加速网络上智能体的分散。本文研究一个网络智能体分散问题,其中m个自主智能体在离散时间内在连通图上移动,并寻求一种配置,使得没有两个智能体占据同一节点。我们关注m=n的分散情况,此时成功配置中每个节点恰好有一个智能体。在基线模型中,每个智能体遵循具有共同懒惰参数p的懒惰随机游走。该过程定义了一个有限吸收马尔可夫链,期望吸收时间用于衡量分散效率。我们引入了两种局部行为扩展:领地行为,即单独占据节点的智能体声明该节点并排斥后来的到达者;方向性偏倚,即智能体在路径和环上共享一个优先移动方向。在三个智能体的路径和环网络上的精确计算以及更大实例上的蒙特卡洛模拟表明,领地行为显著减少了期望分散时间,且网络规模越大相对减少越大。方向性偏倚在大多数小网络情况下效果有限,但与领地行为结合时能产生额外的大幅加速。特别是,模拟显示当所有智能体从一个节点出发时,在L100和C100上分别减少了99.22%和97.48%。这些结果表明简单的局部移动规则如何强烈影响分散式网络多智能体系统的全局分散时间。

英文摘要

Territorial behavior can greatly accelerate decentralized agent dispersion on networks. This paper studies a network-agent dispersion problem in which m autonomous agents move in discrete time on a connected graph and seek a configuration in which no two agents occupy the same node. We focus on the dispersion case m = n, where successful configurations contain exactly one agent per node. In the baseline model, each agent follows a lazy random walk with a common laziness parameter p. This process defines a finite absorbing Markov chain, and the expected absorption time is used to measure dispersion efficiency. We introduce two local behavioral extensions: territorial behavior, in which an agent that is alone at a node claims that node and repels later arrivals, and directional bias, in which agents share a preferred direction of movement on paths and cycles. Exact calculations on three-agent path and cycle networks and Monte Carlo simulations on larger instances show that territorial behavior substantially reduces expected dispersion time, with larger relative reductions as network size increases. Directional bias alone has limited effect in most small-network cases, but when combined with territorial behavior it can produce large additional speedups. In particular, the simulations show reductions of 99.22% on L100 and 97.48% on C100 when all agents start from one node. These results show how simple local movement rules can strongly affect global dispersion time in decentralized networked multi-agent systems.

2606.19148 2026-06-18 stat.CO 新提交

Fast Computation of Free-Support Wasserstein Medians

自由支撑Wasserstein中位数的快速计算

Kisung You, Mauro Giuffré, Dennis Shung

AI总结 提出一种直接固定权重自由支撑求解器,通过求解精确最优传输子问题并重新定位支撑点,避免内循环,实现单调下降、凸包不变性和有限时间最佳残差率,计算效率显著优于嵌套Weiszfeld方法。

详情
AI中文摘要

Wasserstein中位数是Wasserstein重心的一种稳健替代方法,用于平均概率测度,但精确经验计算可能代价高昂。一种自然的度量空间Weiszfeld方案通过在每次外迭代中求解加权Wasserstein重心问题来更新当前候选,产生嵌套优化问题。我们提出一种直接固定权重自由支撑求解器,避免了这种内重心循环。在每次迭代中,该方法从当前候选到输入测度求解精确最优传输(OT)子问题,计算所选计划的质心投影,并将每个支撑原子重新定位到其投影目的地的逆距离加权平均值。对于平滑的中位数目标,我们证明这种重新定位是紧的majorization-minimization代理的精确最小化器。这产生了精确传输子问题的单调下降、凸包不变性、有限时间最佳残差率、可微性下的残差到梯度控制以及不动点和驻点刻画。我们还给出了平滑性、稳定性和分辨率一致性结果,阐明了固定权重近似。在精确OT基准测试中,直接求解器在显著减少精确传输子问题使用量的同时,获得了接近紧密求解的嵌套Weiszfeld基线中位数目标。额外的污染、后验聚合和图像原型实验表明,直接求解器产生的中位数摘要与嵌套计算相当,并且对异常分布比Wasserstein重心更不敏感。

英文摘要

The Wasserstein median is a robust alternative to the Wasserstein barycenter for averaging probability measures, but exact empirical computation can be expensive. A natural metric-space Weiszfeld scheme updates the current candidate by solving a weighted Wasserstein barycenter problem at each outer iteration, producing a nested optimization problem. We propose a direct fixed-weight free-support solver that avoids this inner barycenter loop. At each iteration, the method solves exact optimal transport (OT) subproblems from the current candidate to the input measures, computes barycentric projections of the selected plans, and relocates each support atom to an inverse-distance-weighted average of its projected destinations. For a smoothed median objective, we show that this relocation is the exact minimizer of a tight majorization--minimization surrogate. This yields monotone descent for exact transport subproblems, convex-hull invariance, a finite-time best-residual rate, residual-to-gradient control under differentiability, and fixed-point and stationarity characterizations. We also give smoothing, stability, and resolution-consistency results clarifying the fixed-weight approximation. In exact-OT benchmarks, the direct solver attains median objectives close to tightly solved nested Weiszfeld baselines while using substantially fewer exact transport subproblems. Additional contamination, posterior aggregation, and image-prototype experiments show that the direct solver produces median summaries comparable to nested computation and less sensitive to outlying distributions than Wasserstein barycenters.

2606.19087 2026-06-18 stat.ME 新提交

What does ethnic density represent? Spatial co-occurrence networks of a widely used contextual measure using harmonised UK small-area census data

族群密度代表什么?基于统一英国小区域普查数据的广泛使用的背景测量的空间共现网络

Joseph Lam

AI总结 通过分析英国23.9万小区域普查数据,使用混合图模型和空间自相关分析,揭示族群密度并非单一背景标量,不同族群的密度百分比代表不同的邻里背景特征。

详情
AI中文摘要

族群密度在流行病学和健康地理学中广泛用作背景暴露,但很少被作为测量问题本身进行研究。等价的百分比值可能代表不同群体和地区的不同邻里背景,特别是在迁移、宗教、语言、家庭结构和社会经济条件空间共现的情况下。利用统一的英国普查数据,我分析了239,023个小区域普查数据,将族群密度作为探索性背景共现结构进行研究。我针对八个族群密度目标估计了英国范围的混合图模型(MGM),使用了239,019个完整案例和每个目标特定模型32个节点。随后,仅英格兰的空间分析使用k近邻输出区域质心(k=8)估计LISA和空间调整残差网络。族群密度并不表现为单一背景标量。在英国范围的MGM中,不同群体保留的最强目标-邻居边不同。亚洲密度与中东/亚洲出生比例(0.59)关联最强,印度密度与印度教比例(0.55)关联最强,巴基斯坦密度与穆斯林比例(0.47)关联最强,孟加拉国密度与穆斯林比例(0.23)关联最强,黑人密度与非洲出生比例(0.42)关联最强,白人密度与中东/亚洲出生比例(0.35)关联最强。仅英格兰的族群密度测量具有强空间自相关,全局莫兰指数从混合比例的0.57到白人比例的0.90。在对英格兰区域和局部空间滞后进行残差化后,64.3%至96.4%的原始目标-节点边在族群密度网络中持续存在。等价百分比值在不同族群之间不一定可比。这对估计量定义、调整策略以及城市健康研究中族群密度和其他捆绑背景测量的解释具有意义。

英文摘要

Ethnic density is widely used in epidemiology and health geography as a contextual exposure, yet it is rarely examined as a measurement problem in its own right. Equivalent percentage values may represent different neighbourhood contexts across groups and places, particularly where migration, religion, language, household structure and socioeconomic conditions are spatially co-located. Using the harmonised Unified UK Census Data release, I analysed 239,023 small-area census data to examine ethnic density as an exploratory contextual co-occurrence construct. I estimated UK-wide mixed graphical models (MGM) for eight ethnic-density targets using 239,019 complete cases and 32 nodes per target-specific model. England-only spatial analyses then used k-nearest-neighbour Output Area centroids (k = 8) to estimate LISA and spatially adjusted residual networks. Ethnic density did not behave as a single contextual scalar. In the UK-wide MGM, the strongest retained target-neighbour edges differed across groups. Asian density was linked most strongly with Middle East/Asia-born share (0.59), Indian density with Hindu share (0.55), Pakistani density with Muslim share (0.47), Bangladeshi density with Muslim share (0.23), Black density with Africa-born share (0.42), and White density with Middle East/Asia-born share (0.35). England-only ethnic-density measures were strongly spatially autocorrelated, with Global Moran's I ranging from 0.57 for Mixed share to 0.90 for White share. After residualising against English region and local spatial lag, 64.3% to 96.4% of original target-node edges persisted across ethnic-density networks. Equivalent percentage values are not necessarily comparable across ethnic groups. This has implications for estimand definition, adjustment strategies, and the interpretation of ethnic density and other bundled contextual measures in urban health research.

2606.19086 2026-06-18 stat.ME 新提交

Probability Bound Analysis for Dependence Uncertainty in Risk and Decision Models

风险与决策模型中依赖不确定性的概率界分析

Rowan Iskandar

AI总结 针对边际信息与依赖信息不完整的情况,提出一种依赖敏感的PBA框架,通过p-box、copula和Fréchet耦合集传播不确定性,并在风险决策模型中展示依赖假设对输出界和尾部风险的影响。

详情
AI中文摘要

风险与决策模型通常结合稀疏的边际信息、精确指定的概率分布以及仅部分合理的依赖假设。概率界分析(PBA)通过概率盒表示认知不确定性,但许多应用假设独立性或要求完全指定依赖结构。我们为黑箱风险与决策模型开发了一个依赖敏感的PBA框架,其中边际信息和依赖信息可能都不完整。该框架结合了p-box参数、精确CDF参数和固定量;通过copula纳入指定的依赖关系;并通过Fréchet风格的可容许耦合集传播未知依赖关系。我们还将该构造扩展到不精确指定和精确指定输入之间的交叉依赖关系。在一个说明性风险决策模型中,依赖假设显著影响了输出界和尾部风险汇总;忽略或简化依赖关系的分析产生了更窄的可能结果表征。当证据不足以证明精确边际分布或单一依赖模型时,该框架支持透明的不确定性传播。

英文摘要

Risk and decision models often combine sparse marginal information, precisely specified probability distributions, and dependence assumptions that are only partly justified. Probability bound analysis (PBA) represents epistemic uncertainty through probability boxes, but many applications assume independence or require dependence structures to be fully specified. We develop a dependence-sensitive PBA framework for black-box risk and decision models in which both marginal information and dependence information may be incomplete. The framework combines p-box parameters, precise-CDF parameters, and fixed quantities; incorporates specified dependence through copulas; and propagates unknown dependence through Fréchet-style admissible coupling sets. We also extend the construction to cross-dependence between imprecisely specified and precisely specified inputs. In an illustrative risk decision model, dependence assumptions materially affected output bounds and tail-risk summaries; analyses that ignored or simplified dependence produced narrower characterizations of plausible outcomes. The framework supports transparent uncertainty propagation when evidence is insufficient to justify either precise marginal distributions or a single dependence model.

2606.19065 2026-06-18 stat.ME stat.AP 新提交

Regularized covariance estimation from partially observed interferometric data

基于部分观测干涉数据的正则化协方差估计

Teresa Bortolotti, Roberta Troilo, Francesco Casu, Simone Vantini, Alessandra Menafoglio

AI总结 针对部分观测干涉数据中系统缺失的问题,提出一种基于拉普拉斯正则化的矩阵补全方法进行非参数协方差估计,无需平稳性或各向同性假设,在模拟和实地数据中均表现优异。

详情
AI中文摘要

小基线子集技术提供了高空间分辨率的地面位移远程测量,使其成为监测灾害易发地区地球物理过程的关键工具。有效分析这类数据需要可靠估计其二阶结构,但由于测量在调查区域的相对较大范围内系统缺失,这一目标难以实现。我们从函数数据分析的角度处理该问题,将观测视为具有二维域的部分观测函数数据。为了恰当表征数据,我们引入了部分观测的碎片化机制,其中曲线部分在重复测量中系统缺失。针对该机制,我们提出了一种新的协方差估计方法,将任务表述为带有拉普拉斯正则化的矩阵补全问题。该估计量是非参数的,且无需平稳性或各向同性假设。大量模拟表明,我们的方法在多种协方差结构下均能实现一致的低估计误差。应用于与Phlegraean Fields相关的地面位移数据,证明了其恢复有意义空间依赖模式的能力,突显了其在环境风险评估和监测中的潜力。

英文摘要

The Small BAseline Subset technique provides remote measurements of ground displacement with high spatial resolution, making it a key tool for monitoring geophysical processes in hazard-prone areas. An effective analysis of this type of data requires reliable estimation of their second-order structure, which is difficult to achieve because the measurements are systematically missing over relatively large portions of the investigated areas. We tackle the problem from a functional data analysis perspective and treat the observations as partially observed functional data with two-dimensional domain. To properly characterize the data, we introduce the fragmented regime of partial observation, where parts of the curves are systematically missing across replicates. For this regime, we propose a novel method for covariance estimation, formulating the task as a matrix completion problem with Laplacian regularization. The estimator is nonparametric and free from stationarity or isotropy assumptions. Extensive simulations show that our method achieves consistently low estimation error across a range of covariance structures. Application to ground displacement data relative to the Phlegraean Fields demonstrates its ability to recover meaningful spatial dependence patterns, highlighting its potential for environmental risk assessment and monitoring.

2606.19044 2026-06-18 stat.CO stat.AP stat.ME 新提交

smoothbp: Fast Bayesian Hierarchical Piecewise Regression with Smoothed Transitions and Spike-and-Slab Model Selection

smoothbp:具有平滑转换和尖峰-板模型选择的快速贝叶斯分层分段回归

Aidan D. Bindoff

AI总结 提出R包smoothbp,利用Rust实现的Metropolis-within-Gibbs采样器和HMC,实现具有逻辑平滑转换的贝叶斯分层分段回归,支持多断点、随机效应和自动断点选择。

Comments 16 pages, 2 figures, R package on CRAN

详情
AI中文摘要

分段回归模型对于识别不同科学领域中纵向或空间数据的结构变化至关重要。虽然标准方法通常假设尖锐的瞬时转换和单一的非分层断点,但许多现实世界现象表现出逐渐平滑的转换,并且在不同组间系统性变化。我们介绍smoothbp,一个用于快速贝叶斯分层分段回归的R包,具有逻辑平滑转换。通过在Rust中实现定制的Metropolis-within-Gibbs采样器,smoothbp将线性项的精确共轭更新与非线性位置和尖锐度参数的哈密顿蒙特卡洛(HMC)转换相结合。smoothbp原生支持多个断点、随机截距、随机断点时机以及所有分段参数上的结构协变量。它还通过smoothbp_ss函数集成了Kuo和Mallick(1998)的尖峰-板先验,用于自动推断活跃断点的数量。我们记录了采样器,通过基于模拟的校准和区间覆盖研究验证了参数恢复和校准,并将smoothbp与R、Python、Julia和MATLAB中的现有软件进行了对比,展示了其相对于通用概率编程语言(如brms)和专用包(如mcp)的竞争效率。

英文摘要

Piecewise regression models are essential for identifying structural changes in longitudinal or spatial data across diverse scientific domains. While standard approaches often assume sharp, instantaneous transitions and single, non-hierarchical breakpoints, many real-world phenomena exhibit gradual, smoothed transitions that vary systematically across groups. We introduce smoothbp, an R package for fast, Bayesian hierarchical piecewise regression featuring logistic-smoothed transitions. By implementing a bespoke Metropolis-within-Gibbs sampler in Rust, smoothbp combines exact conjugate updates for linear terms with Hamiltonian Monte Carlo (HMC) transitions for non-linear location and sharpness parameters. smoothbp natively supports multiple change-points, random intercepts, random change-point timing, and structural covariates on all segment parameters. It also incorporates Kuo and Mallick (1998) spike-and-slab priors for automatic inference on the number of active breakpoints via the smoothbp_ss function. We document the sampler, validate parameter recovery and calibration through simulation-based calibration and interval-coverage studies, and contrast smoothbp against the existing software landscape across R, Python, Julia, and MATLAB, demonstrating its competitive efficiency against general-purpose probabilistic programming languages like brms and specialized packages like mcp.

2606.19011 2026-06-18 stat.ME 新提交

Dimension reduction of multivariate densities in Bayes spaces

贝叶斯空间中多元密度的降维

Adéla Czolková, Karel Hron, Sonja Greven

AI总结 提出在贝叶斯空间中对多元概率密度函数进行正交分解,实现独立与交互成分的分离,并证明该分解在PCA意义下最优,应用于房屋和地质数据展示可解释性。

详情
AI中文摘要

贝叶斯空间为分析概率密度函数(PDF)提供了一个希尔伯特空间结构,赋予它们反映其相对性和约束性的几何结构。该框架中的一个关键工具是中心对数比(clr)变换,它在贝叶斯空间与经典$L^2$空间(的一个子空间)之间建立了等距同构。这使得将函数数据分析(FDA)技术,特别是函数主成分分析(FPCA),应用于单变量和多变量密度数据的降维成为可能。对于多元PDF,将其嵌入贝叶斯空间可以实现正交分解为独立成分和交互成分。此外,独立部分可以分解为相互正交的几何边缘分布。这种结构为多元密度的变异来源提供了更深刻的见解。我们证明了这种总方差分解在PCA意义下是最优的,影响了FPCA得到的特征函数和得分的解释。我们证明,直接对多元密度应用FPCA在某种意义上等价于对其分解形式应用多元FPCA,得到的特征函数和得分也相应分解。基于这些理论结果的独特分解分别应用于房屋和地质实证数据,展示了该方法的可解释性和实用价值。

英文摘要

The Bayes space provides a Hilbert space structure for analysing probability density functions (PDFs), equipping them with a geometry that reflects their relative and constrained nature. A key tool in this framework is the centred logratio (clr) transformation, which establishes an isometric isomorphism between the Bayes space and (a subspace of) the classical $L^2$ space. This makes it possible to apply functional data analysis (FDA) techniques, particularly functional principal component analysis (FPCA), to both univariate and multivariate density data in the context of dimension reduction. For multivariate PDFs, embedding them in the Bayes space enables an orthogonal decomposition into independent and interactive components. Furthermore, the independent part can be decomposed into mutually orthogonal geometric marginals. This structure provides more profound insights into the sources of variation in multivariate densities. We show that this decomposition of the total variance is optimal in a PCA sense, impacting the interpretation of the eigenfunctions and scores resulting from FPCA. We demonstrate that applying FPCA directly to multivariate densities is equivalent in a certain sense to applying multivariate FPCA to their decomposed form, with the resulting eigenfunctions and scores decomposing accordingly. The unique decomposition based on these theoretical results is applied to housing and geological empirical data respectively, demonstrating the interpretability and practical value of this approach.

2606.18949 2026-06-18 stat.ME 新提交

Feature Screening for High-Dimensional Structural Break Predictive Regression

高维结构断点预测回归的特征筛选

Zhenjie Qin, Rongmao Zhang, Wenyang Zhang, Yang Zu

AI总结 提出一种在高维结构断点预测回归中筛选活跃预测变量和估计断点的方法,结合SICS、RCRS和IC准则,实现一致估计与选择。

详情
AI中文摘要

预测回归是探索收益可预测性的重要工具。在本研究中,我们介绍了一种在结构断点预测回归中选择和估计活跃预测变量及断点的有效程序。我们的方法允许断点数量随样本量增加,并适应可能是平稳或协整的稀疏活跃预测变量。我们首先使用确信独立规范筛选(SICS)程序识别活跃预测变量。接着,通过比率控制回归筛选(RCRS)方法估计断点。最后,使用信息准则(IC)消除不必要的断点和预测变量以减少冗余。该方法能够一致地估计和选择真实断点及活跃预测变量。我们的模拟和实证研究表明,所提出的程序表现良好。

英文摘要

Predictive regression is a crucial tool for exploring return predictability. In this study, we introduce an efficient procedure for selecting and estimating active predictors and change points in structural break predictive regression. Our approach allows the number of change points to increase with the sample size and accommodates sparse active predictors that may be stationary or cointegrated. We begin by identifying the active predictors using a Sure Independence Canonical Screening (SICS) procedure. Next, we estimate the change points through a Ratio-Controlled Regression Screening (RCRS) method. Finally, we reduce redundancy by eliminating unnecessary breakpoints and predictors using information criteria (IC). This approach allows for consistent estimation and selection of true breakpoints and active predictors. Our simulations and empirical studies demonstrate that the proposed procedure performs effectively.

2606.18843 2026-06-18 stat.ME 新提交

Improved prediction of extreme random effects in joint models: WRaPs

联合模型中极端随机效应的改进预测:WRaPs

Eline Vanderpijpen, Els Goetghebeur

AI总结 针对联合模型中极端随机效应预测的回归均值问题,提出加权随机效应预测器(WRaPs),通过最小化加权平方预测误差来改善尾部估计,并在贝叶斯框架下提供解析解和MCMC计算方案。

详情
AI中文摘要

混合模型常用于预测多个中心的受试者特定重复结局或中心绩效。然而,当目标是识别极端或不良结局时,标准随机效应预测可能遭受回归均值的影响,低估其分布尾部的值。最近提出了最优加权随机效应估计量来缓解这一问题。受重复结局可能以死亡结束的临床情境启发,我们将该方法扩展到预测定义为“死亡或低于标准的重复测量”的不良结局。我们从具有纵向和生存结局共享随机效应的联合模型出发,通过最小化给定生存和重复测量可用数据下的加权平方预测误差来估计其随机效应。与混合模型一样,选择权重以更重地惩罚尾部的误差。我们将结果称为WRaPs:加权随机效应预测器。对于基本模型和选定的权重集,从通常的联合模型参数推导出解析闭式解。对于更复杂的情况,在贝叶斯范式内使用rjags中的MCMC方法开发计算解决方案。我们在具有随机截距和斜率的I型模拟中展示了所提出方法的有限样本性质;并将新方法应用于一项针对胶质母细胞瘤患者的随机研究中,预测个体未来结局和生存。

英文摘要

Mixed models are popular for the prediction of subject-specific repeated outcomes or center performance among many centers. When the goal is to identify extreme or poor outcomes, standard random effects predictions may, however, suffer from regression to the mean and underestimate values in the tail of their distribution. Optimally weighted random effect estimators have recently been proposed to mitigate this. Motivated by clinical settings where repeated outcomes may end in death, we extend that method to predict poor outcome defined as 'death or substandard repeated measures'. We start from joint models with shared random effects for the longitudinal and survival outcome and estimate their random effects by minimizing squared weighted prediction errors given available data on survival and repeated measures. As for mixed models, weights are chosen to more heavily penalize errors in the tails. We call the results WRaPs: Weighted Random effect Predictors. For basic models and a select set of weights analytical closed form solutions are derived from the usual joint model parameters. For the more complex setting, computational solutions are developed in rjags using MCMC methods within the Bayesian paradigm. We illustrate finite sample properties of the proposed method in Type I simulations with random intercept and slope; and apply the new approach to predict individual future outcomes and survival in a randomized study with glioblastoma patients.

2606.18809 2026-06-18 stat.ME stat.AP 新提交

Applying the Weibull Shape Parameter test for signal detection in pharmacovigilance using the R package WSPsignal

应用Weibull形状参数检验进行药物警戒信号检测:基于R包WSPsignal

Julia Dyck, Odile Sauzet

AI总结 提出Weibull形状参数检验家族用于药物警戒信号检测,开发R包WSPsignal集成多种估计方法和分布,支持默认与仿真调优,通过大样本和小样本示例展示功能。

详情
AI中文摘要

上市后药物警戒依赖统计信号检测方法识别潜在药物不良反应。Weibull形状参数(WSP)检验概念利用时间信息(电子健康记录)评估药物起始后不良事件的风险随时间的变化。从恒定性中统计显著的偏离产生信号。WSP框架包含一系列检验,这些检验在估计方法(频率学派或贝叶斯)、用于风险建模的选定时间-事件分布(Weibull、双Weibull、幂广义Weibull)以及检验规范参数方面有所不同。为促进实际应用并鼓励在未来研究中考虑WSP信号检测检验,我们开发了R包WSPsignal。该包将所有WSP检验所需功能整合到一个统一的开源接口中。它使实践者和研究人员能够应用默认检验规范,或执行基于仿真的调优以针对给定数据场景确定最优检验。我们通过两个示例说明该包的功能。在大样本设置(约20,000个观测)中,考虑频率学派WSP检验。在小样本设置(约1,000个观测)中,选择贝叶斯WSP检验。额外的检验规范通过基于仿真的调优进行优化。

英文摘要

Post-marketing pharmacovigilance relies on statistical signal detection methods to identify potential adverse drug reactions. The Weibull shape parameter (WSP) test concept exploits temporal information (electronic health records) to assess the hazard of an adverse event over time after drug initiation. A statistically significant deviation from constancy results in a signal. The WSP framework comprises a family of tests that differ with respect to the estimation approach (frequentist or Bayesian), the chosen time-to-event distribution (Weibull, double Weibull, power generalized Weibull) for hazard modeling, and test specification parameters. To facilitate practical application and encourage consideration of the WSP signal detection test in future research, we developed the R package WSPsignal. The package consolidates all functionalities required for WSP testing into a unified, open-source interface. It enables practitioners and researchers to apply default test specifications or perform simulation-based tuning to identify the optimal test for a given data scenario. We illustrate the package functionalities in two examples to follow along. In a large-sample setting (ca. 20 000 observations), a frequentist WSP test is considered. In a small-sample setting (ca. 1 000 observations), a Bayesian WSP test is chosen. The additional test specifications are optimized through simulation-based tuning.

2606.18806 2026-06-18 stat.AP 新提交

Spatial emergence of acceleration in global warming

全球变暖加速的空间涌现

Tanja Korsten Bugajski, Nicolai Peder Bulow Pedersen, J. Eduardo Vera-Valdes

AI总结 使用贝叶斯分层时空模型检测全球变暖加速的空间涌现,发现高置信度信号最早出现在高纬度地区,空间聚集会延迟检测。

Comments Supplementary information included after main manuscript

详情
AI中文摘要

全球变暖是否在加速仍存在争议,因为内部变率和空间异质性可能掩盖变暖速率的变化。这里我们使用具有结构化空间依赖的贝叶斯分层时空模型来估计局部变暖轨迹和加速,并将模型应用于逐步截断的观测数据,以推断加速何时变得可检测。我们发现,可检测的加速在气候系统中不均匀地涌现,最早的高置信度信号集中在特定的高纬度地区。在保留的网格单元中,超过90%后验概率为正加速的比例从1970-1990年的13.6%增加到1970-2026年的39.7%,而超过50%阈值的比例从46.4%增加到70.3%。这些结果表明,空间聚集通过平均加速已经出现的区域与加速仍然微弱或不确定的区域,从而延迟了检测。该框架提供了一个概率诊断工具,用于识别变暖在何处加剧以及加速何时变得统计上可检测。

英文摘要

Whether global warming is accelerating remains contested because internal variability and spatial heterogeneity can obscure changes in warming rates. Here we use a Bayesian hierarchical spatio-temporal model with structured spatial dependence to estimate local warming trajectories and acceleration, and apply the model to progressively truncated observations to infer when acceleration becomes detectable. We find that detectable acceleration emerges unevenly across the climate system, with the earliest high-confidence signals concentrated in selected high-latitude regions. Across retained grid cells, the proportion exceeding a 90% posterior probability of positive acceleration increases from 13.6% for 1970-1990 to 39.7% for 1970-2026, while the proportion exceeding a 50% threshold increases from 46.4% to 70.3%. These results show that spatial aggregation can delay detection by averaging regions where acceleration has already emerged with regions where it remains weak or uncertain. The framework provides a probabilistic diagnostic for identifying where warming is intensifying and when acceleration becomes statistically detectable.

2606.18459 2026-06-18 stat.ME 新提交

Apportioning Causal Responsibility of Two Risk Factors for an Adverse Outcome via Counterfactual Attribution

通过反事实归因分配两个风险因素对不良结果的因果责任

Shanshan Luo, Yafang Deng, Qingyuan Zhao, Zhi Geng

AI总结 提出一个量化框架,在无混杂和单调性假设下,通过反事实归因分配两个二元风险因素对已发生不良结果的因果责任,并建立非参数识别或推导出尖锐界限。

详情
AI中文摘要

与前瞻性评估原因效应的传统因果推断不同,分配因果责任需要回顾性评估以推断已发生结果的原因。本文提出了一个量化框架,用于在共同导致已实现不良结果的两个二元风险因素之间分配因果责任。理想情况下,了解个体的潜在因果类型(由所有可能暴露组合下的潜在结果定义)将允许精确分配;然而,这些潜在结果无法同时观测。因此,我们将每个风险因素的平均因果责任定义为其在潜在因果类型分布上的期望责任。在无混杂和单调性假设下,当类型特定责任满足结构平衡条件时,我们建立了该指标的非参数识别,否则推导出尖锐界限。我们使用吸烟和石棉暴露导致肺癌的经典例子来说明所提出的框架。

英文摘要

Unlike traditional causal inference, which prospectively evaluates the effects of causes, apportioning causal responsibility requires a retrospective assessment to deduce the causes of an outcome that has already occurred. This paper proposes a quantitative framework for apportioning causal responsibility between two binary risk factors that jointly contribute to a realized adverse outcome. Ideally, knowing the individual's latent causal type, defined by the potential outcomes under all possible exposure combinations, would allow precise apportionment; however, these potential outcomes cannot be simultaneously observed. We therefore define the average causal responsibility of each risk factor as its expected responsibility over the distribution of latent causal types. Under the assumptions of no confounding and monotonicity, we establish nonparametric identification of this metric when the type-specific responsibilities satisfy a structural balance condition, and derive sharp bounds otherwise. We illustrate the proposed framework using the classic example of lung cancer attributable to smoking and asbestos exposures.

2606.18412 2026-06-18 stat.ME stat.ML 新提交

Bayesian Nonparametric Detection of Anomalies in Multivariate Functional Data

多元函数数据中异常点的贝叶斯非参数检测

Daniel Krasnov, David Stephens

AI总结 提出一种贝叶斯非参数方法,通过无限混合多输出高斯过程建模多元函数数据,自动确定混合分量数,利用切片采样和Besov先验实现稀疏表示,并引入Carlin-Chib步骤选择协方差核,从而无需预设异常数量即可检测异常。

Comments 29 pages, 8 figures

详情
AI中文摘要

函数数据中的异常点源于偏离主导数据生成机制的罕见或独特过程。检测此类偏离在应用中至关重要,因为它们可能对应错误、结构变化或其他感兴趣的行为。本文介绍了一种用于多元函数数据异常检测的贝叶斯非参数方法。我们将函数数据建模为多输出高斯过程的无限混合,通过切片采样获得有限且自动确定的混合分量数。均值函数使用小波基表示,并通过Besov先验正则化以获得数据的平滑稀疏表示。利用内在共区域化模型捕获跨函数依赖性,并通过在马尔可夫链蒙特卡洛算法中引入Carlin-Chib乘积空间步骤解决协方差核选择问题。在该模型中,异常观测被分配到小的混合分量中,无需预先指定异常的数量或性质。我们考虑半监督设置,其中15%的正常观测有标签,且存在较大的类别不平衡。我们的模型在单变量和多元函数数据上的实用性得到了验证。

英文摘要

Anomalies in functional data arise from rare or distinct processes that deviate from the dominant data-generating mechanism. Detecting such departures is essential in applications where they may correspond to errors, structural changes, or other behavior of interest. This work introduces a Bayesian nonparametric approach for anomaly detection in multivariate functional data. We model functional data as an infinite mixture of multi-output Gaussian processes, with a finite and automatically determined number of mixture components obtained through slice sampling. Mean functions are represented using a wavelet basis and regularized through Besov priors to obtain a smooth and sparse representation of the data. Cross-functional dependence is captured using the intrinsic coregionalization model and we solve covariance kernel selection by introducing a Carlin-Chib product space step in the Markov Chain Monte Carlo algorithm. Within this model, anomalous observations are assigned to small mixture components without requiring prior specification of the number or nature of anomalies. We consider a semi-supervised setting, in which labels are available for 15% of the normal observations and a large class imbalance is present. The utility of our model is demonstrated on both univariate and multivariate functional data.

2606.18409 2026-06-18 stat.ME 新提交

Learning Moment Maps for Continuous-Time Markov Chains under Monte Carlo Noise

蒙特卡洛噪声下连续时间马尔可夫链的矩映射学习

Madison Pratt, Olivia Prosper-Feldman

AI总结 针对连续时间马尔可夫链矩计算困难的问题,提出基于蒙特卡洛噪声训练数据的代理模型,学习参数到矩的映射,并分析噪声对均值和协方差估计的影响,给出计算资源分配策略。

详情
AI中文摘要

连续时间马尔可夫链广泛用于建模随机动力系统,但关键的汇总量如均值和协方差通常难以计算。虽然蒙特卡洛采样提供渐近精确的估计,但当需要在许多参数值下评估矩时,计算成本变得过高。我们开发了一个基于模拟的代理建模框架,从蒙特卡洛衍生的含噪声训练目标中学习参数到矩的映射,从而能够在参数空间中进行高效且准确的近似。我们表明,蒙特卡洛噪声主要通过加性方差影响均值估计,而协方差估计还额外受到来自经验估计的非线性变换引起的偏差的影响。使用随机易感-感染-恢复模型,我们证明了在用于构建含噪声训练标签的固定模拟预算下,神经网络能够准确学习均值和协方差。我们进一步描述了如何在参数空间覆盖和蒙特卡洛重复之间分配计算资源,表明协方差估计需要平衡分配以控制方差和偏差,而均值估计则更多受益于增加参数空间覆盖。最后,我们展示了学习到的矩映射能够产生有效的总体水平量,并在下游任务(如白化)中表现良好。这些结果强调了在代理建模中考虑蒙特卡洛噪声的重要性,并为随机系统中的基于模拟的学习提供了实用指导。

英文摘要

Continuous-time Markov Chains are widely used to model stochastic dynamical systems, but key summary quantities such as means and covariances are often intractable. While Monte Carlo sampling provides asymptotically exact estimates, it becomes computationally prohibitive when moments must be evaluated across many parameter values. We develop a simulation-based surrogate modeling framework that learns parameter-to-moment mappings from Monte Carlo-derived, noise-corrupted training targets, enabling efficient and accurate approximation across the parameter space. We show that Monte Carlo noise affects mean estimation primarily through additive variance, whereas covariance estimation is additionally impacted by bias arising from nonlinear transformations of empirical estimates. Using a stochastic Susceptible-Infected-Recovered model, we demonstrate that neural networks accurately learn both mean and covariance under fixed simulation budgets allocated to constructing the noisy training labels. We further characterize how to allocate computational resources between parameter-space coverage and Monte Carlo replication, showing that covariance estimation requires a balanced allocation to control both variance and bias, while mean estimation benefits more from increased parameter space coverage. Finally, we show that the learned moment mappings produce valid population-level quantities and perform well in downstream tasks such as whitening. These results highlight the importance of accounting for Monte Carlo noise in surrogate modeling and provide practical guidance for simulation-based learning in stochastic systems.

2606.18366 2026-06-18 stat.ME 新提交

A closed-form sample size correction for always-valid inference with optional stopping

可选停止下始终有效推断的闭式样本量校正

Mårten Schultzberg

AI总结 针对A/B测试中连续监测的顺序检验,提出闭式校正因子k*,通过调整固定样本量使经验功效接近目标值,节省8-20%样本预算。

详情
AI中文摘要

允许连续监测的顺序检验在A/B实验中很常见。这些检验的功效计算需要模拟,这在实验平台上跨多个指标难以扩展。相反,一种常见的样本量确定启发式方法会膨胀固定样本量,直到计划终点处的边际拒绝概率达到$1-\beta$。这个最后点规则是保守的,因为始终有效(AV)功效是在运行期间任何时间边界跨越的概率,而不仅仅是终点。我们给出了一个闭式校正因子$k^*(\alpha, \beta, t_0)$,用初等函数和二元正态CDF表示,其中$t_0 = m/n_z$是预热比例。闭式近似仅通过边界在计划终点处的值和斜率依赖于边界,并且可以针对任何光滑凹边界进行评估。我们研究了三种情况:Waudby-Smith等人(2023)和Maharaj等人(2023)的置信序列,以及Johari等人(2022)的混合序贯概率比检验。将总样本量设为$k^* \cdot n_z$,其中$n_z$是分配比$r$下的固定样本量,在高斯模拟中使经验功效与目标值相差约3个百分点以内。校正因子仅通过$t_0 = m/n_z(r)$依赖于分配比$r$。我们研究了预热参数的敏感性,并表明在校正因子在操作范围内节省了8-20%的最后点样本预算。

英文摘要

Sequential tests that allow continuous monitoring are common in A/B experimentation. Power calculations for these tests require simulations that are hard to scale across many metrics on an experimentation platform. Instead, a common sizing heuristic inflates the fixed-sample size until the marginal rejection probability at the planned endpoint reaches $1-β$. This last-point rule is conservative because always-valid (AV) power is the probability of a boundary crossing at any time during the run, not at the endpoint alone. We give a closed-form correction factor $k^(α, β, t_0)$ expressed in elementary functions and the bivariate normal CDF, where $t_0 = m/n_z$ is the burn-in fraction. The closed-form approximation depends on the boundary only through its value and slope at the planned endpoint and can be evaluated for any smooth concave boundary. We work out three cases: the confidence sequences of Waudby-Smith et al. (2023) and Maharaj et al. (2023), and the mixture sequential probability ratio test of Johari et al. (2022). Setting the total sample size to $k^ \cdot n_z$, where $n_z$ is the fixed-sample size for allocation ratio $r$, hits empirical power within approximately 3 percentage points of target in Gaussian simulations. The correction factor depends on the allocation ratio $r$ only through $t_0 = m/n_z(r)$. We study sensitivity to the burn-in parameter and show that the correction saves 8--20% of the last-point sample budget across the operating range.

2606.18365 2026-06-18 stat.ME 新提交

Logarithmic energy distances and Gini covariance for Hilbert-valued random elements

Hilbert值随机元的对数能量距离与Gini协方差

Norbert Henze, M. Dolores Jiménez-Gamero

AI总结 研究α↓0时广义能量距离的极限,得到对数能量距离,保留特征性质并导出高斯核最大均值差异表示,进而提出对数Gini协方差用于k样本问题。

Comments 18 pages

详情
AI中文摘要

对于α∈(0,2),广义能量距离和Gini协方差统计量基于核函数(x,y)↦‖x-y‖^α,其中‖·‖表示实可分Hilbert空间中的范数。本文研究边界情形α↓0。经过适当归一化后,相应的能量距离收敛到涉及核函数(x,y)↦log‖x-y‖的对数能量距离。我们证明所得对数能量距离保留了可分Hilbert空间中普通能量距离的基本特征性质,并导出其高斯核最大均值差异表示。受此表示启发,我们针对k样本问题引入对数Gini协方差,并研究其结构和渐近性质。特别地,我们导出其成对对数能量距离表示,建立分布相等的特征定理,发展相应经验统计量的渐近零假设和备择假设理论,并讨论基于置换的实现。对数框架揭示了能量型统计量族中的新边界现象,并提供了与核方法、函数型数据分析和高维推断的联系。

英文摘要

For $α\in(0,2)$, the generalized energy distance and the Gini covariance statistic are based on kernels of the form $(x,y)\mapsto \|x-y\|^α$, where $\|\cdot\|$ denotes the norm in a real separable Hilbert space. This paper investigates the boundary regime $α\downarrow 0$. After suitable normalization, the corresponding energy distance converges to a logarithmic energy distance involving the kernel $(x,y)\mapsto\log\|x-y\|$. We establish that the resulting logarithmic energy distance retains the fundamental characterization property of ordinary energy distances in separable Hilbert spaces and derive a representation in terms of Gaussian-kernel maximum mean discrepancies. Motivated by this representation, we introduce a logarithmic Gini covariance for the $k$-sample problem and investigate its structural and asymptotic properties. In particular, we derive a representation in terms of pairwise logarithmic energy distances, establish a characterization theorem for equality of distributions, develop asymptotic null and alternative theory for the corresponding empirical statistic, and discuss permutation-based implementation. The logarithmic framework reveals a new boundary phenomenon within the family of energy-type statistics and provides connections with kernel methods, functional data analysis, and high-dimensional inference.

2606.18545 2026-06-18 stat.AP q-fin.RM 新提交

The Gini-Bayes Connection: The CAP Slope as Bayes' Theorem, with Applications to Weight of Evidence, Somers' $D$, and Calibration

Gini-Bayes 联系:CAP 斜率作为贝叶斯定理,及其在证据权重、Somers' $D$ 和校准中的应用

Denis Burakov

AI总结 本文明确将累积精度曲线 (CAP) 的斜率识别为贝叶斯定理在累积坐标下的形式,并由此推导出证据权重、信息值、准确率比、Somers' $D$ 和 Gini 系数之间的几何关系,同时提出基于 Gini 系数差异的校准诊断方法。

Comments 19 pages, 7 figures, 6 tables. Code and data: https://github.com/deburky/gini-bayes-paper

详情
AI中文摘要

累积精度曲线 (CAP) 的概率解释在工业界有着悠久的历史。Falkenstein, Boral 和 Carty (2000) 以离散形式指出,在某个分数百分位处的违约率等于组合平均违约率乘以功效曲线的局部斜率;van der Burgt (2008, 2019) 将其形式化为连续恒等式 $p(D\mid x) = p_D\\, dy/dx$,并将连续形式作为工作事实引入;Tasche (2009) 分析了由此产生的校准方法;Voloshyn 和 Voloshyn (2023) 将贝叶斯定理 $f(x\mid D)=p(D\mid x) f(x)/p_D$ 代入面积积分,并将 Gini 系数写为校准曲线的泛函。斜率本身已存在于这一谱系中(van der Burgt 的 $dy/dx$ 是两个累积微分的比值),但它是作为引用的工作事实出现的,从未被视为贝叶斯定理。我们明确地做出这一识别,并阐述其后果。首先,CAP 斜率是累积坐标下的贝叶斯定理:它所恢复的标准化 PD 是经先验概率重新缩放的后验概率。本文的重点在于这一解读所揭示的两个结果。几率形式将证据权重(似然比的对数,即贝叶斯因子)和信息值置于同一几何框架内(某点的证据权重是“坏”和“好”CAP 斜率比值的对数)。准确率比、Somers' $D_{xy}$ 和 Gini 系数 $(2A-1)/(1-p_D)$ 被揭示为同一数值的三种计算方式。在比较模式下(实际结果与模型预测对比),同一恒等式恢复了累积坐标下的可靠性图,其中经验 Gini 系数与模型隐含 Gini 系数之间的差距符号可作为校准诊断指标。一个五组示例以离散形式呈现了所有恒等式,一个核密度示例将其推广到连续情形。

英文摘要

The probabilistic reading of the cumulative accuracy profile (CAP) has a long industry lineage. Falkenstein, Boral and Carty (2000) state, in discrete form, that the default rate at a score percentile equals the portfolio average rate times the local slope of the power curve; van der Burgt (2008, 2019) formalizes this as the continuous identity $p(D\mid x) = p_D\, dy/dx$ and imports the continuous form as a working fact; Tasche (2009) analyzes the resulting calibration method; Voloshyn and Voloshyn (2023) substitute Bayes' theorem, $f(x\mid D)=p(D\mid x) f(x)/p_D$, into the area integral and write the Gini as a functional of the calibration curve. The slope itself is already in the lineage (van der Burgt's $dy/dx$ is the ratio of the two cumulative differentials), but it enters as a cited working fact, never as Bayes' theorem. We make that identification explicit and draw out its consequences. First, the CAP slope is Bayes' theorem in cumulative coordinates: the standardized PD it recovers is the posterior probability rescaled by the prior. The weight of the paper then falls on two results this reading unlocks. The odds form places the weight of evidence (the log of the likelihood ratio, i.e. the Bayes factor) and the information value inside one geometry (the weight of evidence at a point is the log of the ratio of the "bad" and "good" CAP slopes). The accuracy ratio, Somers' $D_{xy}$, and the Gini $(2A-1)/(1-p_D)$ are revealed as one number computed three ways. Run in comparison mode (realized outcomes against model claims), the same identity recovers the reliability diagram in cumulative coordinates, with the sign of the gap between the empirical and model-implied Gini coefficients as a calibration diagnostic. A worked five-band example carries every identity in discrete form, and a kernel-density example extends them to the continuous case.

2606.19318 2026-06-18 q-fin.ST econ.EM q-fin.MF 新提交

Fitting Accumulated Stock Returns with Tempered Skew t-Distribution

用调节偏斜t分布拟合累积股票收益

Siqi Shao, R. A. Serota

AI总结 分析S&P500多日收益分布,发现随累积天数增加幂律尾部被调节,提出带“有界逆伽马”随机波动率的模型导出“调节学生t”分布,并引入Jones-Faddy对称破缺机制得到“调节偏斜t”分布,该分布能很好拟合收益的对称破缺及均值、方差的线性依赖。

Comments 15 pages, 10 figures, 4 tables

详情
AI中文摘要

我们分析了历史S&P500多日收益的分布,累积天数从20到120。随着累积天数的增加,我们观察到幂律尾部明显被调节,趋向于一个看似有限的值。为了解释这一现象,我们采用了一个模型,该模型为随机波动率产生了一个“有界逆伽马”平稳(稳态)分布,进而为收益产生了一个“调节学生t”分布。然后,我们采用了类似Jones-Faddy的对称破缺机制,产生了一个“调节偏斜t”分布。该分布对累积多日S&P500收益的分布提供了相当好的拟合,这些分布表现出收益与损失之间的对称破缺——正如正均值和负偏度所反映的那样。调节偏斜t拟合也与均值以及方差(均方实现波动率)对累积天数的近乎完美线性依赖一致。

英文摘要

We analyze distributions of historic S&P500 multi-day returns, for the number of days of accumulation from 20 to 120. With the increase of the number of days of accumulation, we observe clear tempering of power-law tails toward a seemingly finite value. To explain this phenomenon, we employ a model that produces a "capped Inverse Gamma" stationary (steady-state) distribution for stochastic volatility which, in turn, produces a "tempered Student-t" distribution for returns. We then employ Jones-Faddy-like symmetry breaking mechanism that produces a "tempered Skew-t" distribution. This distribution provides rather good fits to the distributions of accumulated multi-day S&P500 returns, which exhibit symmetry breaking between gains and losses -- as reflected by positive mean and negative skew. Tempered Skew-t fits are also consistent with near perfect linear dependence on the number of days of accumulation of the mean values and, even more so, of the variances (mean squared realized volatility) of the distributions.

2606.19214 2026-06-18 econ.GN q-fin.EC 新提交

Testing Centralized and Polycentric Computational Planning

测试集中式和多中心计算规划

Ricardo Alonzo Fernández Salguero

AI总结 本文提出一个可复现的合成基准,在模拟经济中比较计算规划者、基于代理的市场和混合元市场,发现规划者福利损失更低,但结果受设计选择影响,主要贡献是方法论而非意识形态。

详情
AI中文摘要

本文提出了一个可复现的合成基准,在共同的模拟经济中比较计算规划者、基于代理的市场和混合元市场。该基准包含投入产出生产网络、异质企业、产能约束、内生价格、福利指标、结构性冲击、对抗性压力测试和信息报告实验。在训练、保留和对抗性场景中,规划者始终比分散化替代方案实现更低的福利损失。主要贡献是方法论而非意识形态的。虽然该基准展示了一个可证伪的框架用于比较经济协调机制,但它并未确立规划的实证优越性。若干设计选择机械地偏向规划者,包括信息不对称、不完整的市场表示和简化的制度假设。因此,结果应被解释为对合成实验架构的验证,以及作为未来研究的原型。本文最后概述了一个基于实证校准、结构性保留、敏感性分析、不确定性量化、机制设计测试和独立复制的验证议程。

英文摘要

This paper presents a reproducible synthetic benchmark comparing a computational planner, an agent-based market, and a hybrid meta-market within a common simulated economy. The benchmark incorporates input-output production networks, heterogeneous firms, capacity constraints, endogenous prices, welfare metrics, structural shocks, adversarial stress testing, and information-reporting experiments. Across training, holdout, and adversarial scenarios, the planner consistently achieves lower welfare losses than the decentralized alternatives. The main contribution is methodological rather than ideological. While the benchmark demonstrates a falsifiable framework for comparing economic coordination mechanisms, it does not establish the empirical superiority of planning. Several design choices mechanically favor the planner, including informational asymmetries, incomplete market representation, and simplified institutional assumptions. The results should therefore be interpreted as validation of a synthetic experimental architecture and as a prototype for future research. The paper concludes by outlining a validation agenda based on empirical calibration, structural holdouts, sensitivity analysis, uncertainty quantification, mechanism-design tests, and independent replication.

2606.19052 2026-06-18 q-fin.CP stat.CO 新提交

An extendable, integrated, and dynamic approach to forecasting and stress-testing credit risk

一种可扩展、集成且动态的信用风险预测与压力测试方法

Marcel Muller, Arno Botha, Conrad Beyers

AI总结 提出一种集成贷款生成与信用风险的可扩展压力测试方法,通过蒙特卡洛模拟生成贷款组合并计算风险指标,支持动态调整参数以评估多种压力情景。

Comments 23 pages, 10 figures

详情
AI中文摘要

本文提出了一种集成且可扩展的贷款组合压力测试方法,该方法包括贷款生成组件和信用风险组件。在该方法中,我们使用现实的贷款参数和分布假设模拟完整的贷款组合。随后,我们在多状态概率框架内生成这些贷款的不确定现金流历史。我们通过基于模拟的研究来说明我们的方法,尽管该方法可以拟合真实世界的数据。这种基于模拟的方法非常适合压力测试,因为它允许评估一系列条件。根据这些完成的贷款,我们计算组合层面的信用风险指标,例如违约率和损失率。通过在更广泛的蒙特卡洛设置中相应改变贷款参数来引入压力情景,从而产生一系列组合。经典的压力测试方法通常不集成贷款生成或嵌入风险指标之间的相关结构。在我们的方法中,我们将风险指标的预测与收据生成相结合。给定数据,我们可扩展方法中的贷款参数可以使用任何适用的技术动态建模为输入变量的函数。总体而言,我们的方法可以生成更动态且灵活调整的预测,这可以增强任何银行内的压力测试实践。

英文摘要

An integrated and extendable approach for stress-testing loan portfolios is presented, which includes both a loan production component and a credit risk component. In this approach, we simulate a completed portfolio using realistic loan parameters and distributional assumptions. Thereafter, we generate the uncertain cash flow history of these loans within a multistate probabilistic framework. We illustrate our approach using a simulation-based study, though the approach can be fit to real-world data. Such a simulation-based approach is ideal for stress-testing since it allows for evaluating a range of conditions. From these completed loans, we compute portfolio-level credit risk metrics, e.g., default and loss rates. Stress scenarios are introduced by varying the loan parameters accordingly within a broader Monte Carlo setup, thereby resulting in a range of portfolios. A classical approach to stress-testing does not typically integrate loan production or embed the correlation structure amongst risk metrics. In our approach, we integrate the forecasting of risk metrics with receipt-generation. Given data, the loan parameters within our extendable approach can be dynamically modelled as functions of input variables using any applicable technique. Overall, our approach can render predictions that are more dynamic and flexibly tuned, which can enhance stress-testing practices within any bank.

2606.19038 2026-06-18 q-fin.MF 新提交

Collective completeness and pricing-hedging duality II

集体完备性与定价-对冲对偶 II

Alessandro Doldi, Marco Frittelli, Marco Maggis

AI总结 本文扩展了集体定价与对冲理论,在可接受风险交换构成有限生成凸锥的设定下,证明了集体无套利蕴含可行锥的闭性,并建立了集体第一基本定理和定价-对冲对偶。

详情
AI中文摘要

本文补充并扩展了Doldi, Frittelli和Maggis的《集体完备性与定价-对冲对偶》(Math. Finan. Econ. 19, 757-784 (2025)),研究了当可接受风险交换构成有限生成凸锥时的集体定价与对冲。集体资产定价第一基本定理和集体定价-对冲对偶被推广到这一设定。一个关键贡献是闭性结果,表明无集体套利意味着结合无限维交易机会与有限维交换的聚合可行锥是闭的。本文还证明了未定权益向量的无集体套利价格构成相对开凸集。最后,引入了强集体可复制性,并证明其等价于价格唯一性。这导出了增强的集体资产定价第二基本定理,提供了集体完备性和强集体完备性在集体等价鞅测度唯一性方面的等价刻画。我们强调,当交换属于凸锥而非向量空间时,理论的几个核心方面发生了实质性改变。

英文摘要

This paper complements and extends Doldi, Frittelli and Maggis, Collective completeness and pricing-hedging duality, Math. Finan. Econ. 19, 757-784 (2025), by studying collective pricing and hedging when admissible risk exchanges form a finitely generated convex cone. The collective First Fundamental Theorem of Asset Pricing and the collective pricing-hedging duality are extended to this setting. A key contribution is a closedness result showing that no collective arbitrage implies the closedness of the aggregate feasibility cone combining infinite-dimensional trading opportunities with finite-dimensional exchanges. The paper also proves that no-collective-arbitrage prices for vectors of contingent claims form a relatively open convex set. Finally, strong collective replicability is introduced and shown to be equivalent to price uniqueness. This leads to an enhanced collective Second Fundamental Theorem of Asset Pricing, providing equivalent characterizations of collective completeness and strong collective completeness in terms of the uniqueness of the collective equivalent martingale measure. We highlight that several core aspects of the theory are substantially altered when exchanges belong to a convex cone rather than a vector space.

2606.18994 2026-06-18 econ.GN q-fin.EC 新提交

Climate Policy and The Energy Transition

气候政策与能源转型

Roy Sarkis

AI总结 本文构建多部门动态一般均衡模型,研究气候政策对宏观经济的影响,发现渐进式政策实施可大幅降低转型成本,且部门覆盖范围影响政策福利效果。

Comments 48 pages, 19 figures

详情
AI中文摘要

本文在一个包含可再生能源和不可再生能源、部门特定资本调整摩擦、家庭能源需求以及内生化石资源动态的多部门动态一般均衡模型中,研究了气候政策的宏观经济动态。核心机制在于脱碳需要重新分配能源使用和已安装资本:化石能源需求可以立即收缩,而可再生能源产能和减排措施只能逐步调整。分析得出四个结果。第一,渐进式政策实施显著降低了转型成本:相对于立即实施,在全面监管下渐进式排放上限使福利提高2.26个百分点,在仅企业监管下提高5.06个百分点。第二,可再生能源补贴和不可再生能源税支持可再生能源资本积累,并减少但未消除前置收紧的福利成本。第三,部门覆盖范围改变了不同实施速度下的福利排序。仅企业监管在渐进实施下表现更好,因为它保护了与效用相关的家庭能源服务,但在立即实施下变得几乎与仅碳价格转型一样昂贵。第四,内生化石勘探和依赖于储量的开采成本将气候政策传导为更低的开采量、更少的发现以及不断下降的储量影子价值,为搁浅化石资产提供了结构性机制。结果表明,当政策管理能源-资本重新配置的速度和影响范围时,深度脱碳可以以显著更低的宏观经济成本实现。

英文摘要

This paper studies the macroeconomic dynamics of climate policy in a multi-sector dynamic general equilibrium model with renewable and non-renewable energy, sector-specific capital adjustment frictions, household energy demand, and endogenous fossil resource dynamics. The central mechanism is that decarbonization requires reallocating energy use and installed capital: fossil energy demand can contract immediately, while renewable capacity and abatement adjust only gradually. The analysis delivers four results. First, gradual policy implementation sharply reduces transition costs: relative to immediate implementation, gradual emissions caps improve welfare by 2.26 percentage points under comprehensive regulation and by 5.06 percentage points under firm-only regulation. Second, renewable energy subsidies and non-renewable energy taxes support renewable capital accumulation and reduce, but do not eliminate, the welfare cost of front-loaded tightening. Third, sectoral coverage changes the welfare ranking across implementation speeds. Firm-only regulation performs better under gradual implementation because it shields utility-relevant household energy services, but becomes nearly as costly as the carbon-price-only transition under immediate implementation. Fourth, endogenous fossil exploration and stock-dependent extraction costs transmit climate policy into lower extraction, fewer discoveries, and a declining shadow value of reserves, providing a structural mechanism for stranded fossil assets. The results show that deep decarbonization can be achieved at substantially lower macroeconomic cost when policy manages the speed and incidence of energy-capital reallocation.

2606.18805 2026-06-18 econ.GN q-fin.EC 新提交

Emotional driving: Reference-dependent emotions and risky driving behavior after sporting events

情绪驾驶:体育赛事后的参考依赖情绪与危险驾驶行为

Travis Richardson, Steve Bickley, Ho Fai Ben Chan, Benno Torgler, Shamsunnahar Yasmin, Tim Pawlowski

AI总结 利用2015-2019年佛罗里达五座体育场附近交通数据,发现NFL预测比分接近且主队失利的比赛后,一小时内3公里内平均车速显著增加(最高3 mph),表明持续悬念与负面结果共同诱发危险驾驶。

详情
AI中文摘要

利用交通信息频道(TMC)位置级别每10分钟的平均车速数据,以及精确的碰撞时间和位置信息,我们分析了2015年至2019年NFL和NBA常规赛前后佛罗里达五座体育场周围的驾驶行为。我们没有发现NBA比赛后情绪驾驶的证据,但发现NFL比赛后存在强烈且一致的影响,集中在预测比分接近且以主队失望失利告终的比赛中——结合了高赛前悬念与负面结果效价。这些比赛与赛后第一小时内体育场3公里范围内平均车速显著增加相关,且随着时间和距离的增加而消散。相对于预测比分接近但获胜的比赛,平均车速增加高达3英里/小时——这一效应是典型比赛日与非比赛日速度差异的数倍。总体而言,我们的结果强调了在势均力敌的体育比赛中,持续悬念与负面结果效价的结合如何溢出到赛后的危险驾驶行为中,突显了大型体育赛事中情感线索的行为和公共安全影响。

英文摘要

Using average vehicle speed data in 10-minute increments at the Traffic Message Channel (TMC) location level, along with precise crash timing and location information, we analyze driving behavior around five Florida stadiums before and after NFL and NBA regular season games from 2015 to 2019. We find no evidence of emotional driving following NBA games, but strong and consistent effects following NFL games, concentrated in predicted-close games that end in disappointing home-team losses -- combining high pre-game suspense with negative outcome valence. These games are associated with significant increases in average vehicle speed within 3 km of stadiums during the first post-game hour, dissipating with increasing time and distance from the stadium. Average vehicle speed increases by up to 3 mph relative to predicted-close games that ended in a win -- an effect several times larger than the typical game day versus non-game day speed differential. Overall, our results highlight how the combination of sustained suspense and negative outcome valence in close sporting contests can spill over into risky post-game driving behavior, underscoring the behavioral and public safety implications of affective cues in large-scale sporting events.

2606.18719 2026-06-18 econ.GN q-fin.EC 新提交

Reassessing the role of intermediaries in exports

重新评估中间商在出口中的作用

Aitor Garmendia-Lazcano, Raúl Mínguez, Asier Minondo

AI总结 利用西班牙企业数据,剔除制造商自有出口部门和垂直整合企业后,纯中间商出口份额下降约70%,且与前者存在显著差异。

详情
AI中文摘要

先前的研究认为中间商在出口中占据很大份额。利用西班牙企业层面数据,我们表明许多被归类为中间商的企业要么是制造商拥有的出口部门,负责运输其母公司的产品,要么是垂直整合的企业,控制设计、生产和分销,并主要出口以自有品牌销售的商品。一旦我们排除这些出口部门和垂直整合企业,我们样本中中间商在出口中的份额下降了约70%。我们还表明,纯中间商在关键企业和出口维度上与出口部门和垂直整合企业存在显著差异。

英文摘要

Previous studies conclude that intermediaries account for a large share of exports. Using Spanish firm-level data, we show that many firms classified as intermediaries are either manufacturer-owned export arms that ship their parent firms' products or vertically integrated firms that control design, production, and distribution and predominantly export goods sold under their own brands. Once we exclude these export arms and vertically integrated firms, the share of intermediaries in exports in our sample falls by about 70%. We also show that pure intermediaries differ markedly from export arms and vertically integrated firms along key firm and export dimensions.

2606.18684 2026-06-18 econ.GN q-fin.EC 新提交

How firms export: direct and indirect exporting, intermediaries, and hybrid firms

企业如何出口:直接出口、间接出口、中间商与混合型企业

Raúl Mínguez, Asier Minondo

AI总结 基于制造能力和商业能力的异质性模型,解释企业为何选择直接出口、间接出口、纯中间商或混合出口模式,并用西班牙数据验证。

详情
AI中文摘要

一些企业直接出口自己的产品,另一些依赖中间商代为出口,还有一些既出口自己的产品又为其他生产者提供出口中介服务。为了解释这种异质性,我们构建了一个模型,其中企业在两个维度上存在差异:制造能力和商业能力。制造能力降低了生产品种的边际成本,而商业能力降低了接触外国客户的变动成本。这些能力的组合产生了出口市场中观察到的不同类型的企业:直接出口商、间接出口商、纯中间商和混合型企业。模型预测,商业能力强的中间商与制造能力更强的生产者匹配,且商业能力越强的中间商出口的品种范围越广。我们利用西班牙企业层面的出口数据为这些预测提供了初步证据。

英文摘要

Some firms export their own products directly, others rely on intermediary firms to export on their behalf, and still others both export their own products and intermediate exports for other producers. To explain this heterogeneity, we develop a model in which firms differ along two dimensions: manufacturing capability and commercial capability. Manufacturing capability lowers the marginal cost of producing a variety, whereas commercial capability lowers the variable cost of reaching foreign customers. Different combinations of these capabilities generate the different types of firms observed in export markets: direct exporters, indirect exporters, pure intermediaries, and hybrid firms. The model predicts that commercially capable intermediaries are matched with more manufacturing-capable producers, and that more commercially capable intermediaries export a broader set of varieties. We provide suggestive evidence for these predictions using Spanish firm-level export data.

2606.19280 2026-06-18 q-bio.QM 新提交

CollaboratoR: A scalable workflow for collaborative data entry and management

CollaboratoR:一种用于协作数据录入和管理的可扩展工作流程

Patrick Bills, Ashwini Ramesh, Lais Petri, Alejandra Martinez Blancas, Kelly Kapsar, Amar Deep Tiwari, Phoebe L. Zarnetske

AI总结 针对协作数据录入中不一致和效率低下的问题,开发了CollaboratoR R包,通过自动化验证和聚合,结合Google Sheets和GitHub,实现透明、可重复的数据管理,提升数据合成质量。

Comments 16 pages, 1 table, 1 figure

详情
AI中文摘要

有效的协作数据录入和透明度是构建稳健数据库和高质量数据综合的基础。然而,研究人员经常面临不一致的数据录入,无意中引入错误、误读和不一致,损害数据完整性。尽管开源工具的使用日益增多,许多人仍依赖低效的格式或昂贵的商业平台,而较少采用复杂的开源解决方案。这些低效率拖慢了工作流程,阻碍了研究人员构建用于综合研究(包括元分析)的基础数据库。为了解决这个问题,我们开发了CollaboratoR,一个可定制的R包,它自动化数据验证和聚合,确保一致性和透明度,并遵循FAIR数据原则,同时可选地使用Google Sheets进行协作数据录入和GitHub进行版本控制。CollaboratoR填补了临时电子表格和用于元分析数据提取的复杂系统之间的空白。数据被录入共享的Google Sheets,经过验证,推送到GitHub进行版本控制,然后在最终确定前再次验证以确保准确性。在两个案例研究(植物竞争和鸟类互动数据库)中测试,CollaboratoR在管理大型协作数据集方面证明是有效的。在这两个案例中,自动化验证及早标记了常见的录入和格式问题,提高了可追溯性,并减少了事后清理所花费的时间。该框架适用于数据综合为数据驱动决策提供信息的学科,如社会科学、生态学以及医学和药学研究。最终,CollaboratoR为高效、透明和可重复的协作数据管理提供了指导,增强了跨领域和行业的研究综合。

英文摘要

Effective collaborative data entry and transparency are foundational for building robust databases and high-quality data synthesis. Yet researchers often face inconsistent data entries, inadvertently introducing errors, misreadings, and inconsistencies that compromise data integrity. Despite the growing use of open-source tools, many still rely on inefficient formats or costly commercial platforms, while fewer adopt complex open-source solutions. These inefficiencies slow workflows and hinder researchers' ability to build foundational databases for synthesis research, including meta-analyses. To address this, we developed CollaboratoR, a customizable R package that automates data validation and aggregation, ensuring consistency and transparency and adhering to FAIR data principles, while optionally using Google Sheets for collaborative data entry and GitHub for version control. CollaboratoR fills the gap between ad-hoc spreadsheets and complex systems for data extraction in meta-analyses. Data are entered into shared Google Sheets, validated, and pushed to GitHub for version control, then re-validated after verification to ensure accuracy before finalizing. Tested in two case studies, plant competition and avian interaction databases, CollaboratoR proved effective at managing large collaborative datasets. In both, automated validation flagged common entry and formatting issues early, improving traceability and reducing time spent on post-hoc cleaning. This framework applies across disciplines where data synthesis informs data-driven decision-making, such as social science, ecology, and medical and pharmaceutical research. Ultimately, CollaboratoR offers guidance for efficient, transparent, and reproducible collaborative data management, enhancing research synthesis across fields and industries alike.

2606.18667 2026-06-18 q-bio.NC q-bio.QM 新提交

Can neurons speak? Semantic narration of vision at single-cell resolution

神经元能说话吗?单细胞分辨率的视觉语义叙述

Arnau Marin-Llobet, Richard Hakim, Sara Matias, Venkatesh N. Murthy, Na Li, Demba Ba

AI总结 提出NEURRATOR框架,通过将神经元活动解码为自然语言描述,实现单细胞分辨率的视觉语义叙述,并用于量化解码保真度及解析单个神经元和特定细胞类型的功能贡献。

详情
AI中文摘要

识别高级视觉皮层中单个神经元编码的内容是一个开放问题。响应难以直观参数化,而用于替代的深度网络嵌入是黑箱。这里,我们介绍NEURRATOR,一个将尖峰活动解码为单神经元分辨率的自由形式自然语言叙述的框架。一个学习编码器将来自任意子集的同步记录神经元的尖峰序列映射到冻结CLIP的补丁嵌入空间,多模态语言模型和稀疏自编码器生成并验证描述,无需语言侧训练。应用于自然电影观看期间小鼠视觉皮层的Neuropixel记录,NEURRATOR从数千个神经元、单个皮层区域、局部群体或分子定义的细胞类型进行叙述。我们利用这一特性来(i)量化解码保真度如何随群体大小和皮层区域变化,以及(ii)用平实的语言“叙述”单个神经元和基因标记的抑制性细胞类型对视觉表征的贡献。这将细胞身份从分类目标重新定义为视觉系统的功能探针,为神经系统提供了一种新的生物学见解单位。

英文摘要

Identifying what individual neurons encode in higher-order visual cortex is an open problem. Responses resist intuitive parameterization, and the deep-network embeddings used in their place are black boxes. Here, we introduce NEURRATOR, a framework that decodes spiking activity into free-form natural-language narration of the viewed scene at single-neuron resolution. A learned encoder maps spike trains from arbitrary subsets of simultaneously-recorded neurons into the patch-embedding space of a frozen CLIP, from which a multimodal language model and sparse autoencoder generates and validates a description with no language-side training. Applied to Neuropixel recordings of mouse visual cortex during natural-movie viewing, NEURRATOR narrates from thousands of neurons, singular cortical regions, local populations, or from a molecularly-defined cell-types. We use this property to (i) quantify how decoding fidelity scales with population size and cortical region, and (ii) "neurrate", in plain language, what individual neurons and genetically-tagged inhibitory cell-types contribute to visual representation. This recasts cell identity from a classification target into a functional probe of the visual system, providing a new unit of biological insights in neural systems.

2606.18575 2026-06-18 q-bio.QM 新提交

Adaptive COVID-19 Trajectory Forecasting Using MAB-Inspired Ensemble Weighting

基于MAB启发式集成加权的自适应COVID-19轨迹预测

Hamed Karami, Javier Redondo Anton, Geunsoo Jang, K. Selcuk Candan, Gerardo Chowell

AI总结 针对疫情预测中单一模型可靠性不足的问题,提出MAB启发式自适应加权策略,在三个美国COVID-19疫情波次中评估UCB、EXP3和epsilon-greedy等加权规则,发现EXP3和EPSStoch在概率预测质量上表现最优。

详情
AI中文摘要

预测疫情轨迹对公共卫生决策至关重要,但没有任何单一模型能在不同疫情阶段和预测场景中持续可靠。我们评估了多臂老虎机(MAB)启发的自适应加权策略,用于在组件模型性能随时间变化时组合疫情预测模型。利用来自三个疫情波次的美国COVID-19发病率数据,我们在固定短窗口和增长校准窗口下比较了UCB、EXP3和epsilon-greedy加权规则,包括确定性和随机集成变体。模型池包括SIR、SEIR、GLM、Gompertz、Richards、ARIMA、带漂移的随机游走、简单指数平滑、Holt线性趋势方法和指数增长。自适应集成与单个模型以及朴素、未加权和逆WIS加权集成基准进行比较。使用RMSE、加权区间分数(WIS)、95%预测区间覆盖率和平均95%预测区间宽度评估预测性能。在不同波次、校准窗口和预测时间跨度上,EXP3Stoch、EXP3Det和EPSStoch实现了最低的平均预测WIS。主要收益在于概率预测质量,特别是WIS和区间覆盖率,而非一致更低的点预测误差。简单基准(包括未加权和逆WIS集成)在若干场景中仍具竞争力。这些结果表明,MAB启发的自适应加权是疫情预测中有用的补充工具,尤其当模型技能随时间变化且预测不确定性较大时。

英文摘要

Forecasting epidemic trajectories is important for public health decision-making, but no single model is consistently reliable across epidemic phases and forecasting settings. We evaluate Multi-Armed Bandit (MAB)-inspired adaptive weighting strategies for combining epidemic forecasting models when component-model performance changes over time. Using U.S. COVID-19 incidence data from three epidemic waves, we compare UCB, EXP3, and epsilon-greedy weighting rules under fixed short-window and growing calibration windows, with both deterministic and stochastic ensemble variants. The model pool includes SIR, SEIR, GLM, Gompertz, Richards, ARIMA, random walk with drift, simple exponential smoothing, Holt's linear trend method, and exponential growth. Adaptive ensembles are compared with individual models and with naive, unweighted, and inverse-WIS weighted ensemble benchmarks. Forecast performance is assessed using RMSE, weighted interval score (WIS), 95% prediction-interval coverage, and mean 95% prediction-interval width. Across waves, calibration windows, and forecast horizons, EXP3Stoch, EXP3Det, and EPSStoch achieved the lowest mean forecast WIS. The main gains were in probabilistic forecast quality, especially WIS and interval coverage, rather than uniformly lower point forecast error. Simple benchmarks, including the unweighted and inverse-WIS ensembles, remained competitive in several settings. These results suggest that MAB-inspired adaptive weighting is a useful complementary tool for epidemic forecasting, especially when model skill is time-varying and forecast uncertainty is substantial.

2606.18295 2026-06-18 q-bio.QM 新提交

Archetypal Microbiome Profiles as Indicators of Nitrous Oxide Emission States in Activated Sludge

活性污泥中一氧化二氮排放状态的原型微生物组特征指标

Cheng Chen, Marcelo Seppi, Samir Suweis, Andreas Froemelt, Eberhard Morgenroth, Andreas Scheidegger, Carlo Albert

AI总结 本研究利用原型分析(AA)将活性污泥微生物组降维为可解释的低维状态空间,发现三个原型可解释63%-73%的群落变异,且高N2O排放样本集中在特定原型附近,为全尺度污水处理厂监测N2O排放状态提供了可解释框架。

详情
AI中文摘要

水资源回收设施(WRRFs)的一氧化二氮(N2O)排放随时间波动,可能源于多种微生物途径,使得源归因和全尺度预测困难。活性污泥微生物组的高维度进一步加剧了难度,其复杂动态的群落结构可能掩盖与N2O排放模式的关系。本研究评估了活性污泥微生物组的可解释低维表示是否与N2O排放状态相关。从瑞士两个全尺度WRRFs收集了时间序列16S rRNA基因扩增子谱和N2O排放指标。使用原型分析(AA)汇总属级相对丰度谱,将每个样本表示为少量可解释群落原型的凸组合。在两个WRRFs中,三个原型捕获了群落组成中大部分可解释变异(63%-73%),并定义了一个单纯形状态空间,其中样本聚集在顶点和边缘附近,表明群落组成围绕不同的原型状态及其混合组织。在训练时不使用排放标签的情况下,原型状态空间与二元N2O排放状态强烈对齐:两个工厂的高排放观测集中在特定原型周围,时间轨迹显示在高排放期间该原型的权重持续较高。功能总结表明高N2O原型具有位点特异性但途径相关的解释。温度进一步结构化原型状态空间,表明与N2O升高相关的微生物组配置的季节性驱动。总体而言,AA提供了一个可解释的框架来追踪微生物组状态转变,并可能支持全尺度WRRFs中高N2O排放状态的运行追踪。

英文摘要

Nitrous oxide (N2O) emissions from water resource recovery facilities (WRRFs) fluctuate over time and can arise from multiple microbial pathways, making source attribution and full-scale prediction difficult. The difficulty is compounded by the high dimensionality of activated sludge microbiomes, whose complex and dynamic community structure can obscure relationships with N2O emission patterns. This study evaluated whether interpretable, low-dimensional representations of activated sludge microbiomes can be correlated with N2O emission states. Temporal 16S rRNA gene amplicon profiles and N2O emission metrics were collected from two full-scale WRRFs in Switzerland. Genus-level relative-abundance profiles were summarized using archetypal analysis (AA), which represents each sample as a convex combination of a small number of interpretable community profiles. In both WRRFs, three archetypes captured most explainable variation in community composition (63%--73%) and defined a simplex state space in which samples clustered near vertices and edges, indicating that community compositions were organized around distinct archetypal states and their mixtures. Without using emission labels while training, the archetypal state space aligned strongly with binary N2O emission states: high-emission observations in both plants concentrated around a specific archetype, and temporal trajectories showed consistent high weights of this archetype during high-emission periods. Functional summaries suggested site-specific but pathway-relevant interpretations of the high-N2O archetype. Temperature further structured the archetypal state space, indicating seasonal forcing of microbiome configurations associated with elevated N2O. Overall, AA provides an interpretable framework to track microbiome regime shifts and may support operational tracking of high-N2O emission states in full-scale WRRFs.